Evaluation Category
|
Enter Your Score
|
AppropriatenessDoes this paper fit in EMNLP-CoNLL 2007?
(The focus of EMNLP-CoNLL 2007 is learned models and data-driven
systems concerning all aspects of human language. Both empirical and
theoretical results are welcome; see the
Call for Papers.)
5 = Appropriate for EMNLP-CoNLL. (most submissions)
4 = Computational linguistics or NLP, though it's not typical EMNLP or CoNLL material.
3 = Possibly relevant to the audience, though it's not quite computational linguistics or NLP.
2 = Only marginally relevant.
1 = Inappropriate.
| |
ClarityFor the reasonably well-prepared reader, is it clear what was done
and why? Is the paper well-written and well-structured? Does the
English or the mathematics need cleaning up? Would the explanation
benefit from more examples or pictures?
Is there sufficient detail for an expert to validate the work, i.e., by replicating experiments or filling in theoretical steps?
(Take into account whether any obscurity or minor English errors could be fixed with relatively
little effort, or whether the paper requires more work than is
likely to be carried out in the 2.5 weeks available.)
5 = Admirably clear.
4 = Understandable by most readers.
3 = Mostly understandable to me with some effort.
2 = Important questions were hard to resolve even with effort.
1 = Much of the paper is confusing.
| |
Originality / InnovativenessHow original is the approach? Does this paper break new ground in
topic, methodology, or content? How exciting and innovative is the
research it describes?
(Note that a paper could score high for originality even if the
results did not show a convincing benefit.)
5 = Surprising: Noteworthy new problem, technique, methodology, or insight.
4 = Creative: Relatively few people in our community would have put these ideas together.
3 = Somewhat conventional: A number of people could have come up with this if they thought about it for a while.
2 = Rather boring: Obvious, or a minor improvement on familiar techniques.
1 = Significant portions have actually been done before or done better.
| |
Soundness / CorrectnessFirst, is the technical approach sound and well-chosen? Second,
can one trust the claims of the paper -- are they supported by
proper experiments, proofs, or other argumentation?
5 = The approach is very apt, and the claims are convincingly
supported.
4 = Generally solid work, though I have a few suggestions
about how to strengthen the technical approach or evaluation.
3 = Fairly reasonable work. The approach is not bad, and at least the
main claims are probably correct, but I am not entirely ready to accept them
(based on the material in the paper).
2 = Troublesome. There are some ideas worth salvaging here,
but the work should really have been done or evaluated differently, or
justified better.
1 = Fatally flawed.
| |
Meaningful ComparisonDoes the author make clear where the
problems and methods sit with respect to existing literature?
Are the references adequate? Are
any experimental results meaningfully compared with
the best prior approaches?
5 = Precise and complete comparison with related work. Good job
given the space constraints.
4 = Mostly solid bibliography and comparison, but I have
some suggestions.
3 = Bibliography and comparison are somewhat helpful, but it could
be hard for a reader to determine exactly how this work relates
to previous work.
2 = Only partial awareness and understanding of related work, or a flawed empirical comparison.
1 = Little awareness of related work, or lacks necessary empirical comparison.
| |
Thoroughness [formerly called Depth]Does this paper have enough substance,
or would it benefit from more ideas or results?
(Note that this question mainly concerns the amount of work;
its quality is evaluated in other categories.)
5 = Contains more ideas or results than most publications in this conference; goes the extra mile.
4 = Represents an appropriate amount of work for a
publication in this conference. (most submissions)
3 = Leaves open one or two natural questions that should have been
pursued within the paper.
2 = Work in progress. There are enough good ideas, but perhaps
not enough results yet.
1 = Seems thin. Not enough ideas here for a full-length paper.
| |
Impact of Ideas or
ResultsHow significant is the work described? If the ideas are novel, will they also
be useful or inspirational? If the results are sound, are
they also important?
5 = Could alter other people's choice of research topics or basic approach.
4 = Some of the ideas or results will substantially help other people's ongoing research.
3 = Interesting but not too influential. The work will be cited, but mainly
for comparison or as a source of minor contributions.
2 = Marginally interesting. May or may not be cited.
1 = Will have no impact on the field.
| |
Impact of ResourcesIn addition to its direct intellectual contributions, does the
paper promise to release any new resources, such as an
implementation, a toolkit, or new data?
If so, is it clear what will be released and when? If so, will
these resources be valuable to others in the form in which they are
released? Do they fill an unmet need? Are they at least sufficient
to replicate or better understand the research in the paper?
(This question encourages authors to help the field advance, by releasing their systems, data, or tools.)
5 = Enabling: The newly released resources should affect other people's choice of research or development projects to undertake.
4 = Useful: I would recommend the new resources to other researchers or developers for their ongoing work.
3 = Potentially useful: Someone might find the new resources useful for their work.
2 = Documentary: The new resources are useful to study or replicate the reported research,
although for other purposes they may have limited interest or limited
usability. (this is a positive rating)
1 = No usable resources released. (most submissions)
| |
RecommendationThere
are many good submissions competing for slots at EMNLP-CoNLL 2007; how
important is it to feature this one? Will people learn a
lot by reading this paper or seeing it presented?
In deciding on your ultimate recommendation, please think over all
your scores above. But remember that no paper is perfect, and
remember that we want a conference full of interesting, diverse, and
timely work. If a paper has some weaknesses, but you really got a
lot out of it, feel free to fight for it. If a paper is solid but you
could live without it, let us know that you're ambivalent. Remember
also that the author has a couple of weeks to address reviewer
comments before the camera-ready deadline.
Should the paper be accepted or rejected?
5 = Exciting: I'd fight to get it accepted
4 = Worthy: I would like to see it accepted
3 = Borderline: I'm ambivalent about this one
2 = Mediocre: I'd rather not see it in the conference
1 = Poor: I'd fight to have it rejected
Use an integer score (1-5) if you can, but if you
have trouble choosing one of the above options, half-points are
allowed.
| |
Reviewer Confidence
5 = Positive that my evaluation is correct. I read the paper
very carefully and am very familiar with related work.
4 = Quite sure. I tried to check the important points carefully,
and checked for uncited prior work. It's unlikely, though
conceivable, that I missed something that should affect my ratings.
3 = Pretty sure, but there's a chance I missed something.
Although I have a good feel for this area in general, I did not
carefully check the paper's details, e.g., math, experimental design,
novelty.
2 = Willing to defend evaluation, but it is fairly likely that I
missed some details, didn't understand some central points, or can't
be sure about the novelty of the work.
1 = Not my area, or paper is very hard to understand. My
evaluation is just an educated guess. | |
AudienceIf the paper is accepted, we will have to decide whether to present
it in a larger auditorium, a smaller auditorium, or a poster session.
This decision depends on the paper's quality but also may be affected
by scheduling considerations, perhaps including the size of its
likely audience.
Is the work addressed to a large subset of the community? This is
not a question about the quality of the work -- rather about the topic
and how it is presented. It asks who the natural audience would
be.
5 = Potentially relevant to many people from different parts of the EMNLP-CoNLL community.
4 = Potentially relevant to a large subcommunity.
3 = Potentially relevant to a small subcommunity.
2 = Potentially relevant to a few specialized researchers.
1 = Relevant only to the author.
Note that good specialized papers are welcome in the conference, so
a low score here does not imply a low overall recommendation.
| |