EMNLP-CoNLL 2007 Review Form

EMNLP-CoNLL 2007: 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Review Form (slightly improved after the conference)

Submission #7: On A Permutation-Invariant Class of Conference Paper Reviews

Authors: Jason Kudo

Reviewer: Taku Eisner

Secondary Reviewer (if any):

Summary Ranking

Please evaluate the submission according to the criteria below.

Evaluation Category Enter Your Score

Appropriateness
Does this paper fit in EMNLP-CoNLL 2007?

(The focus of EMNLP-CoNLL 2007 is learned models and data-driven systems concerning all aspects of human language. Both empirical and theoretical results are welcome; see the Call for Papers.)
5 = Appropriate for EMNLP-CoNLL. (most submissions)
4 = Computational linguistics or NLP, though it's not typical EMNLP or CoNLL material.
3 = Possibly relevant to the audience, though it's not quite computational linguistics or NLP.
2 = Only marginally relevant.
1 = Inappropriate.

Clarity
For the reasonably well-prepared reader, is it clear what was done and why? Is the paper well-written and well-structured? Does the English or the mathematics need cleaning up? Would the explanation benefit from more examples or pictures?

Is there sufficient detail for an expert to validate the work, i.e., by replicating experiments or filling in theoretical steps?

(Take into account whether any obscurity or minor English errors could be fixed with relatively little effort, or whether the paper requires more work than is likely to be carried out in the 2.5 weeks available.)
5 = Admirably clear.
4 = Understandable by most readers.
3 = Mostly understandable to me with some effort.
2 = Important questions were hard to resolve even with effort.
1 = Much of the paper is confusing.

Originality / Innovativeness
How original is the approach? Does this paper break new ground in topic, methodology, or content? How exciting and innovative is the research it describes?

(Note that a paper could score high for originality even if the results did not show a convincing benefit.)
5 = Surprising: Noteworthy new problem, technique, methodology, or insight.
4 = Creative: Relatively few people in our community would have put these ideas together.
3 = Somewhat conventional: A number of people could have come up with this if they thought about it for a while.
2 = Rather boring: Obvious, or a minor improvement on familiar techniques.
1 = Significant portions have actually been done before or done better.

Soundness / Correctness
First, is the technical approach sound and well-chosen? Second, can one trust the claims of the paper -- are they supported by proper experiments, proofs, or other argumentation?
5 = The approach is very apt, and the claims are convincingly supported.
4 = Generally solid work, though I have a few suggestions about how to strengthen the technical approach or evaluation.
3 = Fairly reasonable work. The approach is not bad, and at least the main claims are probably correct, but I am not entirely ready to accept them (based on the material in the paper).
2 = Troublesome. There are some ideas worth salvaging here, but the work should really have been done or evaluated differently, or justified better.
1 = Fatally flawed.

Meaningful Comparison
Does the author make clear where the problems and methods sit with respect to existing literature? Are the references adequate? Are any experimental results meaningfully compared with the best prior approaches?
5 = Precise and complete comparison with related work. Good job given the space constraints.
4 = Mostly solid bibliography and comparison, but I have some suggestions.
3 = Bibliography and comparison are somewhat helpful, but it could be hard for a reader to determine exactly how this work relates to previous work.
2 = Only partial awareness and understanding of related work, or a flawed empirical comparison.
1 = Little awareness of related work, or lacks necessary empirical comparison.

Thoroughness [formerly called Depth]
Does this paper have enough substance, or would it benefit from more ideas or results?

(Note that this question mainly concerns the amount of work; its quality is evaluated in other categories.)
5 = Contains more ideas or results than most publications in this conference; goes the extra mile.
4 = Represents an appropriate amount of work for a publication in this conference. (most submissions)
3 = Leaves open one or two natural questions that should have been pursued within the paper.
2 = Work in progress. There are enough good ideas, but perhaps not enough results yet.
1 = Seems thin. Not enough ideas here for a full-length paper.

Impact of Ideas or Results
How significant is the work described? If the ideas are novel, will they also be useful or inspirational? If the results are sound, are they also important?
5 = Could alter other people's choice of research topics or basic approach.
4 = Some of the ideas or results will substantially help other people's ongoing research.
3 = Interesting but not too influential. The work will be cited, but mainly for comparison or as a source of minor contributions.
2 = Marginally interesting. May or may not be cited.
1 = Will have no impact on the field.

Impact of Resources
In addition to its direct intellectual contributions, does the paper promise to release any new resources, such as an implementation, a toolkit, or new data?

If so, is it clear what will be released and when? If so, will these resources be valuable to others in the form in which they are released? Do they fill an unmet need? Are they at least sufficient to replicate or better understand the research in the paper?

(This question encourages authors to help the field advance, by releasing their systems, data, or tools.)
5 = Enabling: The newly released resources should affect other people's choice of research or development projects to undertake.
4 = Useful: I would recommend the new resources to other researchers or developers for their ongoing work.
3 = Potentially useful: Someone might find the new resources useful for their work.
2 = Documentary: The new resources are useful to study or replicate the reported research, although for other purposes they may have limited interest or limited usability. (this is a positive rating)
1 = No usable resources released. (most submissions)

Recommendation
There are many good submissions competing for slots at EMNLP-CoNLL 2007; how important is it to feature this one? Will people learn a lot by reading this paper or seeing it presented?

In deciding on your ultimate recommendation, please think over all your scores above. But remember that no paper is perfect, and remember that we want a conference full of interesting, diverse, and timely work. If a paper has some weaknesses, but you really got a lot out of it, feel free to fight for it. If a paper is solid but you could live without it, let us know that you're ambivalent. Remember also that the author has a couple of weeks to address reviewer comments before the camera-ready deadline.

Should the paper be accepted or rejected?
5 = Exciting: I'd fight to get it accepted
4 = Worthy: I would like to see it accepted
3 = Borderline: I'm ambivalent about this one
2 = Mediocre: I'd rather not see it in the conference
1 = Poor: I'd fight to have it rejected
Use an integer score (1-5) if you can, but if you have trouble choosing one of the above options, half-points are allowed.

Reviewer Confidence
5 = Positive that my evaluation is correct. I read the paper very carefully and am very familiar with related work.
4 = Quite sure. I tried to check the important points carefully, and checked for uncited prior work. It's unlikely, though conceivable, that I missed something that should affect my ratings.
3 = Pretty sure, but there's a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper's details, e.g., math, experimental design, novelty.
2 = Willing to defend evaluation, but it is fairly likely that I missed some details, didn't understand some central points, or can't be sure about the novelty of the work.
1 = Not my area, or paper is very hard to understand. My evaluation is just an educated guess.

Audience
If the paper is accepted, we will have to decide whether to present it in a larger auditorium, a smaller auditorium, or a poster session. This decision depends on the paper's quality but also may be affected by scheduling considerations, perhaps including the size of its likely audience.

Is the work addressed to a large subset of the community? This is not a question about the quality of the work -- rather about the topic and how it is presented. It asks who the natural audience would be.
5 = Potentially relevant to many people from different parts of the EMNLP-CoNLL community.
4 = Potentially relevant to a large subcommunity.
3 = Potentially relevant to a small subcommunity.
2 = Potentially relevant to a few specialized researchers.
1 = Relevant only to the author.
Note that good specialized papers are welcome in the conference, so a low score here does not imply a low overall recommendation.

Evaluation Category	Enter Your Score
Appropriateness Does this paper fit in EMNLP-CoNLL 2007? (The focus of EMNLP-CoNLL 2007 is learned models and data-driven systems concerning all aspects of human language. Both empirical and theoretical results are welcome; see the Call for Papers.) 5 = Appropriate for EMNLP-CoNLL. (most submissions) 4 = Computational linguistics or NLP, though it's not typical EMNLP or CoNLL material. 3 = Possibly relevant to the audience, though it's not quite computational linguistics or NLP. 2 = Only marginally relevant. 1 = Inappropriate.
Clarity For the reasonably well-prepared reader, is it clear what was done and why? Is the paper well-written and well-structured? Does the English or the mathematics need cleaning up? Would the explanation benefit from more examples or pictures? Is there sufficient detail for an expert to validate the work, i.e., by replicating experiments or filling in theoretical steps? (Take into account whether any obscurity or minor English errors could be fixed with relatively little effort, or whether the paper requires more work than is likely to be carried out in the 2.5 weeks available.) 5 = Admirably clear. 4 = Understandable by most readers. 3 = Mostly understandable to me with some effort. 2 = Important questions were hard to resolve even with effort. 1 = Much of the paper is confusing.
Originality / Innovativeness How original is the approach? Does this paper break new ground in topic, methodology, or content? How exciting and innovative is the research it describes? (Note that a paper could score high for originality even if the results did not show a convincing benefit.) 5 = Surprising: Noteworthy new problem, technique, methodology, or insight. 4 = Creative: Relatively few people in our community would have put these ideas together. 3 = Somewhat conventional: A number of people could have come up with this if they thought about it for a while. 2 = Rather boring: Obvious, or a minor improvement on familiar techniques. 1 = Significant portions have actually been done before or done better.
Soundness / Correctness First, is the technical approach sound and well-chosen? Second, can one trust the claims of the paper -- are they supported by proper experiments, proofs, or other argumentation? 5 = The approach is very apt, and the claims are convincingly supported. 4 = Generally solid work, though I have a few suggestions about how to strengthen the technical approach or evaluation. 3 = Fairly reasonable work. The approach is not bad, and at least the main claims are probably correct, but I am not entirely ready to accept them (based on the material in the paper). 2 = Troublesome. There are some ideas worth salvaging here, but the work should really have been done or evaluated differently, or justified better. 1 = Fatally flawed.
Meaningful Comparison Does the author make clear where the problems and methods sit with respect to existing literature? Are the references adequate? Are any experimental results meaningfully compared with the best prior approaches? 5 = Precise and complete comparison with related work. Good job given the space constraints. 4 = Mostly solid bibliography and comparison, but I have some suggestions. 3 = Bibliography and comparison are somewhat helpful, but it could be hard for a reader to determine exactly how this work relates to previous work. 2 = Only partial awareness and understanding of related work, or a flawed empirical comparison. 1 = Little awareness of related work, or lacks necessary empirical comparison.
Thoroughness [formerly called Depth] Does this paper have enough substance, or would it benefit from more ideas or results? (Note that this question mainly concerns the amount of work; its quality is evaluated in other categories.) 5 = Contains more ideas or results than most publications in this conference; goes the extra mile. 4 = Represents an appropriate amount of work for a publication in this conference. (most submissions) 3 = Leaves open one or two natural questions that should have been pursued within the paper. 2 = Work in progress. There are enough good ideas, but perhaps not enough results yet. 1 = Seems thin. Not enough ideas here for a full-length paper.
Impact of Ideas or Results How significant is the work described? If the ideas are novel, will they also be useful or inspirational? If the results are sound, are they also important? 5 = Could alter other people's choice of research topics or basic approach. 4 = Some of the ideas or results will substantially help other people's ongoing research. 3 = Interesting but not too influential. The work will be cited, but mainly for comparison or as a source of minor contributions. 2 = Marginally interesting. May or may not be cited. 1 = Will have no impact on the field.
Impact of Resources In addition to its direct intellectual contributions, does the paper promise to release any new resources, such as an implementation, a toolkit, or new data? If so, is it clear what will be released and when? If so, will these resources be valuable to others in the form in which they are released? Do they fill an unmet need? Are they at least sufficient to replicate or better understand the research in the paper? (This question encourages authors to help the field advance, by releasing their systems, data, or tools.) 5 = Enabling: The newly released resources should affect other people's choice of research or development projects to undertake. 4 = Useful: I would recommend the new resources to other researchers or developers for their ongoing work. 3 = Potentially useful: Someone might find the new resources useful for their work. 2 = Documentary: The new resources are useful to study or replicate the reported research, although for other purposes they may have limited interest or limited usability. (this is a positive rating) 1 = No usable resources released. (most submissions)
Recommendation There are many good submissions competing for slots at EMNLP-CoNLL 2007; how important is it to feature this one? Will people learn a lot by reading this paper or seeing it presented? In deciding on your ultimate recommendation, please think over all your scores above. But remember that no paper is perfect, and remember that we want a conference full of interesting, diverse, and timely work. If a paper has some weaknesses, but you really got a lot out of it, feel free to fight for it. If a paper is solid but you could live without it, let us know that you're ambivalent. Remember also that the author has a couple of weeks to address reviewer comments before the camera-ready deadline. Should the paper be accepted or rejected? 5 = Exciting: I'd fight to get it accepted 4 = Worthy: I would like to see it accepted 3 = Borderline: I'm ambivalent about this one 2 = Mediocre: I'd rather not see it in the conference 1 = Poor: I'd fight to have it rejected Use an integer score (1-5) if you can, but if you have trouble choosing one of the above options, half-points are allowed.
Reviewer Confidence 5 = Positive that my evaluation is correct. I read the paper very carefully and am very familiar with related work. 4 = Quite sure. I tried to check the important points carefully, and checked for uncited prior work. It's unlikely, though conceivable, that I missed something that should affect my ratings. 3 = Pretty sure, but there's a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper's details, e.g., math, experimental design, novelty. 2 = Willing to defend evaluation, but it is fairly likely that I missed some details, didn't understand some central points, or can't be sure about the novelty of the work. 1 = Not my area, or paper is very hard to understand. My evaluation is just an educated guess.
Audience If the paper is accepted, we will have to decide whether to present it in a larger auditorium, a smaller auditorium, or a poster session. This decision depends on the paper's quality but also may be affected by scheduling considerations, perhaps including the size of its likely audience. Is the work addressed to a large subset of the community? This is not a question about the quality of the work -- rather about the topic and how it is presented. It asks who the natural audience would be. 5 = Potentially relevant to many people from different parts of the EMNLP-CoNLL community. 4 = Potentially relevant to a large subcommunity. 3 = Potentially relevant to a small subcommunity. 2 = Potentially relevant to a few specialized researchers. 1 = Relevant only to the author. Note that good specialized papers are welcome in the conference, so a low score here does not imply a low overall recommendation.

Detailed Comments

Please supply detailed comments to back up your rankings. These comments will be forwarded to the authors of the paper. The comments will help the committee decide the outcome of the paper, and will help justify this decision for the authors. Moreover, if the paper is accepted, the comments should guide the authors in making revisions for a final manuscript. Hence, the more detailed you make your comments, the more useful your review will be - both for the committee and for the authors.

Enter comments here:

Confidential Comments for Committee

You may wish to withhold some comments from the authors, and include them solely for the committee's internal use. For example, you may want to express a very strong (negative) opinion on the paper, which might offend the authors in some way. Or, perhaps you wish to write something which would expose your identity to the authors. If you wish to share comments of this nature with the committee, this is the place to put them.