## The 2007 Joint Conference on Empirical Methods in Natural Language Processingand Computational Natural Language Learning (EMNLP-CoNLL 2007)

The main track of the conference received 398 paper submissions (not counting 21 that were withdrawn or rejected without review). We were able to accept 66 papers to be presented as full talks, and 43 papers to be presented as leisurely posters. All 109 final papers were allowed 9 pages plus bibliography.

In a separate track, 22 specially designated short papers reported results in the CoNLL Shared Task competition, an annual tradition. 9 of these were presented as short talks.

The program also featured invited talks by LouAnn Gerken and Piotr Indyk, and a best paper talk.

The detailed schedule is below, with links to abstracts. The conference proceedings are freely available in the ACL Anthology (including all individual papers, the full proceedings volume to flip through, and a BibTeX database).

### CONFERENCE PROGRAM OVERVIEW

#### Thursday, June 28, 2007

9:00–10:45Session 1: Plenary Session
10:45–11:15Morning Break
11:15–12:30Sessions 2a and 2b
12:30–14:00Lunch
14:00–15:40Sessions 3a and 3b
15:40–16:00Afternoon Break
16:00–18:30Session 4: All Posters

#### Friday, June 29, 2007

9:00–10:40Sessions 5a and 5b
10:40–11:15Morning Break
11:15–12:30Sessions 6a and 6b
12:30–13:00SIGNLL Business Meeting
12:30–14:00Lunch
14:00–15:40Sessions 7a and 7b
15:40–16:00Afternoon Break
16:00–18:30Session 8: All Posters

#### Saturday, June 30, 2007

9:00–10:00Session 9: Plenary Session
10:00–10:50Sessions 10a, 10b, and 10c
10:50–11:15Morning Break
11:15–12:30Sessions 11a, 11b, and 11c
12:30–14:00Lunch
14:00–15:40Sessions 12a, 12b, and 12c
15:40–16:15Afternoon Break
16:15–17:30Sessions 13a, 13b, and 13c
17:30Closing Remarks

### CONFERENCE PROGRAM

#### Thursday, June 28, 2007

Session 1: Plenary Session
9:00–9:10Opening Remarks
9:10–10:10Invited Talk: Baby Bayesians? Evidence for Statistical Hypothesis Selection in Infant Language Learning
LouAnn Gerken, University of Arizona
10:15–10:45Modelling Compression with Discourse Constraints
James Clarke and Mirella Lapata
Session 2a: Question Answering
11:15–11:40Using Semantic Roles to Improve Question Answering
Dan Shen and Mirella Lapata
11:40–12:05What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA
Mengqiu Wang, Noah A. Smith and Teruko Mitamura
12:05–12:30Learning Unsupervised SVM Classifier for Answer Selection in Web Question Answering
Youzheng Wu, Ruiqiang Zhang, Xinhui Hu and Hideki Kashioka
Session 2b: Machine Translation
11:15–11:40Improving Word Alignment with Bridge Languages
Shankar Kumar, Franz J. Och and Wolfgang Macherey
11:40–12:05Getting the Structure Right for Word Alignment: LEAF
Alexander Fraser and Daniel Marcu
12:05–12:30Improving Statistical Machine Translation Using Word Sense Disambiguation
Marine Carpuat and Dekai Wu
Session 3a: Generation, Summarization, and Discourse
14:00–14:25Large Margin Synchronous Generation and its Application to Sentence Compression
Trevor Cohn and Mirella Lapata
14:25–14:50Incremental Text Structuring with Online Hierarchical Ranking
Erdong Chen, Benjamin Snyder and Regina Barzilay
14:50–15:15Automatically Identifying the Arguments of Discourse Connectives
Ben Wellner and James Pustejovsky
15:15–15:40Incremental Generation of Plural Descriptions: Similarity and Partitioning
Albert Gatt and Kees van Deemter
Session 3b: Parsing
14:00–14:25A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors
Joachim Wagner, Jennifer Foster and Josef van Genabith
14:25–14:50Characterizing the Errors of Data-Driven Dependency Parsing Models
Ryan McDonald and Joakim Nivre
14:50–15:15Probabilistic Models of Nonprojective Dependency Trees
David A. Smith and Noah A. Smith
15:15–15:40Structured Prediction Models via the Matrix-Tree Theorem
Terry Koo, Amir Globerson, Xavier Carreras and Michael Collins
Session 4: All Posters (16:00–18:30)
Using Foreign Inclusion Detection to Improve Parsing Performance
Beatrice Alex, Amit Dubey and Frank Keller
LEDIR: An Unsupervised Algorithm for Learning Directionality of Inference Rules
Rahul Bhagat, Patrick Pantel and Eduard Hovy
Modelling Polysemy in Adjective Classes by Multi-Label Classification
Gemma Boleda, Sabine Schulte im Walde and Toni Badia
Improving Query Spelling Correction Using Web Search Results
Qing Chen, Mu Li and Ming Zhou
Towards Robust Unsupervised Personal Name Disambiguation
Ying Chen and James Martin
Compressing Trigram Language Models With Golomb Coding
Kenneth Church, Ted Hart and Jianfeng Gao
Joint Morphological and Syntactic Disambiguation
Shay B. Cohen and Noah A. Smith
Unsupervised Part-of-Speech Acquisition for Resource-Scarce Languages
Sajib Dasgupta and Vincent Ng
Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing
Gunes Erkan, Arzucan Ozgur and Dragomir R. Radev
A Sequence Alignment Model Based on the Averaged Perceptron
Dayne Freitag and Shahram Khadivi
Instance Based Lexical Entailment for Ontology Population
Claudio Giuliano and Alfio Gliozzo
Recovering Non-Local Dependencies for Chinese
Yuqing Guo, Haifeng Wang and Josef van Genabith
Exploiting Multi-Word Units in History-Based Probabilistic Generation
Deirdre Hogan, Conor Cafferkey, Aoife Cahill and Josef van Genabith
Hierarchical System Combination for Machine Translation
Fei Huang and Kishore Papineni
Using RBMT Systems to Produce Bilingual Corpus for SMT
Xiaoguang Hu, Haifeng Wang and Hua Wu
Why Doesn’t EM Find Good HMM POS-Taggers?
Mark Johnson
Probabilistic Coordination Disambiguation in a Fully-Lexicalized Japanese Parser
Daisuke Kawahara and Sadao Kurohashi
A New Perceptron Algorithm for Sequence Labeling with Non-Local Features
Jun’ichi Kazama and Kentaro Torisawa
Extending a Thesaurus in the Pan-Chinese Context
Oi Yee Kwong and Benjamin K. Tsou
Low-Quality Product Review Detection in Opinion Summarization
Jingjing Liu, Yunbo Cao, Chin-Yew Lin, Yalou Huang and Ming Zhou
Improving Statistical Machine Translation Performance by Training Data Selection and Optimization
Yajuan Lu, Jin Huang and Qun Liu
Topic Segmentation with Hybrid Document Indexing
Irina Matveeva and Gina-Anne Levow
Syntactic Re-Alignment Models for Machine Translation
Jonathan May and Kevin Knight
Detecting Compositionality of Verb-Object Combinations using Selectional Preferences
Diana McCarthy, Sriram Venkatapathy and Aravind Joshi
Explorations in Automatic Book Summarization
Rada Mihalcea and Hakan Ceylan
Part-of-Speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic Texts
Taesun Moon and Jason Baldridge
Flexible, Corpus-Based Modelling of Human Plausibility Judgements
Sebastian Padó, Ulrike Padó and Katrin Erk
V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure
Andrew Rosenberg and Julia Hirschberg
Bayesian Document Generative Model with Explicit Multiple Topics
Issei Sato and Hiroshi Nakagawa
Smooth Bilingual N-Gram Translation
Holger Schwenk, Marta R. Costa-jussa and Jose A. R. Fonollosa
Morphological Disambiguation of Hebrew: A Case Study in Classifier Combination
Danny Shacham and Shuly Wintner
Enhancing Single-Document Summarization by Combining RankNet and Third-Party Sources
Krysta Svore, Lucy Vanderwende and Christopher Burges
Automatic Identification of Important Segments and Expressions for Mining of Business-Oriented Conversations at Contact Centers
Hironori Takeuchi, L Venkata Subramaniam, Tetsuya Nasukawa and Shourya Roy
Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap
David Talbot and Miles Osborne
Word Sense Disambiguation Incorporating Lexical and Structural Semantic Information
Takaaki Tanaka, Francis Bond, Timothy Baldwin, Sanae Fujita and Chikara Hashimoto
An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data
Katrin Tomanek, Joachim Wermter and Udo Hahn
Antecedent Selection Techniques for High-Recall Coreference Resolution
Yannick Versley
Methods to Integrate a Language Model with Semantic Information for a Word Prediction Component
Tonio Wandmacher and Jean-Yves Antoine
Bilingual Cluster Based Models for Statistical Machine Translation
Hirofumi Yamamoto and Eiichiro Sumita
A Systematic Comparison of Training Criteria for Statistical Machine Translation
Richard Zens, Sasa Hasan and Hermann Ney
Phrase Reordering Model Integrating Syntactic Knowledge for SMT
Dongdong Zhang, Mu Li, Chi-Ho Li and Ming Zhou
Identification and Resolution of Chinese Zero Pronouns: A Machine Learning Approach
Shanheng Zhao and Hwee Tou Ng
Parsimonious Data-Oriented Parsing
Willem Zuidema

#### Friday, June 29, 2007

Session 5a: Semantics
9:00–9:25Generating Lexical Analogies Using Dependency Relations
Andy Chiu, Pascal Poupart and Chrysanne DiMarco
9:25–9:50Cross-Lingual Distributional Profiles of Concepts for Measuring Semantic Distance
Saif Mohammad, Iryna Gurevych, Graeme Hirst and Torsten Zesch
9:50–10:15Lexical Semantic Relatedness with Random Graph Walks
Thad Hughes and Daniel Ramage
10:15–10:40Experimental Evaluation of LTAG-Based Features for Semantic Role Labeling
Yudong Liu and Anoop Sarkar
Session 5b: Parsing
9:00–9:25Japanese Dependency Analysis Using the Ancestor-Descendant Relation
Akihiro Tamura, Hiroya Takamura and Manabu Okumura
9:25–9:50A Discriminative Learning Model for Coordinate Conjunctions
Masashi Shimbo and Kazuo Hara
9:50–10:15Recovery of Empty Nodes in Parse Structures
Denis Filimonov and Mary Harper
10:15–10:40Treebank Annotation Schemes and Parser Evaluation for German
Ines Rehbein and Josef van Genabith
Session 6a: Document Analysis
11:15–11:40Semi-Markov Models for Sequence Segmentation
Qinfeng Shi, Yasemin Altun, Alex Smola and S.V.N. Vishwanathan
11:40–12:05A Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields
Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto
12:05–12:30MavenRank: Identifying Influential Members of the US Senate Using Lexical Centrality
Anthony Fader, Dragomir R. Radev, Michael H. Crespin, Burt L. Monroe, Kevin M. Quinn and Michael Colaresi
Session 6b: Grammar Learning
11:15–11:40Bootstrapping Feature-Rich Dependency Parsers with Entropic Priors
David A. Smith and Jason Eisner
11:40–12:05Online Learning of Relaxed CCG Grammars for Parsing to Logical Form
Luke Zettlemoyer and Michael Collins
12:05–12:30The Infinite PCFG Using Hierarchical Dirichlet Processes
Percy Liang, Slav Petrov, Michael Jordan and Dan Klein
Session 7a: Information Extraction
14:00–14:25Exploiting Wikipedia as External Knowledge for Named Entity Recognition
Jun’ichi Kazama and Kentaro Torisawa
14:25–14:50Large-Scale Named Entity Disambiguation Based on Wikipedia Data
Silviu Cucerzan
14:50–15:15Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions
Siddharth Patwardhan and Ellen Riloff
15:15–15:40Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information
GuoDong Zhou, Min Zhang, DongHong Ji and QiaoMing Zhu
Session 7b: Machine Translation
14:00–14:25Chinese Syntactic Reordering for Statistical Machine Translation
Chao Wang, Michael Collins and Philipp Koehn
14:25–14:50Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy
Wei Wang, Kevin Knight and Daniel Marcu
14:50–15:15What Can Syntax-Based MT Learn from Phrase-Based MT?
Steve DeNeefe, Kevin Knight, Wei Wang and Daniel Marcu
15:15–15:40Online Large-Margin Training for Statistical Machine Translation
Taro Watanabe, Jun Suzuki, Hajime Tsukada and Hideki Isozaki
Session 8: All Posters (16:00–18:30)
Consult the list of poster titles under Session 4.

#### Saturday, June 30, 2007

Session 9: Plenary Session
9:00–10:00Invited Talk: Hashing, Sketching, and Other Approximate Algorithms for High-Dimensional Data
Piotr Indyk, Massachusetts Institute of Technology
Session 10a: Machine Learning (supervised classifiers)
10:00–10:25Scalable Term Selection for Text Categorization
Jingyang Li and Maosong Sun
10:25–10:50Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem
Jingbo Zhu and Eduard Hovy
Session 10b: Machine Learning (sequential models)
10:00–10:25Semi-Supervised Structured Output Learning Based on a Hybrid Generative and Discriminative Approach
Jun Suzuki, Akinori Fujino and Hideki Isozaki
10:25–10:50Finding Good Sequential Model Structures using Output Transformations
Edward Loper
Session 10c: Information Retrieval
10:00–10:25A Statistical Language Modeling Approach to Lattice-Based Spoken Document Retrieval
Tee Kiah Chia, Haizhou Li and Hwee Tou Ng
10:25–10:50Learning Noun Phrase Query Segmentation
Shane Bergsma and Qin Iris Wang
Session 11a: Information Extraction
11:15–11:40Bootstrapping Information Extraction from Field Books
Sander Canisius and Caroline Sporleder
11:40–12:05Extracting Data Records from Unstructured Biomedical Full Text
Donghui Feng, Gully Burns and Eduard Hovy
12:05–12:30Multiple Alignment of Citation Sentences with Conditional Random Fields and Posterior Decoding
Ariel Schwartz, Anna Divoli and Marti Hearst
Session 11b: Machine Translation
11:15–11:40Large Language Models in Machine Translation
Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och and Jeffrey Dean
11:40–12:05Factored Translation Models
Philipp Koehn and Hieu Hoang
12:05–12:30Translating Unknown Words by Analogical Learning
Philippe Langlais and Alexandre Patry
Session 11c: Phonetics and Phonology
11:15–11:40A Probabilistic Approach to Diachronic Phonology
Alexandre Bouchard, Percy Liang, Thomas Griffiths and Dan Klein
11:40–12:05Learning Structured Models for Phone Recognition
Slav Petrov, Adam Pauls and Dan Klein
12:05–12:30Inducing Search Keys for Name Filtering
L. Karl Branting
Session 12a: CoNLL Shared Task Session (dependency parsing)
14:00–14:15The CoNLL 2007 Shared Task on Dependency Parsing
Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel and Deniz Yuret
14:15–14:30Single Malt or Blended? A Study in Multilingual Parser Optimization
Johan Hall, Jens Nilsson, Joakim Nivre, Gülsen Eryigit, Beáta Megyesi, Mattias Nilsson and Markus Saers
14:30–14:45Probabilistic Parsing Action Models for Multi-Lingual Dependency Parsing
Xiangyu Duan, Jun Zhao and Bo Xu
14:45–15:00Fast and Robust Multilingual Dependency Parsing with a Generative Latent Variable Model
Ivan Titov and James Henderson
15:00–15:15Multilingual Dependency Parsing Using Global Features
Tetsuji Nakagawa
15:15–15:30Experiments with a Higher-Order Projective Dependency Parser
Xavier Carreras
15:30–15:45Log-Linear Models of Non-Projective Trees, k-best MST Parsing and Tree-Ranking
Keith Hall, Jiri Havelka and David A. Smith
Session 12b: Machine Translation
14:00–14:25Improving Translation Quality by Discarding Most of the Phrasetable
Howard Johnson, Joel Martin, George Foster and Roland Kuhn
14:25–14:50Hierarchical Phrase-Based Translation with Suffix Arrays
Adam Lopez
14:50–15:15An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems
Wolfgang Macherey and Franz J. Och
15:15–15:40Learning to Find English to Chinese Transliterations on the Web
Jian-Cheng Wu and Jason S. Chang
Session 12c: Word Senses
14:00–14:25Learning to Merge Word Senses
Rion Snow, Sushant Prakash, Daniel Jurafsky and Andrew Y. Ng
14:25–14:50Improving Word Sense Disambiguation Using Topic Features
Junfu Cai, Wee Sun Lee and Yee Whye Teh
14:50–15:15A Topic Model for Word Sense Disambiguation
Jordan Boyd-Graber, David Blei and Xiaojin Zhu
15:15–15:40Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering
Aline Villavicencio, Valia Kordoni, Yi Zhang, Marco Idiart and Carlos Ramisch
Session 13a: CoNLL Shared Task Session (dependency parsing)
16:15–16:30Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles
Kenji Sagae and Jun’ichi Tsujii
16:30–16:45Frustratingly Hard Domain Adaptation for Dependency Parsing
Mark Dredze, John Blitzer, Partha Pratim Talukdar, Kuzman Ganchev, João Graca and Fernando Pereira
16:45–17:15Analysis: Sandra Kübler, Ryan McDonald
17:15–17:30Discussion
Session 13b: Sentiment
16:15–16:40Crystal: Analyzing Predictive Opinions on the Web
Soo-Min Kim and Eduard Hovy
16:40–17:05Extracting Aspect-Evaluation and Aspect-Of Relations in Opinion Mining
Nozomi Kobayashi, Kentaro Inui and Yuji Matsumoto
17:05–17:30Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents
Nobuhiro Kaji and Masaru Kitsuregawa
Session 13c: Tagging
16:15–16:40Determining Case in Arabic: Learning Complex Linguistic Behavior Requires Complex Linguistic Features
Nizar Habash, Ryan Gabbard, Owen Rambow, Seth Kulick and Mitch Marcus
16:40–17:05Mandarin Part-of-Speech Tagging and Discriminative Reranking
Zhongqiang Huang, Mary Harper and Wen Wang
17:05–17:30Building Domain-Specific Taggers without Annotated (Domain) Data
John Miller, Manabu Torii and K. Vijay-Shanker
Concluding Session
17:30Closing Remarks
Additional CoNLL Shared Task Papers (dependency parsing)
Multilingual Dependency Parsing and Domain Adaptation using DeSR
Giuseppe Attardi, Felice Dell’Orletta, Maria Simi, Atanas Chanev and Massimiliano Ciaramita
Hybrid Ways to Improve Domain Independence in an ML Dependency Parser
Eckhard Bick
A Constraint Satisfaction Approach to Dependency Parsing
Sander Canisius and Erik Tjong Kim Sang
A Two-Stage Parser for Multilingual Dependency Parsing
Wenliang Chen, Yujie Zhang and Hitoshi Isahara
Incremental Dependency Parsing Using Online Learning
Richard Johansson and Pierre Nugues
Online Learning for Deterministic Dependency Parsing
Prashanth Reddy Mannem
Covington Variations
Svetoslav Marinov
A Multilingual Dependency Analysis System Using Online Passive-Aggressive Learning
Le-Minh Nguyen, Akira Shimazu, Phuong-Thai Nguyen and Xuan-Hieu Phan
Global Learning of Labeled Dependency Trees
Michael Schiehlen and Kristina Spranger
Pro3Gres Parser in the CoNLL Domain Adaptation Shared Task
Gerold Schneider, Kaarel Kaljurand, Fabio Rinaldi and Tobias Kuhn
Structural Correspondence Learning for Dependency Parsing
Nobuyuki Shimizu and Hiroshi Nakagawa
Adapting the RASP System for the CoNLL07 Domain-Adaptation Task
Rebecca Watson and Ted Briscoe
Multilingual Deterministic Dependency Parsing Framework using Modified Finite Newton Method Support Vector Machines
Yu-Chieh Wu, Jie-Chi Yang and Yue-Shi Lee

### CONFERENCE PROGRAM WITH ABSTRACTS

#### Thursday, June 28, 2007

Session 1: Plenary Session
9:00–9:10Opening Remarks
9:10–10:10Invited Talk: Baby Bayesians? Evidence for Statistical Hypothesis Selection in Infant Language Learning
LouAnn Gerken, University of Arizona

The past 20 years of work on human language learning has highlighted infants' extreme sensitivity to input statistics, including frequency of particular phonemes and sequences, distributions of acoustic tokens of phoneme types, and transitional probabilities between adjacent and non-adjacent syllables. Recent work from our lab suggests that infants not only keep close track of statistical properties of their input, but they use their statistical sensitivity to select among hypotheses about the underlying structures that might have given rise to those statistics. After a brief introduction to infant testing methods and an overview of experiments demonstrating infants' statistical prowess, I will describe two new lines of research from our lab that are consistent with Bayesian approaches to linguistic generalization. One line of research focuses on the amount of evidence that infants need to generalize principles that are either found or not found among human languages. The other line focuses on how infants generalize from input that has at least two possible structural descriptions.  The new research begins to provide a sketch of infants as hypothesis selectors whose hypothesis space is narrowed over development by the statistical properties of their input.

10:15–10:45Modelling Compression with Discourse Constraints
James Clarke and Mirella Lapata

Sentence compression holds promise for many applications ranging from summarisation to subtitle generation and information retrieval. The task is typically performed on isolated sentences without taking the surrounding context into account, even though most applications would operate over entire documents. In this paper we present a discourse informed model which is capable of producing document compressions that are coherent and informative. Our model is inspired by theories of local coherence and formulated within the framework of Integer Linear Programming. Experimental results show significant improvements over a state-of-the-art discourse agnostic approach.

Session 2a: Question Answering
11:15–11:40Using Semantic Roles to Improve Question Answering
Dan Shen and Mirella Lapata

Shallow semantic parsing, the automatic identification and labeling of sentential constituents, has recently received much attention. Our work examines whether semantic role information is beneficial to question answering. We introduce a general framework for answer extraction which exploits semantic role annotations in the FrameNet paradigm. We view semantic role assignment as an optimization problem in a bipartite graph and answer extraction as an instance of graph matching. Experimental results on the TREC datasets demonstrate improvements over state-of-the-art models.

11:40–12:05What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA
Mengqiu Wang, Noah A. Smith and Teruko Mitamura

This paper presents a syntax-driven approach to question answering, specifically the answer-sentence selection problem for short-answer questions. Rather than using syntactic features to augment existing statistical classifiers (as in previous work), we build on the idea that questions and their (correct) answers relate to each other via loose but predictable syntactic transformations. We propose a probabilistic quasi-synchronous grammar, inspired by one proposed for machine translation (D. Smith and Eisner, 2006), and parameterized by mixtures of a robust non-lexical syntax/alignment model with a(n optional) lexical-semantics-driven log-linear model. Our model learns soft alignments as a hidden variable in discriminative training. Experimental results using the TREC dataset are shown to significantly outperform strong state-of-the-art baselines.

12:05–12:30Learning Unsupervised SVM Classifier for Answer Selection in Web Question Answering
Youzheng Wu, Ruiqiang Zhang, Xinhui Hu and Hideki Kashioka

Previous machine learning techniques for answer selection in question answering (QA) have required question-answer training pairs. It has been too expensive and labor-intensive, however, to collect these training pairs. This paper presents a novel unsupervised support vector machine (U-SVM) classifier for answer selection, which is independent of language and does not require hand-tagged training pairs. The key ideas are the following: 1. unsupervised learning of training data for the classifier by clustering web search results; and 2. selecting the answer from the candidates by classifying the question. The comparative experiments demonstrate that the proposed approach significantly outperforms the retrieval-based model (Retrieval-M), the supervised SVM classifier (S-SVM), and the pattern-based model (Pattern-M) for answer selection. Moreover, the cross-model comparison showed that the performance ranking of these models was: U-SVM > Pattern-M > S-SVM > Retrieval-M.

Session 2b: Machine Translation
11:15–11:40Improving Word Alignment with Bridge Languages
Shankar Kumar, Franz J. Och and Wolfgang Macherey

We describe an approach to improve Statistical Machine Translation (SMT) performance using multi-lingual, parallel, sentence-aligned corpora in several bridge languages. Our approach consists of a simple method for utilizing a bridge language to create a word alignment system and a procedure for combining word alignment systems from multiple bridge languages. The final translation is obtained by consensus decoding that combines hypotheses obtained using all bridge language word alignments. We present experiments showing that multilingual, parallel text in Spanish, French, Russian, and Chinese can be utilized in this framework to improve translation performance on an Arabic-to-English task.

11:40–12:05Getting the Structure Right for Word Alignment: LEAF
Alexander Fraser and Daniel Marcu

Automatic word alignment is the problem of automatically annotating parallel text with translational correspondence. Previous generative word alignment models have made structural assumptions such as the 1-to-1, 1-to-N, or phrase-based consecutive word assumptions, while previous discriminative models have either made one of these assumptions directly or used features derived from a generative model using one of these assumptions. We present a new generative alignment model which avoids these structural limitations, and show that it is effective when trained using both unsupervised and semi-supervised training methods. Experiments show strong improvements in word alignment accuracy and usage of the generated alignments in hierarchical and phrasal SMT systems increases the BLEU score.

12:05–12:30Improving Statistical Machine Translation Using Word Sense Disambiguation
Marine Carpuat and Dekai Wu

We show for the first time that incorporating the predictions of a word sense disambiguation system within a typical phrase-based statistical machine translation (SMT) model consistently improves translation quality across all three different IWSLT Chinese-English test sets, as well as producing statistically significant improvements on the larger NIST Chinese-English MT task---and moreover never hurts performance on any test set, according not only to BLEU but to all eight most commonly used automatic evaluation metrics. Recent work has challenged the assumption that word sense disambiguation (WSD) systems are useful for SMT. Yet SMT translation quality still obviously suffers from inaccurate lexical choice. In this paper, we address this problem by investigating a new strategy for integrating WSD into an SMT system, that performs fully phrasal multi-word disambiguation. Instead of directly incorporating a Senseval-style WSD system, we redefine the WSD task to match the exact same phrasal translation disambiguation task faced by phrase-based SMT systems. Our results provide the first known empirical evidence that lexical semantics are indeed useful for SMT, despite claims to the contrary.

Session 3a: Generation, Summarization, and Discourse
14:00–14:25Large Margin Synchronous Generation and its Application to Sentence Compression
Trevor Cohn and Mirella Lapata

This paper presents a tree-to-tree transduction method for text rewriting. Our model is based on synchronous tree substitution grammar, a formalism that allows local distortion of the tree topology and can thus naturally capture structural mismatches. We describe an algorithm for decoding in this framework and show how the model can be trained discriminatively within a large margin framework. Experimental results on sentence compression bring significant improvements over a state-of-the-art model.

14:25–14:50Incremental Text Structuring with Online Hierarchical Ranking
Erdong Chen, Benjamin Snyder and Regina Barzilay

Many emerging applications require documents to be repeatedly updated. Such documents include newsfeeds, webpages, and shared community resources such as Wikipedia. In this paper we address the task of inserting new information into existing texts. In particular, we wish to determine the best location in a text for a given piece of new information. For this process to succeed, the insertion algorithm should be informed by the existing document structure. Lengthy real-world texts are often hierarchically organized into chapters, sections, and paragraphs. We present an online ranking model which exploits this hierarchical structure -- representationally in its features and algorithmically in its learning procedure. When tested on a corpus of Wikipedia articles, our hierarchically informed model predicts the correct insertion paragraph more accurately than baseline methods.

14:50–15:15Automatically Identifying the Arguments of Discourse Connectives
Ben Wellner and James Pustejovsky

In this paper we consider the problem of automatically identifying the arguments of discourse connectives (e.g. and, because, nevertheless) in the Penn Discourse TreeBank(PDTB). Rather than identifying the full extents of these arguments as annotated in the PDTB, however, we re-cast the problem to that of identifying the argument heads, effectively side-stepping the problem of discourse segmentation. We demonstrate significant gains using features derived from a dependency parse representation over those derived from a constituency-based tree parse. By also capturing inter-argument dependencies using a log-linear re-ranking model we achieve very promising results on this difficult task identifying both arguments correctly for over 74% of the connectives on held-out test data using gold-standard parses.

15:15–15:40Incremental Generation of Plural Descriptions: Similarity and Partitioning
Albert Gatt and Kees van Deemter

Approaches to plural reference generation emphasise descriptive brevity, but often lack empirical backing. This paper describes a corpus-based study of plural descriptions, and proposes a psycholinguistically-motivated algorithm for plural reference generation. The descriptive strategy is based on partitioning, and incorporates corpus-derived heuristics. An exhaustive evaluation shows that the output closely matches human data.

Session 3b: Parsing
14:00–14:25A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors
Joachim Wagner, Jennifer Foster and Josef van Genabith

This paper compares a deep and a shallow processing approach to the problem of classifying a sentence as grammatically well-formed or ill-formed. The deep processing approach uses the XLE LFG parser and English grammar: two versions are presented, one which uses the XLE directly to perform the classification, and another one which uses a decision tree trained on features consisting of the XLE's output statistics. The shallow processing approach predicts grammaticality based on n-gram frequency statistics: we present two versions, one which uses frequency thresholds and one which uses a decision tree trained on the frequencies of the rarest n-grams in the input sentence. We find that the use of a decision tree improves on the basic approach only for the deep parser-based approach. We also show that combining both the shallow and deep decision tree features is effective. Our evaluation is carried out using a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting grammatical errors into well-formed BNC sentences.

14:25–14:50Characterizing the Errors of Data-Driven Dependency Parsing Models
Ryan McDonald and Joakim Nivre

We present a comparative error analysis of the two dominant approaches in data-driven dependency parsing: global, exhaustive, graph-based models, and local, greedy, transition-based models. We show that, in spite of similar performance overall, the two models produce different types of errors, in a way that can be explained by theoretical properties of the two models. This analysis leads to new directions for parser development.

14:50–15:15Probabilistic Models of Nonprojective Dependency Trees
David A. Smith and Noah A. Smith

A notable gap in research on statistical dependency parsing is a proper conditional probability distribution over nonprojective dependency trees for a given sentence. We exploit the Matrix Tree Theorem (Tutte, 1984) to derive an algorithm that efficiently sums the scores of all nonprojective trees in a sentence, permitting the definition of a conditional log-linear model over trees. While discriminative methods, such as those presented in McDonald et al. (2005), obtain very high accuracy on standard dependency parsing tasks and can be trained and applied without marginalization, summing trees'' permits some alternative techniques of interest. Using the summing algorithm, we present experimental results on four nonprojective languages, for maximum conditional likelihood estimation, minimum Bayes-risk parsing, and hidden variable training.

15:15–15:40Structured Prediction Models via the Matrix-Tree Theorem
Terry Koo, Amir Globerson, Xavier Carreras and Michael Collins

This paper provides an algorithmic framework for learning statistical models involving directed spanning trees, or equivalently non-projective dependency structures. We show how partition functions and marginals for directed spanning trees can be computed by an adaptation of Kirchhoff's Matrix-Tree Theorem. To demonstrate an application of the method, we perform experiments which use the algorithm in training both log-linear and max-margin dependency parsers. The new training methods give improvements in accuracy over perceptron-trained models.

Session 4: All Posters (16:00–18:30)
Using Foreign Inclusion Detection to Improve Parsing Performance
Beatrice Alex, Amit Dubey and Frank Keller

Inclusions from other languages can be a significant source of errors for monolingual parsers. We show this for English inclusions, which are sufficiently frequent to present a problem when parsing German. We describe an annotation-free approach for accurately detecting such inclusions, and develop two methods for interfacing this approach with a state-of-the-art parser for German. An evaluation on the TIGER corpus shows that our inclusion entity model achieves a performance gain of 4.3 points in F-score over a baseline of no inclusion detection, and even outperforms a parser with access to gold standard part-of-speech tags.

LEDIR: An Unsupervised Algorithm for Learning Directionality of Inference Rules
Rahul Bhagat, Patrick Pantel and Eduard Hovy

Semantic inference is a core component of many natural language applications. In response, several researchers have developed algorithms for automatically learning inference rules from textual corpora. However, these rules are often either imprecise or underspecified in directionality. In this paper we propose an algorithm called LEDIR that filters incorrect inference rules and identifies the directionality of correct ones. Based on an extension to Harris's distributional hypothesis, we use selectional preferences to gather evidence of inference directionality and plausibility. Experiments show empirical evidence that our approach can classify inference rules significantly better than several baselines.

Modelling Polysemy in Adjective Classes by Multi-Label Classification
Gemma Boleda, Sabine Schulte im Walde and Toni Badia

This paper assesses the role of multi-label classification in modelling polysemy for language acquisition tasks. We focus on the acquisition of semantic classes for Catalan adjectives, and show that polysemy acquisition naturally suits architectures used for multi-label classification. Furthermore, we explore the performance of information drawn from different levels of linguistic description, using feature sets based on morphology, syntax, semantics, and n-gram distribution. Finally, we demonstrate that ensemble classifiers are a powerful and adequate way to combine different types of linguistic evidence: a simple, majority voting ensemble classifier improves the accuracy from 62.5% (best single classifier) to 84%.

Improving Query Spelling Correction Using Web Search Results
Qing Chen, Mu Li and Ming Zhou

Traditional research on spelling correction in natural language processing and information retrieval literature mostly relies on pre-defined lexicons to detect spelling errors. But this method does not work well for web query spelling correction, because there is no lexicon that can cover the vast amount of terms occurring across the web. Recent work showed that using search query logs helps to solve this problem to some extent. However, such approaches cannot deal with rarely-used query terms well due to the data sparseness problem. In this paper, a novel method is proposed for use of web search results to improve the existing query spelling correction models solely based on query logs by leveraging the rich information on the web related to the query and its top-ranked candidate. Experiments are performed based on real-world queries randomly sampled from search engine's daily logs, and the results show that our new method can achieve 16.9% relative F-measure improvement and 35.4% overall error rate reduction in comparison with the baseline method.

Towards Robust Unsupervised Personal Name Disambiguation
Ying Chen and James Martin

The increasing use of large open-domain document sources is exacerbating the problem of ambiguity in named entities. This paper explores the use of a range of syntactic and semantic features in unsupervised clustering of documents that result from ad hoc queries containing names. From these experiments, we find that the use of robust syntactic and semantic features can significantly improve the state of the art for disambiguation performance for personal names for both Chinese and English.

Compressing Trigram Language Models With Golomb Coding
Kenneth Church, Ted Hart and Jianfeng Gao

Trigram language models are compessed using a Golomb coding method inspired by the original Unix spell program. Compression methods trade off space, time and accuracy (loss). The proposed HashTBO method optimizes space at the expense of time and accuracy. Trigram language models are normally considered memory hogs, but with HashTBO, it is possible to squeeze a trigram language model into a few megabytes or less. HashTBO made it possible to ship a trigram contextual speller in Microsoft Office 2007.

Joint Morphological and Syntactic Disambiguation
Shay B. Cohen and Noah A. Smith

In morphologically rich languages, should morphological and syntactic disambiguation be treated sequentially or as a single problem? We describe several efficient, probabilistically-interpretable ways to apply joint inference to morphological and syntactic disambiguation using lattice parsing. Joint inference is shown to compare favorably to pipeline parsing methods across a variety of component models. State-of-the-art performance on Hebrew Treebank parsing is demonstrated using the new method. The benefits of joint inference are modest with the current component models, but appear to increase as components themselves improve.

Unsupervised Part-of-Speech Acquisition for Resource-Scarce Languages
Sajib Dasgupta and Vincent Ng

This paper proposes a new bootstrapping approach to unsupervised part-of-speech induction for resource-scarce languages. In comparison to previous bootstrapping algorithms developed for this problem, our approach aims to improve the quality of the seed clusters by employing seed words that are both distributionally and morphologically reliable. In particular, we present a novel method for combining morphological and distributional information for seed selection. Experimental results demonstrate that our approach works well for English and Bengali, thus providing suggestive evidence that it is applicable to both morphologically impoverished langauges and highly inflectional langauges.

Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing
Gunes Erkan, Arzucan Ozgur and Dragomir R. Radev

We introduce a relation extraction method to identify the sentences in biomedical text that indicate an interaction among the protein names mentioned. Our approach is based on the analysis of the paths between two protein names in the dependency parse trees of the sentences. Given two dependency trees, we define two separate similarity functions (kernels) based on cosine similarity and edit distance among the paths between the protein names. Using these similarity functions, we investigate the performances of two classes of learning algorithms, Support Vector Machines and k-nearest-neighbor, and the semi-supervised counterparts of these algorithms, transductive SVMs and harmonic functions, respectively. Significant improvement over the previous results in the literature is reported as well as a new benchmark dataset is introduced. Semi-supervised algorithms perform better than their supervised version by a wide margin especially when the amount of labeled data is limited.

A Sequence Alignment Model Based on the Averaged Perceptron
Dayne Freitag and Shahram Khadivi

We describe a discriminatively trained sequence alignment model based on the averaged perceptron. In common with other approaches to sequence modeling using perceptrons, and in contrast with comparable generative models, this model permits and transparently exploits arbitrary features of input strings. The simplicity of perceptron training lends more versatility than comparable approaches, allowing the model to be applied to a variety of problem types for which a learned edit model might be useful. We enumerate some of these problem types, describe a training procedure for each, and evaluate the model's performance on several problems. We show that the proposed model performs at least as well as an approach based on statistical machine translation on two problems of name transliteration, and provide evidence that the combination of the two approaches promises further improvement.

Instance Based Lexical Entailment for Ontology Population
Claudio Giuliano and Alfio Gliozzo

In this paper we propose an instance based method for lexical entailment and apply it to automatic ontology population from text. The approach is fully unsupervised and based on kernel methods. We demonstrate the effectiveness of our technique largely surpassing both the random and most frequent baselines and outperforming current state-of-the-art unsupervised approaches on a benchmark ontology available in the literature.

Recovering Non-Local Dependencies for Chinese
Yuqing Guo, Haifeng Wang and Josef van Genabith

To date, work on Non-Local Dependencies (NLDs) has focused almost exclusively on English and it is an open research question how well these approaches migrate to other languages. This paper surveys non-local dependency constructions in Chinese as represented in the Penn Chinese Treebank (CTB) and provides an approach for generating proper predicate-argument-modifier structures including NLDs from surface contextfree phrase structure trees. Our approach recovers non-local dependencies at the level of Lexical-Functional Grammar f-structures, using automatically acquired subcategorisation frames and f-structure paths linking antecedents and traces in NLDs. Currently our algorithm achieves 92.2% f-score for trace insertion and 84.3% for antecedent recovery evaluating on gold-standard CTB trees, and 64.7% and 54.7%, respectively, on CTBtrained state-of-the-art parser output trees.

Exploiting Multi-Word Units in History-Based Probabilistic Generation
Deirdre Hogan, Conor Cafferkey, Aoife Cahill and Josef van Genabith

We present a simple history-based model for sentence generation from LFG f-structures, which improves on the accuracy of previous models by breaking down PCFG independence assumptions so that more conditioning context is used in the prediction of grammar rule expansions. In addition, we present work on experiments with named entities and other multi-word units, showing a statistically significant improvement of generation accuracy.

Hierarchical System Combination for Machine Translation
Fei Huang and Kishore Papineni

Given multiple translations of the same source sentence, how to combine them to produce a translation that is better than any single system output? We propose a hierarchical system combination framework for machine translation. This framework integrates multiple MT systems' output at the word-, phrase- and sentence- levels. By boosting common word and phrase translation pairs, pruning unused phrases, and exploring decoding paths adopted by other MT systems, this framework achieves better translation quality with much less re-decoding time. The full sentence translation hypotheses from multiple systems are additionally selected based on N-gram language models trained on word/word-POS mixed stream, which further improves the translation quality. We consistently observed significant improvements on several test sets in multiple languages covering different genres.

Using RBMT Systems to Produce Bilingual Corpus for SMT
Xiaoguang Hu, Haifeng Wang and Hua Wu

This paper proposes a method using the existing Rule-based Machine Translation (RBMT) system as a black box to produce synthetic bilingual corpus, which will be used as training data for the Statistical Machine Translation (SMT) system. We use the existing RBMT system to translate the monolingual corpus into synthetic bilingual corpus. With the synthetic bilingual corpus, we can build an SMT system even if there is no real bilingual corpus. In our experiments using BLEU as a metric, the system achieves a relative improvement of 11.7% over the best RBMT system that is used to produce the synthetic bilingual corpora. We also interpolate the model trained on a real bilingual corpus and the models trained on the synthetic bilingual corpora. The interpolated model achieves an absolute improvement of 0.0245 BLEU score (13.1% relative) as compared with the individual model trained on the real bilingual corpus.

Why Doesn't EM Find Good HMM POS-Taggers?
Mark Johnson

This paper investigates why the HMMs estimated by Expectation-Maximization (EM) produce such poor results as Part-of-Speech (POS) taggers. We find that the HMMs estimated by EM generally assign a roughly equal number of word tokens to each hidden state, while the empirical distribution of tokens to POS tags is highly skewed.

This motivates a Bayesian approach using a sparse prior to bias the estimator toward such a skewed distribution. We investigate Gibbs Sampling (GS) and Variational Bayes (VB) estimators and show that VB converges faster than GS for this task and that VB significantly improves 1-to-1 tagging accuracy over EM. We also show that EM does nearly as well as VB when the number of hidden HMM states is dramatically reduced. We also point out the high variance in all of these estimators, and that they require many more iterations to approach convergence than usually thought.

Probabilistic Coordination Disambiguation in a Fully-Lexicalized Japanese Parser
Daisuke Kawahara and Sadao Kurohashi

This paper describes a probabilistic model for coordination disambiguation integrated into syntactic and case structure analysis. Our model probabilistically assesses the parallelism of a candidate coordinate structure using syntactic/semantic similarities and cooccurrence statistics. We integrate these probabilities into the framework of fully-lexicalized parsing based on large-scale case frames. This approach simultaneously addresses two tasks of coordination disambiguation: the detection of coordinate conjunctions and the scope disambiguation of coordinate structures. Experimental results on web sentences indicate the effectiveness of our approach.

A New Perceptron Algorithm for Sequence Labeling with Non-Local Features
Jun’ichi Kazama and Kentaro Torisawa

We cannot use non-local features due to concerns about complexity with current major methods of sequence labeling such as CRFs. We propose a new perceptron algorithm that can use non-local features. Our algorithm allows the use of all types of non-local features whose values are determined from the sequence and the labels. The weights of local and non-local features are learned together in the training process with guaranteed convergence. We present experimental results from the CoNLL 2003 named entity recognition (NER) task to demonstrate the performance of the proposed algorithm.

Extending a Thesaurus in the Pan-Chinese Context
Oi Yee Kwong and Benjamin K. Tsou

In this paper, we address a unique problem in Chinese language processing and report on our study on extending a Chinese thesaurus with region-specific words, mostly from the financial domain, from various Chinese speech communities. With the larger goal of automatically constructing a Pan-Chinese lexical resource, this work aims at taking an existing semantic classificatory structure as leverage and incorporating new words into it. In particular, it is important to see if the classification could accommodate new words from heterogeneous data sources, and whether simple similarity measures and clustering methods could cope with such variation. We use the cosine function for similarity and test it on automatically classifying 120 target words from four regions, using different datasets for the extraction of feature vectors. The automatic classification results were evaluated against human judgement, and the performance was encouraging, with accuracy reaching over 85% in some cases. Thus while human judgement is not straightforward and it is difficult to create a Pan-Chinese lexicon manually, it is observed that combining simple clustering methods with the appropriate data sources appears to be a promising approach toward its automatic construction.

Low-Quality Product Review Detection in Opinion Summarization
Jingjing Liu, Yunbo Cao, Chin-Yew Lin, Yalou Huang and Ming Zhou

Product reviews posted at online shopping sites vary greatly in quality. This paper addresses the problem of detecting low-quality product reviews. Three types of biases in the existing evaluation standard of product reviews are discovered. To assess the quality of product reviews, a set of specifications for judging the quality of reviews is first defined. A classification-based approach is proposed to detect the low-quality reviews. We apply the proposed approach to enhance opinion sum-marization in a two-stage framework. Experimental results show that the proposed approach effectively (1) discriminates low-quality reviews from high-quality ones and (2) enhances the task of opinion summarization by detecting and filtering low-quality reviews.

Improving Statistical Machine Translation Performance by Training Data Selection and Optimization
Yajuan Lu, Jin Huang and Qun Liu

Parallel corpus is an indispensable resource for translation model training in statistical machine translation (SMT). Instead of collecting more and more parallel training corpora, this paper aims to improve SMT performance by exploiting full potential of the existing parallel corpora. Two kinds of methods are proposed: offline data optimization and online model optimization. The offline method adapts the training data by redistributing the weight of each training sentence pairs. The online method adapts the translation model by redistributing the weight of each predefined submodels. Information retrieval model is used for the weighting scheme in both methods. Experimental results show that without using any additional resource, both methods can improve SMT performance significantly.

Topic Segmentation with Hybrid Document Indexing
Irina Matveeva and Gina-Anne Levow

We present a domain-independent unsupervised topic segmentation approach based on hybrid document indexing. Lexical chains have been successfully employed to evaluate lexical cohesion of text segments and to predict topic boundaries. Our approach is based in the notion of semantic cohesion. It uses spectral embedding to estimate semantic association between content nouns over a span of multiple text segments. Our method significantly outperforms the baseline on the topic segmentation task and achieves performance comparable to state-of-the-art methods that incorporate domain specific information.

Syntactic Re-Alignment Models for Machine Translation
Jonathan May and Kevin Knight

We present a method for improving word alignment for statistical syntax-based machine translation that employs a syntactically informed alignment model closer to the translation model than commonly-used word alignment models. This leads to extraction of more useful linguistic patterns and improved BLEU scores on translation experiments in Chinese and Arabic.

Detecting Compositionality of Verb-Object Combinations using Selectional Preferences
Diana McCarthy, Sriram Venkatapathy and Aravind Joshi

In this paper we explore the use of selectional preferences for detecting non-compositional verb-object combinations. To characterise the arguments in a given grammatical relationship we experiment with three models of selectional preference. Two use WordNet and one uses the entries from a distributional thesaurus as classes for representation. In previous work on selectional preference acquisition, the classes used for representation are selected according to the coverage of argument tokens rather than being selected according to the coverage of argument types. In our distributional thesaurus models and one of the methods using WordNet we select classes for representing the preferences by virtue of the number of argument types that they cover, and then only tokens under these classes which are representative of the argument head data are used to estimate the probability distribution for the selectional preference model. We demonstrate a highly significant correlation between measures which use these type-based' selectional preferences and compositionality judgements from a data set used in previous research. The type-based models perform better than the models which use tokens for selecting the classes. Furthermore, the models which use the automatically acquired thesaurus entries produced the best results. The correlation for the thesaurus models is stronger than any of the individual features used in previous research on the same dataset.

Explorations in Automatic Book Summarization
Rada Mihalcea and Hakan Ceylan

Most of the text summarization research carried out to date has been concerned with the summarization of short documents (e.g., news stories, technical reports), and very little work if any has been done on the summarization of very long documents. In this paper, we try to address this gap and explore the problem of book summarization. We introduce a new data set specifically designed for the evaluation of systems for book summarization, and describe summarization techniques that explicitly account for the length of the documents.

Part-of-Speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic Texts
Taesun Moon and Jason Baldridge

We demonstrate an approach for inducing a tagger for historical languages based on existing resources for their modern varieties. Tags from Present Day English source text are projected to Middle English text using alignments on parallel Biblical text. We explore the use of multiple alignment approaches and a bigram tagger to reduce the noise in the projected tags. Finally, we train a maximum entropy tagger on the output of the bigram tagger on the target Biblical text and test it on tagged Middle English text. This leads to tagging accuracy in the low 80's on Biblical test material and in the 60's on other Middle English material. Our results suggest that our bootstrapping methods have considerable potential, and could be used to semi-automate an approach based on incremental manual annotation.

Flexible, Corpus-Based Modelling of Human Plausibility Judgements
Sebastian Padó, Ulrike Padó and Katrin Erk

In this paper, we consider the computational modelling of human plausibility judgements for verb-relation-argument triples, a task equivalent to the computation of selectional preferences. Such models have applications both in psycholinguistics and in computational linguistics.

By extending a recent model, we obtain a completely corpus-driven model for this task which achieves significant correlations with human judgements. It rivals or exceeds deeper, resource-driven models while exhibiting higher coverage. Moreover, we show that our model can be combined with deeper models to obtain better predictions than from either model alone.

V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure
Andrew Rosenberg and Julia Hirschberg

We present V-measure, an external entropy-based cluster evaluation measure. V-measure provides an elegant solution to many problems that affect previously defined cluster evaluation measures including 1) dependence on clustering algorithm or data set, 2) the "problem of matching", where the clustering of only a portion of data points are evaluated and 3) accurate evaluation and combination of two desirable aspects of clustering, homogeneity and completeness. We compare V-measure to a number of popular cluster evaluation measures and demonstrate that it satisfies several desirable properties of clustering solutions, using simulated clustering results. Finally, we use V-measure to evaluate two clustering tasks: document clustering and pitch accent type clustering.

Bayesian Document Generative Model with Explicit Multiple Topics
Issei Sato and Hiroshi Nakagawa

In this paper, we proposed a novel probabilistic generative model to deal with multiple-topic documents: Parametric Dirichlet Mixture Model(PDMM). PDMM is an expansion of an existing probabilistic generative model: Parametric Mixture Model(PMM) by hierarchical Bayes model. PMM models multiple-topic documents by mixing model parameters of each single topic with an equal mixture ratio. PDMM models multiple-topic documents by mixing model parameters of each single topic with mixture ratio following Dirichlet distribution. We evaluate PDMM and PMM by comparing F-measures using MEDLINE corpus. The evaluation showed that PDMM is more effective than PMM.

Smooth Bilingual N-Gram Translation
Holger Schwenk, Marta R. Costa-jussa and Jose A. R. Fonollosa

We address the problem of smoothing translation probabilities in a bilingual N-gram-based statistical machine translation system. It is proposed to project the bilingual tuples onto a continuous space and to estimate the translation probabilities in this representation. A neural network is used to perform the projection and the probability estimation.

Smoothing probabilities is most important for tasks with a limited amount of training material. We consider here the Btec task of the 2006 Iwslt evaluation. Improvements in all official automatic measures are reported when translating from Italian to English. Using a continuous space model for the translation model and the target language model, an improvement of 1.5 BLEU on the test data is observed.

Morphological Disambiguation of Hebrew: A Case Study in Classifier Combination
Danny Shacham and Shuly Wintner

Morphological analysis and disambiguation are crucial stages in a variety of natural language processing applications, especially when languages with complex morphology are concerned. We present a system which disambiguates the output of a morphological analyzer for Hebrew. It consists of several simple classifiers and a module which combines them under linguistically motivated constraints. We investigate a number of techniques for combining the predictions of the classifiers. Our best result, 91.44% accuracy, reflects a 25% reduction in error rate compared with the previous state of the art.

Enhancing Single-Document Summarization by Combining RankNet and Third-Party Sources
Krysta Svore, Lucy Vanderwende and Christopher Burges

We present a new approach to automatic summarization based on neural nets, called NetSum. We extract a set of features from each sentence that helps identify its importance in the document. We apply novel features based on news search query logs and Wikipedia entities. Using the RankNet learning algorithm, we train a pair-based sentence ranker to score every sentence in the document and identify the most important sentences. We apply our system to documents gathered from CNN.com, where each document includes highlights and an article. Our system significantly outperforms the standard baseline in the ROUGE-1 measure on over 70% of our document set.

Automatic Identification of Important Segments and Expressions for Mining of Business-Oriented Conversations at Contact Centers
Hironori Takeuchi, L Venkata Subramaniam, Tetsuya Nasukawa and Shourya Roy

Textual records of business-oriented conversations between customers and agents need to be analyzed properly to acquire useful business insights that improve productivity. For such an analysis, it is critical to identify appropriate textual segments and expressions to focus on, especially when the textual data consists of complete transcripts, which are often lengthy and redundant. In this paper, we propose a method to identify important segments from the conversations by looking for changes in the accuracy of a categorizer designed to separate different business outcomes. We extract effective expressions from the important segments to define various viewpoints. In text mining a viewpoint defines the important associations between key entities and it is crucial that the correct viewpoints are identified. We show the effectiveness of the method by using real datasets from a car rental service center.

Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap
David Talbot and Miles Osborne

A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirements fall significantly below lossless information-theoretic lower bounds but it produces false positives with some quantifiable probability. Here we present a general framework for deriving smoothed language model probabilities from BFs.

We investigate how a BF containing n-gram statistics can be used as a direct replacement for a conventional n-gram model. Recent work has demonstrated that corpus statistics can be stored efficiently within a BF, here we consider how smoothed language model probabilities can be derived efficiently from this randomised representation. Our proposal takes advantage of the one-sided error guarantees of the BF and simple inequalities that hold between related $n$-gram statistics in order to further reduce the BF storage requirements and the error rate of the derived probabilities. We use these models as replacements for a conventional language model in machine translation experiments.

Word Sense Disambiguation Incorporating Lexical and Structural Semantic Information
Takaaki Tanaka, Francis Bond, Timothy Baldwin, Sanae Fujita and Chikara Hashimoto

We present results that show that incorporating lexical and structural semantic information is effective for word sense disambiguation. We evaluated the method by using precise information from a large treebank and an ontology automatically created from dictionary sentences. Exploiting these information improves precision 2-3%, especially 5.7% for verb, over a model using only bag of words and n-gram features.

An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data
Katrin Tomanek, Joachim Wermter and Udo Hahn

We consider the impact Active Learning (AL) has on effective and efficient text corpus annotation, and report on reduction rates for annotation efforts ranging up until 72%. We also address the issue whether a corpus annotated by means of AL -- using a particular classifier and a particular feature set -- can be re-used to train classifiers different from the ones employed by AL, supplying alternative feature sets as well. We, finally, report on our experience with the AL paradigm under real-world conditions, i.e., the annotation of large-scale document corpora for the life sciences.

Antecedent Selection Techniques for High-Recall Coreference Resolution
Yannick Versley

We investigate methods to improve the recall in coreference resolution by also trying to resolve those definite descriptions where no earlier mention of the referent shares the same lexical head (coreferent bridging). The problem, which is notably harder than identifying coreference relations among mentions which have the same lexical head, has been tackled with several rather different approaches, and we attempt to provide a meaningful classification along with a quantitative comparison. Based on the different merits of the methods, we discuss possibilities to improve them and show how they can be effectively combined.

Methods to Integrate a Language Model with Semantic Information for a Word Prediction Component
Tonio Wandmacher and Jean-Yves Antoine

Most current word prediction systems make use of n-gram language models (LM) to estimate the probability of the following word in a phrase. In the past years there have been many attempts to enrich such language models with further syntactic or semantic information. We want to explore the predictive powers of Latent Semantic Analysis (LSA), a method that has been shown to provide reliable information on long-distance semantic dependencies between words in a context. We present and evaluate here several methods that integrate LSA-based information with a standard language model: a semantic cache, partial reranking, and different forms of interpolation. We found that all methods show significant improvements, compared to the 4-gram baseline, and most of them to a simple cache model as well.

Bilingual Cluster Based Models for Statistical Machine Translation
Hirofumi Yamamoto and Eiichiro Sumita

We propose a utilization domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. Domain specific language and translation models in statistical machine translations perform well. However, there are two problems with using domain specific models. The first is the data sparseness problem. An adaptation technique is used to avoid this problem. The second problem is domain estimation. For adaptation, the domain must be given in advance. However, in many cases, the domain is not given or changes dynamically. In this case, not only the translation target sentence but also the domain must be estimated. This paper focuses on the domain estimation problem in statistical machine translations. In the proposed method, a training corpus, which is a bilingual corpus, is automatically clustered to sub-corpuses. Each sub-corpus is regarded as a domain. A domain is estimated by its similarity to the translation source sentence and sub-corpus. The estimated domain (sub-corpus) specific language and translation models are used for the translation. The IWSLT05 Japanese to English evaluation set that we used in our experiments gave 2.7 points (52.4 to 55.1) higher Blue score using this method. These results indicate the validity of the proposed bilingual cluster based models.

A Systematic Comparison of Training Criteria for Statistical Machine Translation
Richard Zens, Sasa Hasan and Hermann Ney

We address the problem of training the free parameters of a statistical machine translation system. We show significant improvements over a state-of-the-art minimum error rate training baseline on a large Chinese-English translation task. We present novel training criteria based on maximum likelihood estimation and expected loss computation. Additionally, we compare the maximum a-posteriori decision rule and the minimum Bayes risk decision rule. We show that not only from a theoretical point but also in terms of translation quality the minimum Bayes risk decision rule is preferable.

Phrase Reordering Model Integrating Syntactic Knowledge for SMT
Dongdong Zhang, Mu Li, Chi-Ho Li and Ming Zhou

Reordering model is important for the statistical machine translation (SMT). Current phrase-based SMT technologies are good at capturing local reordering but not global reordering. This paper introduces syntactic knowledge to improve global reordering capability of SMT system. Syntactic knowledge such as boundary words, POS information and dependencies is used to guide phrase reordering. Not only constraints in syntax tree are proposed to avoid the reordering errors, but also the modification of syntax tree is made to strengthen the capability of capturing phrase reordering. Furthermore, the combination of parse trees can compensate for the reordering errors caused by single parse tree. Finally, experimental results show that the performance of our system is superior to that of the state-of-the-art phrase-based SMT system.

Identification and Resolution of Chinese Zero Pronouns: A Machine Learning Approach
Shanheng Zhao and Hwee Tou Ng

In this paper, we present a machine learning approach to the identification and resolution of Chinese anaphoric zero pronouns. We perform both identification and resolution automatically, with two sets of easily computable features. Experimental results show that our proposed learning approach achieves anaphoric zero pronoun resolution accuracy comparable to a previous state-of-the-art, heuristic rule-based approach. To our knowledge, our work is the first to perform both identification and resolution of Chinese anaphoric zero pronouns using a machine learning approach.

Parsimonious Data-Oriented Parsing
Willem Zuidema

This paper explores a parsimonious approach to Data-Oriented Parsing. While allowing, in principle, all possible subtrees of trees in the treebank to be productive elements, our approach aims at finding a manageable subset of these trees that can accurately describe empirical distributions over phrase-structure trees. The proposed algorithm leads to computationally much more tracktable parsers, as well as linguistically more informative grammars. The parser is evaluated on the OVIS and WSJ corpora, and shows improvements on efficiency, parse accuracy and testset likelihood.

#### Friday, June 29, 2007

Session 5a: Semantics
9:00–9:25Generating Lexical Analogies Using Dependency Relations
Andy Chiu, Pascal Poupart and Chrysanne DiMarco

A lexical analogy is a pair of word-pairs that share a similar semantic relation. Lexical analogies occur frequently in text and are useful in various natural language processing tasks. In this study, we present a system that generates lexical analogies automatically from text data. Our system discovers semantically related pairs of words by using dependency relations, and applies novel machine learning algorithms to match these word-pairs to form lexical analogies. Empirical evaluation shows that our system generates valid lexical analogies with a precision of 70%, and produces quality output although not at the level of the best human-generated lexical analogies.

9:25–9:50Cross-Lingual Distributional Profiles of Concepts for Measuring Semantic Distance
Saif Mohammad, Iryna Gurevych, Graeme Hirst and Torsten Zesch

We present the idea of estimating semantic distance in one, possibly resource-poor, language using a knowledge source in another, possibly resource-rich, language. We do so by creating cross-lingual distributional profiles of concepts, using a bilingual lexicon and a bootstrapping algorithm, but without the use of any sense-annotated data or word-aligned corpora. The cross-lingual measures of semantic distance are evaluated on two tasks: (1) estimating semantic distance between words and ranking the word pairs according to semantic distance, and (2) solving Reader's Digest Word Power' problems. In task (1), cross-lingual measures are superior to conventional monolingual measures based on a wordnet. In task (2), cross-lingual measures are able to solve more problems correctly, and despite scores being affected by many tied answers, their overall performance is again better than the best monolingual measures.

9:50–10:15Lexical Semantic Relatedness with Random Graph Walks
Thad Hughes and Daniel Ramage

Many systems for tasks such as question answering, multi-document summarization, and information retrieval need robust numerical measures of lexical relatedness. Standard thesaurus-based measures of word pair similarity are based on only a single path between those words in the thesaurus graph. By contrast, we propose a new model of lexical semantic relatedness that incorporates information from every explicit or implicit path connecting the two words in the entire graph. Our model uses a random walk over nodes and edges derived from WordNet links and corpus statistics. We treat the graph as a Markov chain and compute a word-specific stationary distribution via a generalized PageRank algorithm. Semantic relatedness of a word pair is scored by a novel divergence measure, ZKL, that outperforms existing measures on certain classes of distributions. The resulting relatedness measure is the WordNet-based measure most highly correlated with human similarity judgments by rank ordering at rho=.90.

10:15–10:40Experimental Evaluation of LTAG-Based Features for Semantic Role Labeling
Yudong Liu and Anoop Sarkar

This paper proposes the use of Lexicalized Tree-Adjoining Grammar (LTAG) formalism as an important additional source of features for the Semantic Role Labeling (SRL) task. Using a set of one-vs-all Support Vector Machines (SVMs), we evaluate these LTAG-based features. Our experiments show that LTAG-based features can improve SRL accuracy significantly. When compared with the best known set of features that are used in state of the art SRL systems we obtain an improvement in F-score from 82.34% to 85.25%.

Session 5b: Parsing
9:00–9:25Japanese Dependency Analysis Using the Ancestor-Descendant Relation
Akihiro Tamura, Hiroya Takamura and Manabu Okumura

We propose a novel method for Japanese dependency analysis, which is usually reduced to the construction of a dependency tree. In deterministic approaches to this task, dependency trees are constructed by series of actions of attaching a bunsetsu chunk to one of the nodes in the tree being constructed. Conventional techniques select the node based on whether the new bunsetsu chunk and each node in the trees are in a parent-child relation or not. However, tree structures include relations between two nodes other than parent-child relation. Therefore, we use ancestor-descendant relation in addition to parent-child relation, so that the added redundancy helps errors be corrected. Experimental results show that the proposed method achieves higher accuracy.

9:25–9:50A Discriminative Learning Model for Coordinate Conjunctions
Masashi Shimbo and Kazuo Hara

We propose a sequence-alignment based method for detecting and disambiguating coordinate conjunctions. In this method, averaged perceptron learning is used to adapt the substitution matrix to the training data drawn from the target language and domain. To reduce the cost of training data construction, our method accepts training examples in which complete word-by-word alignment labels are missing, but instead only the boundaries of coordinated conjuncts are marked. We report promising empirical results in detecting and disambiguating coordinated noun phrases in the GENIA corpus, despite a relatively small number of training examples and minimal features are employed.

9:50–10:15Recovery of Empty Nodes in Parse Structures
Denis Filimonov and Mary Harper

In this paper, we describe a new algorithm for recovering WH-trace empty nodes. Our approach combines a set of hand-written patterns together with a probabilistic model. Because the patterns heavily utilize regular expressions, the pertinent tree structures are covered using a limited number of patterns. The probabilistic model is essentially a probabilistic context-free grammar (PCFG) approach with the patterns acting as the terminals in production rules. We evaluate the algorithm’s performance on gold trees and parser output using three different metrics. Our method compares favorably with state-of-the-art algorithms that recover WH-traces.

10:15–10:40Treebank Annotation Schemes and Parser Evaluation for German
Ines Rehbein and Josef van Genabith

Recent studies focussed on the question whether less-configurational languages like German are harder to parse than English, or whether the lower parsing scores are an artifact of treebank encoding schemes and data structures, as claimed by Kübler et al. (2006). In this paper we present new experiments to test this claim. We use the PARSEVAL metric, the Leaf-Ancestor metric as well as a dependency-based evaluation, and present novel approaches measuring the effect of controlled error insertion on treebank trees and parser output. We also provide extensive cross-treebank conversion. The results of the experiments show that, contrary to Kübler et al. (2006), the question whether or not German is harder to parse than English remains undecided.

Session 6a: Document Analysis
11:15–11:40Semi-Markov Models for Sequence Segmentation
Qinfeng Shi, Yasemin Altun, Alex Smola and S.V.N. Vishwanathan

In this paper, we study the problem of automatically segmenting written text into paragraphs. This is inherently a sequence labeling problem. However, previous approaches ignore this dependency. We propose a novel approach for this task, namely training Semi Markov models discriminatively using a Max-Margin method. This method allows us to model the sequence dependency of the problem and to incorporate properties of a whole paragraph, such as coherence, which cannot be used in previous methods. Experimental evaluation on a collection of English and German books shows improvement over the previous state-of-the-art method on this task.

11:40–12:05A Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields
Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto

This paper presents a method for categorizing named entities in Wikipedia. In Wikipedia, an anchor text is glossed in a linked HTML text. We formalize named entity categorization as a task of categorizing anchor texts with linked HTML texts which glosses a named entity. Using this representation, we introduce a graph structure in which anchor texts are regarded as nodes. In order to incorporate HTML structure on the graph, three types of cliques are defined based on the HTML tree structure. We propose a method with Conditional Random Fields (CRFs) to categorize the nodes on the graph. Since the defined graph may include cycles, the exact inference of CRFs is computationally expensive. We introduce an approximate inference method using Tree-based Reparameterization (TRP) to reduce computational cost. In experiments, our proposed model obtained significant improvements compare to baseline models that use Support Vector Machines.

12:05–12:30MavenRank: Identifying Influential Members of the US Senate Using Lexical Centrality
Anthony Fader, Dragomir R. Radev, Michael H. Crespin, Burt L. Monroe, Kevin M. Quinn and Michael Colaresi

We introduce a technique for identifying the most salient participants in a discussion. Our method, MavenRank is based on lexical centrality: a random walk is performed on a graph in which each node is a participant in the discussion and an edge links two participants who use similar rhetoric. As a test, we used MavenRank to identify the most influential members of the US Senate using data from the US Congressional Record and used committee ranking to evaluate the output. Our results show that MavenRank scores are largely driven by committee status in most topics, but can capture speaker centrality in topics where speeches are used to indicate ideological position instead of influence legislation.

Session 6b: Grammar Learning
11:15–11:40Bootstrapping Feature-Rich Dependency Parsers with Entropic Priors
David A. Smith and Jason Eisner

One may need to build a statistical parser for a new language, using only a very small labeled treebank together with raw text. We argue that bootstrapping a parser is most promising when the model uses a rich set of redundant features, as in recent models for scoring dependency parses (McDonald et al., 2005). Drawing on Abney's (2004) analysis of the Yarowsky algorithm, we perform bootstrapping by entropy regularization: we maximize a linear combination of conditional likelihood on labeled data and confidence (negative Renyi entropy) on unlabeled data. In initial experiments, this surpassed EM for training a simple feature-poor generative model, and also improved the performance of a feature-rich, conditionally estimated model where EM could not easily have been applied. For our models and training sets, more peaked measures of confidence, measured by Renyi entropy, outperformed smoother ones. We discuss how our feature set could be extended with cross-lingual or cross-domain features, to incorporate knowledge from parallel or comparable corpora during bootstrapping.

11:40–12:05Online Learning of Relaxed CCG Grammars for Parsing to Logical Form
Luke Zettlemoyer and Michael Collins

We consider the problem of learning to parse sentences to lambda-calculus representations of their underlying semantics and present an algorithm that learns a weighted combinatory categorial grammar (CCG). A key idea is to introduce non-standard CCG combinators that relax certain parts of the grammar---for example allowing flexible word order, or insertion of lexical items---with learned costs. We also present a new, online algorithm for inducing a weighted CCG. Results for the approach on ATIS data show 86% F-measure in recovering fully correct semantic analyses and 95.9% F-measure by a partial-match criterion, a more than 5% improvement over the 90.3% partial-match figure reported by He and Young (2006).

12:05–12:30The Infinite PCFG Using Hierarchical Dirichlet Processes
Percy Liang, Slav Petrov, Michael Jordan and Dan Klein

We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDP-PCFG model allows the complexity of the grammar to grow as more training data is available. In addition to presenting a fully Bayesian model for the PCFG, we also develop an efficient variational inference procedure. On synthetic data, we recover the correct grammar without having to specify its complexity in advance. We also show that our techniques can be applied to full-scale parsing applications by demonstrating its effectiveness in learning state-split grammars.

12:30–13:00SIGNLL Business Meeting
Session 7a: Information Extraction
14:00–14:25Exploiting Wikipedia as External Knowledge for Named Entity Recognition
Jun’ichi Kazama and Kentaro Torisawa

We explore the use of Wikipedia as external knowledge to improve named entity recognition (NER). Our method retrieves the corresponding Wikipedia entry for each candidate word sequence and extracts a category label from the first sentence of the entry, which can be thought of as a definition part. These category labels are used as features in a CRF-based NE tagger. We demonstrate using the CoNLL 2003 dataset that the Wikipedia category labels extracted by such a simple method actually improve the accuracy of NER.

14:25–14:50Large-Scale Named Entity Disambiguation Based on Wikipedia Data
Silviu Cucerzan

This paper presents a large-scale system for the recognition and semantic disambiguation of named entities based on information extracted automatically from a large encyclopedic collection and Web search results, over a space of more than 1.4 million entities. It describes in detail the disambiguation paradigm employed and the information extraction process from Wikipedia. The disambiguation component employs a vector space model and a process of maximizing the agreement between the contextual information extracted from Wikipedia and the context of a document, as well as the agreement among the category tags associated with the candidate entities. In tests on both real news data and Wikipedia text, the system obtained accuracies exceeding 91% and 88%.

14:50–15:15Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions
Siddharth Patwardhan and Ellen Riloff

We present an information extraction system that decouples the tasks of finding relevant regions of text and applying extraction patterns. We create a self-trained relevant sentence classifier to identify relevant regions, and use a semantic affinity measure to automatically learn domain-relevant extraction patterns. We then distinguish primary patterns from secondary patterns and apply the patterns selectively in the relevant regions. The resulting IE system achieves good performance on the MUC-4 terrorism corpus and ProMed disease outbreak stories. This approach requires only a few seed extraction patterns and a collection of relevant and irrelevant documents for training.

15:15–15:40Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information
GuoDong Zhou, Min Zhang, DongHong Ji and QiaoMing Zhu

This paper proposes a tree kernel with context-sensitive structured parse tree information for re-lation extraction. It resolves two critical problems in previous tree kernels for relation extraction in two ways. First, it automatically determines a dy-namic context-sensitive tree span for relation extraction by extending the widely-used Shortest Path-enclosed Tree (SPT) to include necessary context information outside SPT. Second, it proposes a context-sensitive convolution tree kernel, which enumerates both context-free and context-sensitive sub-trees by considering their ancestor node paths as their contexts. Moreover, this paper evaluates the complementary nature between our tree kernel and a state-of-the-art linear kernel. Evaluation on the ACE RDC corpora shows that our dynamic context-sensitive tree span is much more suitable for relation extraction than SPT and our tree kernel outperforms the state-of-the-art Collins and Duffy's convolution tree kernel. It also shows that our tree kernel achieves much better performance than the state-of-the-art linear kernels. Finally, it shows that feature-based and tree kernel-based methods much complement each other and the composite kernel can well integrate both flat and structured features.

Session 7b: Machine Translation
14:00–14:25Chinese Syntactic Reordering for Statistical Machine Translation
Chao Wang, Michael Collins and Philipp Koehn

Syntactic reordering approaches are an effective method for handling word-order differences between source and target languages in statistical machine translation (SMT) systems. This paper introduces a reordering approach for translation from Chinese to English. We describe a set of syntactic reordering rules that exploit systematic differences between Chinese and English word order. The resulting system is used as a preprocessor for both training and test sentences, transforming Chinese sentences to be much closer to English in terms of their word order. We evaluated the reordering approach within the MOSES phrase-based SMT system. The reordering approach improved the BLEU score for the MOSES system from 28.52 to 30.86 on the NIST 2006 evaluation data. We also conducted a series of experiments to analyze the accuracy and impact of different types of reordering rules.

14:25–14:50Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy
Wei Wang, Kevin Knight and Daniel Marcu

We show that phrase structures in Penn Treebank style parses are not optimal for syntax-based machine translation. We exploit a series of binarization methods to restructure the Penn Treebank style trees such that syntactified phrases smaller than Penn Treebank constituents can be acquired and exploited in translation. We find that by employing the EM algorithm for determining the binarization of a parse tree among a set of alternative binarizations gives us the best translation result.

14:50–15:15What Can Syntax-Based MT Learn from Phrase-Based MT?
Steve DeNeefe, Kevin Knight, Wei Wang and Daniel Marcu

We compare and contrast the strengths and weaknesses of a syntax-based machine translation model with a phrase-based machine translation model on several levels. We briefly describe each model, highlighting points where they differ. We include a quantitative comparison of the phrase pairs that each model has to work with, as well as the reasons why some phrase pairs are not learned by the syntax-based model. We then evaluate proposed improvements to the syntax-based extraction techniques in light of phrase pairs captured. We also compare the translation accuracy for all variations.

15:15–15:40Online Large-Margin Training for Statistical Machine Translation
Taro Watanabe, Jun Suzuki, Hajime Tsukada and Hideki Isozaki

We achieved a state of the art performance in statistical machine translation by using a large number of features with an online large-margin training algorithm. The millions of parameters were tuned only on a small development set consisting of less than 1K sentences. Experiments on Arabic-to-English translation indicated that a model trained with sparse binary features outperformed a conventional SMT system with a small number of features.

Session 8: All Posters (16:00–18:30)
Consult the list of poster titles under Session 4.

#### Saturday, June 30, 2007

Session 9: Plenary Session
9:00–10:00Invited Talk: Hashing, Sketching, and Other Approximate Algorithms for High-Dimensional Data
Piotr Indyk, Massachusetts Institute of Technology

In the last decade, there have been significant developments in the design of approximate randomized algorithms for high-dimensional data. These include: hashing-based algorithms for similarity search problems, computing succinct approximate "sketches" of high-dimensional objects, etc. In this talk I will present an overview of algorithms and techniques in this area.

Session 10a: Machine Learning (supervised classifiers)
10:00–10:25Scalable Term Selection for Text Categorization
Jingyang Li and Maosong Sun

In text categorization, term selection is an important step for the sake of both categorization accuracy and computational efficiency. Different dimensionalities are expected under different practical resource restrictions of time or space. Traditionally in text categorization, the same scoring or ranking criterion is adopted for all target dimensionalities, which considers both the discriminability and the coverage of a term, such as $\chi^2$ or IG. In this paper, the poor accuracy at a low dimensionality is imputed to the small average vector length of the documents. Scalable term selection is proposed to optimize the term set at a given dimensionality according to an expected average vector length. Discriminability and coverage are separately measured; by adjusting the ratio of their weights in a combined criterion, the expected average vector length can be reached, which means a good compromise between the specificity and the exhaustivity of the term subset. Experiments show that the accuracy is considerably improved at lower dimensionalities, and larger term subsets have the possibility to lower the average vector length for a lower computational cost. The interesting observations might inspire further investigations.

10:25–10:50Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem
Jingbo Zhu and Eduard Hovy

Active learning is a promising way to solve the knowledge bottleneck problem faced by supervised word sense disambiguation (WSD) methods. Unfortunately, in real-world data, the distribution of the senses of a word is often skewed, which causes a problem for learning methods for WSD. In this paper, we study active learning with methods for addressing the class imbalance problem for WSD. We analyze the effect of resampling techniques, including under-sampling and over-sampling used in active learning. Experimental results show that under-sampling causes negative effects on active learning, but over-sampling is a relatively good choice. To alleviate the within-class imbalance problem of over-sampling, we propose a bootstrap-based over-sampling (BootOS) method that works better than ordinary over-sampling in active learning for WSD. Finally, we investigate when to stop active learning, and adopt two strategies, max-confidence and min-error, as stopping conditions for active learning. According to experimental results, we suggest a prediction solution by considering max-confidence as the upper bound and min-error as the lower bound of stopping conditions.

Session 10b: Machine Learning (sequential models)
10:00–10:25Semi-Supervised Structured Output Learning Based on a Hybrid Generative and Discriminative Approach
Jun Suzuki, Akinori Fujino and Hideki Isozaki

This paper proposes a framework for semi-supervised structured output learning (SOL), specifically for sequence labeling, based on a hybrid generative and discriminative approach. We define the objective function of our hybrid model, which is written in log-linear form, by discriminatively combining discriminative structured predictor(s) with generative model(s) that incorporate unlabeled data. Then, unlabeled data is used in a generative manner to increase the sum of the discriminant functions for all outputs during the parameter estimation. Experiments on named entity recognition (CoNLL-2003) and syntactic chunking (CoNLL-2000) data show that our hybrid model significantly outperforms the state-of-the-art performance obtained with supervised SOL methods, such as conditional random fields (CRFs).

10:25–10:50Finding Good Sequential Model Structures using Output Transformations
Edward Loper

In Sequential Viterbi Models, such as HMMs, MEMMs, and Linear Chain CRFs, the type of patterns over output sequences that can be learned by the model depend directly on the model's structure: any pattern that spans more output tags than are covered by the models' order will be very difficult to learn. However, increasing a model's order can lead to an increase in the number of model parameters, making the model more susceptible to sparse data problems.

This paper shows how the notion of output transformation can be used to explore a variety of alternative model structures. Using output transformations, we can selectively increase the amount of contextual information available for some conditions, but not for others, thus allowing us to capture longer-distance consistencies while avoiding unnecessary increases to the model's parameter space. The appropriate output transformation for a given task can be selected by applying a hill-climbing approach to held-out data. On the NP Chunking task, our hill-climbing system finds a model structure that outperforms both first-order and second-order models with the same input feature set.

Session 10c: Information Retrieval
10:00–10:25A Statistical Language Modeling Approach to Lattice-Based Spoken Document Retrieval
Tee Kiah Chia, Haizhou Li and Hwee Tou Ng

Speech recognition transcripts are far from perfect; they are not of sufﬁcient quality to be useful on their own for spoken document retrieval. This is especially the case for conversational speech. Recent efforts have tried to overcome this issue by using statistics from speech lattices instead of only the 1-best transcripts; however, these efforts have invariably used the classical vector space retrieval model. This paper presents a novel approach to lattice-based spoken document retrieval using statistical language models: a statistical model is estimated for each document, and probabilities derived from the document models are directly used to measure relevance. Experimental results show that the lattice-based language modeling method outperforms both the language modeling retrieval method using only the 1-best transcripts, as well as a recently proposed lattice-based vector space retrieval method.

10:25–10:50Learning Noun Phrase Query Segmentation
Shane Bergsma and Qin Iris Wang

Query segmentation is the process of taking a user's search-engine query and dividing the tokens into individual phrases or semantic units.

Identification of these query segments can potentially improve both document retrieval precision, by first returning pages which contain the exact query segments, and document retrieval recall, by allowing query expansion or substitution via the segmented units. We train and evaluate a machine-learned query segmentation system that can achieve 86% segmentation-decision accuracy on a gold standard set of segmented noun phrase queries, well above recently published approaches. Key enablers of this high performance are features derived from previous natural language processing work in noun compound bracketing. For example, token association features beyond simple N-gram counts provide powerful indicators of segmentation.

Session 11a: Information Extraction
11:15–11:40Bootstrapping Information Extraction from Field Books
Sander Canisius and Caroline Sporleder

We present two machine learning approaches to information extraction from semi-structured documents that can be used if no annotated training data are available, but there does exist a database filled with information derived from the type of documents to be processed. One approach tries to employ standard supervised learning for information extraction by artificially constructing labelled training data from the contents of the database. The second approach combines unsupervised hidden markov modelling with language models. Empirical evaluation of both systems pointed out that the hidden markov model managed best to learn the task of segmenting and labelling biological field book entries from a derived database only.

11:40–12:05Extracting Data Records from Unstructured Biomedical Full Text
Donghui Feng, Gully Burns and Eduard Hovy

In this paper, we address the problem of extracting data records and their attributes from unstructured biomedical full text. There has been little effort reported on this in the research community. We argue that semantics is important for record extraction or finer-grained language processing tasks. We derive a data record template including semantic language models from unstructured text and represent them with a discourse level Conditional Random Fields (CRF) model. We evaluate the approach from the perspective of Information Extraction and achieve significant improvements on system performance compared with other baseline systems.

12:05–12:30Multiple Alignment of Citation Sentences with Conditional Random Fields and Posterior Decoding
Ariel Schwartz, Anna Divoli and Marti Hearst

In scientific literature, sentences that cite related work can be a valuable resource for applications such as summarization, synonym identification, and entity extraction. In order to determine which equivalent entities are discussed in the various citation sentences, we propose aligning the words within these sentences according to semantic similarity. This problem is partly analogous to the problem of multiple sequence alignment in the biosciences, and is also closely related to the word alignment problem in statistical machine translation. In this paper we address the problem of multiple citation concept alignment by combining and modifying the CRF based pairwise word alignment system of Blunsom & Cohn (2006) and a posterior decoding based multiple sequence alignment algorithm of Schwartz & Pachter (2007). We evaluate the algorithm on hand-labeled data, achieving results that improve on a baseline.

Session 11b: Machine Translation
11:15–11:40Large Language Models in Machine Translation
Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och and Jeffrey Dean

This paper reports on the benefits of large-scale statistical language modeling in machine translation. A distributed infrastructure is proposed which we use to train on up to 2 trillion tokens, resulting in language models having up to 300 billion n-grams. It is capable of providing smoothed probabilities for fast, single-pass decoding. We introduce a new smoothing method, dubbed Stupid Backoff, that is inexpensive to train on large data sets and approaches the quality of Kneser-Ney Smoothing as the amount of training data increases.

11:40–12:05Factored Translation Models
Philipp Koehn and Hieu Hoang

We present an extension of phrase-based statistical machine translation models that enables the straight-forward integration of additional annotation at the word-level --- may it be linguistic markup or automatically generated word classes. In a number of experiments we show that factored translation models lead to better translation performance, both in terms of automatic scores, as well as more grammatical coherence.

12:05–12:30Translating Unknown Words by Analogical Learning
Philippe Langlais and Alexandre Patry

Unknown words are a well-known hindrance to natural language applications. In particular, they drastically impact machine translation quality. An easy way out commercial translation systems usually offer their users is the possibility to add unknown words and their translations into a dedicated lexicon. Recently, Stroppa and Yvon (2005) have shown how analogical learning alone deals nicely with morphology in different languages. In this study we show that analogical learning offers as well an elegant and effective solution to the problem of identifying potential translations of unknown words.

Session 11c: Phonetics and Phonology
11:15–11:40A Probabilistic Approach to Diachronic Phonology
Alexandre Bouchard, Percy Liang, Thomas Griffiths and Dan Klein

We present a probabilistic model of diachronic phonology in which individual word forms undergo stochastic edits along the branches of a phylogenetic tree. Our approach allows us to achieve three goals with a single unified model: (1) reconstruction of both ancient and modern word forms, (2) discovery of general phonological changes, and (3) selection among different phylogenies. We learn our model using a Monte Carlo EM algorithm and present quantitative results validating the model.

11:40–12:05Learning Structured Models for Phone Recognition
Slav Petrov, Adam Pauls and Dan Klein

We present a maximally streamlined approach to learning HMM-based acoustic models for automatic speech recognition. In our approach, an initial monophone, single-Gaussian HMM is iteratively refined using a split-merge EM procedure which makes no assumptions about subphone structure or context-dependent structure and which uses only a single Gaussian per HMM state. Despite the much simplified training process, our acoustic model achieves state-of-the-art results on phone classification (where it outperforms almost all other methods) and competitive performance on phone recognition (where it outperforms standard CD triphone /subphone / GMM approaches). We also present an analysis of what is and is not learned by our system.

12:05–12:30Inducing Search Keys for Name Filtering
L. Karl Branting

This paper describes ETK (Ensemble of Transformation based Keys) a new algorithm for inducing search keys for name filtering. ETK has the low computational cost and ability to filter by phonetic similarity characteristic of phonetic keys such as Soundex, but is adaptable to alternative similarity models. The accuracy of ETK in a preliminary empirical evaluation suggests that it is well-suited for phonetic filtering applications such as recognizing alternative cross-lingual transliterations.

Session 12a: CoNLL Shared Task Session (dependency parsing)
14:00–14:15The CoNLL 2007 Shared Task on Dependency Parsing
Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel and Deniz Yuret

The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets.In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results.

14:15–14:30Single Malt or Blended? A Study in Multilingual Parser Optimization
Johan Hall, Jens Nilsson, Joakim Nivre, Gülsen Eryigit, Beáta Megyesi, Mattias Nilsson and Markus Saers

We describe a two-stage optimization of the MaltParser system for the ten languages in the multilingual track of the CoNLL 2007 shared task on dependency parsing. The first stage consists in tuning a single-parser system for each language by optimizing parameters of the parsing algorithm, the feature model, and the learning algorithm. The second stage consists in building an ensemble system that combines six different parsing strategies, extrapolating from the optimal parameters settings for each language. When evaluated on the official test sets, the ensemble system significantly outperforms the single-parser system and achieves the highest average labeled attachment score.

14:30–14:45Probabilistic Parsing Action Models for Multi-Lingual Dependency Parsing
Xiangyu Duan, Jun Zhao and Bo Xu

Deterministic dependency parsers use parsing actions to construct dependencies. These parsers do not compute the probability of the whole dependency tree. They only determine parsing actions stepwisely by a trained classifier. To globally model parsing actions of all steps that are taken on the input sentence, we propose two kinds of probabilistic parsing action models that can compute the probability of the whole dependency tree. The tree with the maximal probability is outputted. The experiments are carried on 10 languages, and the results show that our probabilistic parsing action models outperform the original deterministic dependency parser.

14:45–15:00Fast and Robust Multilingual Dependency Parsing with a Generative Latent Variable Model
Ivan Titov and James Henderson

We use a generative history-based model to predict the most likely derivation of a dependency parse. Our probabilistic model is based on Incremental Sigmoid Belief Networks, a recently proposed class of latent variable models for structure prediction. Their ability to automatically induce features results in multilingual parsing which is robust enough to achieve accuracy well above the average for each individual language in the multilingual track of the CoNLL-2007 shared task. This robustness led to the third best overall average labeled attachment score in the task, despite using no discriminative methods. We also demonstrate that the parser is quite fast, and can provide even faster parsing times without much loss of accuracy.

15:00–15:15Multilingual Dependency Parsing Using Global Features
Tetsuji Nakagawa

In this paper, we describe a two-stage multilingual dependency parser used for the multilingual track of the CoNLL 2007 shared task. The system consists of two components: an unlabeled dependency parser using Gibbs sampling which can incorporate sentence-level (global) features as well as token-level (local) features, and a dependency relation labeling module based on Support Vector Machines. Experimental results show that the global features are useful in all the languages.

15:15–15:30Experiments with a Higher-Order Projective Dependency Parser
Xavier Carreras

We present experiments with a dependency parsing model defined on rich factors. Our model represents dependency trees with factors that include three types of relations between the tokens of a dependency and their children. We extend the projective parsing algorithm of Eisner (1996) for our case, and train models using the averaged perceptron. Our experiments show that considering higher-order information yields significant improvements in parsing accuracy, but comes at a high cost in terms of both time and memory consumption. In the multilingual exercise of the CoNLL-2007 shared task (Nivre et al.,2007), our system obtains the best accuracy for English, and the second best accuracies for Basque and Czech.

15:30–15:45Log-Linear Models of Non-Projective Trees, k-best MST Parsing and Tree-Ranking
Keith Hall, Jiri Havelka and David A. Smith

We present our system used in the CoNLL 2007 shared task on multilingual parsing. The system is composed of three components: a k-best maximum spanning tree (MST) parser, a tree labeler, and a reranker that orders the k-best labeled trees. We present two techniques for training the MST parser: tree-normalized and graph-normalized conditional training. The tree-based reranking model allows us to explicitly model \emph{global} syntactic phenomena. We describe the reranker features which include non-projective edge attributes. We provide an analysis of the errors made by our system and suggest changes to the models and features that might rectify the current system.

Session 12b: Machine Translation
14:00–14:25Improving Translation Quality by Discarding Most of the Phrasetable
Howard Johnson, Joel Martin, George Foster and Roland Kuhn

It is possible to reduce the bulk of phrasetables for Statistical Machine Translation using a technique based on the significance testing of phrase pair co-occurrence in the parallel corpus. The savings can be quite substantial (up to 90%) and cause no reduction in BLEU score. In some cases, an improvement in BLEU is obtained at the same time although the effect is less pronounced if state-of-the-art phrasetable smoothing is employed.

14:25–14:50Hierarchical Phrase-Based Translation with Suffix Arrays
Adam Lopez

A major engineering challenge in statistical machine translation systems is the efficient representation of extremely large translation rulesets. In phrase-based models, this problem can be addressed by storing the training data in memory and using a suffix array as an efficient index to quickly lookup and extract rules on the fly. Hierarchical phrase-based translation introduces the added wrinkle of source phrases with gaps. Lookup algorithms used for contiguous phrases no longer apply and the best approximate pattern matching algorithms are much too slow, taking several minutes per sentence. We describe new lookup algorithms for hierarchical phrase-based translation that reduce the empirical computation time by nearly two orders of magnitude, making on-the-fly lookup feasible for source phrases with gaps.

14:50–15:15An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems
Wolfgang Macherey and Franz J. Och

This paper presents an empirical study on how different selections of input translation systems affect translation quality in system combination. We give empirical evidence that the systems to be combined should be of similar quality and need to be almost uncorrelated in order to be beneficial for system combination. Experimental results are presented for composite translations computed from large numbers of different research systems as well as a set of translation systems derived from one of the best-ranked machine translation engines in the 2006 NIST machine translation evaluation.

15:15–15:40Learning to Find English to Chinese Transliterations on the Web
Jian-Cheng Wu and Jason S. Chang

We present a method for learning to find English to Chinese transliterations on the Web. In our approach, proper nouns are expanded into new queries aimed at maximizing the probability of retrieving transliterations from existing search engines. The method involves learning the sublexical relationships between names and their transliterations. At run-time, a given name is automatically extended into queries with relevant morphemes, and transliterations in the returned search snippets are extracted and ranked. We present a new system, TermMine, that applies the method to find transliterations of a given name. Evaluation on a list of 500 proper names shows that the method achieves high precision and recall, and outperforms commercial machine translation systems.

Session 12c: Word Senses
14:00–14:25Learning to Merge Word Senses
Rion Snow, Sushant Prakash, Daniel Jurafsky and Andrew Y. Ng

It has been widely observed that different NLP applications require different sense granularities in order to best exploit word sense distinctions, and that for many applications WordNet senses are too fine-grained. In contrast to previously proposed automatic methods for sense clustering, we formulate sense merging as a supervised learning problem, exploiting human-labeled sense clusterings as training data. We train a discriminative classifier over a wide variety of features derived from WordNet structure, corpus-based evidence, and evidence from other lexical resources. Our learned similarity measure outperforms previously proposed automatic methods for sense clustering on the task of predicting human sense merging judgments, yielding an absolute F-score improvement of 4.1% on nouns, 13.6% on verbs, and 4.0% on adjectives. Finally, we propose a model for clustering sense taxonomies using the outputs of our classifier, and we make available several automatically sense-clustered WordNets of various sense granularities.

14:25–14:50Improving Word Sense Disambiguation Using Topic Features
Junfu Cai, Wee Sun Lee and Yee Whye Teh

This paper presents a novel approach for exploiting the global context for the task of word sense disambiguation (WSD). This is done by using topic features constructed using the latent dirichlet allocation (LDA) algorithm on unlabeled data. The features are incorporated into a modified naive Bayes network alongside other features such as part-of-speech of neighboring words, single words in the surrounding context, local collocations, and syntactic patterns. In both the English all-words task and the English lexical sample task, the method achieved significant improvement over the simple naive Bayes classifier and higher accuracy than the best offical scores on Senseval-3 for both task.

14:50–15:15A Topic Model for Word Sense Disambiguation
Jordan Boyd-Graber, David Blei and Xiaojin Zhu

We develop latent Dirichlet allocation with WordNet (LDAWN), an unsupervised probabilistic topic model that includes word sense as a hidden variable. We develop a probabilistic posterior inference algorithm for simultaneously disambiguating a corpus and learning the domains in which to consider each word. Using the WordNet hierarchy, we embed the construction of Abney and Light in the topic model and show that automatically learned domains improve WSD accuracy compared to alternative contexts.

15:15–15:40Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering
Aline Villavicencio, Valia Kordoni, Yi Zhang, Marco Idiart and Carlos Ramisch

This paper focuses on the evaluation of methods for the automatic acquisition of Multiword Expressions (MWEs) for robust grammar engineering. First we investigate the hypothesis that MWEs can be detected by the distinct statistical properties of their component words, regardless of their type, comparing 3 statistical measures: mutual information (MI), chi-square and permutation entropy (PE). Our overall conclusion is that at least two measures, MI and PE, seem to differentiate MWEs from non-MWEs. We then investigate the influence of the size and quality of different corpora, using the BNC and the Web search engines Google and Yahoo. We conclude that, in terms of language usage, web generated corpora are fairly similar to more carefully built corpora, like the BNC, indicating that the lack of control and balance of these corpora are probably compensated by their size. Finally, we show a qualitative evaluation of the results of automatically adding extracted MWEs to existing linguistic resources. We argue that such a process improves qualitatively, if a more compositional approach to grammar/lexicon automated extension is adopted.

Session 13a: CoNLL Shared Task Session (dependency parsing)
16:15–16:30Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles
Kenji Sagae and Jun’ichi Tsujii

We present a data-driven variant of the LR algorithm for dependency parsing, and extend it with a best-first search for probabilistic generalized LR dependency parsing. Parser actions are determined by a classifier, based on features that represent the current state of the parser. We apply this parsing framework to both tracks of the CoNLL 2007 shared task, in each case taking advantage of multiple models trained with different learners. In the multilingual track, we train three LR models for each of the ten languages, and combine the analyses obtained with each individual model with a maximum spanning tree voting scheme. In the domain adaptation track, we use two models to parse unlabeled data in the target domain to supplement the labeled out-of-domain training set, in a scheme similar to one iteration of co-training.

16:30–16:45Frustratingly Hard Domain Adaptation for Dependency Parsing
Mark Dredze, John Blitzer, Partha Pratim Talukdar, Kuzman Ganchev, João Graca and Fernando Pereira

We describe some challenges of adaptation in the 2007 CoNLL Shared Task on Domain Adaptation. Our error analysis for this task suggests that the primary source of error are differences in annotation guidelines among treebanks. Our suspicions are supported by the observation that no team was able to improve target domain performance substantially over a state of the art baseline.

16:45–17:15Analysis: Sandra Kübler, Ryan McDonald
17:15–17:30Discussion
Session 13b: Sentiment
16:15–16:40Crystal: Analyzing Predictive Opinions on the Web
Soo-Min Kim and Eduard Hovy

In this paper, we present an election prediction system (Crystal) based on web users' opinions posted on an election prediction website. Given a prediction message, Crystal first identifies which party the message predicts to win and then aggregates prediction analysis results of a large amount of opinions to project the election results. We collect past election prediction messages from the Web and automatically build a gold standard. We focus on capturing lexical patterns that people frequently use when they express their predictive opinions about a coming election. To predict elec-tion results, we apply SVM-based super-vised learning. To improve performance, we propose a novel technique which generalizes n-gram feature patterns. Experimental results show that Crystal significantly outperforms several baselines as well as a non-generalized n-gram ap-proach. Crystal predicts future elections with 81.68% accuracy.

16:40–17:05Extracting Aspect-Evaluation and Aspect-Of Relations in Opinion Mining
Nozomi Kobayashi, Kentaro Inui and Yuji Matsumoto

The technology of opinion extraction allows users to retrieve and analyze people's opinions scattered over Web documents. We define an opinion unit as a quadruple consisting of the opinion holder, the subject being evaluated, the part or the attribute in which it is evaluated, and the value of the evaluation that expresses a positive or negative assessment. We use this definition as the basis for our opinion extraction task. We focus on two important subtasks of opinion extraction: (a) extracting aspect-evaluation relations, and (b) extracting aspect-of relations, and we approach each task using methods which combine contextual and statistical clues. Our experiments on Japanese weblog posts show that the use of contextual clues improve the performance of both tasks.

17:05–17:30Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents
Nobuhiro Kaji and Masaru Kitsuregawa

Recognizing polarity requires a list of polar words and phrases. For the purpose of building such lexicon automatically, a lot of studies have investigated (semi-) unsupervised method of learning polarity of words and phrases. In this paper, we explore to use structural clues that can extract polar sentences from Japanese HTML documents, and build lexicon from the extracted polar sentences. The key idea is to develop the structural clues so that it achieves extremely high precision at the cost of recall. In order to compensate for the low recall, we used massive collection of HTML documents. Thus, we could prepare enough polar sentence corpus.

Session 13c: Tagging
16:15–16:40Determining Case in Arabic: Learning Complex Linguistic Behavior Requires Complex Linguistic Features
Nizar Habash, Ryan Gabbard, Owen Rambow, Seth Kulick and Mitch Marcus

This paper discusses automatic determination of case in Arabic. This task is a major source of errors in full diacritization of Arabic. We use a gold-standard syntactic tree, and obtain an error rate of about 4.2%, with a machine learning based system outperforming a system using hand-written rules. A careful error analysis suggests that when we account for annotation errors in the gold standard, the error rate drops to 0.9%, with the hand-written rules outperforming the machine learning-based system.

16:40–17:05Mandarin Part-of-Speech Tagging and Discriminative Reranking
Zhongqiang Huang, Mary Harper and Wen Wang

We present in this paper methods to improve HMM-based part-of-speech (POS) tagging of Mandarin. We model the emission probability of an unknown word using all the characters in the word, and enrich the standard left-to-right trigram estimation of word emission probabilities with a right-to-left prediction of the word by making use of the current and next tags. In addition, we utilize the RankBoost-based reranking algorithm to rerank the N-best outputs of the HMM-based tagger using various $n$-gram, morphological, and dependency features. Two methods are proposed to improve the generalization performance of the reranking algorithm. Our reranking model achieves an accuracy of 94.68% using n-gram and morphological features on the Penn Chinese Treebank 5.2, and is able to further improve the accuracy to 95.11% with the addition of dependency features.

17:05–17:30Building Domain-Specific Taggers without Annotated (Domain) Data
John Miller, Manabu Torii and K. Vijay-Shanker

Part of speech tagging is a fundamental component in many NLP systems. When taggers developed in one domain are used in another domain, the performance can degrade considerably. We present a method for developing taggers for new domains without requiring POS annotated text in the new domain. Our method involves using raw domain text and identifying related words to form a domain specific lexicon. This lexicon provides the initial lexical probabilities for EM training of a HMM model. We evaluate the method by applying it in the Biology domain and show that we achieve results that are comparable with some taggers developed for this domain.

Concluding Session
17:30Closing Remarks
Additional CoNLL Shared Task Papers (dependency parsing)
Multilingual Dependency Parsing and Domain Adaptation using DeSR
Giuseppe Attardi, Felice Dell’Orletta, Maria Simi, Atanas Chanev and Massimiliano Ciaramita

We describe our experiments using the DeSR parser in the multilingual and domain adaptation tracks of the CoNLL 2007 shared task. DeSR implements an incremental deterministic Shift/Reduce parsing algorithm, using specific rules to handle non-projective dependencies. For the multilingual track we adopted a second order averaged perceptron and performed feature selection to tune a feature model for each language. For the domain adaptation track we applied a tree revision method which learns how to correct the mistakes made by the base parser on the adaptation domain.

Hybrid Ways to Improve Domain Independence in an ML Dependency Parser
Eckhard Bick

The paper reports a hybridization experiment, where an existing ML dependency parser (LingPars), was allowed access to Constraint Grammar analyses provided by a rule-based parser (EngGram) for the same data. Descriptive compatibility issues and their influence on performance are discussed, such as tokenization problems, category bundling and dependency head conventions.

The hybrid system performed considerably better than its ML base line, and proved more robust than the latter in the domain adaptation task, where it was the best-scoring system in the open class for the official biomedical test data, and the best overall system for the CHILDES test data.

A Constraint Satisfaction Approach to Dependency Parsing
Sander Canisius and Erik Tjong Kim Sang

We present an adaptation of constraint satisfaction inference (Canisius et al., 2006b) for predicting dependency trees. Three different classifiers are trained to predict weighted soft-constraints on parts of the complex output. From these constraints, a standard weighted constraint satisfaction problem can be formed, the solution to which is a valid dependency tree.

A Two-Stage Parser for Multilingual Dependency Parsing
Wenliang Chen, Yujie Zhang and Hitoshi Isahara

We present a two-stage multilingual dependency parsing system submitted to the Multilingual Track track of CoNLL-2007. The parser first identifies dependencies using a discriminative classifier and then labels those dependencies as a sequence labeling problem. The features for two stages are proposed. For four languages have different values of ROOT, we design some special features for ROOT labeler. Then we present evaluation results and error analysis focusing on Chinese.

Incremental Dependency Parsing Using Online Learning
Richard Johansson and Pierre Nugues

We describe an incremental parser that was trained to minimize cost over sentences rather than over individual parsing actions. This is an attempt to use the advantages of the two top-scoring systems in the CoNLL-X shared task.

In the evaluation, we present the performance of the parser in the Multilingual task, as well as an evaluation of the contribution of bidirectional parsing and beam search to the parsing performance.

Online Learning for Deterministic Dependency Parsing
Prashanth Reddy Mannem

Deterministic parsing has emerged as an effective alternative for complex parsing algorithms which search the entire search space to get the best probable parse tree. In this paper, we present an online large margin based training framework for deterministic parsing using Nivre's Shift-Reduce parsing algorithm. Online training facilitates the use of high dimensional features without creating memory bottlenecks unlike the popular SVMs. We participated in the CoNLL Shared Task-2007 and evaluated our system for ten languages. We got an average multilingual labeled attachment score of 74.54% (with 65.50% being the average and 80.32% the highest) and an average multilingual unlabeled attachment score of 80.30% (with 71.13% being the average and 86.55% the highest).

Covington Variations
Svetoslav Marinov

Three versions of the Covington algorithm for non-projective dependency parsing have been tested on the ten different languages for the Multilingual track of the CoNLL-X Shared Task. The results were achieved by using only information about heads and daughters as features to guide the parser which obeys strict incrementality. A memory-based learner was used to predict the next action of the parser.

A Multilingual Dependency Analysis System Using Online Passive-Aggressive Learning
Le-Minh Nguyen, Akira Shimazu, Phuong-Thai Nguyen and Xuan-Hieu Phan

This paper presents an online algorithm for dependency parsing problems. We propose an adaptation of the passive and aggressive online learning algorithm to the dependency parsing domain. We evaluate the proposed algorithms on the 2007 CONLL Shared Task, and report errors analysis. Experimental results show that the system score is better than the average score among the participating systems.

Global Learning of Labeled Dependency Trees
Michael Schiehlen and Kristina Spranger

In the paper we describe a dependency parser that uses exact search and global learning (Crammer et al., 2006) to produce labelled dependency trees. Our system integrates the task of learning tree structure and learning labels in one step, using the same set of features for both tasks. During label prediction, the system automatically selects for each feature an appropriate level of smoothing. We report on several experiments that we conducted with our system. In the shared task evaluation, it scored better than average.

Pro3Gres Parser in the CoNLL Domain Adaptation Shared Task
Gerold Schneider, Kaarel Kaljurand, Fabio Rinaldi and Tobias Kuhn

We present Pro3Gres, a deep-syntactic, fast dependency parser that combines a hand-written competence grammar with probabilistic performance disambiguation and that has been used in the biomedical domain. We discuss its performance in the domain adaptation open submission. We achieve average results, which is partly due to difficulties in mapping to the dependency representation used for the shared task.

Structural Correspondence Learning for Dependency Parsing
Nobuyuki Shimizu and Hiroshi Nakagawa

Following (Blitzer et al., 2006), we present an application of structural correspondence learning to non-projective dependency parsing (McDonald et al., 2005). To induce the correspondences among dependency edges from different domains, we looked at every two tokens in a sentence and examined whether or not there is a preposition, a determiner or a helping verb between them. Three binary linear classifiers were trained to predict the existence of a preposition, etc, on unlabeled data and we used singular value decomposition to induce new features. During the training, the parser was trained with these additional features in addition to these described in (McDonald et al., 2005). We discriminatively trained our parser in an on-line fashion using a variant of the voted perceptron (Collins, 2002; Collins and Roark, 2004; Crammer and Singer, 2003).

Adapting the RASP System for the CoNLL07 Domain-Adaptation Task
Rebecca Watson and Ted Briscoe

We describe our submission to the domain adaptation track of the CoNLL07 shared task in the open class for systems using external resources. Our main finding was that it was very difficult to map from the annotation scheme used to prepare training and development data to one that could be used to effectively train and adapt the RASP system unlexicalised parse ranking model. Nevertheless, we were able to demonstrate a significant improvement in performance utilising bootstrapping over the PBIOTB data.

Multilingual Deterministic Dependency Parsing Framework using Modified Finite Newton Method Support Vector Machines
Yu-Chieh Wu, Jie-Chi Yang and Yue-Shi Lee

In this paper, we present a three-step multi-lingual dependency parser based on a deterministic shift-reduce parsing algorithm. Different from last year, we separate the root-parsing strategy as sequential labeling task and try to link the neighbor word dependences via a near neighbor parsing. The outputs of the root and neighbor parsers were encoded as features for the shift-reduce parser. We found that our method could benefit from the two-preprocessing stages. To speed up training, in this year, we employ the MFN-SVM (modified finite-Newton method support vector machines) which can be learned in linear time. The experimental results show that our method achieved the middle rank over the 23 teams. We expect that our method could be further improved via well-tuned parameter validations for different languages.

Last modified: Tue Jul 17 16:30:58 EDT 2007