Statistical Language Learning
Prof. Jason Eisner
Course # 600.665  Spring 2002


"When the going gets tough, the tough get empirical"  Jon Carroll
Course Description
Catalog description: This course focuses on past and present research that has attempted,
with mixed success, to induce the structure of language from raw data
such as text. Lectures will be intermixed with reading and discussion of
the primary literature. Students will critique the readings, answer
openended homework questions, and undertake a final project.
[Applications]
Prereq: 600.465 or perm
req'd.
The main goals of the seminar are (a) to cover some techniques
people have tried for inducing hidden structure from text, (b) to get
you thinking about how to do it better.
Since most of the techniques in (a) don't perform that well, (b) is
more important.
The course should also help to increase your comfort with the
building blocks of statistical NLP  weighted transducers,
probabilistic grammars, graphical models, etc., and the supervised
training procedures for these building blocks.
Links:
Vital Statistics
Lectures:  MTW 23 pm, Shaffer 304 (but we'll
move to the NEB 325a conference room if we're not too big) 
Prof:  Jason Eisner  jason@cs.jhu.edu 
Office hrs: 
MW 34 pm, or by appt, in NEB 326

Web page:  http://cs.jhu.edu/~jason/665 
Mailing list:  cs665@cs.jhu.edu (cs665 also works on NLP lab machines) 
Textbook:  none, but the textbooks for 465 may come in handy 
Policies: 
Grading: 30% written responses (graded as check/checkplus, etc.), 30% class participation, 40% project.
Announcements: New readings announced by email and posted below.
Submission: Email me written responses to the whole week's
readings by 11 am each Monday.
Academic honesty: dept. policy (but you can work in pairs on reading responses)

Readings and Responses
Generally we will discuss about 3 related papers each week. Since we
may flit from paper to paper, comparing and contrasting,
you should read all the papers by the start of the week.
A centerpiece of the course is the requirement to respond
thoughtfully to each paper in writing. You should email me your
responses to the upcoming week's papers, in separate plaintext or
postscript messages, by noon each Monday. (Include "665
response" and the paper's authors in the subject line.) I will print
the responses out for everyone, and they will anchor our class
discussion. They will also be a useful source of ideas for your final
projects.
A typical response is 13 paragraphs; in a given week you might
respond at greater length to some papers than others. It's okay to
work with another person. What should you write about? Some
possibilities:
 Idea for a new experiment, model or other research opportunity
inspired by the reading
 A clearer explanation of some point that everyone probably had to
struggle with
 Unremarked consequences of the experimental design or results
 Additional experiments you really wish the author had done
 Other ways the research could be improved (e.g., flaws you spotted)
 Nonobvious connections to other work you know about from class or elsewhere
Please be as concrete as possible  and write clearly, since your
classmates will be reading your words of wisdom!
The Readings
Suggestions for readings are welcome, especially well in advance.
 Week of Jan. 28: Bootstrapping
We will read one or two of these for Wednesday (to be chosen
in class on Monday).
 Week of Feb. 4: Classes of "interchangeable" words
 Chapter 3 of: Lillian Lee (1997). Similaritybased approaches to natural
language processing. Ph.D. thesis.
Harvard University Technical Report TR1197.
http://xxx.lanl.gov/ps/cmplg/9708011
 Chapter 4 of: The same thing.
 Deerwester, S., Dumais, S. T., Landauer, T. K., Furnas, G. W. and
Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the
Society for Information Science, 41(6), 391407.
http://lsi.research.telcordia.com/lsi/papers/JASIS90.pdf;
scanned version with figures
 Week of Feb. 11: Word meanings, word boundaries
 Carl de Marcken (1996). Linguistic structure as
composition and perturbation. Proceedings of ACL96.
http://xxx.lanl.gov/ps/cmplg/9606027
 Chengxiang Zhai (1997). Exploiting context to identify lexical atoms: A statistical view of
linguistic context.
Proceedings of the International and Interdisciplinary Conference on Modelling and Using Context
(CONTEXT97), Rio de Janeiro, Brzil, Feb. 46, 1997. 119129.
http://arXiv.org/ps/cmplg/9701001
 Jeffrey Mark Siskind:
 (1995) `Robust Lexical Acquisition Despite Extremely Noisy Input,' Proceedings of the 19th Boston University
Conference on Language Development (edited by
D. MacLaughlin and S. McEwen), Cascadilla Press, March.
ftp://ftp.nj.nec.com/pub/qobi/bucld95.ps.Z
 Section 6 of: (1996) A Computational Study of CrossSituational Techniques for Learning WordtoMeaning Mappings.
Cognition 61(12): 3991, October/November.
ftp://ftp.nj.nec.com/pub/qobi/cognition96.ps.Z
 Week of Feb. 18: HMMs and PartofSpeech Tagging
 Week of Feb. 25: Unsupervised FiniteState Topology
 Eric Brill (1995). Unsupervised Learning of
Disambiguation Rules for Part of Speech Tagging. Proc. of 3rd
Workshop on Very Large Corpora, MIT, June. Also appears in
Natural Language Processing Using Very Large Corpora,
1997. http://www.cs.jhu.edu/~brill/aclwkshp.ps.
 Sections 2.42.5 and Chapter 3 of: Andreas Stolcke (1994). Bayesian Learning of
Probabilistic Language Models. Ph.D., thesis, University of
California at Berkeley.
ftp://ftp.icsi.berkeley.edu/pub/ai/stolcke/thesis.ps.Z

Jose Oncina (1998). The data driven approach applied to the OSTIA algorithm.
In Proceedings of the Fourth International Colloquium on Grammatical Inference
Lecture Notes on Artificial Intelligence Vol. 1433, pp. 5056
SpringerVerlag, Berlin 1998. ftp://altea.dlsi.ua.es/people/oncina/articulos/icgi98.ps.gz
(draft)
Please also glance at the following papers so that you
roughly understand a couple of the variants that Oncina and his colleagues
have proposed: section 1 of this
paper on learning stochastic DFAs, and section 3
of this
paper dealing with OSTIAD and OSTIAR.
 Week of Mar. 4: Learning Tied FiniteState Parameters
 Kevin Knight and Jonathan Graehl (1998). Machine
Transliteration. Computational Linguistics
24(4):599612, December. [Hardcopy available and preferred; in a pinch,
read the slightly less detailed ACL97
version.]
 Richard Sproat and Michael Riley (1996). Compilation of
Weighted FiniteState Transducers from Decision
Trees. Proceedings of ACL. http://arXiv.org/ps/cmplg/9606018
 Jason Eisner (2002). Parameter Estimation for
Probabilistic FiniteState Transducers. Submitted to ACL.
http://cs.jhu.edu/~jason/papers/#acl02fst
 Week of Mar. 11: InsideOutside Algorithm
If you need to review the insideoutside algorithm, check
my course
slides before reading the following papers. The slide fonts are
unfortunately a bit screwy unless you view under Windows.
 K. Lari and S. Young (1990). The estimation of
stochastic contextfree grammars using the insideoutside
algorithm. Computer Speech and Language 4:3556. scanned PDF version
 Fernando Pereira and Yves Schabes (1992). Insideoutside
reestimation from partially bracketed corpora. Proceedings of
the 20th Meeting of the Association for Computational
Linguistics. scanned PDF version
 Carl de Marcken (1995). On the unsupervised induction of
phrasestructure grammars. Proc. of the 3rd Workshop on Very
Large Corpora. http://bobo.link.cs.cmu.edu/grammar/demarcken.ps
 Week of Mar. 18: Spring break!
 Week of Mar. 25: More CFG Learning
 Week of Apr. 2: Maximum Entropy Parsing Models
 Week of Apr. 9: Bootstrapping Syntax
 Week of Apr. 16: Neural nets
 Week of Apr. 23
 John M. Zelle and Raymond J. Mooney (1996). Comparative Results on Using Inductive Logic Programming for Corpusbased Parser
Construction. In S. Wermter, E. Riloff and G. Scheler
(Eds.), Symbolic, Connectionist, and Statistical Approaches to
Learning for Natural Language Processing. Springer Verlag.
http://www.cs.utexas.edu/users/ml/papers/chillbkchapter95.ps.gz
 Robert C. Berwick and Sam Pilato (1987). Learning Syntax
by Automata Induction. Machine Learning 2: 938.
scanned individual pages
Note: No class on Wednesday April 24.
 Week of Apr. 30
 Monday, May 13: Due date for final project
 Wednesday, May 15, 9am12pm: Project presentation party (in
lieu of final exam) with 20minute talks