|
|
||||
|
|
|
|
|
|
Natural Language Processing
|
|
http://cs.jhu.edu/~jason/465).Course catalog entry: This course is an in-depth overview of techniques for processing human language. How should linguistic structure and meaning be represented? What algorithms can recover them from text? And crucially, how can we build statistical models to choose among the many legal answers?
The course covers methods for trees (parsing and semantic interpretation), sequences (finite-state transduction such as tagging and morphology), and words (sense and phrase induction), with applications to practical engineering tasks such as information retrieval and extraction, text classification, part-of-speech tagging, speech recognition, and machine translation. There are a number of structured but challenging programming assignments. Prerequisite: 600.226 or equivalent. [Eisner, Applications, Fall] 3 credits
More information: Welcome! This course is designed to introduce you to some of the problems and solutions of NLP, and their relation to linguistics and statistics. You need to know how to program (e.g., 600.120) and use common data structures (600.226). It might also be nice to have some previous familiarity with automata (600.271) and probabilities (600.475, 550.420, or 550.310). At the end you should agree (I hope!) that language is subtle and interesting, feel some ownership over some of NLP's formal and statistical techniques, and be able to understand research papers in the field.
| Lectures: | MWF 3-4 pm (sometimes 3-4:30), Hackerman B17 | |
| Prof: | Jason Eisner - ( ) | |
| TA: | Xuchen Yao - (xuchen at cs dot jhu dot edu) | |
| CA: | TBA | |
| Office hrs: |
For Prof: MW 4pm after class (or by appt) in Hackerman 324C For TA: M 12-1, Th 11-12, in Hackerman 226 | |
| Discussion session: | TA-led session (optional) for activities/discussion/questions/review: Th 3:30-4:30, in Hackerman 306 | |
| Discussion site: |
http://piazza.com/class#fall2011/600465
... public questions, discussion, announcements | |
| Web page: | http://cs.jhu.edu/~jason/465 | |
| Textbook: |
Jurafsky &
Martin, 2nd ed. (semi-required - P98.J87 2009 in Science Ref
section on C-Level) Roark & Sproat (recommended - P98.R63 2007 in same section) Manning & Schütze (recommended - free online PDF version here!) | |
| Policies: |
Grading: homework 50%, participation 5%, midterm 15%, final 30% Submission: TBA Lateness: floating late days policy Honesty: here's what it means Intellectual engagement: much encouraged Announcements: Read mailing list and this page! | |
| Related sites: |
|
This class is in the 3-4:30 time slot, a "flex slot" that is intended to permit either three 50-minute lectures or two 75-minute lectures per week. Please try to keep the whole slot free, especially on MW:
Warning: For future lectures and assignments, the links below take you to last year's versions, which are subject to change. The schedule may also change.
Warning: I sometimes turn off the PDF links when they are not up to date with the PPT links. If they don't work, just click on "ppt" instead.
| Week | Monday | Wednesday | Friday | Suggested Reading | |
| 8/29 |
Introduction
(ppt)
|
Assignment 1 given: Designing CFGs Chomsky hierarchy (ppt) |
Language models
(ppt)
|
|
|
| 9/5 | No class (Labor Day) |
Probability concepts
(ppt; video lecture)
|
Bayes' Theorem
(ppt) Smoothing n-grams (ppt) |
|
|
| 9/12 |
(& another sign meant 3 ... ?) Assignment 2 given: Probabilities Limitations of CFG |
Improving CFG with features
(ppt)
|
Catch-up day (we'll be behind schedule by now) |
|
|
| 9/19 |
Assignment 2 due Assignment 3 given: Language Models Context-free parsing (ppt) |
Context-free parsing
|
Earley's algorithm
(ppt)
|
|
|
| 9/26 |
(not covered this year) Extending CFG (summary (ppt)) |
No class (Rosh Hashanah) |
Probabilistic parsing
(ppt)
|
| |
| 10/3 |
Assignment 3 due Assignment 4 given: Parsing Parsing tricks (ppt) |
Catch-up day (we'll be behind schedule by now) |
(not covered this year) Human sentence processing (ppt) |
TBA | |
| 10/10 |
(Monday 10/10 is fall break day;
but class meets on Tuesday 10/11,
which will follow a Monday schedule) Semantics (ppt) |
Semantics continued
|
Assignment 5 given: Semantics |
Semantics continued J&M 17-18; also this
web page, up to but not including "denotational semantics"
section; and you could try the Penn Lambda Calculator;
and how about lambda
calculus for kids? |
| |
| 10/17 |
Midterm exam (3-4:30 in classroom) |
Forward-backward algorithm (ppt)
(Excel spreadsheet; Viterbi version; lesson plan; video lecture)
|
Forward-backward continued
|
J&M 6 or perhaps Allen pp. 195-208 (handout); M&S 11 | |
| 10/24 |
Assignment 4 due Assignment 6 given: Hidden Markov Models Expectation Maximization (ppt) |
Finite-state algebra
(ppt)
|
Finite-state machines
|
John Lafferty's inside-outside notes; R&S 1 | |
| 10/31 |
Finite-state implementation
(ppt)
|
Finite-state tagging
(ppt)
|
Assignment 5 due Noisy channels and FSTs (ppt) |
chaps 2-3 of xfst book draft (only accessible from barley and other Solaris machines at JHU CS; don't distribute); R&S 2; perhaps also this paper | |
| 11/7 |
More FST examples
(ppt)
|
Programming with regexps
(ppt)
|
(covered only partially this year) |
J&M 5 or M&S 10 | |
| 11/14 |
Assignment 7 given: Finite-State Modeling Log-linear models (ppt) |
Assignment 6 due Structured prediction |
Global models
|
J&M 6 | |
| 11/21 |
(not covered this year) Topic models (video lecture part 1, part 2) |
No class (Thanksgiving break) |
No class (Thanksgiving break) |
M&S 15.2, 15.4 | |
| 11/28 |
(not covered this year) Machine translation |
Training MT systems
|
Current NLP tasks and
competitions
(ppt)
|
J&M 25, M&S 13, statmt.org; tutorial (2003), workbook (1999), introductory essay (1997), technical paper (1993); tutorial (2006) focusing on more recent developments (slides, 3-hour video part 1, part 2) |
|
| 12/5 |
Assignment 7 due NLP tasks continued | Sun 12/11 is absolute deadline for late assignments ---> |
Final exam: Thu 12/15, 9am-noon ---> |