|
|
||||
|
|
|
|
|
|
Natural Language Processing
|
|
http://cs.jhu.edu/~jason/465).Welcome! This course is designed to introduce you to some of the problems and solutions of NLP, and their relation to linguistics and statistics. You need to know how to program (e.g., 600.120) and use common data structures (600.226). It might also be nice to have some previous familiarity with automata (600.271) and probabilities (600.475, 550.420, or 550.310). At the end you should agree (I hope!) that language is subtle and interesting, feel some ownership over some of NLP's formal and statistical techniques, and be able to understand research papers in the field.
Course catalog entry: This course is an in-depth overview of techniques for processing human language. How should linguistic structure and meaning be represented? What algorithms can recover them from text? And crucially, how can we build statistical models to choose among the many legal answers? The course covers methods for trees (parsing and semantic interpretation), sequences (finite-state transduction such as morphology), and words (sense and phrase induction), with applications to practical engineering tasks such as information retrieval and extraction, text classification, part-of-speech tagging, speech recognition and machine translation. There are a number of structured but challenging programming assignments. Prerequisite: 600.226. [Eisner, Applications, Fall] 3 credits
| Lectures: | MWF 3-4 pm (sometimes 3-4:30), CSEB B17 | |
| Prof: | Jason Eisner - ( ) | |
| TA: | Jonathan Weese - (jonny at cs dot jhu dot edu) | |
| CA: | Chuan Liu - (chuan at cs dot jhu dot edu) | |
| Office hrs: |
For Prof: MW 4pm after class (or by appt) in CSEB 324C For TA: Th 2-4 (or by appt) in CSEB 322 | |
| Mailing list: |
... public questions, discussion, announcements | |
| Web page: | http://cs.jhu.edu/~jason/465 | |
| Textbook: |
Jurafsky &
Martin, 2nd ed. (semi-required - P98.J87 2009 in Science Ref section on C-Level) Manning & Schütze (recommended - online PDF version here!) | |
| Policies: |
Grading: homework 50%, participation 5%, midterm 15%, final 30% Submission: TBA Lateness: floating late days policy Honesty: here's what it means Intellectual engagement: much encouraged Announcements: Read mailing list and this page! | |
| Related sites: |
|
This class is in the 3-4:30 time slot, a "flex slot" that is intended to permit either three 50-minute lectures or two 75-minute lectures per week. Usually, we have three 50-minute lectures per week (MWF). However, you will see on the schedule below that I am fairly likely to cancel class on M 9/29, M 10/27, W 11/19, and M 12/8. I may therefore ask you to let me make up the time with a few 75-minute lectures, for example, on M and W in the week before a canceled class.
Warning: For future lectures and assignments, the links below take you to last year's versions, which are subject to change.
Warning: The Jurafsky & Martin chapter numbers refer to the 1st edition, not the 2nd. We will update them shortly.
Warning: I've turned off the PDF links for now because they are not all up to date with the PPT links. Just click on "ppt" instead.
| Week | Monday | Wednesday | Friday | Suggested Reading | |
| 8/31 |
Introduction
(ppt)
|
Assignment 1 given: Designing CFGs Chomsky hierarchy (ppt) |
J&M chapter 1; for assignment, J&M 9 (or M&S 3) | ||
| 9/7 | No class (Labor Day) |
Language models
(ppt)
|
Probability concepts
(ppt)
Bayes' Theorem (ppt) |
J&M chapters 13, 6.2 (see also M's slides
or Andrew
Moore's slides);
M&S chapter 2 |
|
| 9/14 |
Smoothing n-grams
(ppt)
|
Assignment 2 given: Using n-Grams (lecture postponed till 10/5 or 10/12) Human sentence processing (ppt) |
No class (Rosh Hashanah) But could have a Q&A / homework help session with the TA ... |
M&S chapter 6; Rosenfeld (2000) survey | |
| 9/21 |
(& another sign meant 3 ... ?) Limitations of CFG |
Improving CFG with features
(ppt)
|
Context-free parsing
(ppt)
|
J&M 10, 11.1-11.4 | |
| 9/28 | No class (Yom Kippur) But could have a Q&A / homework help session with the TA ... |
Context-free parsing
|
Earley's algorithm
(ppt)
|
J&M 10 | |
| 10/5 |
(lecture skipped this year) Extending CFG (summary (ppt)) |
Probabilistic parsing
(ppt)
|
Assignment 2 due
Assignment 3 given: Parsing Parsing tricks (ppt) A song about parsing |
J&M 12 (or M&S 11.1-11.3 and 12.1.1-12.1.5) | |
| 10/12 |
(lecture skipped this year) Learning is impossible? (ppt) |
Semantics
(ppt)
|
Semantics continued
|
J&M 14-15; also this web page, up to but not including "denotational semantics" section; and you could try the Penn Lambda Calculator; and how about lambda calculus for kids? | |
| 10/19 |
Assignment 4 given: Semantics Finite-state functions (ppt) |
Finite-state implementation
(ppt)
|
Midterm exam (in class) |
chap 2 of xfst book draft (only accessible from barley and other Solaris machines at JHU CS; don't distribute) | |
| 10/26 |
Programming with Regexps
(ppt)
|
Noisy Channels and FSTs
(ppt)
|
Assignment 3 due
(lecture skipped this year) |
Morphology and Phonology (ppt) chap 3 of xfst book draft;
perhaps also this paper |
| |
| 11/2 |
Finite-state parsing
|
Finite-state tagging
(ppt)
|
HMMs
|
J&M 8 or M&S 10 | |
| 11/9 |
Assignment 4 due Assignment 5 given: Finite-State Grammars Forward-backward algorithm (Excel spreadsheet; Viterbi version; lesson plan) |
Forward-backward continued
|
Expectation Maximization
(ppt)
|
J&M 6 (2nd ed.) or perhaps Allen pp. 195-208 (handout); M&S 11; John Lafferty's inside-outside notes | |
| 11/16 |
Final FSM Examples
(ppt)
|
Assignment 5 due Assignment 6 given: Training an HMM Grouping words (ppt) |
Splitting words
(ppt)
|
M&S 14, 7, 5, (since J&M 16-17 covers only some of this) | |
| 11/23 |
Words vs. senses in IR
(ppt)
|
No class (Thanksgiving break) |
No class (Thanksgiving break) |
M&S 15.2, 15.4 | |
| 11/30 |
(may replace this & next lecture with more general overview of machine learning in NLP) Text categorization (ppt) |
Log-linear models
(ppt)
|
Machine Translation
|
Text cat: M&S 16,
Science News article Log-linear: J&M 6 MT: J&M 25, M&S 13, statmt.org; tutorial (2003), workbook (1999), introductory essay (1997), technical paper (1993); tutorial (2006) focusing on more recent developments (slides, 3-hour video part 1, part 2) |
|
| 12/7 |
Assignment 6 due Current and Future Research (ppt) |
Sun 12/13 is absolute deadline for late assignments ---> |
Final exam: Thu 12/17, 9am-noon ---> |