Natural Language Processing
Prof. Jason Eisner
Course # 601.465/665 — Fall 2020
- 12/13/20 Final exam will be Wed 12/16.
You may take it either at 9am or at 11pm, depending on your time zone.
The exam will be administered on Gradescope and will last 3 hours,
with an extra half-hour to upload it.
- 12/1/20 While we're not assigning HW7 this year,
you are welcome to try the 2018 version.
If you submit this extra homework, we'll give credit for the best 6
out of 7. We'd need it by 12/13.
- 11/25/20 HW6
is finally available -- the culmination of the course. (We already sent
you most of the reading handout a week ago.)
You may work in pairs. It is due on Wednesday, December 9, at
11:59pm (as late as we could make it without cutting into reading
- 10/25/20 HW5 is available, with a short
"reading handout" appended to it. It deals with
attaching semantic λ-expressions to grammar rules. It is
due on Wednesday, 11/4, at 2pm.
- 10/18/20 Midterm will be held on Wed 10/21,
online, with details announced on Piazza.
- 10/8/20 HW4
is available, with a separate "reading handout" appended to it. You may want
to do HW3 first, but we're making HW4 available now so that you can read the
handout while parsing is still fresh in your mind from lecture.
The reading might also help you study parsing for the midterm.
This is the conceptually hardest homework project in the course, with two
major challenges: probabilistic Earley parsing, and making parsing
efficient. It is due on
Monday, 10/26 Wednesday, 10/28, at 2pm. You
may work with a partner on this one.
- 10/6/20 HW3
is now available, with a separate "reading handout" appended to it.
The due date is Friday, 10/16, at 2pm. Start now: This is
a long and detailed homework that requires you to write
some smoothing code and experiment with their parameters and design
to see what happens. It should be manageable because we've provided
you with a good deal of code, but it may take some time to understand
that code and the libraries that it uses (especially PyTorch).
I strongly suggest
that you start reading the 20-page reading handout now, then study
the starter code and ask questions on Piazza as needed.
Spread the work out. You may work in pairs.
- 9/18/20 HW2 (11 pages) is available. It's due in 2 weeks, Friday 10/2 at 2pm. This homework is mostly a problem set about manipulating probabilities. But it is a long homework! Most significantly, question 6 asks you to read a separate handout and to work through a series of online lessons, preferably with a partner or two. Question 8 asks you to write a small program. It is okay to work on questions 6 and 8 out of order.
- 9/4/20 HW1
(12 pages) is available. It is due on Friday, 9/18 at 2pm: please
get this one in on time so we can discuss it in class an hour
- 8/31/20 Start of class is today, Monday
8/31, 3pm. Zoom information will be emailed to you or can be
found at https://meetinginfo.jhu.edu/. Please bookmark this page (
[Office hours for the TAs/CAs will be posted here.]
- Syllabus -- reference info about the course's staff, meetings, textbooks, goals,
expectations, and policies. May be updated on occasion.
site for discussion and announcements. Sign up, follow, and participate!
recordings of class meetings. Use these if you're sick or
stuck in a timezone where you can't come to the live class.
As the syllabus says: This year we'll often "flip the classroom," to make the best interactive use of our precious synchronous class meeting times. Thus, many of the class meetings will be used for Q&A, discussion, enrichment, and collaborative-problem solving. You'll be expected to watch lecture videos ahead of time, which will be announced on Piazza and posted below.
Warning: The schedule below may change! It should be correct up to the present, but for future dates, it shows last year's timetable and materials (adjusted to this year's dates). Watch Piazza for important updates such as new lecture videos, homeworks, and due date. Links to future lecture slides and homeworks currently point to last year's versions.
What's Important? What's Hard? What's Easy? [1 week]
- What's wrong with n-grams?
Regular expressions, FSAs, CFGs, ...
Optional reading about formal languages: J&M 16 (2nd ed.)
Language Models [1+ week]
Mon 9/7 (Labor Day: no class)
- Video lecture
- Joint & conditional prob
- Chain rule and backoff
- Modeling sequences
- Cross-entropy and perplexity
- Optional reading about probability, Bayes' Theorem, information theory: M&S 2; slides by Andrew Moore
Grammars and Parsers [3 weeks]
- HW1 due Fri 9/18
- Improving CFG with attributes
- Gaps (slashes)
- Optional reading about attributes: J&M 15 (2nd ed.)
HW2 due on 10/2
- PCFG parsing
- Dependency grammar
- Lexicalized PCFGs
- Optional reading on probabilistic parsing: M&S 12, J&M 14
Quick in-class quiz: Log-linear models
HW4 given: Parsing
- Pruning; best-first
- Rules as regexps
- Left-corner strategy
- Optional listening: A song about parsing
Representing Meaning [1 week]
- What is understanding?
- Lambda terms
- Semantic phenomena and representations
- More semantic phenomena and representations
- Adding semantics to CFG rules
Optional readings on semantics:
HW5 given: Semantics
Unsupervised Learning [2 weeks]
Fri 10/23 (fall break day)
Midterm exam (details TBA)
- Catch-up lecture day
Algebraic Methods [2+ weeks]
HW5 due on 11/4
- Regexp review
- Functions, relations, composition
- Simple applications
- Optional reading on finite-state operators: chaps 2-3 of XFST book draft
- Expressive power
- Weights and semirings
- Lattice parsing
Noisy channels and FSTs
- Spelling correction
- The noisy channel generalization
- Implementation using FSTs
- Optional reading on finite-state NLP: Karttunen (1997)
- Baby talk
- Edit distance
- Speech recognition
- The task
- Hidden Markov Models
- Optional reading on tagging: J&M 8 or M&S 10
- Morphology and phonology [not covered in 2020]
- English, Turkish, Arabic
- Compounds, segmentation
- Two-level morphology
- Rewrite rules
- Optional reading on morphology: R&S 2
Tying it All Together [1- weeks]
Applications [2 weeks]
Mon 11/23 (Thanksgiving break),
Wed 11/25 (Thanksgiving break),
Fri 11/27 (Thanksgiving break)
Lectures from past years, some still useful:
Optimal paths in graphs: A Dyna perspective
Tree-adjoining grammars (by Darcey Riley)
Grouping words: Semantic clustering
Word sense disambiguation and the Yarowsky algorithm
Words vs. terms in IR: Collocation discovery and Latent Semantic Indexing
Knowledge extraction and dialogue systems
Applied NLP tasks: