601.465/665 - Natural Language Processing

Natural Language Processing
Prof. Jason Eisner
Course # 601.465/665 — Fall 2024

Announcements

11/23/24 HW8 is out! It's 1 page long with no reading handout, but goes with a couple of Python notebooks and some code to inspect. You may work in pairs. Due date is Sunday, December 8, at 11:59pm (as late as we could make it without illegally cutting into reading period).
11/16/24 HW7 is finally available as well. We provided its medium-length reading handout several days ago. You may continue to work with your HW6 partner if you like. HW7 is now due on Tuesday, November 26, at noon -- but try to finish it by Friday, Nov 22 so it doesn't ruin your Thanksgiving break! (There may not be office hours etc. during Thanksgiving break.)
11/3/24 The rest of HW6 is finally available. We provided a long reading handout (several days ago) to review the ideas and fill in some details. This homework shouldn't be too hard conceptually if you followed the HMM and CRF lectures, but you'll still have to keep track of a lot of ideas, code, and experiments. You may work in pairs. The deadline is Tuesday 11/12. Note that HW7 will build on HW6, so you'll continue working with this codebase (and optionally with the same partner).
10/21/24 HW5 is available, with a short "reading handout" appended to it. It deals with attaching semantic λ-expressions to grammar rules. It is due on Monday, 10/28, at 11pm.
9/27/24 HW4 is available, with a separate "reading handout" appended to it. You may want to do HW3 first, but we're making HW4 available now so that you can read the handout while parsing is still fresh in your mind from lecture. The reading might also help you study parsing for the midterm. This is the conceptually hardest homework project in the course, with two major challenges: probabilistic Earley parsing, and making parsing efficient. It is due on Monday, 10/21, at 11pm. You may work with a partner on this one.
9/22/24 HW3 is now available, with a separate "reading handout" appended to it. The due date is Sun, 10/6, at 11pm. Start early: This is a long and detailed homework that requires you to write some smoothing code and experiment with their parameters and design to see what happens. It should be manageable because we've already covered the ideas in class and on HW2, and because we've provided you with a good deal of code. But it may take some time to understand that code and the libraries that it uses (especially PyTorch). I strongly suggest that you start reading the 27-page reading handout now, then study the starter code and ask questions on Piazza as needed. Spread the work out. You may work in pairs.
9/6/24 HW2 (11 pages) is available. It's due in a little over 2 weeks: Mon 9/23 at 2pm. This homework is mostly a problem set about manipulating probabilities. But it is a long homework! Most significantly, question 6 asks you to read a separate handout and to work through a series of online lessons, preferably with a partner or two. Question 8 asks you to write a small program. It is okay to work on questions 6 and 8 out of order.
8/28/24 HW1 (12 pages) is available. It is due on Wed 9/11 at 2pm: please get this one in on time so we can discuss it in class an hour later.
8/26/24 First class is Mon 8/26, 3pm, Krieger 205. As explained on the syllabus, please keep MWF 3-4:30 pm open to accommodate a variable class schedule as well as office hours after class. Our weekly recitations are Tue 6-7:30 pm.
8/26/24 Please bookmark this page. All enrolled students will soon be added to Piazza. (If you are waitlisted, I will send you a code by email that you can use to join Piazza if you are attending the class in hopes of getting a seat.) Later, when Homework 1 is due, we will tell you how to join Gradescope.

Key Links

Syllabus -- reference info about the course's staff, meetings, office hours, textbooks, goals, expectations, and policies. May be updated on occasion.
Piazza site for discussion and announcements. Sign up, follow, and participate!
Gradescope for submitting your homework.
Office hours for the course staff.
Video recordings (see policy on syllabus)

Schedule

Warning: The schedule below is adapted from last year's schedule and may still change! Links to future lecture slides, homeworks, and dates currently point to last year's versions. Watch Piazza for important updates, including when assignments are given and when they are due.

What's Important? What's Hard? What's Easy? [1 week]

Mon 8/26:

Introduction
- NLP applications
- Ambiguity
- Levels of language
- Random language via n-grams
- Optional reading about scope and history of the field: J&M (2nd ed.) chapter 1

Wed 8/28:

Modeling grammaticality
- What's wrong with n-grams?
- Regular expressions, FSAs, CFGs, ...
- Optional reading about formal languages: J&M 2
HW1 given: Designing CFGs

Fri 8/30:

Uses of language models
- Language ID
- Text categorization
- Spelling correction
- Segmentation
- Speech recognition
- Machine translation
- Optional reading about n-gram language models: J&M 3 (or M&S 6)

Probabilistic Modeling [1 week]

Mon 9/2 (Labor Day: no class)
Wed 9/4, Fri 9/6:

Probability concepts
- Joint & conditional prob
- Chain rule and backoff
- Modeling sequences
- Surprisal, cross-entropy, perplexity
- Optional reading about probability, Bayes' Theorem, information theory: M&S 2; slides by Andrew Moore
Smoothing n-grams (video lessons, 52 min. total)
- Maximum likelihood estimation
- Bias and variance
- Add-one or add-λ smoothing
- Cross-validation
- Smoothing with backoff
- Good-Turing, Witten-Bell (bonus slides)
- Optional reading about smoothing: M&S 6; J&M 4; Rosenfeld (2000)
HW2 given: Probabilities

Mon 9/9:

Bayes' Theorem
Log-linear models (self-guided interactive visualization with handout)
- Parametric modeling: Features and their weights
- Maximum likelihood and moment-matching
- Non-binary features
- Gradient ascent
- Regularization (L2 or L1) for smoothing and generalization
- Conditional log-linear models
- Application: Language modeling
- Application: Text categorization
- Optional readings about log-linear models: Collins (pp. 1-4), Smith (section 3.5), J&M 5

Grammars and Parsers [3- weeks]

Wed 9/11:

HW1 due
- In-class discussion of HW1
Improving CFG with attributes (video lessons, 62 min. total)
- Morphology
- Lexicalization
- Post-processing (CFG-FST composition)
- Tenses
- Gaps (slashes)
- Optional reading about syntactic attributes: J&M 15 (2nd ed.)

Wed 9/11 (continued), Fri 9/13, Mon 9/16:

HW3 given: Language Models
Context-free parsing
- What is parsing?
- Why is it useful?
- Bottom-up algorithms, working up to CKY algorithm
- From recognition to parsing
- Incremental strategy
- Dotted rules
- Sparse matrices
- Quick reference: CKY and Earley algorithms
- Optional reading about parsing: J&M 18

Wed 9/18, Fri 9/20:

Earley's algorithm
- Top-down parsing: Recursive descent
- Earley's algorithm

Mon 9/23:

HW2 due
Quick in-class quiz: Log-linear models
Probabilistic parsing
- PCFG parsing
- Dependency grammar
- Lexicalized PCFGs
- Optional reading on probabilistic parsing: M&S 12, J&M Appendix C

Wed 9/25:

Parsing tricks
- Pruning; best-first
- Rules as regexps
- Left-corner strategy
- Evaluation
- Optional listening: A song about parsing
HW4 given: Parsing

Fri 9/27:

Human sentence processing (time permitting)
- Methodology
- Frequency sensitivity
- Incremental interpretation
- Unscrambling text
- Optional reading on psycholinguistics: Tanenhaus & Trueswell (2006), Human Sentence Processing website

Representing Meaning [1 week]

Mon 9/30, Wed 10/2, Fri 10/4:

HW3 due on Wed 10/2
Semantics
- What is understanding?
- Lambda terms
- Semantic phenomena and representations
- More semantic phenomena and representations
- Adding semantics to CFG rules
- Compositional semantics
- Optional readings on semantics:
  - J&M 17-18
  - this chapter from an online programming languages textbook
  - this web page, up to but not including "denotational semantics" section
  - try the Lambda Calculator
  - lambda calculus for kids
HW5 given: Semantics

Midterm

~~Mon 10/7~~ Fri 10/11:

Midterm exam (3-4:30, in classroom)

Representing Everything: Deep Learning for NLP [1+ week]

Wed 10/9, Fri 10/11, Mon 10/14, Wed 10/16:

Back-propagation (video lesson, 33 min.)
Neural methods
- Vectors, matrices, tensors; PyTorch operations; linear and affine operations
- Log-linear models, temperatures, learned features, nonlinearities
- Vectors as an alternative semantic representation
- Training signals: Categorical labels, similarity, matching
- Encoders and decoders
- End-to-end training, multi-task training, pretraining + fine-tuning
- Self-supervised learning
- word2vec (skip-gram / CBOW)
- Recurrent neural nets (RNNs, BiRNNs, ELMo)
Optional reading about neural nets and RNNs: J&M 7, 8

Fri 10/18 (fall break: no class)

Unsupervised Learning [1+ week]

Mon 10/21, Wed 10/23:

HW4 due on Mon 10/21
Forward-backward algorithm (Excel spreadsheet; Viterbi version; lesson plan; video lecture)
- Ice cream, weather, words and tags
- Forward and backward probabilities
- Inferring hidden states
- Controlling the smoothing effect
- Reestimation
- Likelihood convergence
- Symmetry breaking
- Local maxima
- Uses of states
- Optional reading about forward-backward: J&M Appendix A
HW6 given: Unsupervised Tagging with HMMs

Fri 10/25, Mon 10/28:

Expectation Maximization
- Generalizing the forward-backward strategy
- Inside-outside algorithm
- Posterior decoding
- Optional reading on inside-outside and EM: John Lafferty's notes; M&S 11; Eisner on relation to backprop
HW5 due on Mon 10/28

Discriminative Modeling [1- week]

Wed 10/30, Fri 11/1:

Structured prediction
- Perceptrons
- CRFs
- Feature engineering
- Generative vs. discriminative
HW7 given: Discriminative Tagging with (neural) CRFs

Deep Learning for Structured Prediction; Transformers [1- week]

Mon 11/4, Wed 11/6:

Neural methods (continued)
- seq2seq: Structure prediction via sequence prediction (or via tagging)
- Decoders: Exact, greedy, beam search, independent, dynamic programming, stochastic, Minimum Bayes Risk (MBR)
- Attention
- Transformers (encoder-decoder, encoder-only (BERT), decoder-only (LM))
- Positional embeddings
- Tokenization
- Parameter-efficient fine tuning, distillation, RLHF (REINFORCE, PPO, DPO)
- Optional Reading on Transformers: The Illustrated Transformer; J&M 9, J&M 11; GPT-2 spreadsheet

Harnessing Large Language Models [1+ week]

Fri 11/8, Mon 11/11, Wed 11/13, Fri 11/15:

HW6 due on Fri 11/8
Few-shot learning with prompted language models [at recitation]
Black-box use of large language models
HW8 given: Large Language Models
Optional reading on LLMs: J&M 10, J&M 12

NLP Applications [2 weeks]

Mon 11/18, Wed 11/20, Fri 11/22,
Mon 11/25 (Thanksgiving break), Wed 11/27 (more break), Fri 11/29 (more break),
Mon 12/2, Wed 12/4, Fri 12/6:

HW7 due on Fri 11/22
Current NLP tasks and competitions
- The NLP research community
- Text annotation tasks
- Other types of tasks
- Optional reading: Explore links in the "NLP tasks" slides!
HW8 due on 12/8

Final

Exam period (12/11 - 12/19):

Final exam review session (date TBA)
Final exam (Wed 12/18, 6pm-9pm, Krieger 205)

Unofficial Summary of Homework Schedule

These dates were copied from the schedule above, which is subject to change. Homeworks are due approximately every two weeks, with longer homeworks getting more time. But the homework periods are generally longer than two weeks -- they overlap. This gives you more flexibility about when to do each assignment, which is useful if you have other classes and activities. We assign homework n as soon as you've seen the lectures you need, rather than waiting until after homework n-1 is due. So you can jump right in while the material is fresh.

HW1 (grammar): given Wed 8/28, due Wed 9/11
HW2 (probability): given Fri 9/6, due Mon 9/23
HW3 (empiricism): given Fri 9/13, due ~~Wed 10/2~~ Sun 10/6
Midterm: ~~Mon 10/7~~ Fri 10/11:
HW4 (algorithms): given Wed 9/25, due Mon 10/21
HW5 (logic): given Fri 10/4, due Mon 10/28
HW6 (unsupervised learning): given Wed 10/23, due Fri 11/8
HW7 (discriminative learning): given Fri 11/1, due Fri 11/22
HW8 (large language models): given Mon 11/11, due Fri 12/6 (last day of class)

Recitation Schedule

Recitations are normally held on Tuesdays (see the syllabus). Enrolled students are expected to attend the recitation and participate in solving practice problems. This will be more helpful than an hour of solo study. The following schedule is subject to change.

Tue 8/27: Warmup puzzles: rules and n-grams (solution), combinatorics (solution)
Tue 9/3: ~~No recitation (to compensate for 2 upcoming hours of video lectures)~~ Extra lecture (but we'll skip Mon 9/9 lecture)
Tue 9/10: Probabilities (solutions)
Tue 9/17: Syntax (solutions)
Tue 9/24: Log-linear models (solutions)
Tue 10/1: Parsing (solutions)
Tue 10/8: Midterm review (try practice midterms in advance)
Tue 10/15: Semantics (solutions)
Tue 10/22: Deep learning (solutions)
Tue 10/29: HMMs and EM (solutions)
Tue 11/5: Discriminative modeling (solutions)
Tue 11/12: Interactive presentation of GPT-4
Tue 11/19: ~~Finite-state methods (solutions)~~ TBD (LLMs?)
Tue 11/26: No recitation (Thanksgiving break)
Tue 12/3: NLP applications (solutions)
TBA: Final exam review (try practice final in advance)

Old Materials

Lectures from past years, some still useful:

Tree-adjoining grammars (guest lecture by Darcey Riley)
Learning in the limit: Gold's Theorem
Finite-state methods:
- Finite-state algebra
- Finite-state implementation
- Programming with regexps
- Noisy channels and FSTs
- Finite-state tagging
- Morphology and phonology
- Optimal paths in graphs: A Dyna perspective
Word senses:
- Grouping words: Semantic clustering
- Clustering spreadsheet
- Splitting words: Word sense disambiguation and the Yarowsky algorithm
- Words vs. terms in IR: Collocation discovery and Latent Semantic Indexing
Topic models
- intro readings/slides from Dave Blei
- slides by Jason Eisner (video lecture part 1, part 2)
Machine translation:
- Guest lecture by Matt Post
- Competitive linking: A simple bitext alignment algorithm
- Guest lecture by Adam Lopez
- Guest lectures by Chris Callison-Burch on word-based and phrase-based models
NLP applications:
- Methods for text categorization
- named entity recognition (guest lecture by Delip Rao)
- entity disambiguation and linking, question answering, sentiment analysis, dependency parsing, semantic role labeling (guest lecture by Delip Rao)
- Knowledge extraction and dialogue systems

Old homeworks:

Natural Language Processing Prof. Jason Eisner Course # 601.465/665 — Fall 2024