Novel methods in NLP and ML. Focusing on probabilistic modeling and
inference in complex, structured, or ill-defined settings.
This often involves new machine learning; creative uses and
modifications of large language models; probabilistic models of
linguistic structure, human behavior, and machine behavior;
combinatorial algorithms and approximate inference.
I'm also into designing declarative specification languages backed by general efficient algorithms (and adaptive execution). This produces a coherent view of all of the modeling and algorithmic options, and accelerates the research of others.
The questions: Large language models attempt to
imitate typical human behavior. How can we combine this with
disciplines for ensuring rational behavior, such as statistics,
case analysis and planning, reinforcement learning, the scientific
method, and probabilistic modeling of the world? How can we use this
to support humans, including by integrating human preferences and
expertise?
The engineering motivation: Computers must learn to
understand human language. A huge portion of human communication,
thought, and culture now passes through computers. Ultimately, we
want our devices to help us by understanding text and speech as a human
would—both at the small scale of intelligent user interfaces and
at the large scale of the entire multilingual Internet.
The scientific motivation: Human language is
fascinatingly complex and ambiguous. Yet babies are born with the
incredible ability to discover the structure of the language around
them. Soon they are able to rapidly comprehend and produce that
language and relate it to events and concepts in the world. Figuring
out how this is possible is a grand challenge for both cognitive
science and machine learning.
The disciplines: My research program combines computer
science with statistics and linguistics. The challenge is to
fashion statistical models that are nuanced enough to capture
good intuitions about linguistic structure, and especially, to
develop efficient algorithms to apply these models to data
(including training them with as little supervision as possible, or
making use of large pre-trained models).
Models: I've developed significant modeling approaches
for a wide variety of domains in natural language
processing—syntax, phonology, morphology, and machine
translation, as well as semantic preferences, name variation, and even
database-backed websites. The goal is to capture not just the
structure of sentences, but also deep regularities within the grammar
and lexicon of a language (and across languages). My students and I
are always thinking about new problems and better models. For
example, latent variables and nonparametric Bayesian methods let us
construct a linguistically plausible account of how the data arose.
Our latest models continue to include linguistic ideas, but they
also include deep neural networks in order to fit
unanticipated regularities and large pre-trained language models to
exploit the knowledge implicit in large corpora.
Algorithms: A good mathematical model will define
the best analysis of the data, but can we compute that
analysis? My students and I are constantly developing new algorithms,
to cope with the tricky structured prediction and learning problems
posed by increasingly sophisticated models. Unlike many areas of
machine learning, we have to deal with probability distributions over
unboundedly large structured variables such as strings, trees,
alignments, and grammars. My favorite tools include dynamic
programming, Markov chain Monte Carlo (MCMC), belief propagation and
other variational approximations, automatic differentiation,
deterministic annealing, stochastic local search, coarse-to-fine
search, integer linear programming, and relaxation methods. I
especially enjoy connecting disparate techniques in fruitful new ways.
General paradigms: My students and I also work to
pioneer general statistical and algorithmic paradigms that cut across
problems (not limited to NLP). We are developing a high-level
declarative programming language, Dyna, which allows startlingly short
programs, backed up by many interesting general efficiency tricks so
that these don't have to be reinvented and reimplemented in new
settings all the time. We are also showing how to learn execution
strategies that do fast and accurate approximate statistical
inference, and how to properly train these essentially discriminative
strategies in a Bayesian way. We have also developed other
machine learning techniques and modeling frameworks of general
interest, primarily for structured prediction and temporal
sequence modeling.
Measuring success: We implement our new methods and evaluate
them carefully on collections of naturally occurring language. We
have repeatedly improved the state of the art. While our work can
certainly be used within today's end-user applications, such as
machine translation and information extraction, we ourselves are generally
focused on building up the long-term fundamentals of the field.
In general, I have broad interests and have worked on a wide range
of fundamental topics in NLP, drawing on varied areas of computer
science. See my papers, CV,
and research summary for more information;
see also notes on
my advising
style.
Undergraduates are often curious about their teachers' secret lives.
In the name of encouraging curiosity-driven research, here are a few
photos: