Nicholas Andrews

I'm a Senior Research Scientist at the HLTCOE where I work on making AI systems safer and more reliable, with a recent interest in questions around privacy, detecting machine-generated content, and controllable/personalized generation. Please see my papers below for an up-to-date picture of my interests.

PhD students (primary or co-primary advisor):
David Mueller (→ Applied Scientist @ Amazon Search)
Aleem Khan (co-advised by Benjamin Van Durme)
Sophia Hager (co-advised by Kevin Duh)
Rafael Rivera-Soto
Andrew Wang (co-advised by Daniel Khashabi)

Postdocs:
Cristina Aggazzotti
Zexin Cai (co-advised by Matthew Wiesner)

Other student collaborators: Ashi Garg, Henry Li, Rachel Wicks

Hiring: I am actively recruiting postdoctoral fellows interested in generation, particularly in the areas of speech synthesis and detection of synthesized speech. Please feel free to contact me directly if you are interested! For PhD programs, please refer to this link for detailed instructions on how to apply.

Papers

Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence. Preprint (2025)
Sophia Hager, David Mueller, Kevin Duh, Nicholas Andrews
[pdf]
Less is More for Synthetic Speech Detection in the Wild. Preprint (2025)
Ashi Garg, Zexin Cai, Henry Li Xinyuan, Leibny Paola García-Perera, Kevin Duh, Sanjeev Khudanpur, Matthew Wiesner, Nicholas Andrews
[pdf] [code]
GenVC: Self-Supervised Zero-Shot Voice Conversion. Preprint (2025)
Zexin Cai, Henry Li Xinyuan, Ashi Garg, Leibny Paola García-Perera, Kevin Duh, Sanjeev Khudanpur, Matthew Wiesner, Nicholas Andrews
[pdf]
HLTCOE Submission to the VoicePrivacy Attacker Challenge. ICASSP (2025)
Henry Li Xinyuan, Zexin Cai, Ashi Garg, Leibny Paola Garcia-Perera, Kevin Duh, Sanjeev Khudanpur, Nicholas Andrews, and Matthew Wiesner
Are Paraphrases Generated by Large Language Models Invertible? Preprint (2024)
Rafael Rivera Soto, Barry Chen, Nicholas Andrews
[pdf]
Learning to Generate Verbalized Confidences. SFLLM @ NeurIPS (2024)
Sophia Hager and Nicholas Andrews
Financial Forecasting from Textual and Tabular Time Series. EMNLP Findings (2024)
Ross Koval, Nicholas Andrews, and Xifeng Yan
HLTCOE Submission to the 2024 Voice Privacy Challenge. SPSC (2024)
Henry Li Xinyuan, Zexin Cai, Ashi Garg, Leibny Paola Garcia-Perera, Kevin Duh, Sanjeev Khudanpur, Nicholas Andrews, and Matthew Wiesner
[pdf] [Best paper award]
Privacy versus Emotion Preservation Trade-Offs in Emotion-Preserving Speaker Anonymization. SLT (2024)
Zexin Cai, Henry Li Xinyuan, Ashi Garg, Leibny Paola Garcia-Perera, Kevin Duh, Sanjeev Khudanpur, Nicholas Andrews, and Matthew Wiesner
[arxiv] [pdf]
Can Optimization Trajectories Explain Multi-Task Transfer? TMLR (2024)
David Mueller, Mark Dredze, Nicholas Andrews
[arxiv] [pdf]
Multi-Task Transfer Matters During Instruction-Tuning. ACL Findings (2024)
David Mueller, Mark Dredze, Nicholas Andrews
[pdf]
AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies. EMNLP (2024)
Xiao Ye, Andrew Wang, Jacob Choi, Yining Lu, Shreya Sharma, Lingfeng Shen, Vijay Tiyyala, Nicholas Andrews, Daniel Khashabi
[arxiv] [pdf]
Can Authorship Attribution Models Distinguish Speakers in Speech Transcripts? TACL (2024)
Cristina Aggazzotti, Nicholas Andrews, Elizabeth Allyn Smith
[arxiv] [pdf]
Learning to Compare Financial Reports for Financial Forecasting. Findings of EACL (2024)
Ross Koval, Nicholas Andrews, and Xifeng Yan
[pdf]
Few-Shot Detection of Machine-Generated Text using Style Representations. ICLR (2024)
Rafael Rivera Soto, Kailin Koch, Aleem Khan, Barry Chen, Marcus Bishop, Nicholas Andrews
[arxiv] [pdf] [code] [demo]
Learning to Generate Text in Arbitrary Writing Styles. Preprint (2023)
Aleem Khan, Andrew Wang, Sophia Hager, Nicholas Andrews
[arxiv] [pdf]
Can Authorship Representation Learning Capture Stylistic Features? TACL (2023)
Andrew Wang, Cristina Aggazzotti, Rebecca Kotula, Rafael Rivera-Soto, Marcus Bishop, and Nicholas Andrews
[arxiv] [pdf]
Forecasting Earnings Surprises from Conference Call Transcripts. Findings of ACL (2023)
Ross Koval, Nicholas Andrews, and Xifeng Yan
[aclanthology] [pdf]
Low-Resource Authorship Style Transfer: Can Non-Famous Authors Be Imitated? Preprint (2023)
Ajay Patel, Nicholas Andrews, Chris Callison-Burch [arxiv] [pdf]
The Importance of Temperature in Multi-Task Optimization. OPT Workshop @ NeurIPS (2022)
David Mueller, Mark Dredze, and Nicholas Andrews
[pdf]
Do Text-to-Text Multi-Task Learners Suffer from Task Conflict? Findings of EMNLP (2022)
David Mueller, Mark Dredze, and Nicholas Andrews
[pdf]
Learning Universal Authorship Representations. EMNLP (2021)
Rafael Rivera-Soto, Olivia Miano, Juanita Ordonez, Barry Chen, Aleem Khan, Marcus Bishop and Nicholas Andrews
[pdf]
A Deep Metric Learning Approach to Account Linking. NAACL (2021)
Aleem Khan, Elizabeth Fleming, Noah Schofield, Marcus Bishop, Nicholas Andrews
[aclweb] [arxiv] [code + data]
Ensemble Distillation for Structured Prediction: Calibrated, Accurate, Fast - Choose Three. EMNLP (2020)
Steven Reich, David Mueller, Nicholas Andrews
[aclweb] [arxiv] [code]
Sources of Transfer in Multilingual Named Entity Recognition. ACL (2020)
David Mueller, Nicholas Andrews, Mark Dredze
[aclweb] [paper] [bib] [arxiv] [video]
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning. RepL4NLP (2020)
Mitchell A Gordon, Kevin Duh, Nicholas Andrews
[aclweb] [paper] [bib] [arxiv] [video]
Learning Invariant Representations of Social Media Users. EMNLP (2019)
Nicholas Andrews and Marcus Bishop
[aclweb] [arxiv] [code] [data] [video]
Convolutions Are All You Need (For Classifying Character Sequences). EMNLP WNUT (2018)
Zach Wood-Doughty, Nicholas Andrews, and Mark Dredze
[pdf]
Predicting Twitter User Demographics from Names Alone. NAACL PEOPLES (2018)
Zach Wood-Doughty, Nicholas Andrews, Rebecca Marvin, and Mark Dredze
[pdf]
Bayesian Modeling of Lexical Resources for Low-Resource Settings. ACL (2017)
Nicholas Andrews, Mark Dredze, Benjamin Van Durme, and Jason Eisner
[pdf] [code] [slides]
Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation.
SocialNLP (2016)
Mark Dredze, Nicholas Andrews and Jay DeYoung
[pdf]
Generative Non-Markov Models for Information Extraction. Dissertation (2015)
Nicholas Andrews (advised by Jason Eisner and Mark Dredze)
Robust Entity Clustering via Phylogenetic Inference. ACL (2014)
Nicholas Andrews, Jason Eisner, and Mark Dredze
[pdf] [full paper] [bib] [code] [slides]
PARMA: A Predicate Argument Aligner. ACL (2013)
Travis Wolfe, Benjamin Van Durme, Mark Dredze, Nicholas Andrews, Charley Beller, Chris Callison-Burch, Jay DeYoung, Justin Snyder, Jonathan Weese, Tan Xu, and Xuchen Yao
[pdf] [bib]
Name Phylogeny: A Generative Model of String Variation. EMNLP (2012)
Nicholas Andrews, Jason Eisner, and Mark Dredze
[pdf] [bib]
Entity Clustering Across Languages. NAACL (2012)
Spence Green, Nicholas Andrews, Matthew R. Gormley, Mark Dredze,
and Christopher D. Manning
[pdf]
Transformation Process Priors. NeuIPS NP Bayes (2011)
Nicholas Andrews and Jason Eisner
[pdf] [bib]
Seeded Discovery of Base Relations in Large Corpora. EMNLP (2008)
Nicholas Andrews and Naren Ramakrishnan
[pdf] [bib]
Recent Developments in Document Clustering. TR (2008)
Nicholas Andrews and Edward A. Fox
[pdf]