Data
- PHOENIX14T-HS [data]
A continuous sign language recognition and translation dataset with handshape annotations.
We have enriched the sign language recognition dataset PHOENIX14T by incorporating handshape labels derived from a public dictionary and manual labeling.
- Tabular Hyperparameter Optimization Dataset for Neural Machine Translation [data]
A benchmark dataset for comparing HPO methods on NMT models.
We trained a total of 2,245 Transformers on six different corpora with a cost of approximately 1,547 GPU days,
and collected all pairs of hyperparameter settings and corresponding performance metrics.
Tools
- Handshape-Aware Sign Language Recognition Systems [code]
A sign language recognition system that incorporates handshape information.
- A Hyperparameter Optimization Toolkit for Neural Machine Translation Research [code]
A hyperparameter optimization toolkit for neural machine translation to help researchers focus their time on the creative rather than the mundane.
The toolkit is implemented as a wrapper on top of the open-source Sockeye NMT software using the Asynchronous Successive Halving Algorithm (ASHA).
- Graph-based Hyperparameter Optimization [code]
This is an extension of graph-based semi-supervised regression for hyperparameter optimization.
Talks
-
Tutorial
AutoML for Natural Language Processing
Kevin Duh, Xuan Zhang
EACL2023
[website]
[slides]
[recording]
AutoML for Neural Machine Translation
Kevin Duh, Xuan Zhang
AMTA2022
[slides]
Practical Tips on BERT Applications
Large Language Model Bootcamp, JHU, 2022 [slides]
Knowledge Base - Based Language Model Pre-training
CLSP Seminar, JHU, 2020 [slides]
Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems
Microsoft Research, 2020 [slides]
Hyperparameter Optimization of Neural Machine Translation Systems
CLSP Seminar, JHU, 2020 [slides]
Train Better Models Faster -- Curriculum Learning and Intelligent Hyperparameter Search for Neural Machine Translation
CLSP Seminar, JHU, 2018 [slides]
Teaching
Machine Learning, CS 475/675, Mark Dredze and Anqi Liu, JHU, Fall 2022
Teaching Assistant
Handwritten Recitation Slides:
Probability and Linear Algebra
Bias vs. Variance and Linear Regression
Support Vector Machines and Kernels
Expectation Maximization and Graphical Models