- [ ] language model is built incorrectly when starting at MERT with
  a parsed corpus (maybe SAMT should expect a plain corpus and a .parsed one)
- [ ] add recasing with recursive call to pipeline.pl (provide a 1-1
  alignment) 
- [ ] pipeline shold output a script that can be easily -
  used to decode another test set
- [ ] add tree output for test sets
- [ ] run MERT multiple times
- [X] hadoop cluster roll-out
- [X] rm -r hadoop directory after retrieving grammar successfully
- [ ] change qsub arg defaults when doing SAMT
- [ ] don't put number in train files if maxlen == 0
- [ ] should be easier to stop and start runs (locations of canonical files)
- [ ] add in kenlm binarization of the language model
- [ ] better tokenization (url aware, e.g.,)
