The annotated data required to train most modern natural language processing systems is expensive and domain-specific. We aim to build useful models starting with cheaper data – ideally, raw text. This talk is about weighted grammar induction, an unsupervised learning problem. While much previous work on grammar induction has considered the acquisition of grammar rules, we consider simple, over-generating grammars (for coverage) and focus instead on the weights. We address two problems with the standard maximum likelihood (ML) approach: the desired model is under-specified, and search is hard. We present Contrastive Estimation, a novel, elegant and efficient generalization of ML that allows the learner to exploit basic linguistic intuitions encoded as “implicit” negative evidence. We separately tackle the search problem with a new technique, Structural Annealing, that first focuses the learner on easy explanations, then challenges it with gradually harder problems. We present state-of-the-art performance on unsupervised dependency parsing in six diverse languages.