Many pattern recognition applications use statistical models with a large number of parameters, although the amount of available training data is often insufficient for robust parameter estimation. A common technique to reduce the effect of data sparseness is the divide-and-conquer approach, which decomposes a problem into a number of smaller subproblems, each of which can be handled by a more specialized and potentially more robust model. This talk describes how this principle can be applied to a variety of problems in speech and language processing: the general procedure is to adopt a feature-based representation for the objects to be modelled (such as phones or words), learn statistical models describing the features of the object rather than the object itself, and recombine these partial probability estimates. This enables a more efficient use of data, and the sharing of data from heterogeneous sources (such as different languages). I will present both knowledge-inspired and data-driven techniques for designing appropriate feature representations, and unsupervised methods for optimizing the model combination on task-specific criteria. Experimental results will be presented for four different applications: articulatory-based speech recognition, multi-stream automatic language identification, factored statistical language modeling, and statistical machine translation.