This talk is concerned with designing algorithms for large scale data science using massively parallel computation. The talk will discuss theoretical models and algorithms for massively parallel frameworks such as MapReduce and Spark. The constraints of the models are well connected to practice, but pose algorithmic challenges.
This talk introduces recent developments that overcome these challenges, widely applicable massively parallel algorithmic techniques, and key questions on the theoretical foundations of massively parallel computation. The methods introduced will be applied to large data problems that are central to operations research, theoretical computer science and machine learning, clustering and submodular function optimization.
The work in this talk has been supported by Google, Yahoo and the NSF.