With the invention of DNA sequencing came an explosion of sequence data, and a lot of work for computer scientists. With the advent of the human genome project came another explosion of sequence data, and a lot more work for computer scientists. In just the last five years, we have seen another huge breakthrough: the invention of second-generation DNA sequencing. DNA sequencers now can generate hundreds of billions of nucleotides of data, enough to cover the human genome hundreds of times over, in about a week for a few thousand dollars. Consequently, sequencing has become a very common tool in the study of biology, genetics, and disease.
But with these developments comes a problem: growth in per-sequencer throughput is drastically outpacing growth in computer speed. As the gap widens over time, the crucial research bottlenecks are increasingly computational: computing, storage, labor, power. Life science simply cannot advance without help from computer science.
I will survey some current computational problems in genomics, and discuss a few solutions. My goal is to provide background, convey what kinds of computational problems are important in the field currently, and to highlight problems that cut across computational disciplines.
Speaker Biography
Ben Langmead recently joined Johns Hopkins as an Assistant Professor in the department of Computer Science. He received a Ph.D. in Computer Science from the University of Maryland working with Prof. Steven L Salzberg. Prof. Langmead uses approaches from computer science – algorithms, text indexing, and high performance computing, especially cloud computing – to create high-impact software tools for life science researchers. His goal is to make all types of high-throughput biological data, especially sequencing data, easy to analyze and interpret. Software tools written by Prof. Langmead, including Bowtie and Crossbow, are widely used in the genomics community. His work on Bowtie won the Genome BIology award for outstanding publication in the year 2009.