Spring 2014
February 11, 2014
Computational biology is emerging as one of the exemplar data sciences, with abundant data, complex interactions, and the need for scalable algorithms and statistics. During my presentation, I will describe my research on two major problems: The first is de novo genome assembly, in which the genome of an organism must be computationally reconstructed from millions or billions of short DNA sequences. An emerging assembly strategy is to use PacBio single molecule sequencing to overcome the limitations seen with Illumina and other older technologies. We and others have developed new assembly algorithms to utilize the long reads (currently averaging over 8,500bp) to achieve near-perfect assemblies of many microbes and small eukaryotes, and greatly improved assemblies of several significant plant and animal species. Even though the raw sequence data have high error rates (>10%) and a non-uniform error model, the accuracy of the assembled sequences approaches 100%. I’ll summarize the field with a support vector regression based model that can predict the outcome for a genome assembly project today and into the future as the read lengths and available coverage improves. The second major problem I’ll discuss is disease analytics, and how we can identify disease-relevant genetic mutations in a population of healthy and affected individuals. Namely, I will describe my lab’s work examining the genetic components of autism spectrum disorders (ASD) using our new variation detection algorithm Scalpel. Scalpel uses a hybrid approach of read mapping and de novo assembly to accurately discover insertion/deletion (indel) mutations up to 100bp long. In a battery of >10,000 simulated and >1,000 experimentally validated indel mutations, Scalpel is significantly more accurate than the other leading algorithms GATK and SOAPindel. Using Scalpel, we have analyzed the exomes of >800 families (>3200 individuals) in which one child in each family is affected with ASD, and see a strong enrichment of “gene killing” de novo mutations associated with the disorder. Finally, I’ll conclude with a brief description of our work using single cell sequencing to study genetic heterogeneity in cancer.
Speaker Biography: Michael Schatz is an assistant professor in the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory. His research interests include developing large-scale sequence analysis methods for sequence alignment, de novo assembly, variation detection, and related analysis. Schatz received his Ph.D. in Computer Science from the University of Maryland in 2010, and his B.S. in Computer Science from Carnegie Mellon University in 2000, with 4 years at the Institute for Genomic Research in between. For more information see: http://schatzlab.cshl.edu.
February 18, 2014
As cloud computing becomes increasingly popular, organizations face greater security threats. Public clouds have become a central point of attack and successful compromises can cause potentially billions of dollars of damage. Physical attacks on data center machines are very concerning because an attacker can gain full control of the machines and circumvent software protections.
We present an efficient processor architecture that allows us to build a more secure cloud that is resistant against physical attacks. We are able to achieve full security against malicious adversaries by only trusting and securing the CPU of a machine. We can leverage commodity components such as DRAM, hard drives, and network interfaces without requiring that they be secured against physical attacks. We achieve this by designing a novel Oblivious RAM algorithm ideal for hardware and building a memory controller that hides access patterns to DRAM and storage. The memory controller is integrated into the CPU and makes data dependent computation indistinguishable to an adversary.
Speaker Biography: Emil Stefanov is a 5th year graduate student at UC Berkeley working with Professor Dawn Song. His research interests include systems security, privacy, and applied cryptography focusing on secure cloud computing and privacy-preserving storage outsourcing. Some of his recent research topics include oblivious ram, secure processor architecture, searchable encryption, integrity verified file systems, dynamic proofs of retrievability, and private set intersection. Before joining UC Berkeley, Emil got his B.S. degree in Computer Science from Purdue University in 2009, and is expected to defend his Ph.D. in the summer of this year.
Emil was awarded an NSF graduate fellowship in 2009 and an NDSEG graduate fellowship in 2011. He is a coauthor of 15 conference proceeding papers and 5 journal papers, and has won a best paper award, an AT&T Best Applied Security Paper Award in 2012, and an AT&T best applied security paper finalist award in 2013. Besides his academic experience, Emil has also worked for a short time for NVIDIA, Microsoft, RSA labs, and Motorola as a summer intern.
February 27, 2014
There is often interest in predicting an individual’s latent health status based on high-dimensional biomarkers that vary over time. Motivated by time-course gene expression array data that we have collected in two influenza challenge studies, we develop a novel time-aligned Bayesian dynamic factor analysis methodology. The time course trajectories in the gene expressions are related to a relatively low-dimensional vector of latent factors, which vary dynamically starting at the latent initiation time of infection.
Speaker Biography: Minhua Chen received his Ph.D. degree from Duke University in May 2012 with Profs. Lawrence Carin and David Dunson working on Bayesian and Information-Theoretic Learning of High Dimensional Data. Currently he is working on statistical machine learning problems at University of Chicago in collaboration with Prof. John Lafferty. His research interests broadly span machine learning, signal processing and bioinformatics.
March 11, 2014
Over the last decade, my research group has extensively studied Internet threats: the nature of botnets, worms, denial-of-service, and a wide range of malware. While there’s much to learn from this attack-centric viewpoint, in recent work we have focused on understanding the threat landscape through the role of networks themselves.
In this talk, I will highlight a recent series of work in which we measure and model how networks, suitably defined, influence and are influenced by malicious behavior on the Internet. Through analysis of Internet-scale observations, we show that networks’ observed maliciousness is strongly correlated with spatial distance, that aggregate maliciousness evolves over time in predictable ways, and that the dynamics of this evolution can serve as a proxy for understanding the overall security hygene and responsiveness of each network. Taken together, these observations lead to models of network maliciousness that can inform strategies for improving the overall health of the Internet. Applying these models, we explore a policy that seeks to quarantine the carriers of harmful traffic, and analyze the tradeoffs between improvements in security, stability, and performance versus losses in important core Internet properties that would ensue from actively disconnecting the most egregiously malicious networks.
Speaker Biography: Michael Bailey is Research Associate Professor and Co-Director of the Network and Security Research Group at the University of Michigan. His research is focused on the security and availability of complex distributed systems. Prior to his appointment at the University of Michigan, he was the Director of Engineering Arbor Networks, a Lecturer at DePaul University, and a Programmer/Analyst at Amoco corporation. He was awarded the College of Engineering Kenneth M. Reese Outstanding Research Scientist Award in 2011, the University of Michigan Research Faculty Recognition Award in 2012, and was elevated to senior member of IEEE in 2009 and senior member of ACM in 2013. Michael received his PhD in Computer Science and Engineering from the University of Michigan in 2006.
March 13, 2014
We have been investigating compiler-generated software diversity as a defense mechanism against cyber attacks. This approach is in many ways similar to biodiversity in nature.
Imagine an “App Store” containing a diversification engine (a “multicompiler”) that automatically generates a unique version of every program for every user. All the different versions of the same program behave in exactly the same way from the perspective of the end-user, but they implement their functionality in subtly different ways. As a result, any specific attack will succeed only on a small fraction of targets and a large number of different attack vectors would be needed to take over a significant percentage of them.
Because an attacker has no way of knowing a priori which specific attack will succeed on which specific target, this method also very significantly increases the cost of attacks directed at specific targets.
We have built such a multicompiler which is now available as a prototype. We can diversify large software distributions such as the Firefox and Chromium web browsers or a complete Linux distribution. I will present some preliminary benchmarks and will also address some practical issues such as the problem of reporting errors when every binary is unique, and updating of diversified software.
Speaker Biography: Prof. Michael Franz was an early pioneer in the areas of mobile code and dynamic compilation. He created an early just-in-time compilation system, contributed to the theory and practice of continuous compilation and optimization, and co-invented the trace compilation technology that eventually became the JavaScript engine in Mozilla’s Firefox browser. His current research emphases lie in the areas of Systems Software, particularly focusing on compilers and virtual machines, in Trustworthy Computing, with a focus on biologically-inspired defenses such as automated code diversity and on information-flow, and in Software Engineering, with an emphasis on software architecture for secure systems and on reducing the trusted code base. Dr. Franz is the Principal Investigator on many competitive grants from the federal government, totaling well over $11M (of which more than $7M as sole PI), and has received well over half a million dollars in unrestricted gifts from industry in appreciation of the research innovations he has contributed. Franz received a Dr. sc. techn. degree in Computer Science (advisor: Niklaus Wirth) and a Dipl. Informatik-Ing. ETH degree, both from the Swiss Federal Institute of Technology, ETH Zurich. He is a Distinguished Scientist of the Association for Computing Machinery (ACM) and a Senior Member of The Institute of Electrical and Electronics Engineers (IEEE).
March 21, 2014
Internet measurement and experimentation platforms, such as PlanetLab – a research network with over 570 sites – have become essential for the study and evaluation of distributed systems and networks. Despite their many benefits and strengths, an increasingly recognized problem with current platforms is their inability to capture the geographic and network diversity of the wider, commercial Internet. Lack of diversity and poor visibility into the network hamper progress in a number of important research areas, from network troubleshooting to broadband characterization and Internet mapping, and complicate our attempts to generalize from test-bed evaluations of networked systems. In this talk, I will present Dasu, a measurement experimentation platform for the Internet’s edge. Dasu explicitly aligns the objectives of researchers with those of the users hosting the platform, supporting both flexible experimentation and broadband service characterization. Dasu has been publicly available since mid-2010 and has been adopted by over 96,000 users across 162 countries. I will discuss some of the challenges we faced building a platform for experimentation in the larger Internet. Our work on Dasu, and its first instantiation, was an offspring of two previous Internet-scale systems we deployed – Ono and NEWS. Dasu has, in turn, proven to be richly generative. I will illustrate the value of Dasu’s unique perspective and generative power presenting two concrete projects on content distribution and broadband network analysis.
Speaker Biography: Fabián E. Bustamante is an associate professor of computer science in the EECS department at Northwestern University. He joined Northwestern in 2002, after receiving his Ph.D. and M.S. in Computer Science from the Georgia Institute of Technology. His research focuses on the measurement, analysis and design of Internet-scale distributed systems and their supporting infrastructure. Fabián is a recipient of the US National Science Foundation CAREER award and the E.T.S. Watson Fellowship Award from the Science Foundation of Ireland, and a senior member of both the ACM and the IEEE. He currently serves in the editorial boards of IEEE Internet Computing and the ACM SIGCOMM CCR, the Steering Committee for IEEE P2P (as chair), and the External Advisory Board for the mPlane initiative. Fabián is also the general co-chair for the ACM SIGCOMM 2014 to be held in Chicago. For more detailed information and a list of publications, please visit: http://www.aqualab.cs.northwestern.edu.
March 25, 2014
Modern technologies are increasingly capable, interconnected, and used in diverse aspects of our lives. Securing these devices is critical: attackers can leverage their properties to perform attacks with greater ease and at a larger scale, and attacks can result in novel or amplified harms to users and bystanders. It is necessary to approach securing these devices from a human-centric perspective in order to design application-specific security solutions that maximally protect the relevant human assets via defenses of appropriately calibrated costs. Human-centric investigations are often necessary to understand the nuances of a specific usage domain: the diverse human assets affected, the various costs that might be incurred by security system designs, and how humans weigh their respective values. I ground the importance of this approach and example methodologies for such investigations with studies in two domains: implantable medical devices and augmented reality. I conclude my talk with a call for the development of more toolkits to bootstrap the security process, and present one such toolkit: the Security Cards, a physical deck of brainstorming cards that I developed to help computer science students, technologists, and researchers explore the threats that might be posed to and by a technology system.
Speaker Biography: Tamara Denning is a senior PhD student at the University of Washington working with Tadayoshi Kohno in the Security and Privacy Research Lab. She received her B.S. in Computer Science from the University of California, San Diego in 2007. Tamara’s interests are in the human aspects of computer security and privacy, with a focus on emerging technologies. Past areas of work include security for implantable medical devices, the security of consumer technologies in the home, security and privacy issues surrounding augmented reality, and security toolkits for awareness and education. Tamara’s work is published in both HCI and computer security venues, and has been covered by new outlets such as CNN, MSNBC, NY Times, and Wired.
March 27, 2014
Control systems used in manufacturing, transportation, and energy delivery connect embedded controllers to IT networks. Recently, such systems have been gaining increasing attention from attackers, e.g., the well known Stuxnet attack. The majority of efforts both in attacking and defending control systems have focussed solely on the IT perimeter. We argue that this is insufficient. We first show that compromise of the IT perimeter does not necessarily allow an adversary to execute a Stuxnet-like targeted attack. In response, we introduce our tool SABOT, which incrementally model checks embedded controller code against an adversary-supplied specification. The result is an automatically generated attack program for the victim control system. Our results show that SABOT can instantiate malicious payloads in 4 out of 5 systems tested, even when the adversary does not know the full system behavior.
As a response to SABOT-style attacks, we then present a Trusted Safety Verifier (TSV). TSV uses a combination of symbolic execution and model checking to ensure that all controller code satisfies engineer-provided safety specifications. We show that TSV can verify the safety of controller code from a representative set of control systems in under two minutes, a small overhead in the control system lifecycle.
Speaker Biography: Stephen McLaughlin recently defended his thesis in Computer Science and Engineering at Penn State. His past work has identified vulnerabilities in electronic voting machines and smart electric meters. His current work on specification-based control system security has been presented at CCS 2012, ACSAC 2013, and NDSS 2014. He is a two-year recipient of the Diefenderfer graduate fellowship in Penn State’s College of Engineering.
April 1, 2014
Today, end users generate large volumes of private data, some of which may be stored on the cloud in an encrypted form. The need to perform computation on this data to extract meaningful information has become ubiquitous.
The following fundamental questions arise in this setting: Can the cloud compute on the encrypted data of multiple users without knowing their secret keys? What functions can be computed in this manner? What if the users are mutually distrustful?
My research provides the first positive resolution of these questions. In this talk I will describe these new results and my other interests.
Speaker Biography: Abhishek Jain is currently a postdoctoral researcher in the Cryptography and Information Security Group at MIT CSAIL and Boston University. He received his PhD in Computer Science from UCLA in 2012 where he was the recipient of the Symantec Outstanding Graduate Student Research award. Abhishek’s research interests are in cryptography and security, and related areas of theoretical computer science.
April 3, 2014
Users create, store and access a lot of personal data, both on their devices and in the cloud. Although this provides tremendous benefits, it also creates risks to security and privacy, ranging from the inconvenient (private photos posted around the office) to the serious (loss of a job; withdrawal of college admission). Simply refusing to share personal data is not feasible or desirable, but sharing indiscriminately is equally problematic. Instead, users should be able to efficiently accomplish their primary goals without unnecessarily compromising their privacy. In this talk, I describe my work toward developing usable access-control mechanisms for personal data. I review the results of three user studies that provided insight into users’ policy needs and preferences. I then discuss the design and implementation of Penumbra, a distributed file system with built-in access control designed to support those needs. Penumbra has two key building blocks: semantic-tag-based policy specification and logic-based policy enforcement. Our results show that Penumbra can enforce users’ preferred policies securely with low overhead.
Speaker Biography: Michelle Mazurek is a Ph.D. candidate in Electrical and Computer Engineering at Carnegie Mellon University, co-advised by Lujo Bauer and Greg Ganger. Her research interests span security, systems, and HCI, with particular emphasis on designing systems from the ground up for usable security. She has worked on projects related to usable access control, distributed systems, and passwords.
2014 Carolyn and Edward Wenk Jr. Lecture in Technology and Public Policy
April 10, 2014
We tend to think of computer science as operating systems, programming languages, networking, data management, computer architecture, and algorithms. But, it is much more! It is also efficient transportation, energy independence, health and wellness, personalized education and life-long learning, national security, and 21st century scientific discovery. The growing role of computing has broad implications for society at large: “computational thinking” at a minimum – probably more – will be an essential capability for every citizen, going forward.
The dramatically expanding role of the field is creating exponentially expanding opportunities and pressures. Computer scientists, academic leaders, and policymakers in fields ranging from STEM education to research priorities to privacy, security, and human rights need to recognize and embrace this expanding role.
In this talk, I’ll begin by taking a look at past progress in the field. I’ll then explore the coming decade, during which we’ll “put the smarts into everything.” I’ll discuss some of the implications of this for the way we view the field (both from within and from without), for the education of the next generation, and for institutional and national policies.
Speaker Biography: Ed Lazowska holds the Bill & Melinda Gates Chair in Computer Science & Engineering at the University of Washington, where he also serves as the Founding Director of the University of Washington eScience Institute.
Lazowska received his A.B. from Brown University in 1972 and his Ph.D. from the University of Toronto in 1977, when he joined the University of Washington faculty. His research and teaching concern the design, implementation, and analysis of high performance computing and communication systems, and, more recently, the techniques and technologies of data-intensive discovery. He is a Member of the National Academy of Engineering and a Fellow of the American Academy of Arts & Sciences, the Association for Computing Machinery, the Institute of Electrical and Electronics Engineers, and the American Association for the Advancement of Science. He has received the Vollum Award for Distinguished Accomplishment in Science and Technology from Reed College, as well as the University of Washington Computer Science & Engineering Undergraduate Teaching Award.
Lazowska has been active in public policy issues, ranging from STEM education to Federal policies concerning research and innovation. He recently co-chaired (with David E. Shaw) the Working Group of the President’s Council of Advisors on Science and Technology charged with reviewing the Federal Networking and Information Technology Research and Development Program, and previously co-chaired (with Marc Benioff) the President’s Information Technology Advisory Committee. From 2007-13 he served as the Founding Chair of the Computing Community Consortium, whose goal is to catalyze the computing research community and enable the pursuit of innovative, high-impact research aligned with pressing national and global challenges. He has served on the Technical Advisory Board for Microsoft Research since its inception, and serves as a technical advisor to a number of high-tech companies and venture firms.
April 22, 2014
Succinct data structures let us work directly on near-optimally compressed data representations without decompressing.
How can derive new functional data structures from these techniques? Applications range from novel encodings for handling “big data” on disk with efficient aggregates and order statistics down to parsing JSON faster and in less memory
Speaker Biography: Edward spent most of his adult life trying to build reusable code in imperative languages before realizing he was building castles in sand. He converted to Haskell in 2006 while searching for better building materials. He now chairs the Haskell core libraries committee, collaborates with hundreds of other developers on over 150 projects on github, builds tools for quants and traders using the purely-functional programming-language Ermine for S&P Capital IQ, and is obsessed with finding better tools so that seven years from now he won’t be stuck solving the same problems with the same tools he was stuck using seven years ago.
Distinguished Lecturer
April 24, 2014
Healthcare for chronic disease is the dominant cost for many healthcare systems, now and for the foreseeable future. The unique capabilities of pervasive technologies have the potential to transform healthcare practices by shifting care from institutional to home settings, by helping individuals engage in their own care, by facilitating problem solving and observational learning, and by creating a network of communication and collaboration channels that extends healthcare delivery to everyday settings. In this talk, I will draw from a number of research projects that combine computing research, human-centered design, and health management theory to create promising approaches for promoting wellness, supporting behavior change and delivering improved health ooutcomes.
Speaker Biography: Elizabeth Mynatt is a Professor of Interactive Computing at Georgia Tech and the Executive Director of Georgia Tech’s Institute for People and Technology. The Institute for People and Technology (IPaT) connects industry, government and nonprofit leaders with Georgia Tech’s world-class researchers and innovations to transform media, health, education and humanitarian systems. IPaT integrates academic and applied research through living laboratories and multidisciplinary projects to deliver real-world, transformative solutions that balance the needs of people with the possibilities of new technologies. Dr. Mynatt is an internationally recognized expert in the areas of ubiquitous computing, personal health informatics, computer-supported collaborative work and human-computer interface design. Named Top Woman Innovator in Technology by Atlanta Woman Magazine in 2005, Dr. Mynatt has created new technologies that support the independence and quality of life of older adults “aging in place,” that help people manage diabetes, and that increase creative collaboration in workplaces. Dr. Mynatt is a member of the ACM SIGCHI Academy, a Sloan and Kavli Research Fellow, and a member of Microsoft Research’s Technical Advisory Board. She is also a member of the Computing Community Consortium, an NSF-sponsored effort to catalyze and empower the U.S. computing research community to pursue audacious, high-impact research. Dr. Mynatt earned her Bachelor of Science summa cum laude in computer science from North Carolina State University and her Master of Science and Ph.D. in computer science from Georgia Tech
May 1, 2014
My research has largely been concerned with developing quantitative methods that seek to characterize variation in large-scale genomic studies. Often the goal of these studies is to identify a particular type of signal, for example, genes with expression levels that are associated with disease status. I will describe some of my group’s major research themes aimed at the problem of identifying relevant signals in genomic data in the presence of complex sources of “noise.” This involves false discovery rates, latent variable modeling, and empirical Bayes methods, which are all active research topics at the interface of statistics and machine learning.
Speaker Biography: John Storey received his PhD in statistics from Stanford University. He has been a faculty member at UC-Berkeley and University of Washington, and he is currently a professor at Princeton University. He is known for developing methods for high-dimensional data, particularly with applications to genomics. Storey is an elected fellow of the American Association for the Advancement of Science (AAAS) and the Institute of Mathematical Statistics (IMS).
2014 ACM Nathan Krasnopoler Lecture
May 2, 2014
This talk will go into the details of how the Linux kernel is developed, the current rate of change, who is doing the work, and how all of this goes against everything you have learned in school about doing software development. It will also introduce some ways that you can get involved in Linux kernel development.
Speaker Biography: Greg Kroah-Hartman is a Linux kernel developer and a Fellow at the Linux Foundation. He is responsible for the stable Linux kernel releases, and is the maintainer of the USB, driver core, tty, serial, staging, and many other driver subsystems in the kernel. He has written two books about Linux kernel development and many papers and magazine articles.
This lecture is sponsored by the Nathan Krasnopoler Memorial Fund, established at the Whiting School of Engineering, to benefit the Johns Hopkins chapter of the Association for Computing Machinery.