Spring 2009
Distinguished Lecturer
February 3, 2009
Miniaturization and Moore’s law enabled us to combine sensing, computation and wireless communication in integrated, low-power devices, and to embed networks of these devices in the physical world. By placing sensing devices up close to the physical phenomena we have been able to study details in space and time that were previously unobservable. Looking back over the past few years have made significant progress toward the vision of programmable, distributed sensing systems by using: judicious application of server-side and in situ processing, human in the loop as well as automated systems, mobility at multiple scales, and data and models as context for in situ measurements,. We are now applying these lessons learned and technical approaches to human as well as natural systems, in particular by exploring use of the installed base of image, acoustic, location, activity sensors that we all carry around in our pockets or on our belts—mobile phones. This talk will present work in progress and lessons learned from several prototype applications and system being explored at The NSF Science and Technology Center for Embedded Networked Sensing (CENS).
Speaker Biography: Deborah Estrin is a Professor of Computer Science and Electrical Engineering at UCLA, holds the Jon Postel Chair in Computer Networks, and is Founding Director of the National Science Foundation funded Center for Embedded Networked Sensing (CENS). CENS’ mission is to explore and develop innovative, end-to-end, distributed sensing systems, across an array of applications, from ecosystems to human systems. Since the late 90’s Estrin’s work has focused on multi-disciplinary, experimental-systems research as applied to a variety of environmental monitoring challenges. Most recently this work includes mobile personal sensing systems, leveraging the location, acoustic, image, and attached-sensor data streams increasingly available globally from mobile phones. Previously, Estrin’s research addressed Internet protocol design and scaling, in particular, inter-domain and multicast routing. She received her PhD in 1985 from MIT and her BS in 1980 from UC Berkeley, both in EECS. Estrin currently serves on the National Research Council’s Computer Science and Telecommunications Board (CSTB) and TTI/Vanguard Advisory Board, and was previously a member of the NSF National Ecological Observatory Network (NEON) Advisory board, the NSF CISE Advisory Committee, and DARPA-ISAT. Estrin was selected as the first ACM-W Athena Lecturer in 2006, was awarded the Anita Borg Institute’s Women of Vision Award for Innovation in 2007, and was inducted as a member of the American Academy of Arts and Sciences in 2007. She is a fellow of the IEEE, ACM, and AAAS and was granted Doctor Honoris Causa from EPFL in 2008.
February 17, 2009
Current electronic voting systems have experienced many high-profile software, hardware, and usability failures in real elections. Such failures cast doubt on the accuracy of the tally and threaten to undermine public trust in the electoral system. Recent research has revealed systemic flaws in commercial voting systems that can cause malfunctions, lose votes, and possibly even allow outsiders to influence the outcome of a national election. While some consequently argue for total abandonment of electronic voting, VoteBox shows how careful application of security, distributed systems, and cryptographic principles can yield voting systems that exceed current systems as well as their analog forebears in both trustworthiness and usability.
VoteBox machines keep secure logs of essential election events, allowing credible audits during or after the election; they are connected using the Auditorium, a peer-to-peer network that replicates and intertwines secure logs to survive failure, attack, and poll worker error.
While the election is ongoing, any voter may choose to challenge a VoteBox to immediately produce cryptographic proof that it will correctly and faithfully cast ballots.
The work demonstrates how these and other approaches from the e-voting research community can be composed in a single system to increase assurance. VoteBox is a model for new implementations, but its techniques can be practically applied to current commercial electronic voting system designs as well.
Speaker Biography: Daniel Sandler is a Ph.D. student in the Department of Computer Science at Rice University, currently completing his dissertation on VoteBox. His research concerns the design and implementation of resilient secure systems, including electronic voting, peer-to-peer and distributed systems, and Web applications. He received his M.S. (2007) and B.S. (1999) in computer science from Rice University; from 1999-2004 he worked in industry, including work on systems and application software for Be, Inc. and Palm, Inc.
February 24, 2009
I will describe two areas in computer security that demonstrate the wide range of techniques, from both theory and practice, we need to make impact. First, I treat privacy and security in Radio Frequency Identification (RFID). RFID refers to a range of technologies where a small device with an antenna or “tag” is attached to an item and can be queried later wirelessly by a reader. While proponents of RFID promise security and efficiency benefits, the technology also raises serious security concerns. I will describe my work on practical security analysis of RFID in library books and the United States e-passport deployments. These deployments in turn uncover a new theoretical problem, that of “scalable private authentication.” I will describe the first solution to this problem that scales sub-linearly in the number of RFID tags.
Second, I describe recent work in “whitebox fuzz testing,” a new approach to finding security bugs. Security bugs cost millions of dollars to patch after the fact, so we want to find and fix them as early in the deployment cycle as possible. I review previous fuzz testing work, how fuzzing has been responsible for serious security bugs, and classic fuzz testing’s inability to deal with “unlikely”code paths. I then show how marrying the idea of dynamic test generation with fuzz testing overcomes these shortcomings, but raises significant scaling problems. Two recent tools, SAGE at Microsoft Research, and SmartFuzz at Berkeley, overcome these scaling problems; I present results on the effectiveness of these tools on commodity Windows and Linux media playing software. Finally, I close with directions for leveraging cloud computing to improve developers’ testing and debugging experience.
The talk describes joint work with Ari Juels and David Wagner (RFID), and with Patrice Godefroid, Michael Y. Levin, and Xue Cong Li and David Wagner (Whitebox Fuzzing).
Speaker Biography: David Molnar is a PhD candidate at the University of California, Berkeley, degree expected Spring 2009. His work centers on privacy, cryptography, and computer security, advised by David Wagner. Most recently, he has been interested in RFID privacy, and in applying constraint solvers to finding software bugs at scale (see http://www.metafuzz.com). He is a previous National Science Foundation Graduate Fellow and Intel Open Collaboration Research Graduate Fellow.
March 3, 2009
Analysis of human genetic variation has gained significant momentum due to the success of many large-scale sequencing projects. Within the last five years, millions of single nucleotide polymorphisms (SNPs) have been genotyped over thousands of individuals belonging to several different ethnic groups. These large-scale efforts have now made it possible to analyze genetic variation within humans at very fine-scales.
In this talk, we develop maximum parsimony phylogenetic (evolutionary) tree reconstruction algorithms that are specifically catered to work on SNP data. Such a phylogeny should cluster closely related individuals (a sub-population) together. Therefore, these techniques are widely used to answer questions of human migration and genome evolution. Mathematically, we work with two variants of the phylogeny reconstruction problem, both of which are NP-complete. The first variant is equivalent to finding a Steiner minimum tree on a hypercube and the second is a generalization of this problem. Traditionally such problems are solved using hill-climbing heuristics. We show that the problems can be solved in polynomial time when the size of the phylogeny (Steiner minimum tree) is ‘small’ with respect to the dimensions of the hypercube (‘near-perfect’), a standard-assumption of SNPs.
We will also briefly outline related work on clustering sub-populations by solving a max-cut instance on graphs, another well-known NP-complete problem. Again, we show that under practical assumptions of SNP data the max-cut can be found in polynomial time and that it can be proved to be the correct clusters.
We provide extensive empirical analysis to show that our methods work well in practice.
Speaker Biography: Srinath Sridhar graduated completed his BS in Computer Science from University of Texas at Austin with highest honors in 2003. He received his PhD in Computer Science from Carnegie Mellon University in 2007. His research interests include applied algorithms broadly with specific focus on computational genomics.
Student Seminar
March 6, 2009
The scientific method is founded on the principles of validation and comparison of experimental results. For network and information security research, this means that researchers must have access to freely available network data to validate and improve on published results. Releasing network data, however, can be problematic for the data publisher because it may contain potentially sensitive information about the network and its users. Recently, network data anonymization methods have been proposed in an effort to simultaneously provide generally useful network data to researchers while removing sensitive information. Unfortunately, these state-of-the-art anonymization systems rely on intuition and expert knowledge, rather than rigorous privacy guarantees, to prevent the leakage of sensitive information. As a result, unforeseen and dangerous channels of information leakage may exist despite the use of these anonymization techniques. In this talk, we focus on developing more rigorous foundations for the analysis of anonymization methods and the privacy they provide to data publishers. To do so, we begin by highlighting several areas of information leakage within anonymized network data discovered through the application of novel data mining and machine learning techniques. Specifically, we are able to reveal the security posture of the networks, identify hosts and the services they offer, and even user behaviors (e.g., web browsing activity). From these attacks, we derive an adversarial model and analytic framework that captures the risk involved with these and other inference attacks on anonymized network data. This analysis framework provides the data publisher with a method for quantifying the privacy risks of anonymized network data. Furthermore, we discuss the parallels between our proposed security notions and those of microdata anonymization in an effort to bridge the gap between the two fields.
Speaker Biography: Scott Coull graduated from Rensselaer Polytechnic Institute with a B.Sc. in Computer Science in December of 2003, and a M.Sc. in Computer Science in May of 2005. He is the recipient of several awards, including the Best Student Paper Award at the Annual Computer Security Applications Conference, and the Stanley I. Landgraf Prize for Outstanding Academic Achievement. His research focuses on computer security and applied cryptography, with a particular emphasis on the use of machine learning techniques in identifying and mitigating threats to privacy and security.
March 10, 2009
Browsers are rapidly improving as a platform for compelling, interactive applications. Unfortunately, the Web security model is still not fully understood. Despite the impressive performance gains of browser vendors, the Web cannot succeed without a secure foundation.
This talk will cover my recent efforts to observe, analyze, and improve browser security. Existing security policies were designed in an era where Web users only interacted with one principal at a time, but modern browsers often have many tabs open simultaneously, and these tabs often contain third-party content from multiple sources. By articulating threat models that capture these multi-principal interactions, my research has revealed attacks on a variety of browser features, such as frame navigation, cross-document communication, and HTTPS. I’ll discuss how I worked with browser and plug-in vendors to address these attacks and deploy industry-wide solutions.
Speaker Biography: Collin Jackson is a computer science Ph.D. candidate at Stanford University, specializing in browser and web application security. While at Stanford, Collin worked with Google on the security of the Chrome browser. He has also consulted for Yahoo!, Microsoft, the U.S. Department of Homeland Security, Silicon Valley start-up Cooliris, and the Center for Democracy and Technology. Collin holds a Bachelor of Science degree from Yale University.
March 24, 2009
Humans differ in many “phenotypes” such as weight, hair color and more importantly disease susceptibility. These phenotypes are largely determined by each individual’s specific genotype, stored in the 3.2 billion bases of his or her genome sequence. Deciphering the sequence by finding which sequence variations cause a certain phenotype would have a great impact. The recent advent of high-throughput genotyping methods has enabled retrieval of an individual’s sequence information on a genome-wide scale. Classical approaches have focused on identifying which sequence variations are associated with a particular phenotype. However, the complexity of cellular mechanisms, through which sequence variations cause a particular phenotype, makes it difficult to directly infer such causal relationships. In this talk, I will present machine learning approaches that address these challenges by explicitly modeling the cellular mechanisms induced by sequence variations. Our approach takes as input genome-wide expression measurements and aims to generate a finer-grained hypothesis such as “sequence variations S induces cellular processes M, which lead to changes in the phenotype P”. Furthermore, we have developed the “meta-prior algorithm” which can learn the regulatory potential of each sequence variation based on their intrinsic characteristics. This improvement helps to identify a true causal sequence variation among a number of polymorphisms in the same chromosomal region. Our approaches have led to novel insights on sequence variations, and some of the hypotheses have been validated through biological experiments. Many of the machine learning techniques are generally applicable to a wide-ranging set of applications, and as an example I will present the meta-prior algorithm in the context of movie rating prediction tasks using the Netflix data set.
Speaker Biography: Su-In Lee is a Visiting Assistant Professor in the Machine Learning Department and Lane Center for Computational Biology at Carnegie Mellon University. She recently graduated from Stanford University (thesis advisor: Prof Daphne Koller), where she was a member of the Stanford Artificial Intelligence Laboratory. Her research focuses on devising computational methodologies for understanding the genetic basis of complex traits. She is also interested in developing general machine learning algorithms for broader applications. Su-In graduated Summa Cum Laude with a B.Sc. in Electrical Engineering and Computer Science from Korea Advanced Institute of Science and Technology and was a recipient of the Stanford Graduate Fellowship.
April 9, 2009
Over the past several years, Bayesian filtering techniques have become mainstream tools in robotics research that handles uncertainty. Variations include the classical Kalman filter and recent particle filters, all of which are routinely used for robot localization, navigation, and map building. In this talk, I will introduce a new class of filters, called combinatorial filters, that are distinctive in several ways: 1) they simplify modeling burdens by avoiding probabilities, 2) they are designed for processing information from the weakest sensors possible, and 3) they avoid unnecessary state estimation. In many ways, they are the direct analog to Bayesian filters, but handle substantial amounts of uncertainty by refusing to model it. The emphasis is on detecting and maintaining tiny pieces of information that are critical to solving robotic tasks, such as navigation, map building, target tracking, and pursuit-evasion.
Speaker Biography: Steve LaValle is Professor of Computer Science in the Department of Computer Science at the University of Illinois. He received his Ph.D. in Electrical Engineering from the University of Illinois in 1995. From 1995-1997 he was a postdoctoral researcher and lecturer in the Department of Computer Science at Stanford University. From 1997-2001 he was an Assistant Professor in the Department of Computer Science at Iowa State University. His research interests include robotics, sensing, cyber-physical systems, planning algorithms, computational geometry, and control theory. He authored the book Planning Algorithms, Cambridge University Press, 2006 (which is available on line at http://planning.cs.uiuc.edu/).
April 16, 2009
The Internet ranks among the greatest achievements of 20th century Computer Science. The basic technology was so well conceived that it has remained virtually unchanged despite completely new applications and dramatic growth in the number of connected computers and traffic. This eclectic talk presents a series of lessons drawn from the Internet experience that may help us better understand how to proceed with new research. It considers the design of protocols, general principles, technologies, the underlying architecture, the effect of economics on networking research, and ways that experimental research projects can be organized to ensure success.
Speaker Biography: Douglas Comer is VP of Research Collaboration at Cisco systems, and Distinguished Professor of Computer Science at Purdue University, where he is currently on an extended leave. An internationally recognized expert on computer networking, Comer has been involved in Internet research since the late 1970s. His series of ground-breaking textbooks have been translated into 16 languages, and are used by professional engineers and students around the world. For twenty years, Comer was editor-in-chief of the journal Software — Practice And Experience. He is a Fellow of the ACM.
April 23, 2009
I will present recent work in modeling the visual appearance of materials. These models are a critical component of computer graphics systems which aim to synthesize realistic images of virtual scenes and computer vision systems which aim to infer properties of a 3D scene from natural images. In particular, I will focus on “data-driven” strategies for modeling complex spatially-varying opaque and translucent materials such as brushed metal, plastic, cloth, wood, candle wax and human skin. This includes our Inverse Shade Trees framework that describes how to decompose these measured high-dimensional functions into compact and understandable pieces that are useful within a production setting. I will also discuss a new design interface that allows a user to specify sparse edits that are intelligently propagated across a large collection of measurements and a new empirical model of translucent materials based on a large simulation. I will conclude with a discussion of interesting open problems in this area.
Speaker Biography: Jason Lawrence received his Ph.D. from Princeton and is currently an assistant professor in the Computer Science Department at the University of Virginia. His research focuses on efficient representations and acquisition strategies for material appearance and 3D geometry, real-time rendering algorithms, and global illumination rendering algorithms. He was the recipient of a NSF CAREER award titled, “The Inverse Shade Tree Framework for Material Acquisition, Analysis, and Design.”