Hardware trends over the last several decades have lead to shifting priorities with respect to performance bottlenecks in the implementations of dataflows typically present in large-scale data analytics applications. In particular, efficient use of main memory has emerged as a critical aspect of dataflow implementation, due to the proliferation of multi-core architectures, as well as the rapid development of faster-than-disk storage media. At the same time, the wealth of static domain-specific information about applications remains an untapped resource when it comes to optimizing the use of memory in a dataflow application.
We propose a compilation-based approach to the synthesis of memory-efficient dataflow implementations, using static analysis to extract and leverage domain-specific information about the application. Our program transformations use the combined results of type, effect, and provenance analyses to infer time- and space- effective placement of primitive memory operations, precluding the need for dynamic memory management and its attendant costs. The experimental evaluation of implementations synthesized with our framework shows both the importance of optimizing for memory performance, as well as significant benefits of our approach, along multiple dimensions.
Finally, we also demonstrate a framework for formally verifying the soundness of these transformations, laying the foundation for their use as a component of a more general implementation synthesis ecosystem.
Speaker Biography
Shyam is a PhD candidate in Computer Science at JHU, and a member of the Data Management and Systems Lab (DaMSL) under the supervision of Dr. Yanif Ahmad. Over the course of his PhD, Shyam’s work and interests have spanned the gamut from the most arcane of theory, to the depths of systems performance engineering.
After wrapping up at Hopkins, Shyam will be joining Galois — a private research consultancy — as a member of the technical staff.