When: Apr 03 2025 @ 12:00 PM
Where: 228 Malone Hall
Categories:
Computer Science Seminar Series.

Lunch will be available starting at noon. The seminar will begin at 12:15 p.m.

Abstract

What are the limitations of our current AI alignment techniques? And what risks might emerge as we hand off decision-making to AI agents based on deep learning systems such as large language models? In this talk, David Krueger will give an overview of his existing work on AI safety, which includes studying how deep learning systems generalize, and when and why alignment techniques such as reward modeling can fail. He will then describe his ongoing work on applying these findings to LLMs and LLM agents and identifying novel alignment failure modes. Finally, Krueger will explain how his work can support AI policy tools such as frontier model safety evaluations by providing a more detailed understanding of deep learning models, and will touch on some of his projects surrounding the governance of frontier AI systems.

Speaker Biography

David Krueger is an assistant professor in robust, reasoning, and responsible AI at the University of Montréal and Mila. He works on understanding and addressing the risks of advanced AI systems, especially AI agents. His current work is focused on LLM safety, and in particular on characterizing the risks of LLM systems—such as LLM agents—and informing AI safety policy. Krueger’s past research spans many areas of deep learning, AI alignment, AI safety, AI ethics, and AI governance, including alignment failure modes, algorithmic manipulation, interpretability, robustness, and understanding how AI systems learn and generalize. He was previously an assistant professor at the University of Cambridge and a research director at the UK AI Security Institute.

Zoom link >>