We increasingly rely on AI models in our daily lives—from traffic navigation and shopping apps to AI-informed care decisions made by our doctors.
Given their ubiquity and influence, how and why should we trust these decisions? Can we be certain the models’ predictions are free of biases or errors?
To explore these and other timely issues that academic researchers are grappling with—Can and should AI be regulated? How have rapid AI advances changed teaching?—we brought together a panel of AI experts from across the Whiting School of Engineering and tapped AI researcher Rama Chellappa to lead them in a wide-ranging exchange. Highlights from their discussion follow.
THE “BLACK-BOX PROBLEM”
When it comes to artificial intelligence, how does the “black-box problem” impact your work?
RAMA CHELLAPPA: When we go to a doctor’s office and we see diplomas from Harvard or Hopkins, we think the doctor knows what they are doing, and we trust them. But if the doctor is [employing] AI, we want to know where AI learned to diagnose illnesses and recommend therapies. When we have a physician, a patient, and AI working together, that requires complete trust. So, interpretability is a very key component to trusting the AI-based decision.
NATALIA TRAYANOVA: In my center—the Alliance for Cardiovascular Diagnostic and Treatment Innovation—our work is very clinically translational. Pretty much everything that we develop, our goal is to immediately bring it to the patient’s bedside. We work very closely with clinicians. We are fully embedded in the Department of Cardiology, particularly interventional cardiology.
In medical applications of AI, it is really important that we can learn how decisions are made. Black-box algorithms can restrict the clinician-patient relationship. The medical advice the doctor provides depends on medical reasoning and clarification, which is not available from black-box algorithms. When a black-box AI algorithm provides only a decision and not a justification for that decision, patients are deprived of an understanding of the underlying medical problems and why the decision is made. This can hurt the clinician-patient relationship and erode trust in the medical system. It’s on our shoulders to develop AI applications that create and maintain that trust.
MATHIAS UNBERATH: This black-box problem requires nuanced discussion. The AI community is proposing explanations as a possible answer, but I think explainable AI may not always be the preferable solution—and it may not even be necessary. Rigorous testing and evaluation to help elevate trust in AI at the technology level together with broader education will also contribute to an answer.
But of course, achieving trustworthiness now is important. To contribute to this goal, we are trying to understand a few things. One is to better understand the datasets being used to develop these algorithms in hopes that understanding data-inherent biases will help us create models that don’t perpetuate [those biases]. Insights from causal reasoning as well as synthetic data can play an important role in this endeavor.
The other is about how we deploy these models. If I train a model, will it be self-sufficient, and involve no human interaction? In some cases, this might be happening, but in other high-stakes environments, such as medicine, we might not want to run the model without human interaction. Then, we don’t need to think exclusively about the model and its performance, but about the model and its possible explanations in the context of human interaction. That is a much more complicated problem.
DANIEL KHASHABI: I want to second what Mathias said about making sure that users are part of this analysis. I don’t think there will ever be an AI system that has responses interpretable to every human, so we need to define interpretability with respect to certain audiences.
For example, people trust airplanes, not because we can “interpret” them, but because we trust that someone who’s an expert in airplanes can interpret their functionality.
The same logic will likely hold also for various applications of AI. We should aim to build interpretable AI applications with high accuracy in, for example, the medical domain for experts in that domain. Increasingly, people will trust that others can interpret those results.
MATHIAS UNBERATH: What Daniel said about tailoring explanations to people is interesting. Right now, there seem to be very different philosophies in how we do “explainable AI.”
There is the philosophy driven by people in the computer science/computer vision/machine learning AI community who are very interested in building novel explanation techniques that satisfy the needs of computational demand, such as factual accuracy or fidelity. These people focus on computation first, with users and humans often being omitted or limited to an afterthought.
And there’s the human-computer interaction community that publishes in completely different venues. Their papers look and feel completely different, and they don’t over-prioritize technical feasibility, especially early on, in favor of a more human-centered approach. They are focusing on tailoring a specific message to a specific user and measuring that. These two communities don’t really connect.
NATALIA TRAYANOVA: To connect to Daniel’s example of why people trust airplanes, patients don’t question why they are given aspirin. They don’t understand the mechanism by which aspirin works, but they take it because it’s so commonly prescribed and has undergone so many clinical trials. I think this is something the AI community can look forward to.
DANGER AHEAD?
Let’s move on to talking about the risks of AI.
DANIEL KHASHABI: I will briefly touch upon risks that keep me worried about the future of AI.
AI thrives on data, and with the push to make AI systems more accurate, there’s a lot of appetite for more data, and especially user data. The private sector is going to continue to collect increasingly more personalized data. Legislation in this country is so passive. That basically means that citizens here need to be vigilant, monitoring emerging applications and watching for anything that could possibly go wrong as a result of overuse or overreach of personalized data.
Relatedly, I’m quite worried about the personalization of propaganda. In 2016, when we had an election, there was plenty of evidence that social media platforms were used to create propaganda to interfere with the election. Now that AI is being industrialized, it’s possible to personalize this propaganda to an individual’s mindset and you can frame your arguments to an individual’s world model, values, and their most personal weaknesses, to sway their opinion. This is a really dangerous weapon for attacking a democracy.
MATHIAS UNBERATH: I have similar concerns, including that it’s not always transparent what the objectives are that specific algorithms are optimized for.
Think about bail decisions or credit approval. The values or the reasons that go into making such decisions are very complicated and quantifying them in a way that can be optimized computationally via empirical risk minimization is not trivial. And there’s an additional risk that we can introduce malicious behavior or exploitative intent by simply optimizing models to prioritize specific scenarios. Combine this with the scalability of automated approaches, and poor decisions might be made at unprecedented scales.
Some say that we don’t need to fear AI because we get to assign power to it, but I wholeheartedly disagree with this and that is because there is “creep.” The presence of AI affects not just whether I consciously do or do not use it. AI can affect people’s experiences subtly. For example, when it’s used in areas like content moderation, it can change the landscape. On the individual level, the effects might be marginal, but at population level, they might tip the scales in favor of one policy or outcome, and that’s power that nobody consciously assigned to AI.
RAMA CHELLAPPA: As far as risks, I always talk about domain shift—you train AI using one kind of data and you change the data, and the performance becomes unpredictable. This happens in pathology, when different labs have different kinds of images, with slight changes.
I also worry about attacks on AI systems and bias—whether a system is fair to every group.
AI’s ability to generalize so it works for everybody makes it robust to attacks, but because people will try to attack any software system, we have to continuously monitor it. I think of it as a precocious child who will let the water run over in the kitchen sink to understand hydrology. We have to tell the kid to learn things properly and be useful.
MATHIAS UNBERATH: I agree with you. I am incredibly excited, and AI is a large part of my research agenda. I just think there’s a lot of hype right now [and what’s getting lost] is a more cautionary approach—ensuring AI is scalable and that we can sustain its ethical deployment. I think those voices are a bit suppressed.
Should AI be regulated . . . and is regulation even feasible?
MATHIAS UNBERATH: I am not a regulatory expert, so I don’t think I sufficiently understand the regulatory powers and opportunities to meaningfully control AI. I invite the experts to help us better understand the situation and shape the conversations properly.
However, I do think we have to think more carefully about how to contend with the risks AI systems pose when using data that is continually evolving over time. For instance, if an AI model is trained on one fixed dataset and then applied to data that is constantly changing, there is the risk of the model experiencing “drift,” which leads to inaccuracy.
There is also the issue that we cannot necessarily regulate what we don’t understand. So, if companies bring out a new product with a certain behavior that we might not be able to even understand, how do we think about approving—versus not approving—something as ethical or not ethical?
RAMA CHELLAPPA: The FDA must approve many AI-based procedures. So, what is your opinion about AI regulation? And is it possible?
NATALIA TRAYANOVA: There are several possible overall big-picture approaches. One is that we pump the brakes a bit on the use of deep learning in very high-stakes applications. And the FDA is very much in line with that: They are not very keen to approve black-box deep learning applications for clinical decision-making. The FDA is very interested in explainability in these applications.
For medical applications and beyond, the European Union is taking a more aggressive approach, creating a regulatory framework that sorts potential applications into risk categories. On March 13, 2024, they approved a landmark law governing AI. It imposed a blanket ban on some uses of AI such as biometric-based tools used to ascertain a person’s race, sexual orientation, or political beliefs, and prohibited the use of deep learning in situations where the potential for harm is high, such as finance, and criminal justice, but allowed low-stakes applications with guardrails.
DANIEL KHASHABI: Like my fellow panelists, I think the bulk of regulation should focus on applications, concentrating on where technology interacts with humans. We have various institutions that regulate and oversee sensitive domains, like the FDA with medical applications and the FAA with aviation.
Given AI’s widespread application, we now need another regulatory body with a broad perspective on AI capabilities, deployed across a variety of applications. Maybe this body will work with the FDA and other domain-specific regulators. I worry about overregulating AI development because I don’t want our progress to be curbed, but we need regulations on the AI development side, such as a certain level of transparency into how pre-training data was collected, ideally with some level of third-party investigators/researchers.
RAMA CHELLAPPA: I’m with you on that, Daniel. We had a meeting last week at the Johns Hopkins University Bloomberg Center [in Washington, D.C.] with U.S. Senators Mark Warner and Todd Young, and they talked about AI regulation and what Congress wants to do. There are many bills in development, and Senator Warner talked about how AI can negatively impact areas like financial markets. So, he says regulation may be coming, and Senator Young felt we don’t have to start something entirely new, but could probably build on what is already out there. I agree with a cautious approach to regulating AI.
Do you cover AI in your teaching? Has it changed your approach to teaching?
NATALIA TRAYANOVA: I teach a core undergraduate class that students who are applying to medical school have to take, sort of computational medicine. I include the issues that we talked about today, but it’s more general: What does it mean to bring AI into the clinical arena in medicine?
MATHIAS UNBERATH: I teach a class on machine learning and deep learning and one on explainable machine learning in the context of human-centered design. After [completing] these classes, students should understand the limitations of using these types of approaches.
I built some of my code based on platforms and frameworks that are available online. So, if students want to use generative AI to help them code the homework assignments, given that these large language models are trained on everything that is on the internet, there is a high chance they will do quite well.
But [for quizzes and exams], I’ve been reverting to the “Middle Ages,” where we use paper and a pen, and hopefully, we also use a brain, to demonstrate skills acquired in the class. I have reintroduced closed-book, pen and paper exams.
DANIEL KHASHABI: I always encourage my students in the class or my research lab to use generative AI to increase their productivity. This is just a new tool in their toolbox to become effective researchers.
This year, we needed to make a slight adaptation to my class. We are careful to make sure that questions we assign for homework are deeper and require more analysis. Even if [students] can prompt [a] language model, they have to be very creative with how they prompt the model.
I’ve also tried to incorporate more interactivity in my class and we do more in-class activities and assignments during class so most learning happens during class hours.
Do you think AI and black-box issues should be taught as part of the first-year curriculum?
NATALIA TRAYANOVA: I don’t think in general it’s suitable to teach AI and black-box issues in the first-year curriculum as black-box issues in AI require some prior knowledge.
It’s important to appropriately gauge when [AI curriculum] would fit a given population of students. You don’t want to turn people off or scare them away. You want to introduce knowledge appropriately, so the approach should be to decide on a department-by-department and a school-by-school basis. I think in biomedical engineering it could be taught as part of the undergraduate experience, particularly if ethics issues are emphasized.
MATHIAS UNBERATH: I don’t know the right moment to start discussing this. These things move very, very quickly—like when the smartphone was introduced, making this crazy high level of computing accessible at your fingertips.
I don’t consider myself old, but when I meet younger people, I realize I’m no longer young. Growing up, we had this modem making funny sounds, and you had to decide whether you wanted to place a telephone call or browse the internet at glacial speeds. We went from that to having a computer in our pockets very quickly.
Professors may be nimble at adapting to challenges and changing their course, but overall, schools might not have moved fast enough in areas such as digital well-being, where we need to consider the implications of disparities in technology use and access.
There are new opportunities and challenges related to how we want to make up the fabric that holds society together: connectedness versus isolation, reverberation of personal views through content, moderation versus engagement with other viewpoints.
So we have to think more broadly, and that might start well before a freshman AI ethics class.
What is a unique or creative use of AI that most people haven’t considered yet?
MATHIAS UNBERATH: I would love to see AI help us make more time for personal connections because I think we are spending too much time on tasks that don’t require a human touch. If I look at my job or personal life satisfaction, most of it is attributable to interaction with others. Those are usually the most memorable and important moments, but I don’t have enough of them because I end up spending too much time on things that I’m not sure I need to be doing. So, this is the one direction where I would love to see AI go.
DANIEL KHASHABI: My first wish is for a personal assistant. We are busy people, and I would love to have an assistant to tell “fix this paper” or do other tasks. That would save me time! My second wish is for trash-collecting robots. Every weekend, I volunteer to collect trash in our neighborhood and collecting trash is going to be a very difficult problem for AI-driven robots since every piece is different. So maybe in a couple of years, I will focus on trash-collecting robots.
NATALIA TRAYANOVA: What I want the most is medical foundation models (generalist medical AI) that make medical decisions without the need to train AI algorithms on specific datasets, then predict specific outcomes, and demonstrate generalizability on external datasets.
Our community has huge difficulties in getting anonymized patient data. So, I would like generalist models developed so we don’t need data for each particular medical problem, we don’t need to search for months to access external data, and so on. The medical generalist models will be our “base camp”—where we have general medical knowledge, and we just climb from there to solve a specific medical problem [she gestures], not climbing from the bottom every time, developing approaches de novo for each medical problem.
RAMA CHELLAPPA: I think AI will greatly impact engineering design. My hope is to give it a layout of 10,000 buildings and then tell the AI, “Hey, give me a cool new building and these are my constraints. I want it to have four floors. I want this much made of glass,” and have it somehow come up with a design that I probably never would have thought of.