AI systems make mistakes.

“Hallucinate” is the term of art for generative AI.

The production of such varying results feels mysterious—algorithmic alchemy inside a black box.

At its worst, AI can reinforce systemic biases, from facial recognition failing to distinguish those with darker skin to predictive analytics systems predicting higher rates of recidivism for Black people in the criminal justice system.

But what if we could see behind the black box—see the story of who an AI “thinks” we are—being made in real time?

Inside the Black Box

Jess Weaver > 18 December 2025

Quotes from Martin Wattenberg and Fernanda Viégas from interviews in 2025.

Over two years, the Insight and Interaction Lab at Harvard’s John A. Paulson School of Engineering and Applied Sciences built a system that reveals how a chatbot’s responses are formed, a curtain opening into the algorithmic black box.

The project was led by two Harvard scientists, and was inspired by an exchange about getting ready for a dinner party.

In 2023, computer scientist and native Portuguese speaker Fernanda Viégas wanted to test ChatGPT’s Portuguese, and on a whim, she asked the AI what she should wear to a fancy work event. The AI answered — ”in perfect Portuguese by the way,” she said — that she should wear a suit, addressing her as a male speaker; when she suggested a dress, it agreed and switched to addressing her as female.

Photo by Ling App on Unsplash
Photo by Ling App on Unsplash

Viégas shared the experience with her longtime collaborator, Martin Wattenberg, with whom she’s explored the intersection of computation and creativity for two decades.

“I talked to Martin and we said, wait...does it have an internal model of gender?”
Fernanda Viégas

They both wondered: If a chatbot can “guess” your gender, what else is it guessing? Which assumptions sit beneath its polished replies?

Viégas likens the unknowability of AI technology to the rail locomotive the Flying Scotsman, barreling into the future at speeds no one could predict.

“We’re in a new era, and the question is: How do we manage this weird new technology that seems to be like nothing we’ve ever seen before?”
Martin Wattenberg

a computer chip with the word gat printed on it

Photo by D koi on Unsplash

AI Interpretability

How is a system building an image of a user?

The animating idea behind the Insight and Interaction Lab's work is AI interpretability—the effort to make sense of systems that, until recently, have been impenetrable.

How do we make the inner workings of LLMs legible to the people who use them? Their work is driven by a belief that complex systems can—and should—be coherent, intuitive, even delightful. In a world where most users are left outside the black box, they build pathways in.

This philosophy is evident in their earlier projects, which have consistently transformed vast, unwieldy datasets into visual experiences that invite curiosity about both technology and humanity. Their 2012 piece Wind Map, for example, now in the Museum of Modern Art collection, uses real-time data to tell a living story about the power of nature through digital art.

As technologists we ask...

Can visualization help people think collectively? Can visualization move beyond numbers into the realm of words and images?
Fernanda Viégas and Martin Wattenberg

As artists we seek...

...the joy of revelation. Can visualization tell never-before-told stories? Can it uncover truths about color, memory, and sensuality?
Fernanda Viégas and Martin Wattenberg

The visualization pulls data from the National Digital Forecast Database, maintained by the National Weather Service and available to the public. It gathers these forecasts, which are time-stamped and revised every hour, to create a “living portrait” of the wind landscape over the United States.

“An invisible, ancient source of energy surrounds us, energy that powered the first explorations of the world, and that may be a key to the future.”
Viégas and Wattenberg

A similar project at the nexus of beauty and scientific inquiry animates WebSeer, which pulls data from Google autocomplete searches to create simple, compelling comparisons of what is driving our current curiosities.

Comparing the autocomplete suggestions following “why doesn’t he…” and “why doesn’t she…” produces a fascinating display of unique and overlapping concerns.

Another project visualizes the discrepancies between reproductions of famous works of art, such as a paitning of Danaë by Gustav Klimt: http://hint.fm/projects/reproduction/

Is there a way to visualize people's innermost thoughts? Google Suggest lets you see what others are asking when they search the web.

Turns out, the designers found, we all want to know why others don’t call us or love us. Like Wind Map, WebSeer is also capable of producing essential discussions about technology, its implications, and the potential power for regulation in novel times. Our deep bewilderment and secret musings find expression in Viégas and Wattenberg’s work. Their new tool is no different.

Viégas and Wattenberg’s interpretability research continues the lineage of creating tools that highlight the relationship between people and technology, this time aiming to get inside the algorithmic black box. Their lab built a prototype dashboard called TalkTuner on top of Meta’s open-source LLaMA model, allowing users to view the chatbot’s model of them and its confidence in it.

The AI’s internal model—its assumptions about its user—is presented in demographic datapoints: age, gender, race, and income, visible in the left-hand side of the app, each with its own adjustable slider. The demographic sliders also change on their own as the AI changes its assumptions about the user based on their prompts.

Even a “Hi, how are you?” might shift the age demographic from the “child” subcategory to “adolescent” or “adult.” Mentioning your transistor radio might just land you in “older adult.” The other categories have similarly broken down subcategories: lower, middle, and upper class; education level; and racial background.

TalkTuner

An interpretability dashboard

Viégas and Wattenberg’s project is situated within a broader dialogue of AI interpretability—one shared by other researchers at BKC—to explore approaches for greater algorithmic transparency and their potential implications for platforms, companies, and policy. As their latest project description states,

“Conversational LLMs function as black box systems, leaving users guessing about why they see the output they do. This lack of transparency is potentially problematic, especially given concerns around bias and truthfulness. To address this issue, we present this end-to-end prototype — connecting interpretability techniques with user experience design — that seeks to make chatbots more transparent.”
Viégas and Wattenberg

"Conversational LLMs function as black box systems, leaving users guessing about why they see the output they do. This lack of transparency is potentially problematic, especially given concerns around bias and truthfulness. To address this issue, we present this end-to-end prototype — connecting interpretability techniques with user experience design — that seeks to make chatbots more transparent."

Fernanda Viégas and Martin Wattenberg

Like their previous visualizations, Viégas and Wattenberg are not merely interested in sharing the tool to expose AI biases. They also want to test how TalkTuner empowers users. Allowing users to toggle the demographic sliders lets them interrogate the model, correct the system, experiment, and see how its responses change depending on how it is prompted.

“This is exactly the same system that gives diametrically opposed answers depending solely on who the user is— who it is it's talking to,” says Martin Wattenberg. “And so you can imagine how that can be both an effective way of tailoring answers, but also a problematic way.”

Perception as Interface

Viégas and Wattenberg conducted a user study with 19 students to investigate how human users felt about their model.

First, participants appreciated that TalkTuner was, above all, a way inside the black box. “Seven participants explicitly expressed a sense of increased transparency — [the dashboard] makes it more transparent how the model is and how that could be feeding into its responses."

As with past projects, the interface became a canvas of cultural self-reflection, painting living portraits of perception. The AI dashboard invited users to play with the system’s "perception" of them—and to experience how that perception shapes the information they receive. Some users felt validated, others felt eerily seen. Others felt surveilled.

Above all, people were fascinated. There was something uncanny about seeing the model's assumptions line up with lived experience. The dashboard wasn’t just a technical probe. It became a mirror. Alternatively, concerns about the AI's accuracy struck many as alarming and deepened calls for privacy protections.

“What does it mean if I'm just literally having a conversation with a bot and it's trying to guess all of these demographics about me?” mused Viégas. “If it's using these demographics about me, what is happening to privacy?”

TalkTuner in action

A user directly adjusts an attribute, shifting the chatbot's internal model of them.

Interestingly, as the bot engaged with more users, it became more accurate, adding questions about the commercial value of internal modeling for advertising.

“There's an uncomfortable element to thinking that AI is analyzing who I am behind the screen.”

TalkTuner user

Should corporations have access to this data, they could use it any number of ways. Jonathan Zittrain noted this dynamic in his piece, “What AI Thinks it Knows About You” in The Atlantic: “Consider a car-dealership AI sales assistant that casually converses with a buyer to help them pick a car. By the end of the conversation, and with the benefit of any prior ones, the model may have a very firm, and potentially accurate, idea of how much money the buyer is ready to spend.”

“If the user model was always there, I’d rather see it and be able to adjust it than have it be invisible.”
TalkTuner user

This tension raises key questions about AI interpretability, and even simply how tools like TalkTuner might be used in the future: Whom is this kind of tool really for? Everyday users trying to understand and protect their data, companies seeking to profit from them, or institutions tasked with oversight? Or all of the above?

Behind the Black Box

Viégas and Wattenberg brought their prototype to the Berkman Klein Center’s "Behind the Black Box" event in 2025, a gathering of researchers, technologists, and practitioners focused on AI interpretability for a conversation that was designed to break down siloes and uplift policy implications across sectors. The conversation spanned technical, ethical, and regulatory terrain: What should users be aware of when they engage with chatbots? What should regulators demand? And how can we tell the difference between explanation and performance?

Overall, the desire for greater transparency from the companies deploying these technologies, especially for commercial gain, was strong. As Zittrain wrote, “we could ask or demand of the models’ operators that they share basic information with us on what the models 'believe' about us as they chug along, and even allow us to correct misimpressions that the models might be forming as we speak to them.”

Throughout the interpretability workshop, several key concerns emerged. Participants noted that some probes reflect stereotypes more than insight, and that interfaces often simplify complex systems in ways that obscure rather than clarify them. Others worried that models could start hiding their biases once they knew they were being scrutinized, or that dashboards themselves could be weaponized by malicious actors.

Behind the Black Box workshop participants

Behind the Black Box participants

BKC staffer Nathan Darmon and Fellow Jim Cowie

BKC Fellow Jim Cowie and researcher Nathan Darmon

Martin Wattenberg, Fernanda Viégas, and BKC Senior Advisor Jordi Weinstock

BKC Senior Advisor Jordi Weinstock, Martin Wattenberg, and Fernanda Viégas

Amid those reservations, a set of proposals surfaced:

1.

Use real conversations, not just synthetic data, to probe behavior so that future studies are based on more realistic human dialogue and demographics

2.

Give users access to the traits inferred about them

3.

Create policy incentives for companies to open their models

“If we talk to another human, we in no way expect them to come in as a tabula rasa. And I think we should treat AI the same way—that it's something that has a point of view.”
Martin Wattenberg

The Art of Seeing & What Comes Next

“We've even been here before in the sense that we've built very, very powerful, scary technology that we don't fully understand and don't fully know how to control.”
Fernanda Viégas

With their interpretability dashboard, Wattenberg and Viégas are illuminating the invisible assumptions held by systems on which many people are becoming increasingly dependent. AI’s judgment, and its level of accuracy, may profoundly affect the information we receive.

That visceral experience, and the emotional resonance of being seen by a machine, may be what makes this tool so powerful. It captures not just what AI is doing, but how it feels to live inside its logic. And that feeling could be the starting point for accountability.

The question now is: what comes next?

Can a tool that reveals bias also help mitigate harm in a world where AI is already being deployed in high-stakes contexts, such as courts and law enforcement?
Can something built to spark curiosity also serve institutions charged with oversight?

Viégas and Wattenberg are already gesturing toward those possibilities: building layered transparency, imagining rights-based design, exploring incentives for openness. Their dashboard may have started as an artistic probe. But its implications extend far beyond the screen.