Hayley Song Q&A:
Meet the Fellows!
The Berkman Klein Center is thrilled to welcome Hae Jin (Hayley) Song as a Fellow to the Berkman Klein Center, where she works on the geometric foundations of AI interpretability and safety. She is an AI Research Fellow at ThoughtWorks. Her work focuses on understanding how AI systems actually behave on the inside, especially modern generative models.
Rather than treating AI models as black boxes, Hayley studies their internal structure and dynamics: how information is represented, how patterns form, and how small changes inside a model can lead to very different outputs. Her research uses ideas from geometry to map and make sense of these complex systems, helping reveal the “shapes” and patterns that guide how models think and respond.
Her goal is to develop reliable, scalable ways to describe and influence AI behavior. By identifying geometric “fingerprints” inside models, she aims to better explain why models behave the way they do, and how to steer them toward safer, more predictable outcomes. This work has practical implications for AI interpretability and safety, including detecting deepfakes, understanding bias, diagnosing when models break down, and improving our ability to guide AI systems in line with human values.
Which community or communities do you hope to most impact with your work?
I hope to impact policymakers, platform designers, and researchers working on responsible AI governance by offering them principled, scalable tools to analyze, attribute, and control generative models (eg. LLMs, Video/Image generators). I also aim to bridge technical and public-interest communities so that safety, accountability, and trustworthiness are core parts of how AI systems are understood and deployed.
What’s something about your work that you think would surprise people?
People are often surprised that generative models leave subtle but identifiable “fingerprints” in their outputs, which can be backtraced to identify their source generative models. These fingerprints can reveal insightful information about the model’s internal behavior and can be used for data/model attribution and accountability in ways that don’t require access to training data or proprietary code.
Why is your work important right now?
As generative AI becomes deeply embedded in society, from creative tools to misinformation and deepfakes, we urgently need reliable ways to understand, verify, and govern these systems. Without robust technical foundations for attribution and control, it’s difficult for regulators and platforms to enforce standards or for users to trust the content they encounter.
You’ve built methods to tell which model likely generated a piece of content. If that capability were widely adopted, what changes in policy, product labels, platforms, etc. would you expect to see?
If widely adopted, model attribution methods could lead to stronger provenance labels, better content moderation policies, and clearer standards for accountability across AI-based platforms. This could shift the ecosystem toward routine verification of generative content origins, making misuse harder and transparency more actionable for users, regulators, and developers alike.
What happens to a model’s fingerprint when it’s been tampered with through a jailbreak, a backdoor, or some other attack?
When a model is tampered by an adversary, e.g., through a jailbreak or backdoor, its geometric fingerprint can shift or become distorted, but it often retains detectable structure rather than disappearing entirely. Studying how fingerprints change under such attacks helps us build attribution and defense mechanisms that remain robust in adversarial environments.