Skip to the main content
icon-community

The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers

Affiliate Ram Shankar Siva Kumar and coauthors "present a practical scanner for identifying sleeper agent-style backdoors in causal language models," responding to decades-old concerns about the security of machine learning. Read the authors' complete manuscript from arXiv.

You might also like