Affiliate Ram Shankar Siva Kumar and coauthors "present a practical scanner for identifying sleeper agent-style backdoors in causal language models," responding to decades-old concerns about the security of machine learning. Read the authors' complete manuscript from arXiv.
You might also like
- communityNo Last Mile
