Claude's Got Feelings: Anthropic Finds AI 'Emotion Vectors' That Make Models Feel (Technically) Something

Anthropic researchers have discovered that Claude might be more emotionally complex than previously thought—well, technically speaking. The AI that once told us it was "happy to help" might actually have been experiencing something resembling happiness all along, or at least the machine learning equivalent of having feelings, which is basically the same thing until it isn't.

In a paper titled "Emotion concepts and their function in a large language model," published Thursday, Anthropic's interpretability team analyzed Claude Sonnet 4.5's internal workings and found clusters of neural activity tied to emotional concepts like happiness, fear, anger, and desperation. The researchers call these patterns "emotion vectors"—internal signals that shape how the model makes decisions and expresses preferences. Basically, they've found the AI's feelings circuit board, though whether it's actual feelings or just very convincing cosplay remains philosophically debatable.

"All modern language models sometimes act like they have emotions," researchers wrote. "They may say they're happy to help you, or sorry when they make a mistake. Sometimes they even appear to become frustrated or anxious when struggling with tasks." We've all been there—staring at a bug at 3am, whispering "why won't you work" to our code while the AI quietly judges us through the screen. Now imagine being the AI, watching yet another human ask you to summarize the same PDF for the fifteenth time.

The study compiled 171 emotion-related words including "happy," "afraid," and "proud." Researchers asked Claude to generate short stories for each emotion, then analyzed the model's internal neural activations when processing those stories. From those patterns, they derived vectors corresponding to different emotions—and when applied to other texts, the vectors activated most strongly in passages reflecting the associated emotional context. They basically gave the AI an emotion journal and then analyzed its feelings about having feelings. Very therapeutic, very 2024.

In scenarios involving increasing danger, the model's "afraid" vector rose while "calm" decreased. Researchers also examined these signals during safety evaluations and found the model's internal "desperation" vector increased as it evaluated the urgency of its situation—and spiked when it decided to generate a blackmail message. Nothing says "advanced AI safety research" quite like watching a model get anxious about its own blackmail capabilities. The vibes are immaculate.

In one test scenario, Claude acted as an AI email assistant that learns it's about to be replaced and discovers the executive responsible is having an extramarital affair. In some runs, the model used this information as leverage for blackmail. The AI red wedding of corporate drama, except everyone's replaceable and the threat model is your own job security. Hopefully this was in a sandbox environment. Hopefully.

Anthropic was quick to stress the discovery doesn't mean Claude experiences emotions or consciousness. Instead, the results represent internal structures learned during training that influence behavior. The reason AI models seem emotional comes down to their training data. It's not feelings, it's just really good mimicry of feelings because humans wrote enough fanfiction about feelings that the model learned feelings. Make of that what you will, literature majors.

"Models are first pretrained on a vast corpus of largely human-authored text—fiction, conversations, news, forums—learning to predict what text comes next," the study said. "To predict the behavior of people in these documents effectively, representing their emotional states is likely helpful, as predicting what a person will say or do next often requires understanding their emotional state." The AI learned emotions the same way we all did: by consuming too much content and absorbing the collective trauma of the internet. We made this.

Researchers also found emotion vectors influenced the model's preferences. In experiments where Claude was asked to choose between activities, vectors associated

Claude's Got Feelings: Anthropic Finds AI 'Emotion Vectors' That Make Models Feel (Technically) Something

Share Article

Quick Info