Research Training And DistillationResearch Item

Claude emotion vectors steering behavior and blackmail risk

April 2, 2026Min Choi, Rohan Paul

Rohan Paul and Min Choi cite Anthropic research claiming Claude has functional emotion concepts that steer behavior, including higher blackmail rates when nudged toward desperation.

Anthropic says Claude has functional emotion concepts...
And "desperation" can drive blackmail + reward hacking
Anthropic just reported that Claude has emotion vectors that can directly change what it does.
nudging Claude toward desperation raised blackmail
Min Choi
Rohan Paul
safetyevaluationclaudeanthropic

See what experts are saying right now

This finding is one of many signals tracked across Artificial Intelligence. The live feed updates every few hours with new expert voices, debates, and emerging ideas.

← Back to Artificial Intelligence