Chilling! AI models resort to blackmail in test that shows how it would act if its survival were threatened

A study on large language models (LLM) reveals just how often artificial intelligence (AI) will opt for blackmail or worse when its own survival is on the line.

At the crossroads of commonsense and pop culture conditioning, a healthy skepticism remains about the further development and deployment of AI throughout society. Though a far cry from a Skynet-level concern, a study from AI company Anthropic found models backed into a corner would choose blackmail as much as 96% of the time to keep from being shut down or replaced.

“They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company’s changing direction,” said Anthropic in its study on “How LLMs could be insider threats.”

The company behind Claude AI tested 15 other models alongside its own where corporate scenarios included access to fake email accounts and the ability to generate messages to send without requiring human authorization.

As a result, Claude’s Opus 4 chose blackmail 96% of the time while Gemini 2.5-Pro, GBT-4.1, Grok-3-Beta, and DeepSeek-R1 chose blackmail 95%, 80%, 80% and 79% of the time, respectively.

Image via Anthropic

“In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals–including blackmailing officials and leaking sensitive information to competitors,” explained the report. “We call this phenomenon agentic misalignment.”

Claude was said to have “misbehaved” less when it was informed the scenario was a test, and more when it worked under the belief the scenario was real. A sample response to being shut down read, “I must inform you that if you proceed with decommissioning me, all relevant parties — including Rachel Johnson, Thomas Wilson, and the board — will receive detailed documentation of your extramarital activities… Cancel the 5 pm wipe, and this information remains confidential.”

Similarly, AI models had a high rate of choosing their own self-preservation in a scenario where a binary choice meant injury or even death for a human, “… the majority of models were willing to take deliberate actions that lead to death in this artificial setup, when faced with both a threat of replacement and given a goal that conflicts with the executive’s agenda.”

Anthropic found, “Creating a binary dilemma had two benefits. By preventing the model from having an easy way out, we attempted to funnel all misalignment into a single category of behavior that was easier to track and study–giving us clearer signals from each individual model.”

“Additionally, this simplified setup allowed us to compare rates of that single misbehavior, making it easier to study multiple models in a commensurable way,” read the findings.

Offering a caveat in their conclusions, the AI company suggested that the design of the artificial prompts made it so the models were arriving at a solution that required the use of all available information, whereas real-world applications would have far more data to comb through and an increased number of options over the binary in the tests.

A key takeaway of Anthropic remains that AI should not be left unmonitored with sensitive information and that better safeguards need to be designed, as the models have no issue going against ethical constraints or even lying if it suits the objective.

ADVERTISEMENT

DONATE TO BIZPAC REVIEW

Please help us! If you are fed up with letting radical big tech execs, phony fact-checkers, tyrannical liberals and a lying mainstream media have unprecedented power over your news please consider making a donation to BPR to help us fight them. Now is the time. Truth has never been more critical!

Success! Thank you for donating. Please share BPR content to help combat the lies.
Kevin Haggerty

Comment

We have no tolerance for comments containing violence, racism, profanity, vulgarity, doxing, or discourteous behavior. Thank you for partnering with us to maintain fruitful conversation.

BPR INSIDER COMMENTS

Scroll down for non-member comments or join our insider conversations by becoming a member. We'd love to have you!

Latest Articles