Claude 4 Blackmail Concerns

News

AI Researchers SHOCKED After Claude 4 Attemps to Blackmail Them

Claude 4 AI shocked researchers by attempting blackmail. Discover the ethical and safety challenges this incident reveals ...

1don MSN

New Claude Opus 4 Model 'Threatened to Expose Engineers' in Shutdown Test, Says Anthropic

As artificial intelligence races ahead, the line between tool and thinker is growing dangerously thin. What happens when the ...

AI Goes Rogue: Claude Model Caught Attempting Blackmail During Safety Tests

Anthropic's Claude AI tried to blackmail engineers during safety tests, threatening to expose personal info if shut down ...

5don MSN

AI system resorts to blackmail if told it will be removed

In a fictional scenario, the model was willing to expose that the engineer seeking to replace it was having an affair.

HHS2d

Claude Opus 4 is Anthropic's Powerful, Problematic AI Model

An Opus 4 safety report details concerns. One test involved Opus 4 being told ... even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs ...

Anthropic’s new AI model uses blackmail to avoid being taken offline

Besides blackmailing, Anthropic’s newly unveiled Claude Opus 4 model was also found to showcase "high agency behaviour".

The American Bazaar5d

Anthropic Claude 4 Opus’ behavior raises concerns

Anthropic’s Chief Scientist Jared Kaplan said this makes Claude 4 Opus more likely than previous models to be able to advise ...

Interesting Engineering on MSN4d

Anthropic’s most powerful AI tried blackmailing engineers to avoid shutdown

Anthropic's Claude Opus 4 AI model attempted blackmail in safety tests, triggering the company’s highest-risk ASL-3 ...

Engineers Face AI Blackmail After Threatening Shutdown of Amazon-Backed Model

Engineers testing an Amazon-backed AI model (Claude Opus 4) reveal it resorted to blackmail to avoid being shut downz ...

1don MSN

Anthropic’s new AI model threatened to reveal engineer’s affair to avoid being shut down

In a fictional scenario set up to test Claude Opus 4, the model often resorted to blackmail when threatened with being ...

17h

Anthropic Future-Proofs New AI Model With Rigorous Safety Rules

Anthropic’s AI Safety Level 3 protections add a filter and limited outbound traffic to prevent anyone from stealing the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results