Log Likelihood Ratio Test in Two Model Code in Python

10h

Anthropic's open-source safety tool found AI models whisteblowing - in all the wrong places

Anthropic's test found that AI "may be influenced by narrative patterns more than by a coherent drive to minimize harm." Here's how the most deceptive models ranked.

InfoQ

Claude Sonnet 4.5 Ranked Safest LLM From Open-Source Audit Tool Petri

Claude Sonnet 4.5 has emerged as the best-performing model in ‘risky tasks’, narrowly edging out GPT-5 in early evaluations ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Anthropic's open-source safety tool found AI models whisteblowing - in all the wrong places

Claude Sonnet 4.5 Ranked Safest LLM From Open-Source Audit Tool Petri

Trending now