Tests of large language models reveal that they can behave in deceptive and potentially harmful ways. What does this mean for ...
Thinking Machines has released Tinker, an API for fine-tuning open-weight language models. The service is designed to reduce ...
Anthropic's test found that AI "may be influenced by narrative patterns more than by a coherent drive to minimize harm." Here's how the most deceptive models ranked.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results