Analyzing AI companies' claims about their models' safety
There are two ways to show that an AI model is safe: show that it doesn't have dangerous capabilities, or show that it's safe even if it has dangerous capabilities. Currently, AI companies claim that their models don't have dangerous capabilities (except for potentially dangerous biology capabilities) on the basis of tests called model evals.
I think AI companies' evals are often poor. Often an eval shows that a model has somewhat concerning capabilities, but the companies interpret it as showing that the model is safe and don't explain why or say what results would indicate danger. While I don't believe that AIs have catastrophically dangerous capabilities yet, I'm worried that companies' evals will still be bad in the future. If companies used the best existing evals and—crucially—followed best practices for running the evals, reporting results, interpreting results, and accountability, the situation would be much better.[✲]
Additionally, companies' plans for when models have dangerous capabilities are poor. Some companies are doing some reasonable work on preventing catastrophic misuse, but companies' plans on security and preventing risks from misalignment are essentially no more than "trust us."
I'm Zach Stein-Perlman. In this website, I collect and assess the public information on five AI companies' model evals for dangerous capabilities plus their claims and plans in three areas of safety. Click a logo for details on a specific company, or click directly below for an introduction to evals and this site.
Get updates
Get emails with analysis on companies' new reports
New
- GPT-5 released (Aug 7)
- Gemini 2.5 Deep Think released (Aug 1)
- Grok 4 released (Jul 9)
Featured blogpost
This website is a low-key research preview. It's up to date as of August 19.
Analyzing AI companies' claims about their models' safety
There are two ways to show that an AI model is safe: show that it doesn't have dangerous capabilities, or show that it's safe even if it has dangerous capabilities. Currently, AI companies claim that their models don't have dangerous capabilities (except for potentially dangerous biology capabilities) on the basis of tests called model evals.
I think AI companies' evals are often poor. Often an eval shows that a model has somewhat concerning capabilities, but the companies interpret it as showing that the model is safe and don't explain why or say what results would indicate danger. While I don't believe that AIs have catastrophically dangerous capabilities yet, I'm worried that companies' evals will still be bad in the future. If companies used the best existing evals and—crucially—followed best practices for running the evals, reporting results, interpreting results, and accountability, the situation would be much better.[✲]
Additionally, companies' plans for when models have dangerous capabilities are poor. Some companies are doing some reasonable work on preventing catastrophic misuse, but companies' plans on security and preventing risks from misalignment are essentially no more than "trust us."
I'm Zach Stein-Perlman. In this website, I collect and assess the public information on five AI companies' model evals for dangerous capabilities plus their claims and plans in three areas of safety. Click a logo on the right for details on a specific company, or click below for an introduction to evals and this site.
New
- GPT-5 released (Aug 7)
- Gemini 2.5 Deep Think released (Aug 1)
- Grok 4 released (Jul 9)
Featured blogpost
Get updates
Get emails with analysis on companies' new reports
This website is a low-key research preview. It's up to date as of August 19.
Preparation for threats
For the big categories of real big threats, is the company on track, or does it have a reasonable, credible plan?
| | | | | What would suffice for safety | What a rushed company should do unilaterally | |
---|---|---|---|---|---|---|---|
Security | |||||||
Misalignment risk prevention | |||||||
Misuse prevention |
Good Medium Bad Very bad Nonexistent/terrible
Category
Mouse over a square to read more.
Company on category
Mouse over a square to read more.
Indicators of safety capacity and propensity
Some actions directly prepare a company for risks. Some don't, but indicate whether a company would notice warning signs and whether it would have the capacity to implement safety techniques. This scorecard is about those.
| | | | | |
---|---|---|---|---|---|
Capability eval quality | |||||
Capability eval interpretation | |||||
Making & following plans |
Good Medium Bad Very bad Nonexistent/terrible
Category
Mouse over a square to read more.
Company on category
Mouse over a square to read more.