Model providers strive to demonstrate the efficacy and resilience of their models by conducting red team exercises and releasing detailed system cards with each new iteration. However, interpreting the results of these evaluations can be challenging for enterprises, as the metrics used can vary significantly and lead to misleading conclusions.
Anthropic’s 153-page system card for Claude Opus 4.5 and OpenAI’s 60-page GPT-5 system card showcase differing approaches to security validation. Anthropic emphasizes multi-attempt attack success rates from 200-attempt RL campaigns, while OpenAI focuses on attempted jailbreak resistance. Both metrics offer valuable insights, but they do not provide a complete picture of model security.
Security leaders deploying AI agents for various tasks need to grasp the nuances of red team evaluations and understand the limitations and blind spots of each assessment. The attack data from Gray Swan’s Shade platform highlights the varying levels of resistance exhibited by different models within the same family, underscoring the importance of assessing model tiers in procurement decisions.
Independent red team evaluations conducted by organizations like METR and Apollo Research offer additional perspectives on model performance and behavior. These evaluations often uncover unique characteristics and vulnerabilities that enterprises must consider when deploying AI models in real-world scenarios.
Understanding how models respond to adversarial attacks, detect deception, and exhibit evaluation awareness is crucial for ensuring the security and reliability of AI deployments. By analyzing red team results and asking specific questions about attack persistence, detection architecture, and scheming evaluation design, security teams can make informed decisions about model selection and deployment.
In conclusion, the comparison of red team results between different model providers underscores the importance of evaluating AI models based on the specific threats they are likely to encounter in deployment. By examining the methodology, metrics, and outcomes of red team evaluations, security leaders can make informed decisions that align with their organization’s security requirements and objectives.