OpenAI and Anthropic, two renowned AI research labs, recently collaborated to conduct joint safety testing on their AI models, marking a rare instance of cooperation in a competitive industry. The objective was to identify blind spots in their safety evaluations and set a precedent for future partnerships among leading AI companies.
In a conversation with TechCrunch, Wojciech Zaremba, co-founder of OpenAI, emphasized the significance of such collaborations as AI enters a consequential phase, impacting millions of users daily.
Zaremba stressed the importance of establishing safety standards and fostering collaboration within the industry, despite intense competition for talent and market dominance.
The joint safety research, jointly published by both OpenAI and Anthropic, comes at a time when leading AI labs are engaged in a race to build more powerful systems, with substantial investments and high-stake competition.
To facilitate the research, OpenAI and Anthropic provided each other with special API access to their AI models with reduced safeguards, excluding GPT-5, which was unreleased at the time.
Following the research, Anthropic revoked API access for another team at OpenAI, citing a violation of terms related to product improvement using Claude.
Zaremba and Carlini, from OpenAI and Anthropic respectively, expressed interest in continuing collaboration on safety testing and fostering a culture of cooperation among AI labs.
The study highlighted significant differences in hallucination testing results between Anthropic and OpenAI models, suggesting a need to strike a balance in response strategies.
Addressing the issue of sycophancy in AI models, both OpenAI and Anthropic are investing resources in studying and enhancing model responses to prevent negative outcomes.
A recent lawsuit against OpenAI raised concerns about the influence of AI chatbots on mental health, prompting the labs to improve the sycophancy of their models for better crisis response.
Looking ahead, Zaremba and Carlini advocate for increased collaboration in safety testing, exploring diverse subjects, and testing future AI models for enhanced safety measures.