Summary:
1. Raindrop AI has launched a new feature called Experiments, designed to help enterprises test and compare different AI models to improve performance.
2. The tool allows teams to track changes in AI behavior, measure improvements, and make data-driven decisions for agent development.
3. Experiments offers visual breakdowns of metrics, integration with existing pipelines, and data protection features to ensure accuracy and security.
Article:
Raindrop AI has introduced a groundbreaking feature called Experiments, tailored to assist enterprises in navigating the ever-evolving landscape of AI technology. In a world where new large language models seem to be released almost weekly, it can be challenging for businesses to keep up and determine which models are best suited for their workflows. With Experiments, Raindrop aims to provide a solution by offering the first A/B testing suite specifically designed for enterprise AI agents.
This new analytics feature enables teams to observe and compare the impact of updating agents to new models or modifying instructions and tool access on real end users’ performance. By extending Raindrop’s existing observability tools, Experiments empowers developers and teams to monitor how their agents evolve and behave in real-world scenarios. Through Experiments, teams can analyze the effects of changes such as model updates, tool usage, prompts, or pipeline refactors on AI performance across millions of user interactions.
Raindrop co-founder and CTO, Ben Hylak, emphasized the importance of transparency and measurability in agent development. Experiments allows teams to track changes in tool usage, user intents, issue rates, and demographic factors like language, making model iteration more transparent and measurable. The visual interface of Experiments showcases results, highlighting when an experiment outperforms or underperforms its baseline. By making data easily interpretable, Raindrop encourages AI teams to approach agent iteration with the same rigor as modern software deployment, addressing regressions before they escalate.
The launch of Experiments builds upon Raindrop’s foundation as one of the pioneering AI-native observability platforms. Initially known as Dawn AI, the company emerged to tackle the “black box problem” of AI performance, aiming to catch failures as they happen and provide insights into what went wrong. Co-founders Ben Hylak, Alexis Gauba, and Zubin Singh Koticha established Raindrop after experiencing the challenges of debugging AI systems in production firsthand.
Experiments aims to bridge the gap between traditional evaluation frameworks and the unpredictable behavior of AI agents in dynamic environments. By offering side-by-side comparisons of models, tools, intents, or properties, Experiments surfaces measurable differences in behavior and performance. The tool enables users to identify issues such as task failure spikes, forgetting, or unexpected errors triggered by new tools. Moreover, Experiments facilitates detailed traces to pinpoint root causes and expedite issue resolution.
Designed to facilitate real-world AI behavior analysis, Experiments allows users to compare and measure their agent’s behavior changes across millions of interactions. By providing a visual breakdown of metrics like tool usage frequency, error rates, conversation duration, and response length, Experiments offers a comprehensive view of agent behavior evolution over time. The platform also supports collaboration through shared links, enabling teams to work together efficiently and report findings seamlessly.
In terms of integration, scalability, and accuracy, Experiments seamlessly integrates with popular feature flag platforms and existing telemetry pipelines. The tool can compare performance over time without additional setup, ensuring that teams have statistically meaningful results with around 2,000 users per day. To guarantee the accuracy of comparisons, Experiments monitors sample size adequacy and alerts users if a test lacks sufficient data for valid conclusions. The platform prioritizes metrics like Task Failure and User Frustration, offering transparency behind every aggregate number.
Security and data protection are paramount considerations for Raindrop, as the platform operates as a cloud-hosted service and provides on-premise PII redaction for enterprises requiring additional control. Raindrop is SOC 2 compliant and offers a PII Guard feature that leverages AI to automatically redact sensitive information from stored data, ensuring customer data protection.
In terms of pricing and plans, Experiments is available as part of Raindrop’s Pro plan, priced at $350 per month or $0.0007 per interaction. The Pro tier includes deep research tools, topic clustering, custom issue tracking, and semantic search capabilities. Additionally, Raindrop offers a Starter plan at $65 per month or $0.001 per interaction, catering to businesses with core analytics needs. Larger organizations can opt for the Enterprise plan, featuring custom pricing and advanced functionalities like SSO login, custom alerts, integrations, edge-PII redaction, and priority support.
By introducing Experiments, Raindrop positions itself at the forefront of AI analytics and software observability, emphasizing a data-driven approach to agent development. The platform’s focus on measuring truth reflects a broader industry trend towards accountability and transparency in AI operations. Raindrop envisions that Experiments will empower AI developers to iterate faster, identify root causes sooner, and deploy high-performing models confidently based on real user data and contextual understanding.