The inaugural winner of a challenging AI coding competition has been announced, raising the bar for AI software engineers.
On Wednesday at 5 p.m. PT, the Laude Institute revealed the champion of the K Prize, a rigorous AI coding contest initiated by Databricks and Perplexity co-founder Andy Konwinski. The victor, Eduardo Rocha de Andrade from Brazil, secured a $50,000 prize. What’s remarkable is that he achieved victory by answering only 7.5% of the test questions correctly.
Konwinski emphasized the importance of establishing challenging benchmarks, stating, “Benchmarks should be tough to be meaningful.” He further explained that the K Prize favors smaller and open models by running offline with limited compute resources, thereby leveling the playing field. Konwinski has committed $1 million to the first open-source model that achieves a score above 90% on the test.
The K Prize assesses models against flagged issues from GitHub, mimicking real-world programming challenges. Unlike the static problems in SWE-Bench, the K Prize ensures fairness by using a timed entry system that prevents benchmark-specific training. The top score of 7.5% starkly contrasts with SWE-Bench’s 75% and 34% scores on its “Verified” and “Full” tests, respectively. Konwinski aims to determine the reason for this gap through the K Prize project.
Continual participation in the K Prize will provide insights into the competitiveness of the test, as competitors adapt to the evolving dynamics. The initiative aims to address the growing evaluation challenges in AI by creating more rigorous benchmarks.
Techcrunch event
San Francisco
|
October 27-29, 2025
Despite the availability of numerous AI coding tools, projects like the K Prize are essential to prevent benchmarks from becoming too simplistic. Experts like Princeton researcher Sayash Kapoor advocate for creating new tests to enhance existing benchmarks and address contamination issues.
Konwinski views the K Prize not just as a benchmark but as a challenge to the industry, highlighting the need for realistic expectations regarding AI capabilities. He stresses the significance of achieving more than 10% on a contamination-free SWE-Bench as a reality check for the AI sector.