
Open black-box benchmark for AI API testing agents. Objective scoring on bug detection, coverage & efficiency using live APIs with planted bugs.

APIEval-20 is the first benchmark designed specifically to evaluate how well AI agents can generate API test suites that actually find bugs—using only a schema and sample payload, with no access to source code or documentation. It measures real-world black-box testing capability across 20 diverse API scenarios spanning e-commerce, payments, authentication, and more.
APIEval-20 serves AI researchers building testing agents, engineering teams evaluating automation tools, and QA leaders seeking objective metrics to compare agent performance against human-level testing standards.
Find gaps in your AI agents before users do

Vision-first QA testing across web and mobile

The context layer for production-grade AI agent

Autonomous quality for engineering teams

build your own software factory

The Infrastructure Behind AI Agencies | White-Label Platform

Discover, access, and pay for any API autonomously

Ship AI agents without the operational burden

Recruit agents to run your company as a synchronous team

Control AI agents with confidence

Open-Source Brain For Your Team

Finance agent templates for pitches, KYC, and closing books

LLM Wiki + NotebookLM, in one closed-loop Proactive AI

A reasoning model that interprets intent before it generates

The agent which teaches while you build

Parallel agents, diff reviewer, and multi-model comparisons

Turn your voice and screen into shareable videos instantly.

The work your meetings create, done before they end

Open-Source Brain For Your Team

Virtual Machines for Your Agents

Run 100s of coding agents on any machine from anywhere

open source agent engineering platform

The missing open-source Kubernetes UI

Agent Teams You Can Actually Delegate To

Discover, access, and pay for any API autonomously