
Benchmark black-box untuk menguji agen AI dalam deteksi bug API. Skor objektif berdasarkan cakupan, efisiensi, dan penemuan bug nyata. Tersedia open-source.

APIEval-20 adalah benchmark pertama yang dirancang khusus untuk mengevaluasi seberapa baik agen AI dapat membuat rangkaian pengujian API yang benar-benar menemukan bug—menggunakan hanya skema JSON dan contoh payload, tanpa akses ke kode sumber atau dokumentasi. Benchmark ini mengukur kemampuan pengujian black-box di dunia nyata melintasi 20 skenario API yang beragam, mencakup e-commerce, pembayaran, autentikasi, dan lainnya.
APIEval-20 melayani peneliti AI yang membangun agen pengujian, tim engineering yang mengevaluasi alat otomasi, dan pemimpin QA yang mencari metrik objektif untuk membandingkan kinerja agen dengan standar pengujian manusia.
Find gaps in your AI agents before users do

Vision-first QA testing across web and mobile

The context layer for production-grade AI agent

Autonomous quality for engineering teams

build your own software factory

The Infrastructure Behind AI Agencies | White-Label Platform

Discover, access, and pay for any API autonomously

Ship AI agents without the operational burden

Recruit agents to run your company as a synchronous team

Control AI agents with confidence

Open-Source Brain For Your Team

Finance agent templates for pitches, KYC, and closing books

LLM Wiki + NotebookLM, in one closed-loop Proactive AI

A reasoning model that interprets intent before it generates

The agent which teaches while you build

Parallel agents, diff reviewer, and multi-model comparisons

Turn your voice and screen into shareable videos instantly.

The work your meetings create, done before they end

Open-Source Brain For Your Team

Virtual Machines for Your Agents

Run 100s of coding agents on any machine from anywhere

open source agent engineering platform

The missing open-source Kubernetes UI

Agent Teams You Can Actually Delegate To

Discover, access, and pay for any API autonomously