
Benchmark open-source para avaliar agentes de teste de APIs. Detecta bugs em APIs reais com scoring objetivo. Disponível no Hugging Face.

O APIEval-20 é o primeiro benchmark projetado especificamente para avaliar a capacidade de agentes de IA em gerar suites de teste de API que realmente encontram bugs—usando apenas um esquema e payload de exemplo, sem acesso ao código-fonte ou documentação. Ele mede a capacidade de teste de caixa-preta em 20 cenários diversos abrangendo e-commerce, pagamentos, autenticação e mais.
O APIEval-20 atende pesquisadores de IA que constroem agentes de teste, equipes de engenharia que avaliam ferramentas de automação e líderes de QA que buscam métricas objetivas para comparar o desempenho de agentes com os padrões de teste humanos.
Find gaps in your AI agents before users do

Vision-first QA testing across web and mobile

The context layer for production-grade AI agent

Autonomous quality for engineering teams

build your own software factory

The Infrastructure Behind AI Agencies | White-Label Platform

Discover, access, and pay for any API autonomously

Ship AI agents without the operational burden

Recruit agents to run your company as a synchronous team

Control AI agents with confidence

Open-Source Brain For Your Team

Finance agent templates for pitches, KYC, and closing books

LLM Wiki + NotebookLM, in one closed-loop Proactive AI

A reasoning model that interprets intent before it generates

The agent which teaches while you build

Parallel agents, diff reviewer, and multi-model comparisons

Turn your voice and screen into shareable videos instantly.

The work your meetings create, done before they end

Open-Source Brain For Your Team

Virtual Machines for Your Agents

Run 100s of coding agents on any machine from anywhere

open source agent engineering platform

The missing open-source Kubernetes UI

Agent Teams You Can Actually Delegate To

Discover, access, and pay for any API autonomously