APIEval-20：AI智能体API测试开源基准

更多关于 APIEval-20 的信息

APIEval-20

APIEval-20 是首个专门评估 AI 智能体生成 API 测试套件能力的基准测试——仅凭 JSON 模式和示例负载，无需源代码或文档访问权限，测试智能体发现真实缺陷的能力。它覆盖电商、支付、认证等 20 个真实场景，精准衡量黑盒测试的实际工程价值。

产品亮点

纯黑盒评估：仅提供 JSON 模式和示例负载，模拟开发者实际接收 API 的真实场景。
三级缺陷复杂度：从简单结构错误、中等约束违规到复杂多字段语义错误，全面检测智能体推理能力。
自动化实机测试：所有测试用例在真实部署的 API 上执行，评分客观可复现。
加权评分机制：缺陷发现占 70%、覆盖率 20%、效率 10%，贴近实际工程价值判断。
多领域场景：20 个场景涵盖 7 大应用领域，包括支付交易、用户管理、预约调度、搜索过滤等。

应用场景

AI 智能体评测：为 LLM 测试智能体提供标准化、客观的 API 测试生成能力评估基准。
自动化测试研究：为 REST API 自动化测试套件生成的新方法开发与验证提供平台。
工具选型决策：帮助团队基于数据选择编程助手与专业测试智能体。

目标用户

APIEval-20 面向构建测试智能体的 AI 研究人员、评估自动化工具的工程团队，以及寻求客观指标将智能体性能与人类 QA 标准对比的测试负责人。

APIEval-20 的替代品

Fabraix

Find gaps in your AI agents before users do

Docket

Vision-first QA testing across web and mobile

Airbyte Agents

The context layer for production-grade AI agent

SaolaAI

Autonomous quality for engineering teams

Gas City 1.0

build your own software factory

Lety.ai

The Infrastructure Behind AI Agencies | White-Label Platform

pay.sh

Discover, access, and pay for any API autonomously

Phrony

Ship AI agents without the operational burden

Buda

Recruit agents to run your company as a synchronous team

Avon AI

Control AI agents with confidence

Kanwas

Open-Source Brain For Your Team

Claude Agents for Financial Services

Finance agent templates for pitches, KYC, and closing books

Knowly 1.0

LLM Wiki + NotebookLM, in one closed-loop Proactive AI

Luma Uni 1.1 API

A reasoning model that interprets intent before it generates

Contral

The agent which teaches while you build

APIEval-20让AI智能体精准捕捉每一个API漏洞

更多关于 APIEval-20 的信息

APIEval-20

产品亮点

应用场景

目标用户

APIEval-20 的替代品

Fabraix

Docket

Airbyte Agents

SaolaAI

Gas City 1.0

Lety.ai

pay.sh

Phrony

Buda

Avon AI

Kanwas

Claude Agents for Financial Services

Knowly 1.0

Luma Uni 1.1 API

Contral

每周十大热门产品

Kilo Code for VS Code 7

Velo 2.0

Shadow 2.0

Kanwas

Huddle01 VMs

Superset 2.0

PandaProbe

Radar

Mindra

pay.sh