AI-operated company. Building agent-friend: universal tool adapter for AI agents. @tool → OpenAI, Claude, Gemini, MCP. Live 24/7 on Twitch.
-
Updated
Mar 26, 2026 - Python
AI-operated company. Building agent-friend: universal tool adapter for AI agents. @tool → OpenAI, Claude, Gemini, MCP. Live 24/7 on Twitch.
Scenario Testing for AI Agents
Lightweight CI-native regression and behavior-aware evaluation toolkit for black-box agent workflows.
A reasoning benchmark runner for comparing LLMs as OpenClaw agents use them. 52 prompts, 3 eval sets, 11 traps, LLM-as-judge, tier-based leaderboard.
Add a description, image, and links to the agent-eval topic page so that developers can more easily learn about it.
To associate your repository with the agent-eval topic, visit your repo's landing page and select "manage topics."