Systematic testing and comparison of LLM routing solutions (OpenRouter, LiteLLM, OpenAI-compatible endpoints).
# Install dependencies
uv sync
# Copy and fill in API keys
cp .env.example .env
# Run a benchmark
python scripts/run_benchmark.py --routers openrouter --models gpt-4o --scenarios basic_completion
# Compare results
python scripts/compare.py results/*.json --format tableconfig/ # YAML configuration (routers, models, scenarios index)
src/llm_router_lab/
providers/ # Router adapters (OpenRouter, LiteLLM, OpenAI-compat)
scenarios/ # Scenario loader + built-in programmatic scenarios
runner.py # Async benchmark runner
metrics.py # Timing and usage measurement
report.py # Output formatting (table, CSV, markdown)
scenarios/ # YAML test scenario definitions
scripts/ # CLI entry points
results/ # Benchmark output (gitignored)
-
If it's OpenAI-compatible: Just add an entry to
config/routers.yamland model mappings toconfig/models.yaml. No code needed. -
If it needs custom logic: Create
src/llm_router_lab/providers/your_router.py:- Subclass
RouterProvider(orOpenAICompatProviderif mostly compatible) - Implement
complete()andstream() - Register in
runner.py:PROVIDER_CLASSES
- Subclass
Create a YAML file in scenarios/:
name: my_scenario
description: What this tests
defaults:
model: gpt-4o
temperature: 0.7
cases:
- name: test_case_1
messages:
- role: user
content: "Your prompt here"uv run pytest tests/