This guide covers the development setup and workflow for the MLPerf Inference Endpoint Benchmarking System. For contribution guidelines, see CONTRIBUTING.md.
- Python: 3.12+ (3.12 recommended)
- Git: Latest version
- OS: Linux or macOS (Windows is not supported)
# 1. Fork https://github.com/mlcommons/endpoints on GitHub, then clone your fork
git clone https://github.com/YOUR_USERNAME/endpoints.git
cd endpoints
# 2. Add the upstream repo as a remote
git remote add upstream https://github.com/mlcommons/endpoints.git
# 3. Create virtual environment (Python 3.12+ required)
python3.12 -m venv venv
source venv/bin/activate
# 4. Install development dependencies
pip install -e ".[dev,test]"
# 5. Install pre-commit hooks
pre-commit install
# 6. Verify installation
inference-endpoint --version
pytest --versionendpoints/
├── src/inference_endpoint/ # Main package source
│ ├── main.py # Entry point and CLI app
│ ├── exceptions.py # Project-wide exception types
│ ├── async_utils/ # Event loop, ZMQ transport, pub/sub
│ ├── commands/ # CLI command implementations
│ ├── config/ # Configuration and schema management
│ ├── core/ # Core types and orchestration
│ ├── dataset_manager/ # Dataset handling and loading
│ ├── endpoint_client/ # HTTP/ZMQ endpoint communication
│ ├── evaluation/ # Accuracy evaluation and scoring
│ ├── load_generator/ # Load generation and scheduling
│ ├── metrics/ # Performance measurement and reporting
│ ├── openai/ # OpenAI API compatibility
│ ├── plugins/ # Plugin system
│ ├── profiling/ # Performance profiling tools
│ ├── sglang/ # SGLang API adapter
│ ├── testing/ # Test utilities (echo server, etc.)
│ └── utils/ # Common utilities
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ ├── performance/ # Performance benchmarks
│ └── datasets/ # Test data (dummy_1k.jsonl, squad_pruned/)
├── docs/ # Documentation
├── examples/ # Usage examples
└── scripts/ # Utility scripts
# All tests (excludes slow/performance)
pytest
# Unit tests only
pytest -m unit
# Integration tests
pytest -m integration
# Single file with verbose output
pytest -xvs tests/unit/path/to/test_file.py
# With coverage
pytest --cov=src --cov-report=htmlEvery test function must have a marker:
import pytest
@pytest.mark.unit
def test_something():
...
@pytest.mark.unit
@pytest.mark.asyncio # strict mode is configured globally in pyproject.toml
async def test_async_something():
...Available markers: unit, integration, slow, performance, run_explicitly
Defined in tests/conftest.py — use these instead of mocking:
mock_http_echo_server— real HTTP echo server on dynamic portmock_http_oracle_server— dataset-driven response serverdummy_dataset— in-memory test datasetevents_db— pre-populated SQLite events database
Target >90% coverage for all new code.
All of these run automatically on commit:
- trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements
ruff(lint + autofix) andruff-formatmypytype checkingprettierfor YAML/JSON/Markdown- License header enforcement
- YAML template validation and regeneration
IMPORTANT: Always run pre-commit run --all-files before every commit. Hooks may modify files. If files are modified, stage the changes and commit once.
# Run all hooks
pre-commit run --all-files
# Install hooks (done during setup)
pre-commit install- Formatter/Linter:
ruff(line-length 88, target Python 3.12) - Type checking:
mypy - Formatting:
ruff-format(double quotes, space indent) - License headers: Required on all Python files (auto-added by pre-commit)
- Commit messages: Conventional commits —
feat:,fix:,docs:,test:,chore:,perf: - Comments: Only where the why isn't obvious from the code
# Sync your fork with upstream before starting
git fetch upstream
git checkout main
git merge upstream/main
# Create a feature branch on your fork
git checkout -b feat/your-feature-name
# Make changes and test
pytest
pre-commit run --all-files
# Commit changes
git add <specific files>
git commit -m "feat: add your feature description"
# Push to your fork and open a PR against mlcommons/endpoints
git push origin feat/your-feature-namefeat/short-description
fix/short-description
docs/short-description
Config templates in src/inference_endpoint/config/templates/ are auto-generated from schema defaults. When you change config/schema.py, regenerate them:
python scripts/regenerate_templates.pyThe pre-commit hook auto-regenerates templates when schema.py, config.py, or regenerate_templates.py change. CI validates templates are up to date via --check mode.
Two variants are generated per mode (offline, online, concurrency):
_template.yaml— minimal: only required fields + placeholders_template_full.yaml— all fields with schema defaults + inline# options:comments
Add dependencies to pyproject.toml (always pin to exact versions with ==):
- Runtime dependencies:
[project.dependencies] - Optional groups (dev, test, etc.):
[project.optional-dependencies]
After adding a dependency, run pip-audit (included in dev extras) to verify it has no known vulnerabilities.
pip install -e ".[dev,test]"Code in load_generator/, endpoint_client/worker.py, and async_utils/transport/ is latency-critical. In these paths:
- No
matchstatements — use dict dispatch - Use
dataclass(slots=True)ormsgspec.Structfor frequently instantiated classes - Minimize async suspends
- Use
msgspecoverjson/pydanticfor serialization - The HTTP client uses custom
ConnectionPoolwithhttptoolsparser — notaiohttp/requests
# Run with verbose logging
inference-endpoint -v benchmark offline ...
# Run tests with stdout visible
pytest -xvs tests/unit/path/to/test.py
# Use Python debugger
python -m pdb -m pytest tests/unit/path/to/test.py- Issues: GitHub Issues
- Project Board: Q2 Board
- Documentation: See docs/ directory for guides