docs(bench): update reports on the results

electricalgorithm · electricalgorithm · commit 8264eec18363 · 2026-01-25T13:48:03.000+01:00
Signed-off-by: Gyokhan Kochmarla &lt;gokhan.kocmarli@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -51,13 +51,19 @@ For the initial release, we support:
 
 ### Performance Results
 
-Verified metrics with 100 concurrent clients on Apple M2 Pro (macOS) and Raspberry Pi 5 (Linux):
+ProtoMQ delivers high performance across both high-end and edge hardware:
 
-- **Concurrency**: 100+ concurrent connections verified.
-- **Latency (p99)**: < 0.3ms (Measured 0.24ms for MacOS and 0.17ms for Linux).
-- **Memory Footprint**: < 2.4 MB for 100 clients (Measured 2.41 MB for MacOS and 2.00 MB for Linux).
+| Scenario | Apple M2 Pro | Raspberry Pi 5 |
+|----------|--------------|----------------|
+| Latency (p99, 100 clients) | 0.44 ms | 0.13 ms |
+| Concurrent clients | 10,000 | 10,000 |
+| Sustained throughput | 9k msg/s | 9k msg/s |
+| Message throughput (small) | 208k msg/s | 147k msg/s |
+| Memory (100 clients) | 2.6 MB | 2.5 MB |
 
-For detailed methodology and full results, see [RESULTS.md](benchmarks/RESULTS.md).
+Handles 100,000 connection cycles with zero memory leaks and sub-millisecond latency.
+
+For detailed methodology and full results, see [ProtoMQ Benchmarking Suite](benchmarks/README.md).
 
 ### Contributing
 
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -1,232 +1,67 @@
-# ProtoMQ Benchmarks
-
-This directory contains the ProtoMQ benchmark suite for measuring performance across various scenarios.
-
-## Directory Structure
-
-```
-benchmarks/
-├── common/protomq_benchmarks/   # Shared benchmark library
-│   ├── environment.py           # System environment detection
-│   ├── thresholds.py            # Threshold validation
-│   ├── metrics.py               # Measurement utilities
-│   └── runner.py                # BenchmarkRunner
-├── b1-baseline-concurrency/     # B1: Baseline concurrency test
-│   ├── benchmark.py
-│   ├── thresholds.json
-│   └── README.md
-├── results/                     # All benchmark outputs (JSON)
-└── benchmarks.md                # Detailed benchmark plans (B1-B7)
-```
-
-## Running Benchmarks
-
-### Setup
-
-**One-time setup** (from benchmarks/ directory):
-```bash
-cd benchmarks
-uv venv                    # Create virtual environment
-uv pip install -e common/  # Install protomq_benchmarks library
-uv pip install -e .        # Install benchmarks package (creates console scripts)
-```
-
-This creates console scripts:
-- `protomq-bench-b1` - Baseline concurrency benchmark
-- `protomq-bench-b2` - Thundering herd benchmark
-
-### Running Benchmarks
-
-```bash
-# Start server first
-zig build run-server
-
-# Run benchmarks (from benchmarks/ directory with activated venv)
-cd benchmarks
-source .venv/bin/activate
-protomq-bench-b1
-protomq-bench-b2
-```
-
-Results are saved to `benchmarks/results/{commit_id}_{benchmark_name}.json`
-
-## Benchmark Library (`protomq_benchmarks`)
-
-### BenchmarkRunner
-
-Main interface for running benchmarks with automatic environment collection and threshold validation.
-
-```python
-from protomq_benchmarks import BenchmarkRunner
-
-runner = BenchmarkRunner(
-    name="b1-baseline-concurrency",
-    version="1.0.0",
-    timeout_seconds=300
-)
-
-runner.register_thresholds_from_file("thresholds.json")
-
-@runner.benchmark
-async def run_test():
-    # Your benchmark logic
-    return {"metric1": value1, "metric2": value2}
-
-if __name__ == "__main__":
-    runner.run(output_dir="../results")
-```
-
-### Environment Detection
-
-Automatically collects:
-- CPU model, architecture (normalized: aarch64 → arm64), cores, frequency
-- RAM capacity
-- Storage type and model (via `diskutil` on macOS, `/sys/block` on Linux)
-- OS, kernel, Zig version, Python version
-- Build mode (Release/Debug)
-- ProtoMQ version and git commit hash
-- Network backend (kqueue/epoll)
-
-### Threshold Management
-
-Define pass/warn/fail criteria with directional indicators:
-
-```json
-{
-  "p99_latency_ms": {
-    "direction": "lower",
-    "max": 5.0,
-    "warn": 1.0,
-    "description": "p99 latency threshold"
-  },
-  "concurrent_connections": {
-    "direction": "higher",
-    "min": 100,
-    "description": "Must connect at least 100 clients"
-  }
-}
-```
-
-- **`direction: "lower"`**: For metrics where lower is better (latency, memory)
-- **`direction: "higher"`**: For metrics where higher is better (throughput, connections)
-
-### Metrics Utilities
-
-```python
-from protomq_benchmarks import Timer, measure_memory
-from protomq_benchmarks.metrics import LatencyStats
-
-# Measure time
-with Timer() as t:
-    await some_operation()
-print(f"Elapsed: {t.elapsed_ms()}ms")
-
-# Measure memory
-memory_mb = measure_memory(server_pid)
-
-# Calculate latency statistics
-stats = LatencyStats.from_measurements(latencies)
-print(f"p99: {stats.p99:.3f}ms")
-```
-
-## Result Format
-
-Each benchmark produces a JSON file: `{commit_id}_{benchmark_name}.json`
-
-```json
-{
-  "benchmark": {
-    "name": "b1-baseline-concurrency",
-    "version": "1.0.0",
-    "timestamp": "2026-01-24T13:45:00Z",
-    "duration_s": 1.07
-  },
-  "environment": {
-    "hardware": {...},
-    "software": {...},
-    "protomq": {"commit_hash": "72144c15", ...}
-  },
-  "metrics": {
-    "concurrent_connections": 100,
-    "p99_latency_ms": 0.432,
-    ...
-  },
-  "thresholds": {
-    "passed": true,
-    "warnings": [],
-    "failures": []
-  }
-}
-```
-
-## Creating New Benchmarks
-
-1. Create directory: `benchmarks/bN-benchmark-name/`
-2. Create `benchmark.py`:
-   ```python
-   from pathlib import Path
-   from protomq_benchmarks import BenchmarkRunner
-   
-   runner = BenchmarkRunner(name="bN-benchmark-name", timeout_seconds=600)
-   runner.register_thresholds_from_file(Path(__file__).parent / "thresholds.json")
-   
-   @runner.benchmark
-   async def run_test():
-       # Your test logic
-       return {"metric": value}
-   
-   if __name__ == "__main__":
-       runner.run(output_dir=Path(__file__).parent.parent / "results")
+# ProtoMQ Benchmarking
+
+The main goal of the ProtoMQ project is to provide a high-performance MQTT server implementation using Zig with type-safety. To ensure that the server meets this goal, we perform regular benchmarking to measure its performance. It's recommended to run the "protomq-bench-b1" after each commit to detect any performance regressions, and all the benchmarks before a new release.
+
+## Regularly Testing Environments
+
+Mac OS:
+- **CPU**: Apple M2 Pro
+- **OS**: macOS 26.2 Darwin Kernel 25.2.0 (Using kqueue)
+- **Backend**: kqueue
+- **Zig Version**: 0.15.2
+
+Linux:
+- **CPU**: ARM Cortex-A76 (Raspberry Pi 5)
+- **OS**: Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 6.6.62+rpt-rpi-2712
+- **Backend**: epoll
+- **Zig Version**: 0.15.2
+
+## Results
+
+Whenever the benchmarks are run, they are saved under the "results" directory ("protomq/benchmarks/results") within a directory specific for the hardware. Furthermore, each results is saved as a JSON file with the name of the benchmark and the commit ID of the repository. The JSON holds the results for each metric defined and environment configuration.
+
+Please find the most recent results in the "results" directory with the name "latest" under the hardware directory.
+
+### Overall Summary (2026-01-25)
+
+| Test Scenario | Metric | Apple M2 Pro | Raspberry Pi 5 |
+|--------------|--------|--------------|----------------|
+| **100 concurrent connections** | p99 latency | 0.44 ms | 0.13 ms |
+| | Memory usage | 2.6 MB | 2.5 MB |
+| **10,000 concurrent clients** | Connection time | 0.96 s | 1.76 s |
+| | Message fan-out | 0.12 s | 0.21 s |
+| | Message loss | 0% | 0% |
+| **Sustained load (10 min)** | Throughput | 8,981 msg/s | 9,012 msg/s |
+| | Memory growth | 0.16 MB | 0.09 MB |
+| **Wildcard subscriptions** | Topic matching | 7.2 µs | 5.2 µs |
+| | 1000 subscribers | 100% correct | 100% correct |
+| **Connection churn** | Total connections | 100,000 | 100,000 |
+| | Connection rate | 1,496 conn/s | 1,548 conn/s |
+| | Memory leak | 0 MB | 0 MB |
+| **Message throughput** | 10 byte messages | 208k msg/s | 147k msg/s |
+| | 64 KB messages | 39k msg/s | 27k msg/s |
+
+**Notes:**
+- All tests run on loopback interface.
+- Server built with Zig 0.15.2, ReleaseSafe mode.
+- Raspberry Pi 5 shows competitive performance, especially in sustained throughput and topic matching.
+
+## Reproducing the Results
+1. Start the server:
+   ```bash
+   zig build -Doptimize=ReleaseSafe run-server
+   ```
+2. Create a virtual environment and install benchmark suite:
+   ```bash
+   python3 -m venv benchmarks/venv
+   pip install -e ./common
+   pip install -e .
+   ```
+3. Run any benchmark:
+   ```bash
+   source benchmarks/venv/bin/activate
+   protomq-bench-b1
+   # protomq-bench-b2
+   # protomq-bench-b3
+   # ...
    ```
-3. Create `thresholds.json` with metric criteria
-4. Create `README.md` documenting the benchmark
-
-## Code Quality
-
-The benchmark suite uses `ruff` for linting and formatting (configured at project root):
-
-```bash
-# Check code
-ruff check benchmarks/
-
-# Format code
-ruff format benchmarks/
-
-# Install pre-commit hooks
-pre-commit install
-```
-
-All benchmarks must be PEP-8 compliant with:
-- Module-level imports only (no `sys.path` hacks)
-- Type hints where applicable
-- Proper error handling
-- No emojis in output (professional appearance)
-
-## Planned Benchmarks
-
-See `benchmarks.md` for detailed plans:
-
-- **B1**: Baseline Concurrency & Latency ✅ (implemented)
-- **B2**: Thundering Herd (10k concurrent clients)
-- **B3**: Sustained Throughput (10-minute stress test)
-- **B4**: Wildcard Subscription Explosion
-- **B5**: Protobuf Decoding Under Load
-- **B6**: Connection Churn (rapid connect/disconnect)
-- **B7**: Message Size Variations
-
-## CI/CD Integration
-
-Benchmarks can be integrated into CI/CD pipelines:
-
-```bash
-# Run benchmark and check exit code
-uv run b1-baseline-concurrency/benchmark.py
-if [ $? -ne 0 ]; then
-    echo "Benchmark failed thresholds"
-    exit 1
-fi
-```
-
-Exit codes:
-- `0`: All thresholds passed
-- `1`: One or more thresholds failed or benchmark errored
diff --git a/benchmarks/RESULTS.md b/benchmarks/RESULTS.md