A Distributed Control Plane for Background Job Execution with Runtime Configuration
Helios is not another job queue. It is a control-plane–first backend system that cleanly separates intent, state, and execution.
It is designed to model how modern infrastructure platforms (Kubernetes, Airflow, Temporal, AWS control planes) actually work under the hood.
Most backend systems tightly couple:
- API request handling
- Execution logic
- Runtime behavior
This coupling makes systems fragile, hard to scale, and difficult to modify without redeployments.
Helios deliberately breaks this coupling.
It introduces a Control Plane Architecture where:
- APIs only capture intent
- Workers only execute state
- Runtime behavior is controlled dynamically
This design enables high availability, safer concurrency, and operational flexibility.
Helios is built around two orthogonal planes:
Responsible for:
- Job ingestion
- Persistent state management
- Lifecycle tracking
- Ownership arbitration
This plane is the source of truth.
Responsible for:
- Polling and claiming work
- Executing jobs
- Respecting runtime policies
Workers are stateless and replaceable.
Responsible for:
- Dynamic configuration
- Live operational controls
- Behavior tuning without redeployments
The system abandons traditional Request → Response execution in favor of an event/state-driven model.
graph LR
Client -->|Intent| API[Ingress API]
API -->|Persist State| DB[(PostgreSQL)]
Worker[Worker Node] -->|Poll & Claim| DB
Worker -->|Read Runtime Config| Config[Feature Flag Engine]
subgraph "Execution Plane"
Worker
end
subgraph "Control Plane"
API
DB
Config
end
- The API never executes jobs
- Requests are acknowledged immediately
- Jobs enter the system in a
CREATEDstate
This guarantees:
- Fast ingress
- High availability
- Backpressure resistance
Workers read runtime flags from a local Feature Flag Engine to control behavior at runtime:
- Concurrency limits
- Artificial delays
- Maintenance mode toggles
- Kill switches
All without:
- Code changes
- Redeployments
- Restarts
Job claiming is performed using ACID-compliant SQL transactions:
- Prevents double execution
- Safely handles concurrent workers
- Uses the database as a synchronization primitive
This mirrors how real-world schedulers and control planes operate.
Workers are designed to be fault-tolerant and self-healing:
- Automatic backoff on empty queues
- Graceful handling of database pressure
- Safe recovery after crashes
No centralized coordinator is required.
| Layer | Technology |
|---|---|
| Runtime | Node.js, TypeScript |
| Database | PostgreSQL (pg) |
| Infrastructure | Docker, Docker Compose |
| Architecture | Monorepo (api, worker, shared) |
- Redis hot queue (
LPUSH/BRPOP) - Hybrid database + Redis scheduling model
- Distributed leasing with heartbeats
- Zombie job detection
- Automatic job re-queuing
- Janitor process to repair inconsistent state
- Periodic audits of job ownership
- React-based dashboard
- Live job state visualization
- Feature flag toggling UI
- Control Plane vs Data Plane separation
- Real-world scheduler design patterns
- Safe concurrency using SQL
- Operationally flexible backend systems
- Backend architecture beyond CRUD
This project intentionally avoids heavy frameworks and abstractions, focusing instead on fundamental system design principles that scale to distributed systems.
Built by Kritagya
Backend • Systems • Distributed Architecture
“Execution is temporary. Control is permanent.”