A single-header, lock-free, C++20 work-stealing deque used in Taskflow.
| Class | Capacity | Push behaviour |
|---|---|---|
wsq::BoundedWSQ<T, LogSize> |
Fixed at compile time (2^LogSize) | try_push returns false when full |
wsq::UnboundedWSQ<T> |
Grows dynamically via array doubling | push always succeeds |
Both classes implement the Chase-Lev work-stealing deque described in:
Correct and Efficient Work-Stealing for Weak Memory Models
Lê, Pop, Cohen, Nardelli — PPoPP 2013
https://www.di.ens.fr/~zappa/readings/ppopp13.pdf
- C++20 compiler (GCC ≥ 10, Clang ≥ 11, MSVC ≥ 19.29)
- CMake ≥ 3.18
#include "wsq.hpp"
// Bounded queue (256 slots by default)
wsq::BoundedWSQ<int*> bq;
int x = 42;
bq.try_push(&x);
int* item = bq.pop(); // owner: LIFO
item = bq.steal(); // any thread: FIFO
// Unbounded queue (grows automatically)
wsq::UnboundedWSQ<int*> uq;
uq.push(&x);
item = uq.pop();
item = uq.steal();bool try_push(T item) // owner only
size_t try_bulk_push(I first, size_t N) // owner only
value_type pop() // owner only (LIFO)
value_type steal() // any thread (FIFO)
value_type steal_with_feedback(size_t& n_empty) // any thread (FIFO)
bool empty() const noexcept
size_t size() const noexcept
size_t capacity() const noexcept // compile-time constantvoid push(T item) // owner only
void bulk_push(I first, size_t N) // owner only
value_type pop() // owner only (LIFO)
value_type steal() // any thread (FIFO)
value_type steal_with_feedback(size_t& n_empty) // any thread (FIFO)
bool empty() const noexcept
size_t size() const noexcept
size_t capacity() const noexceptvalue_type is T (with nullptr for empty) for pointer types, and
std::optional<T> (with std::nullopt for empty) for value types.
cd work-stealing-queue/
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_STANDARD=20
make -j$(nproc)cd build
make test # run full grid (scale x thieves combinations)
ctest --output-on-failure # same, with output on failure
ctest -R "scale=1" # quick pass only
ctest -R "scale=4" # thorough pass only
ctest -R "thieves=1" # single-threaded stress onlyThe test grid is controlled by two CMake cache variables:
cmake .. -DWSQ_TEST_SCALES="1;4" -DWSQ_TEST_THIEVES="1;2;4;8;16"./examples/simple # usage walkthrough
./examples/evaluate # default: 1M ops, 3 thieves, r=10
./examples/evaluate 5000000 7 # 5M ops, 7 thieves, r=10
./examples/evaluate 5000000 7 20 # 5M ops, 7 thieves, r=20 roundsEach benchmark runs r+1 rounds; round 0 is a warm-up and is excluded from
the reported mean ± stddev.
Work-Stealing Queue Performance Evaluation
num_ops = 1000000
n_thieves = 3
r = 10 (+ 1 warm-up round, not included in stats)
hw_threads = 16
benchmark mean ± stddev Mops/s
--------------------------------------------------------------------------------
BoundedWSQ push+pop (owner only) 342.51 ± 4.12 Mops/s
BoundedWSQ bulk_push+pop (batch=64, owner only) 491.07 ± 6.83 Mops/s
BoundedWSQ concurrent steal (3 thieves) 97.34 ± 2.01 Mops/s
UnboundedWSQ push+pop (owner only) 253.18 ± 3.55 Mops/s
UnboundedWSQ bulk_push+pop (batch=64, owner only) 379.44 ± 5.22 Mops/s
UnboundedWSQ concurrent steal (3 thieves) 88.91 ± 1.74 Mops/s
| Macro | Default | Meaning |
|---|---|---|
WSQ_CACHELINE_SIZE |
64 |
Cache line size in bytes |
WSQ_DEFAULT_BOUNDED_LOG_SIZE |
8 |
Default log2 capacity for BoundedWSQ (256 slots) |
WSQ_DEFAULT_UNBOUNDED_LOG_SIZE |
10 |
Default initial log2 capacity for UnboundedWSQ (1024 slots) |
MIT — see LICENSE.