Version
v5.70.1
Platform
NodeJS
What happened?
Description
Hi !
We are experiencing a critical issue in our production environment where workers completely stop consuming tasks when a massive influx of jobs saturates our Dragonfly instance.
We are using BullMQ Pro v7.39.1 with Dragonfly, configured with maxmemory-policy=noeviction.
Expected behavior
Ideally, if the queue reaches a defined memory or task limit, subsequent add operations should be rejected or dropped without crashing the LUA script execution for existing tasks. Workers should be able to continue consuming the tasks already present in the queue to clear the backlog.
Actual behavior
We get Out of Memory errors directly inside the LUA script executed by BullMQ (evalsha). Once this occurs, workers completely fail to consume any remaining tasks, effectively paralyzing the queue. The Dragonfly instance also throws LUA script errors downstream.
Ours worker consume jobs with these options:
removeOnComplete: {
age: 1,
count: 1,
},
removeOnFail: {
age: 1,
count: 1,
},
Feature Request
Is there a recommended pattern or an existing feature to define a hard limit on the queue size (e.g., maximum number of wait jobs) to prevent this OOM state? In our use case, it is completely acceptable to lose/drop incoming jobs if the queue is full, as long as the workers remain alive to process the existing backlog.
How to reproduce.
Configure Dragonfly with maxmemory-policy=noeviction.
Rapidly insert hundreds of thousands of tasks into a single queue.
The queue saturates the available memory.
BullMQ cannot consume any jobs.
Relevant log output
ReplyError: ERR Error running script (call to 861ba35a698a949808d371213e913903f59472ad): @user_script:247: Out of memory
at parseError (/usr/src/A/node_modules/redis-parser/lib/parser.js:179:12)
at parseType (/usr/src/A/node_modules/redis-parser/lib/parser.js:302:14) {
command: {
name: 'evalsha',
args: [
'861ba35a698a949808d371213e913903f59472ad',
'8',
'{tasks_queue_A}:AAAA:stalled',
'{tasks_queue_A}:AAAA:wait',
'{tasks_queue_A}:AAAA:active',
'{tasks_queue_A}:AAAA:stalled-check',
'{tasks_queue_A}:AAAA:meta',
'{tasks_queue_A}:AAAA:paused',
'{tasks_queue_A}:AAAA:marker',
'{tasks_queue_A}:AAAA:events',
'1',
'{tasks_queue_A}:AAAA:',
'1772119814080',
'30000',
''
]
}
}
Code of Conduct
Version
v5.70.1
Platform
NodeJS
What happened?
Description
Hi !
We are experiencing a critical issue in our production environment where workers completely stop consuming tasks when a massive influx of jobs saturates our Dragonfly instance.
We are using BullMQ Pro v7.39.1 with Dragonfly, configured with maxmemory-policy=noeviction.
Expected behavior
Ideally, if the queue reaches a defined memory or task limit, subsequent add operations should be rejected or dropped without crashing the LUA script execution for existing tasks. Workers should be able to continue consuming the tasks already present in the queue to clear the backlog.
Actual behavior
We get Out of Memory errors directly inside the LUA script executed by BullMQ (evalsha). Once this occurs, workers completely fail to consume any remaining tasks, effectively paralyzing the queue. The Dragonfly instance also throws LUA script errors downstream.
Ours worker consume jobs with these options:
Feature Request
Is there a recommended pattern or an existing feature to define a hard limit on the queue size (e.g., maximum number of wait jobs) to prevent this OOM state? In our use case, it is completely acceptable to lose/drop incoming jobs if the queue is full, as long as the workers remain alive to process the existing backlog.
How to reproduce.
Configure Dragonfly with maxmemory-policy=noeviction.
Rapidly insert hundreds of thousands of tasks into a single queue.
The queue saturates the available memory.
BullMQ cannot consume any jobs.
Relevant log output
ReplyError: ERR Error running script (call to 861ba35a698a949808d371213e913903f59472ad): @user_script:247: Out of memory at parseError (/usr/src/A/node_modules/redis-parser/lib/parser.js:179:12) at parseType (/usr/src/A/node_modules/redis-parser/lib/parser.js:302:14) { command: { name: 'evalsha', args: [ '861ba35a698a949808d371213e913903f59472ad', '8', '{tasks_queue_A}:AAAA:stalled', '{tasks_queue_A}:AAAA:wait', '{tasks_queue_A}:AAAA:active', '{tasks_queue_A}:AAAA:stalled-check', '{tasks_queue_A}:AAAA:meta', '{tasks_queue_A}:AAAA:paused', '{tasks_queue_A}:AAAA:marker', '{tasks_queue_A}:AAAA:events', '1', '{tasks_queue_A}:AAAA:', '1772119814080', '30000', '' ] } }Code of Conduct