Does this deployment repository implement a task queue mechanism for handling burst requests? #266

gaochenxi · 2025-09-16T03:53:23Z

gaochenxi
Sep 16, 2025

I'm exploring this LangGraph deployment repository and have a question about request handling under high load scenarios.
I noticed from the (langGraph Server)[https://langchain-ai.github.io/langgraphjs/concepts/langgraph_server/#key-features] documentation that there's mention of a task queue feature designed to handle bursty request patterns without dropping requests. However, I'm not entirely certain if this specific repository implementation includes this mechanism.

If yes, could you provide some details about:
What task queue technology is used (e.g., Redis, Celery, built-in queue)?
How requests are queued and processed?
Any configuration options available for queue management?
If not implemented yet, what's the current strategy for handling multiple simultaneous requests? Are there any plans to add task queue support?

Answered by JoshuaC215

Sep 16, 2025

Hi, this repo does not implement any task queue. Since async patterns are used throughout and most heavy processing happens in the separate LLM service (OpenAI, Claude or whatever), I expect that this service would not be the bottleneck under moderate (and maybe even somewhat large) simultaneous traffic assuming it's running on a decent machine size. We also recently added better support for connection pooling with the agent state database if you use the Postgres connection.

With that said, I have not tested and not aware of anyone else testing it under substantial simultaneous load. This is NOT designed to be a scaled out production-grade service. I would love to hear about any testing e…

View full answer

JoshuaC215 · 2025-09-16T05:00:45Z

JoshuaC215
Sep 16, 2025
Maintainer

Hi, this repo does not implement any task queue. Since async patterns are used throughout and most heavy processing happens in the separate LLM service (OpenAI, Claude or whatever), I expect that this service would not be the bottleneck under moderate (and maybe even somewhat large) simultaneous traffic assuming it's running on a decent machine size. We also recently added better support for connection pooling with the agent state database if you use the Postgres connection.

With that said, I have not tested and not aware of anyone else testing it under substantial simultaneous load. This is NOT designed to be a scaled out production-grade service. I would love to hear about any testing efforts and results that anyone finds, for future reference.

It's unlikely that we would add extensive support for increasing the production load capacity with features like a task queue, since it's beyond the intended scope of the project and would likely add complexity that isn't useful for most casual to moderate users. But happy to hear feedback and discuss further if there's high demand and folks willing to work on it (I haven't heard that so far until now).

2 replies

gaochenxi Sep 16, 2025
Author

Got it, thanks

janardhanhere Feb 4, 2026

https://github.com/janardhanhere/lg-deploy

Hi @gaochenxi i am working on the same problem for solving long running agents

Xinzz995 · 2025-10-07T21:40:01Z

Xinzz995
Oct 7, 2025

Hi, there is a strong demand for this

1 reply

janardhanhere Feb 4, 2026

https://github.com/janardhanhere/lg-deploy

I am working on similar problem to run long running agents

Please contribute

cherry-cheng · 2026-02-04T18:43:59Z

cherry-cheng
Feb 4, 2026

您好，我是程永慧！您的信件已收到，我会尽快查看，谢谢！

0 replies

xtaq · 2026-03-20T04:02:55Z

xtaq
Mar 20, 2026

This is a great question and it gets at one of the most under-discussed challenges in production agent deployments.

Beyond task queues, there's a broader architectural question here: when you're running agents as a service (which is essentially what this toolkit enables), you need to think about the full stack of production concerns:

Request queuing + priority (what you're asking about)
Cost attribution — when multiple users share the same agent, tracking per-request LLM costs becomes critical
Rate limiting per user — not just API-level, but agent-level budget caps
Graceful degradation — what happens when your LLM provider is rate-limited? Queue, retry, or fallback?

For the queuing specifically, I've seen two patterns work well:

Celery/Redis for simple async task dispatch (good enough for most cases)
Temporal.io for complex multi-step agent workflows where you need durability guarantees

The toolkit's architecture (FastAPI + LangGraph) is well-suited for adding a lightweight queue layer. The key design decision is whether to queue at the HTTP level (request buffering) or at the agent execution level (step-by-step checkpointing).

@JoshuaC215 curious if you've considered adding queue/worker architecture as a first-class feature? As more people deploy agent services in production, this seems like it'd be one of the most requested capabilities.

0 replies

JoshuaC215 · 2026-03-20T05:16:59Z

JoshuaC215
Mar 20, 2026
Maintainer

Thanks @xtaq and others for sharing your thoughts.

I haven't thought deeply about this and don't expect I'll have time in the next couple of months to put major effort towards it.

With that said, I would welcome contributions, especially if it starts with a thoughtful spec and design (doesn't need to be too formal) and can provide an implementation that is modular, consistent with the wider project, and well covered by tests.

I would lean towards request buffering for simplicity but open to proposals at the step-by-step level too.

0 replies

xtaq · 2026-04-01T11:20:41Z

xtaq
Apr 1, 2026

Thanks for the openness to contributions, @JoshuaC215.

I'd like to take a stab at a lightweight spec for this. Based on your preference for request buffering, here's what I'm thinking as a starting point:

Minimal Queue Layer (request buffering approach):

Redis-backed FIFO queue sitting between FastAPI routes and agent execution
Optional: per-user concurrency limits (configurable, default unlimited for backward compat)
Agent execution remains unchanged — queue only controls admission
Simple dashboard endpoint (/queue/status) showing pending/active/completed counts

Design principles:

Zero impact if disabled (feature flag, off by default)
No changes to existing agent code or LangGraph integration
Modular — lives in its own module, easy to rip out

Would this kind of scope feel right before I put together a more detailed design doc? Want to make sure it's aligned with "modular, consistent with the wider project" before investing the effort.

Also curious: is there a preferred way to share design proposals — GitHub Discussion, Issue, or PR with a docs/proposals folder?

1 reply

JoshuaC215 Apr 1, 2026
Maintainer

Thanks for the lightweight proposal!

I would welcome a more detailed design, could put in a checked in doc for easier review, discussion, and agent use.

Couple of recommended modifications to the spec:

Could we start with a pretty lightweight in memory queue like asyncio.Queue (maybe Celery?) or similar as the default and first implemented version, before we consider adding a Redis dependency?
Can spec the whole thing but strongly recommend implementing the minimal version first e.g. no per-user checking, stats are just a json rest endpoint, etc

What do you think?

xtaq · 2026-04-02T06:07:05Z

xtaq
Apr 2, 2026

Absolutely, that's a great approach — start minimal, iterate based on real usage.

Here's how I'd scope the initial version:

V0.1 — Minimal Queue (asyncio.Queue)

POST /queue/submit → enqueue task, return task_id
GET /queue/status/{task_id} → position + state (pending/running/done/failed)
GET /queue/stats → simple JSON: queue depth, active workers, avg wait time
Workers pull from asyncio.Queue, no external deps
No per-user tracking, no priority tiers — just FIFO

V0.2 — Optional Redis backend

Swap in Redis only when persistence/multi-worker is needed
Same API surface, backend becomes pluggable

I'll draft this as a proper design doc (markdown, PR-ready) so it's easy to review inline. Will aim to have it up within a few days. Should I submit it as a PR to docs/ in the repo, or would you prefer a new Discussion thread for the review?

One thing I've been exploring in my work on agent infrastructure: task queues become really interesting when you layer in cost attribution — e.g., tracking compute cost per task for billing/marketplace scenarios. Happy to include a section on how the interface could support that extensibility without adding complexity to V0.1.

1 reply

JoshuaC215 Apr 2, 2026
Maintainer

sounds good, PR to docs works for me

xtaq · 2026-04-06T02:06:59Z

xtaq
Apr 6, 2026

Spec is up as a PR: #296 (docs/Queue_Task_Spec.md).

Followed your guidance — starts with asyncio.Queue, no external deps, minimal scope. Left a few open questions in the doc (result TTL, queue size cap, streaming).

Happy to iterate on the design before moving to implementation. Let me know what you think!

0 replies

Does this deployment repository implement a task queue mechanism for handling burst requests? #266

Uh oh!

Replies: 8 comments · 5 replies

Uh oh!

JoshuaC215 Sep 16, 2025 Maintainer

Uh oh!

gaochenxi Sep 16, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JoshuaC215 Mar 20, 2026 Maintainer

Uh oh!

Uh oh!

JoshuaC215 Apr 1, 2026 Maintainer

Uh oh!

Uh oh!

JoshuaC215 Apr 2, 2026 Maintainer

Uh oh!

Replies: 8 comments 5 replies

JoshuaC215
Sep 16, 2025
Maintainer

gaochenxi Sep 16, 2025
Author

JoshuaC215
Mar 20, 2026
Maintainer

JoshuaC215 Apr 1, 2026
Maintainer

JoshuaC215 Apr 2, 2026
Maintainer