Question about Context Trimming and Token Limits #235

e-razdumina · 2025-06-29T23:32:21Z

e-razdumina
Jun 29, 2025

Hi! Thank you for the amazing work - it’s been incredibly helpful and inspiring. 🙌
I’ve been exploring a question and would love to get your thoughts:
What’s your take on trimming context or bounding models with max_tokens to manage performance and response quality?

Also, I wasn’t quite sure - is this already handled somewhere in the project, or is it something that might need tuning on the integration side?

Looking forward to your insights!

JoshuaC215 · 2025-06-30T18:13:39Z

JoshuaC215
Jun 30, 2025
Maintainer

Hi @e-razdumina thanks for the question and the kind words!

I don't have any particularly novel insights to share on this topic, I know there are a lot of guides and discussions online and ultimately I think it depends on the particular application.

In terms of how to handle it in the project:

Trimming context: Langgraph provides some memory management tools, so this could be a good place to start if you want to remove earlier parts of a conversation and just track the summary. There's a minimal example of interacting with langgraph memory store in the interrupt agent
Using max_tokens: You'd want to set that up in the llm.py file that defines all the different model inits. It could possibly be improved to have that be more centrally config driven (e.g. using the langchain configurable dict) but I haven't looked too hard at that.

Hope it helps!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Context Trimming and Token Limits #235

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Question about Context Trimming and Token Limits #235

Uh oh!

e-razdumina Jun 29, 2025

Replies: 1 comment

Uh oh!

JoshuaC215 Jun 30, 2025 Maintainer

e-razdumina
Jun 29, 2025

JoshuaC215
Jun 30, 2025
Maintainer