Feature Area
Other (please specify in additional context)
Is your feature request related to a an existing bug? Please link it here.
NA
Describe the solution you'd like
The stop_reason field on Anthropic's Message response object is already accessible within _handle_completion() and _handle_tool_use_conversation() immediately after the API call — but is never read. Both methods extract token usage from the response and then discard the object, so the truncation signal is permanently lost before it can reach any hook, callback, or event subscriber.
The minimal fix is a logging.warning() call at two points where the raw response is still in scope:
Location 1 — _handle_completion(), after _track_token_usage_internal(usage):
if getattr(response, "stop_reason", None) == "max_tokens":
agent_hint = f" [{from_agent.role}]" if from_agent else ""
logging.warning(
f"Truncated response{agent_hint}: stop_reason='max_tokens'. "
f"Consider increasing max_tokens (current: {self.max_tokens})."
)
Location 2 — _handle_tool_use_conversation(), after _track_token_usage_internal(follow_up_usage):
if getattr(final_response, "stop_reason", None) == "max_tokens":
agent_hint = f" [{from_agent.role}]" if from_agent else ""
logging.warning(
f"Truncated response{agent_hint}: stop_reason='max_tokens'. "
f"Consider increasing max_tokens (current: {self.max_tokens})."
)
Location 2 is the more critical path — it handles the final synthesis response after all tool calls complete, which is where silent truncation caused significant downstream data corruption in our use case.
A more complete solution would also add stop_reason: str | None as a field on LLMCallCompletedEvent, allowing downstream subscribers (hooks, event listeners) to react to truncation programmatically rather than only through log monitoring.
The same fix should be applied to the async counterparts: _ahandle_completion() and _ahandle_tool_use_conversation().
Describe alternatives you've considered
- after_llm_call hook with token ratio heuristic: The hook receives context.agent.role and context.llm._token_usage, so one could warn when completion_tokens ≥ max_tokens × 0.95. This works as a proxy but is not the same as checking stop_reason directly — it can produce false positives and misses cases where max_tokens is not explicitly set.
- Monkey-patching self.client.messages.create: Wrapping the Anthropic SDK client on each AnthropicCompletion instance post-init to intercept the raw response. This works today without any framework changes but is fragile to internal refactoring and not a suitable permanent solution.
- Content sentinel / downstream heuristics: Checking whether the response ends mid-sentence or lacks expected structural markers. Too unreliable for production use — the failure mode (silent truncation mid-table) produces syntactically valid but semantically incomplete output that passes all surface checks.
Additional context
LLM provider / response observability
re: Willingness to Contribute (below):
I'm happy to submit a pull request for the minimal fix (logging warning at both sync locations). Happy to also include the async paths and/or the LLMCallCompletedEvent field addition if that's the preferred direction — just let me know in the issue before I open it.
However, this would be my first crewai pull request.
Willingness to Contribute
Yes, I'd be happy to submit a pull request
Feature Area
Other (please specify in additional context)
Is your feature request related to a an existing bug? Please link it here.
NA
Describe the solution you'd like
The stop_reason field on Anthropic's Message response object is already accessible within _handle_completion() and _handle_tool_use_conversation() immediately after the API call — but is never read. Both methods extract token usage from the response and then discard the object, so the truncation signal is permanently lost before it can reach any hook, callback, or event subscriber.
The minimal fix is a logging.warning() call at two points where the raw response is still in scope:
Location 1 — _handle_completion(), after _track_token_usage_internal(usage):
Location 2 — _handle_tool_use_conversation(), after _track_token_usage_internal(follow_up_usage):
Location 2 is the more critical path — it handles the final synthesis response after all tool calls complete, which is where silent truncation caused significant downstream data corruption in our use case.
A more complete solution would also add stop_reason: str | None as a field on LLMCallCompletedEvent, allowing downstream subscribers (hooks, event listeners) to react to truncation programmatically rather than only through log monitoring.
The same fix should be applied to the async counterparts: _ahandle_completion() and _ahandle_tool_use_conversation().
Describe alternatives you've considered
Additional context
LLM provider / response observability
re: Willingness to Contribute (below):
I'm happy to submit a pull request for the minimal fix (logging warning at both sync locations). Happy to also include the async paths and/or the LLMCallCompletedEvent field addition if that's the preferred direction — just let me know in the issue before I open it.
However, this would be my first crewai pull request.
Willingness to Contribute
Yes, I'd be happy to submit a pull request