fix(gateway): add inactivity timeout for /background tasks#8298
Open
jooray wants to merge 1 commit intoNousResearch:mainfrom
Open
fix(gateway): add inactivity timeout for /background tasks#8298jooray wants to merge 1 commit intoNousResearch:mainfrom
jooray wants to merge 1 commit intoNousResearch:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds inactivity-based timeout handling and cancellation reporting for /background-spawned agent runs in the gateway, aligning background behavior with the existing inactivity timeout logic used for regular sessions.
Changes:
- Add inactivity polling for background tasks using
agent.get_activity_summary()withHERMES_AGENT_TIMEOUTandHERMES_AGENT_TIMEOUT_WARNING. - Send a one-time warning message before timing out, then interrupt the agent and report diagnostics on timeout.
- Catch
asyncio.CancelledErrorto best-effort notify the user when background work is cancelled during gateway restart/shutdown.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+5377
to
+5381
| source.chat_id, | ||
| "\n".join(_diag_parts), | ||
| metadata=_thread_metadata, | ||
| ) | ||
| return |
Comment on lines
+5436
to
+5440
| except asyncio.CancelledError: | ||
| logger.warning("Background task %s cancelled (gateway shutdown?)", task_id) | ||
| try: | ||
| await adapter.send( | ||
| chat_id=source.chat_id, |
Comment on lines
+5307
to
+5347
| _executor_task = asyncio.ensure_future( | ||
| loop.run_in_executor(None, run_sync) | ||
| ) | ||
|
|
||
| _inactivity_timeout = False | ||
| result = None | ||
| while True: | ||
| done, _ = await asyncio.wait( | ||
| {_executor_task}, timeout=_POLL_INTERVAL | ||
| ) | ||
| if done: | ||
| result = _executor_task.result() | ||
| break | ||
| if _bg_timeout is None: | ||
| continue | ||
| _bg_agent = agent_holder[0] | ||
| _idle_secs = 0.0 | ||
| if _bg_agent and hasattr(_bg_agent, "get_activity_summary"): | ||
| try: | ||
| _act = _bg_agent.get_activity_summary() | ||
| _idle_secs = _act.get("seconds_since_activity", 0.0) | ||
| except Exception: | ||
| pass | ||
| if (not _bg_warning_fired and _bg_warning is not None | ||
| and _idle_secs >= _bg_warning): | ||
| _bg_warning_fired = True | ||
| _elapsed_warn = int(_bg_warning // 60) or 1 | ||
| try: | ||
| await adapter.send( | ||
| source.chat_id, | ||
| f"⚠️ Background task {task_id}: no activity for " | ||
| f"{_elapsed_warn} min. Will time out soon if it " | ||
| f"remains idle.", | ||
| metadata=_thread_metadata, | ||
| ) | ||
| except Exception: | ||
| pass | ||
| if _idle_secs >= _bg_timeout: | ||
| _inactivity_timeout = True | ||
| break | ||
|
|
be70313 to
5cf7d8b
Compare
…ckground tasks Background tasks (/background) previously had no inactivity timeout, unlike regular agent sessions. A hung API call or stuck tool would run forever with no user notification. Additionally, gateway restart/shutdown cancelled tasks silently (CancelledError is BaseException, not caught by except Exception). Changes: - Add polling loop with same HERMES_AGENT_TIMEOUT inactivity detection used by regular sessions (default 1800s) - Fire a warning at HERMES_AGENT_TIMEOUT_WARNING (default 900s) - On timeout, interrupt the agent and notify the user with diagnostics - Catch asyncio.CancelledError to notify user on gateway shutdown
5cf7d8b to
a6ec52d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Background tasks spawned by
/backgroundhad two ways to silently drop results:No inactivity timeout: Regular agent sessions have a configurable inactivity timeout (default 1800s) that kills stuck agents and notifies the user. Background tasks used a bare
await loop.run_in_executor(None, run_sync)with no timeout — a hung API call or stuck tool would run forever with no user notification.CancelledError not caught: On gateway restart/shutdown, background asyncio tasks are cancelled.
asyncio.CancelledErrorinherits fromBaseException(notException), so the existing error handler never fired and the task disappeared silently.Changes
agent.get_activity_summary()every 5s againstHERMES_AGENT_TIMEOUT(default 1800s)HERMES_AGENT_TIMEOUT_WARNING(default 900s)asyncio.CancelledErrorseparately to send a best-effort notification on gateway shutdownNo behavioral change when background tasks complete normally.
Test plan
/backgroundwith a prompt that completes normally — verify result is still delivered/backgroundwith a task likely to hang — verify warning at 15 min and timeout at 30 min/backgroundtask is running — verify cancellation message is sentHERMES_AGENT_TIMEOUT=0— verify background tasks run without timeout (unlimited mode)