Skip to content

fix(gateway): add inactivity timeout for /background tasks#8298

Open
jooray wants to merge 1 commit intoNousResearch:mainfrom
jooray:fix/background-task-timeout
Open

fix(gateway): add inactivity timeout for /background tasks#8298
jooray wants to merge 1 commit intoNousResearch:mainfrom
jooray:fix/background-task-timeout

Conversation

@jooray
Copy link
Copy Markdown

@jooray jooray commented Apr 12, 2026

Summary

Background tasks spawned by /background had two ways to silently drop results:

  1. No inactivity timeout: Regular agent sessions have a configurable inactivity timeout (default 1800s) that kills stuck agents and notifies the user. Background tasks used a bare await loop.run_in_executor(None, run_sync) with no timeout — a hung API call or stuck tool would run forever with no user notification.

  2. CancelledError not caught: On gateway restart/shutdown, background asyncio tasks are cancelled. asyncio.CancelledError inherits from BaseException (not Exception), so the existing error handler never fired and the task disappeared silently.

Changes

  • Add the same inactivity polling loop used by regular sessions, checking agent.get_activity_summary() every 5s against HERMES_AGENT_TIMEOUT (default 1800s)
  • Fire a warning message at HERMES_AGENT_TIMEOUT_WARNING (default 900s)
  • On timeout, interrupt the agent and send diagnostic info to the user (timeout duration, last active tool)
  • Catch asyncio.CancelledError separately to send a best-effort notification on gateway shutdown

No behavioral change when background tasks complete normally.

Test plan

  • Run /background with a prompt that completes normally — verify result is still delivered
  • Run /background with a task likely to hang — verify warning at 15 min and timeout at 30 min
  • Restart the gateway while a /background task is running — verify cancellation message is sent
  • Set HERMES_AGENT_TIMEOUT=0 — verify background tasks run without timeout (unlimited mode)

Copilot AI review requested due to automatic review settings April 12, 2026 09:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds inactivity-based timeout handling and cancellation reporting for /background-spawned agent runs in the gateway, aligning background behavior with the existing inactivity timeout logic used for regular sessions.

Changes:

  • Add inactivity polling for background tasks using agent.get_activity_summary() with HERMES_AGENT_TIMEOUT and HERMES_AGENT_TIMEOUT_WARNING.
  • Send a one-time warning message before timing out, then interrupt the agent and report diagnostics on timeout.
  • Catch asyncio.CancelledError to best-effort notify the user when background work is cancelled during gateway restart/shutdown.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gateway/run.py
Comment on lines +5377 to +5381
source.chat_id,
"\n".join(_diag_parts),
metadata=_thread_metadata,
)
return
Comment thread gateway/run.py
Comment on lines +5436 to +5440
except asyncio.CancelledError:
logger.warning("Background task %s cancelled (gateway shutdown?)", task_id)
try:
await adapter.send(
chat_id=source.chat_id,
Comment thread gateway/run.py
Comment on lines +5307 to +5347
_executor_task = asyncio.ensure_future(
loop.run_in_executor(None, run_sync)
)

_inactivity_timeout = False
result = None
while True:
done, _ = await asyncio.wait(
{_executor_task}, timeout=_POLL_INTERVAL
)
if done:
result = _executor_task.result()
break
if _bg_timeout is None:
continue
_bg_agent = agent_holder[0]
_idle_secs = 0.0
if _bg_agent and hasattr(_bg_agent, "get_activity_summary"):
try:
_act = _bg_agent.get_activity_summary()
_idle_secs = _act.get("seconds_since_activity", 0.0)
except Exception:
pass
if (not _bg_warning_fired and _bg_warning is not None
and _idle_secs >= _bg_warning):
_bg_warning_fired = True
_elapsed_warn = int(_bg_warning // 60) or 1
try:
await adapter.send(
source.chat_id,
f"⚠️ Background task {task_id}: no activity for "
f"{_elapsed_warn} min. Will time out soon if it "
f"remains idle.",
metadata=_thread_metadata,
)
except Exception:
pass
if _idle_secs >= _bg_timeout:
_inactivity_timeout = True
break

@jooray jooray force-pushed the fix/background-task-timeout branch 2 times, most recently from be70313 to 5cf7d8b Compare April 17, 2026 21:34
…ckground tasks

Background tasks (/background) previously had no inactivity timeout,
unlike regular agent sessions. A hung API call or stuck tool would
run forever with no user notification. Additionally, gateway
restart/shutdown cancelled tasks silently (CancelledError is
BaseException, not caught by except Exception).

Changes:
- Add polling loop with same HERMES_AGENT_TIMEOUT inactivity detection
  used by regular sessions (default 1800s)
- Fire a warning at HERMES_AGENT_TIMEOUT_WARNING (default 900s)
- On timeout, interrupt the agent and notify the user with diagnostics
- Catch asyncio.CancelledError to notify user on gateway shutdown
@jooray jooray force-pushed the fix/background-task-timeout branch from 5cf7d8b to a6ec52d Compare April 20, 2026 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants