refactor(ci): rewrite fix-dependabot to capture all CI failures by umair-ably · Pull Request #333 · ably/ably-cli

umair-ably · 2026-04-15T17:40:28Z

Summary

Rewrites the Fix Dependabot PRs workflow from a single job that duplicated build/lint/test internally to a two-job architecture that waits for all CI workflows to complete and captures their failures.

Problem

The previous workflow ran its own build, lint, and unit test steps. When those passed, it assumed everything was fine — but other CI workflows (E2E CLI, Web CLI Playwright E2E, Security Audit) run separately. Their failures were invisible to the Claude fixer.

Example: PR #332 had a React useState duplication bug caught by Playwright E2E tests, but Claude was never invoked because the unit tests passed within the fix-dependabot workflow.

Solution

Job 1 (regen-lockfile): Same as before — guard for dependabot PRs, regenerate pnpm-lock.yaml, commit + push. Outputs the HEAD SHA.
Job 2 (fix-failures): Polls the GitHub check runs API on the HEAD SHA every 30s, waiting for all other CI workflows to complete. If any fail, fetches their logs via gh run view --log-failed and passes everything to Claude Code Action in one shot.

Key design decisions

Removes duplicated work: No more internal build/lint/test steps — we rely on the real CI workflows instead
Polls check runs API: Waits for at least 3 of 4 core CI checks (test, e2e-cli, setup, audit) to appear, then waits for all to complete
Skips non-CI checks: Filters out own workflow jobs, Vercel deployments, and PR tooling (claude-review, PR overview)
25-minute polling timeout: Leaves ~15 minutes for Claude within the 45-minute job timeout
Concurrency group: Prevents duplicate polling when the lockfile push re-triggers this workflow
Initial 60s wait: Gives CI checks time to be queued after the lockfile push

Test plan

Verify the workflow triggers correctly on a dependabot PR
Verify the polling correctly waits for and detects CI check completions
Verify failed check logs are collected and passed to Claude
Verify the concurrency group cancels stale runs when re-triggered

🤖 Generated with Claude Code

Instead of duplicating build/lint/test steps internally, the workflow now polls the GitHub check runs API to wait for all other CI workflows (unit tests, E2E CLI, Web CLI E2E, security audit) to complete, then collects failure logs and passes them to Claude in one shot. This fixes the gap where Playwright E2E failures (e.g., React useState duplication from dependency bumps) were invisible to the Claude fixer. Structure: - Job 1 (regen-lockfile): guard + regen pnpm-lock.yaml + push - Job 2 (fix-failures): poll check runs API, collect failures, invoke Claude Also adds concurrency group to prevent duplicate polling when the lockfile push re-triggers the workflow.

vercel · 2026-04-15T17:40:33Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
cli-web-cli	Ready	Preview, Comment	Apr 15, 2026 6:06pm

claude-code-ably-assistant · 2026-04-15T17:41:22Z

Walkthrough

This PR rewrites the Fix Dependabot PRs CI workflow from a single self-contained job that ran its own build/lint/test steps into a two-job architecture. Job 1 regenerates the lockfile and captures the HEAD SHA; Job 2 polls the GitHub check-runs API until all other CI workflows complete, then collects failure logs and passes them to Claude Code Action for repair. The motivation is that the old workflow missed failures in separately-running workflows (E2E CLI, Playwright Web CLI, security audit) — as demonstrated by PR #332 where a React bug slipped through.

Changes

Area	Files	Summary
Config / CI	`.github/workflows/dependabot-lockfile.yml`	Full rewrite: split into `regen-lockfile` + `fix-failures` jobs; replace internal build/lint/test steps with polling the check-runs API; add concurrency group and `checks: read` permission

Review Notes

Behavioral change: The workflow no longer runs pnpm install, pnpm build, pnpm exec eslint ., or pnpm test:unit itself — it relies entirely on the existing CI workflows. If a check is added/renamed in CI, the EXPECTED_CHECKS array and SKIP_PATTERN regex in the polling step may need updating.
New permission: checks: read added at the workflow level to allow polling the check-runs API.
Concurrency group: cancel-in-progress: true cancels stale polling runs when the lockfile push re-triggers the workflow — reviewers should confirm this is the desired behaviour (i.e., only the latest run should attempt fixes).
Timeout budget: fix-failures job has a 45-minute timeout, with 25 minutes reserved for polling and ~15 minutes left for Claude. If any CI workflow routinely takes >25 minutes, the poller will time out with a warning rather than an error — Claude won't be invoked.
Skip logic for checks: The SKIP_PATTERN regex filters out regen-lockfile, fix-failures, Vercel, claude-review, and Generate PR Overview. Any new PR-level check added in future should be added here to avoid it blocking the poller.
Log truncation: Each failed workflow's logs are capped at tail -n 500 lines before being written to $GITHUB_OUTPUT. Very long failure outputs will be silently truncated.
actions/create-github-app-token@v3: Introduced in fix-failures (not present in the old single job). Ensure CI_APP_ID and CI_APP_PRIVATE_KEY secrets are available in the repo/org scope.
No source-code changes: All modifications are confined to the single workflow YAML file — no TypeScript, tests, or docs affected.

🤖 Generated with Claude Code

claude-code-ably-assistant

Review summary

Architecture: solid. Moving from duplicated internal build/lint/test to polling real CI results directly solves the problem from PR #332. The two-job split, concurrency group, and filtering logic are all well-thought-out.

Two correctness bugs to address before merging.

Bug 1: Heredoc delimiter collision (medium risk)

The failure_logs and failure_summary step outputs use fixed heredoc delimiters (ENDOFLOGS, ENDOFFAILURES) to write multi-line values to $GITHUB_OUTPUT. The content of failure_logs comes from raw gh run view --log-failed output. If any CI log line contains exactly ENDOFLOGS, GitHub Actions closes the multiline value early and Claude receives a truncated log missing the actual failure cause.

Fix - use a randomised delimiter that cannot appear in log output:

delimiter="EOF_$(openssl rand -hex 16)"
{ echo "failure_logs<<${delimiter}"; echo "$failure_logs"; echo "${delimiter}"; } >> "$GITHUB_OUTPUT"

Apply the same pattern to failure_summary.

Bug 2: Silent non-action when the API fails during polling

If every API call inside the polling loop fails (transient outage, rate limit, permissions), each iteration takes the continue path and ci_checks is never populated. After the timeout break, the failure-collection code runs against an empty ci_checks variable: failed_count=0, and the step exits 0 with the message "All checks passed! Nothing to fix."

Real CI failures are silently missed - the job exits green without invoking Claude.

Fix - after the loop, fail explicitly when timing out with no check data received:

if [[ $elapsed -ge $MAX_POLL_TIME && -z "$ci_checks" ]]; then
echo "::error::Timed out waiting for CI checks - no check data received"
exit 1
fi

Minor (no change needed): The failure_logs prompt expansion is subject to the standard GHA }} sequence issue if logs contain that string, but this was also present in the old workflow.

- Use randomised EOF delimiters for GITHUB_OUTPUT heredocs to prevent collision with raw CI log content truncating the output early - Fail explicitly (exit 1) when the polling loop times out without ever receiving check data, instead of silently reporting success

- Add pnpm/Node.js setup to fix-failures job so Claude can run build/lint/test commands (critical — was missing entirely) - Use Vercel.* prefix match in SKIP_PATTERN for resilience - Add generate-overview fallback to SKIP_PATTERN - Include cancelled checks in failure detection - Add run URL to failure logs for manual inspection - Log pending checks at polling timeout for debugging - Add SHA context logging - Default failed_count=0 at step start - Document EXPECTED_CHECKS source workflows

sacOO7 · 2026-04-15T18:08:58Z

Review feedback:

Add pnpm/Node.js setup to fix-failures job
Replace setup with a real test check in EXPECTED_CHECKS
Make SKIP_PATTERN more resilient with prefix matching

sacOO7

LGTM

vercel bot deployed to Preview April 15, 2026 17:40 View deployment

claude-code-ably-assistant bot reviewed Apr 15, 2026

View reviewed changes

vercel bot deployed to Preview April 15, 2026 17:47 View deployment

vercel bot deployed to Preview April 15, 2026 18:06 View deployment

umair-ably requested a review from sacOO7 April 15, 2026 18:06

sacOO7 approved these changes Apr 15, 2026

View reviewed changes

umair-ably merged commit 41e662e into main Apr 15, 2026
8 of 9 checks passed

umair-ably deleted the refactor/dependabot-workflow-rewrite branch April 15, 2026 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(ci): rewrite fix-dependabot to capture all CI failures#333

refactor(ci): rewrite fix-dependabot to capture all CI failures#333
umair-ably merged 3 commits intomainfrom
refactor/dependabot-workflow-rewrite

umair-ably commented Apr 15, 2026

Uh oh!

vercel bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

claude-code-ably-assistant bot commented Apr 15, 2026

Uh oh!

claude-code-ably-assistant bot left a comment

Uh oh!

sacOO7 commented Apr 15, 2026 •

edited

Loading

Uh oh!

sacOO7 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

umair-ably commented Apr 15, 2026

Summary

Problem

Solution

Key design decisions

Test plan

Uh oh!

vercel bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude-code-ably-assistant bot commented Apr 15, 2026

Walkthrough

Changes

Review Notes

Uh oh!

claude-code-ably-assistant bot left a comment

Choose a reason for hiding this comment

Review summary

Uh oh!

sacOO7 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sacOO7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

vercel bot commented Apr 15, 2026 •

edited

Loading

sacOO7 commented Apr 15, 2026 •

edited

Loading