refactor(ci): rewrite fix-dependabot to capture all CI failures#333
refactor(ci): rewrite fix-dependabot to capture all CI failures#333umair-ably merged 3 commits intomainfrom
Conversation
Instead of duplicating build/lint/test steps internally, the workflow now polls the GitHub check runs API to wait for all other CI workflows (unit tests, E2E CLI, Web CLI E2E, security audit) to complete, then collects failure logs and passes them to Claude in one shot. This fixes the gap where Playwright E2E failures (e.g., React useState duplication from dependency bumps) were invisible to the Claude fixer. Structure: - Job 1 (regen-lockfile): guard + regen pnpm-lock.yaml + push - Job 2 (fix-failures): poll check runs API, collect failures, invoke Claude Also adds concurrency group to prevent duplicate polling when the lockfile push re-triggers the workflow.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
WalkthroughThis PR rewrites the Changes
Review Notes
🤖 Generated with Claude Code |
There was a problem hiding this comment.
Review summary
Architecture: solid. Moving from duplicated internal build/lint/test to polling real CI results directly solves the problem from PR #332. The two-job split, concurrency group, and filtering logic are all well-thought-out.
Two correctness bugs to address before merging.
Bug 1: Heredoc delimiter collision (medium risk)
The failure_logs and failure_summary step outputs use fixed heredoc delimiters (ENDOFLOGS, ENDOFFAILURES) to write multi-line values to $GITHUB_OUTPUT. The content of failure_logs comes from raw gh run view --log-failed output. If any CI log line contains exactly ENDOFLOGS, GitHub Actions closes the multiline value early and Claude receives a truncated log missing the actual failure cause.
Fix - use a randomised delimiter that cannot appear in log output:
delimiter="EOF_$(openssl rand -hex 16)"
{ echo "failure_logs<<${delimiter}"; echo "$failure_logs"; echo "${delimiter}"; } >> "$GITHUB_OUTPUT"
Apply the same pattern to failure_summary.
Bug 2: Silent non-action when the API fails during polling
If every API call inside the polling loop fails (transient outage, rate limit, permissions), each iteration takes the continue path and ci_checks is never populated. After the timeout break, the failure-collection code runs against an empty ci_checks variable: failed_count=0, and the step exits 0 with the message "All checks passed! Nothing to fix."
Real CI failures are silently missed - the job exits green without invoking Claude.
Fix - after the loop, fail explicitly when timing out with no check data received:
if [[ $elapsed -ge $MAX_POLL_TIME && -z "$ci_checks" ]]; then
echo "::error::Timed out waiting for CI checks - no check data received"
exit 1
fi
Minor (no change needed): The failure_logs prompt expansion is subject to the standard GHA }} sequence issue if logs contain that string, but this was also present in the old workflow.
- Use randomised EOF delimiters for GITHUB_OUTPUT heredocs to prevent collision with raw CI log content truncating the output early - Fail explicitly (exit 1) when the polling loop times out without ever receiving check data, instead of silently reporting success
- Add pnpm/Node.js setup to fix-failures job so Claude can run build/lint/test commands (critical — was missing entirely) - Use Vercel.* prefix match in SKIP_PATTERN for resilience - Add generate-overview fallback to SKIP_PATTERN - Include cancelled checks in failure detection - Add run URL to failure logs for manual inspection - Log pending checks at polling timeout for debugging - Add SHA context logging - Default failed_count=0 at step start - Document EXPECTED_CHECKS source workflows
|
Review feedback:
|
Summary
Rewrites the
Fix Dependabot PRsworkflow from a single job that duplicated build/lint/test internally to a two-job architecture that waits for all CI workflows to complete and captures their failures.Problem
The previous workflow ran its own build, lint, and unit test steps. When those passed, it assumed everything was fine — but other CI workflows (E2E CLI, Web CLI Playwright E2E, Security Audit) run separately. Their failures were invisible to the Claude fixer.
Example: PR #332 had a React
useStateduplication bug caught by Playwright E2E tests, but Claude was never invoked because the unit tests passed within the fix-dependabot workflow.Solution
regen-lockfile): Same as before — guard for dependabot PRs, regeneratepnpm-lock.yaml, commit + push. Outputs the HEAD SHA.fix-failures): Polls the GitHub check runs API on the HEAD SHA every 30s, waiting for all other CI workflows to complete. If any fail, fetches their logs viagh run view --log-failedand passes everything to Claude Code Action in one shot.Key design decisions
test,e2e-cli,setup,audit) to appear, then waits for all to completeTest plan
🤖 Generated with Claude Code