feat(ci): add retry logic and metrics to critical workflows by BrianCLong · Pull Request #17561 · BrianCLong/summit

BrianCLong · 2026-02-01T22:23:13Z

User description

This PR addresses CI reliability issues by adding shell-based retry loops to critical dependency installation and audit steps in GitHub Actions workflows. It also integrates a metrics collection job to track runner performance and queue times.

Key changes:

ci.yml: Added retries to lint, typecheck, unit-tests, soc-controls. Added ci-metrics job.
ci-verify.yml: Added retries to security-scan (install & audit), governance-checks, provenance, schema-validation, compliance-evidence. Added ci-metrics job.
_reusable-ga-readiness.yml: Added retries to pnpm install and npm audit.

PR created automatically by Jules for task 17180110948053763867 started by @BrianCLong

PR Type

Enhancement

Description

Add retry logic (3 attempts, 15s delay) to pnpm install across all workflows
Add retry logic to pnpm audit and npm audit steps for resilience
Integrate ci-metrics job in ci.yml and ci-verify.yml workflows
Improve CI reliability by handling transient network failures

Diagram Walkthrough

flowchart LR
  A["Dependency Installation"] -->|"3 retries, 15s delay"| B["pnpm install"]
  C["Security Audits"] -->|"3 retries, 15s delay"| D["pnpm/npm audit"]
  E["CI Workflows"] -->|"collect metrics"| F["ci-metrics job"]
  B --> G["Improved Reliability"]
  D --> G
  F --> G

File Walkthrough

Relevant files

Enhancement

_reusable-ga-readiness.yml `Add retry logic to dependency and audit steps` .github/workflows/_reusable-ga-readiness.yml Added retry loop (3 attempts, 15s delay) to `pnpm install` `--frozen-lockfile` Added retry loop (3 attempts, 15s delay) to `npm audit` `--audit-level=high` Improves resilience against transient network failures in GA readiness checks	+2/-2
ci-verify.yml `Add retries and metrics to verification workflow` .github/workflows/ci-verify.yml Added retry loop (3 attempts, 15s delay) to `pnpm install` `--frozen-lockfile` in 5 jobs Modified `pnpm audit --audit-level critical` to use retry loop with error handling Added `ci-metrics` job that depends on all verification jobs and runs always Improves CI reliability for security scanning, governance, provenance, schema validation, and compliance jobs	+17/-10
ci.yml `Add retries and metrics to main CI workflow` .github/workflows/ci.yml Added retry loop (3 attempts, 15s delay) to `pnpm install` `--frozen-lockfile` in 4 jobs Added `ci-metrics` job that depends on all main CI jobs and runs always Applies retries to lint, typecheck, unit-tests, and soc-controls jobs Enhances CI reliability and enables metrics collection for performance tracking	+12/-4

Summary by CodeRabbit

Chores
- Improved CI/CD robustness with automatic retries for dependency installs and security audits across many workflows.
- Added CI metrics collection and consolidated metrics reporting.
- Made steps more resilient (continue-on-error tolerances, retries) and added explicit tooling/version/setup steps for pnpm, Node, and OPA.
- Pinned several action versions, adopted pnpm in more jobs, refined artifact naming, adjusted SBOM/report paths, and added minor debug/test fixture steps.

- Implements retry logic (3 attempts, 15s delay) for `pnpm install` in `ci.yml`, `ci-verify.yml`, and `_reusable-ga-readiness.yml`. - Implements retry logic for `pnpm/npm audit` steps to mitigate network flakes. - Adds `ci-metrics` job to `ci.yml` and `ci-verify.yml` utilizing `_reusable-ci-metrics.yml` to capture queue times and performance data. Co-authored-by: BrianCLong <6404035+BrianCLong@users.noreply.github.com>

google-labs-jules · 2026-02-01T22:23:14Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

gemini-code-assist · 2026-02-01T22:23:19Z

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

qodo-code-review · 2026-02-01T22:23:44Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
🔴	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Opaque retry failures: The new retry loops for `pnpm install` fail with a generic `exit 1` and no explicit attempt/failure context (e.g., attempt number, final failure message), reducing actionable debugging information when dependency installation repeatedly fails. Referred Code - run: for i in 1 2 3; do pnpm install --frozen-lockfile && exit 0 \|\| sleep 15; done; exit 1 - run: pnpm run lint Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

coderabbitai · 2026-02-01T22:23:48Z

Important

Review skipped

Too many files!

This PR contains 298 files, which is 148 over the limit of 150.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cda7a23a-772a-4947-ac55-0ee81efdcade

📥 Commits

Reviewing files that changed from the base of the PR and between 28bc734 and bebef8a.

📒 Files selected for processing (298)

.agentic-prompts/ci-ops-runbook.md
.agentic-prompts/task-19016-frontier-closure.md
.archive/v039/client/tsconfig.json
.archive/v039/server/package.json
.archive/v039/server/tsconfig.json
.artifacts/pr/schema.json
.ci/cosign-policy.sh
.ci/detections_unit.py
.ci/evidence_validate.py
.ci/scripts/release/evidence_packager.ts
.ci/supplychain_delta_check.py
.disabled/adc/tsconfig.json
.disabled/afl-store/tsconfig.json
.disabled/atl/tsconfig.json
.disabled/cfa-tdw/tsconfig.json
.dockerignore
.doclinkignore
.env.example
.github/.pre-commit-config.yaml
.github/ISSUE_TEMPLATE/agent-task.yml
.github/ISSUE_TEMPLATE/backlog-item.yml
.github/ISSUE_TEMPLATE/bootcamp-task.yaml
.github/ISSUE_TEMPLATE/bug.yaml
.github/ISSUE_TEMPLATE/bug_report.yml
.github/ISSUE_TEMPLATE/capture-issue.md
.github/ISSUE_TEMPLATE/chore.yml
.github/ISSUE_TEMPLATE/config.yml
.github/ISSUE_TEMPLATE/dev_environment.yml
.github/ISSUE_TEMPLATE/docs_request.yml
.github/ISSUE_TEMPLATE/dsr.yml
.github/ISSUE_TEMPLATE/epic-eclipse-spiderfoot-rf.yml
.github/ISSUE_TEMPLATE/epic.yml
.github/ISSUE_TEMPLATE/feature-case-first-investigation-ux-palette.yml
.github/ISSUE_TEMPLATE/feature-evidence-integrity-gate-antigravity.yml
.github/ISSUE_TEMPLATE/feature-parity-kernel-codex.yml
.github/ISSUE_TEMPLATE/feature.yaml
.github/ISSUE_TEMPLATE/feature_request.yml
.github/ISSUE_TEMPLATE/ga_gates.yml
.github/ISSUE_TEMPLATE/incident.yaml
.github/ISSUE_TEMPLATE/incident.yml
.github/ISSUE_TEMPLATE/postmortem.yml
.github/ISSUE_TEMPLATE/promise-epic.yml
.github/ISSUE_TEMPLATE/promise-feature.yml
.github/ISSUE_TEMPLATE/release_regression.yaml
.github/ISSUE_TEMPLATE/roadmap-prompt.yml
.github/ISSUE_TEMPLATE/security-issue.yml
.github/ISSUE_TEMPLATE/spike.yml
.github/ISSUE_TEMPLATE/translation_request.yml
.github/ISSUE_TEMPLATE/triage.yml
.github/ISSUE_TEMPLATE/user_story.yml
.github/MILESTONES/ai-ethics-ga.yaml
.github/MILESTONES/declarative-pipelines-ga.yml
.github/MILESTONES/ga-cogops.yml
.github/MILESTONES/ga-infra-selfservice.yml
.github/MILESTONES/ga/ai_adoption.required_artifacts.json
.github/MILESTONES/infowar-sitrep-ga.yml
.github/MILESTONES/required_checks.todo.md
.github/MILESTONES/self_flow_ga.yml
.github/MILESTONES/semantic-search-ga.yml
.github/SECURITY.md
.github/actionlint.yaml
.github/actions/abp-build/action.yml
.github/actions/backlog-guard/action.yml
.github/actions/docker-build-push/action.yml
.github/actions/fabric-warm/action.yml
.github/actions/helm-deploy/action.yml
.github/actions/maestro-gate-check/action.yml
.github/actions/maestro-run/action.yml
.github/actions/release-bundle/action.yml
.github/actions/setup-pnpm/action.yml
.github/actions/setup-toolchain/action.yml
.github/actions/setup-turbo/action.yml
.github/actions/setup/action.yml
.github/actions/sigstore-verify/action.yml
.github/actions/verify-workflow-versions/action.yml
.github/actions/verify-workflow-versions/index.cjs
.github/assignees.yml
.github/auto-assign.yml
.github/auto-reviewers.yml
.github/ci-cost-policy.yml
.github/ci/action-pinning-allowlist.yml
.github/ci/docker-compose.ci.yml
.github/ci/permissions-allowlist.yml
.github/codeql/codeql-config.yml
.github/compose/pg_neo.yml
.github/container-structure-test.yaml
.github/copilot-instructions.md
.github/copilot-instructions.yml
.github/ct-helm.yaml
.github/dependabot.yml
.github/flake-registry.json
.github/governance/branch_protection_rules.json
.github/k6/intelgraph-canary-validation.js
.github/k6/rollout-canary.js
.github/kube-linter-config.yaml
.github/labeler.yml
.github/labels.json
.github/labels.yml
.github/merge-engine/README.md
.github/merge-engine/config.yml
.github/milestones.yml
.github/policies/agent-runtime/tool_access_policy.yaml
.github/policies/agent-security.rego
.github/policies/agent_governance.yml
.github/policies/ai-usage.rego
.github/policies/canonical-path-exceptions.json
.github/policies/dependency-cos.yml
.github/policies/dependency-worldmodel.yml
.github/policies/infra/README.md
.github/policies/infra/cost_guardrails.rego
.github/policies/infra/deny-by-default.rego
.github/policies/infra/dependency_allowedlist.rego
.github/policies/infra/environment_scope.rego
.github/policies/infra/resource_naming.rego
.github/policies/jurisdiction.policy.json
.github/policies/media-claims.policy.json
.github/policies/personal-intelligence.policy.md
.github/policies/pipeline-schema.rego
.github/policies/regulatory-early-warning-policy.rego
.github/policies/self_flow_policy.rego
.github/policies/slsa-spdx.rego
.github/policies/supplychain/verify.rego
.github/policies/supplychain/verify_test.rego
.github/policies/task-thread-access.rego
.github/protection-rules.yml
.github/pull_request_template.md
.github/release-drafter.yml
.github/required-checks.yml
.github/required_checks.todo.md
.github/roadmap_calendar.yml
.github/roadmap_mapping.yml
.github/roadmap_seeds.yml
.github/scripts/check-never-log.ts
.github/scripts/evidence-emit.ts
.github/scripts/infra-verify.ts
.github/scripts/issue-queue-bot/__tests__/bot.test.cjs
.github/scripts/issue-queue-bot/__tests__/classifier.test.cjs
.github/scripts/issue-queue-bot/__tests__/queueBot.test.js
.github/scripts/issue-queue-bot/bot.cjs
.github/scripts/issue-queue-bot/classifier.cjs
.github/scripts/issue-queue-bot/index.js
.github/scripts/issue-queue-bot/package.json
.github/scripts/issue-queue-bot/rules.json
.github/scripts/issue-queue-bot/run.cjs
.github/scripts/issue-queue-bot/run.js
.github/scripts/merge-engine/apply_labels.sh
.github/scripts/merge-engine/gh_pr_inventory.sh
.github/scripts/merge-engine/triage_prs.py
.github/scripts/merge-train-autopilot.sh
.github/scripts/never-log-scan.ts
.github/scripts/process-pr-batch.sh
.github/scripts/sigstore/verify.sh
.github/scripts/validate-evidence-schemas.mjs
.github/scripts/validate-evidence.ts
.github/scripts/verify-canonical-structure.cjs
.github/scripts/verify-dependency-delta.ts
.github/scripts/verify-evidence.mjs
.github/scripts/verify-regulatory-ew-evidence.ts
.github/scripts/verify-workflow-graphs.mjs
.github/scripts/verify_evidence_index.ts
.github/scripts/verify_self_flow.ts
.github/security-waivers.yml
.github/settings.yml
.github/stale.yml
.github/summit/README.md
.github/summit/agents/architectureDriftAgent.ts
.github/summit/agents/observabilityRollupAgent.ts
.github/summit/agents/readinessAgent.ts
.github/summit/agents/securityPostureAgent.ts
.github/summit/agents/triageAgent.ts
.github/summit/dashboards/engineering-health.json
.github/summit/dashboards/merge-readiness.json
.github/summit/dashboards/security-posture.json
.github/summit/event-router/routeEvent.ts
.github/summit/lib/artifacts.ts
.github/summit/lib/context.ts
.github/summit/policies/readiness-policy.json
.github/workflows/.archive/_auth-oidc.yml
.github/workflows/.archive/_deploy.yml
.github/workflows/.archive/_reusable-aws.yml
.github/workflows/.archive/_reusable-build.yml
.github/workflows/.archive/_reusable-ci-fast.yml
.github/workflows/.archive/_reusable-ci-metrics.yml
.github/workflows/.archive/_reusable-ci-perf.yml
.github/workflows/.archive/_reusable-ci.yml
.github/workflows/.archive/_reusable-governance-gate.yml
.github/workflows/.archive/_reusable-node-pnpm-setup.yml
.github/workflows/.archive/_reusable-release.yml
.github/workflows/.archive/_reusable-security-compliance.yml
.github/workflows/.archive/_reusable-setup.yml
.github/workflows/.archive/_reusable-slsa-build.yml
.github/workflows/.archive/_reusable-test-suite.yml
.github/workflows/.archive/_reusable-test.yml
.github/workflows/.archive/_reusable-toolchain-setup.yml
.github/workflows/.archive/a11y-lab.yml
.github/workflows/.archive/abac-policy.yml
.github/workflows/.archive/accessibility.yml
.github/workflows/.archive/admin-cli.yml
.github/workflows/.archive/agent-guardrails.yml
.github/workflows/.archive/agentic-lifecycle.yml
.github/workflows/.archive/agentic-plan-gate.yml
.github/workflows/.archive/agentic-policy-check.yml
.github/workflows/.archive/agentic-policy-drift.yml
.github/workflows/.archive/agentic-task-orchestrator.yml
.github/workflows/.archive/ai-assist-gates.yml
.github/workflows/.archive/ai-copilot-canary.yml
.github/workflows/.archive/ai-governance.yml
.github/workflows/.archive/ai-refactor-dryrun.yml
.github/workflows/.archive/airgap-deployment.yml
.github/workflows/.archive/alert-hygiene.yml
.github/workflows/.archive/api-determinism-check.yml
.github/workflows/.archive/api-docs-sync.yml
.github/workflows/.archive/api-docs-validation.yml
.github/workflows/.archive/api-docs.yml
.github/workflows/.archive/api-lint.yml
.github/workflows/.archive/archsim.yml
.github/workflows/.archive/audit-artifacts.yml
.github/workflows/.archive/audit-branch-protections.yml
.github/workflows/.archive/audit-ci.yml
.github/workflows/.archive/audit-exception-expiry.yml
.github/workflows/.archive/audit.strict.nightly.yml
.github/workflows/.archive/auto-approve-prs.yml
.github/workflows/.archive/auto-draft-release.yml
.github/workflows/.archive/auto-enqueue.yml
.github/workflows/.archive/auto-fix-vulnerabilities.yml
.github/workflows/.archive/auto-green.yml
.github/workflows/.archive/auto-remediation.yml
.github/workflows/.archive/auto-resolve-conflicts.yml
.github/workflows/.archive/auto-rollback.yml
.github/workflows/.archive/auto-triage-blockers.yml
.github/workflows/.archive/automated-backups.yml
.github/workflows/.archive/autotriage-ci.yml
.github/workflows/.archive/azure-turin-v7-drift.yml
.github/workflows/.archive/backup-dr.yml
.github/workflows/.archive/backup-restore-validation.yml
.github/workflows/.archive/backup-verify.yml
.github/workflows/.archive/bidirectional-sync.yml
.github/workflows/.archive/branch-lifecycle.yml
.github/workflows/.archive/branch-protection-drift.yml
.github/workflows/.archive/branch-protection-reconcile.yml
.github/workflows/.archive/build-cache.yml
.github/workflows/.archive/build-images.yml
.github/workflows/.archive/build.yml
.github/workflows/.archive/ci-actionlint.yml
.github/workflows/.archive/ci-backbone.yml
.github/workflows/.archive/ci-cd.yml
.github/workflows/.archive/ci-comprehensive.yml
.github/workflows/.archive/ci-core.yml
.github/workflows/.archive/ci-e2e-full.yml
.github/workflows/.archive/ci-e2e-smoke.yml
.github/workflows/.archive/ci-evidence-verify.yml
.github/workflows/.archive/ci-governance.yml
.github/workflows/.archive/ci-health-monitor.yml
.github/workflows/.archive/ci-image.yml
.github/workflows/.archive/ci-intelgraph-server.yml
.github/workflows/.archive/ci-legacy.yml
.github/workflows/.archive/ci-main.yml
.github/workflows/.archive/ci-modernized.yml
.github/workflows/.archive/ci-performance-k6.yml
.github/workflows/.archive/ci-platform.yml
.github/workflows/.archive/ci-post-merge.yml
.github/workflows/.archive/ci-pr-gate.yml
.github/workflows/.archive/ci-pr.yml
.github/workflows/.archive/ci-preflight.yml
.github/workflows/.archive/ci-rdp-gates.yml
.github/workflows/.archive/ci-repo-hygiene.yml
.github/workflows/.archive/ci-runner-drift.yml
.github/workflows/.archive/ci-sanity.yml
.github/workflows/.archive/ci-security.yml
.github/workflows/.archive/ci-sgf.yml
.github/workflows/.archive/ci-sharded-example.yml
.github/workflows/.archive/ci-signal-gate.yml
.github/workflows/.archive/ci-supply-chain.yml
.github/workflows/.archive/ci-template-optimized.yml
.github/workflows/.archive/ci-test.yml
.github/workflows/.archive/ci-trusted.yml
.github/workflows/.archive/ci-workflow-diff.yml
.github/workflows/.archive/ci-zap.yml
.github/workflows/.archive/ci.pr.scoped.yml
.github/workflows/.archive/ci.switchboard.yml
.github/workflows/.archive/ci.unified.yml
.github/workflows/.archive/ci.yml
.github/workflows/.archive/ci_baseline.yml
.github/workflows/.archive/ci_eval.yml
.github/workflows/.archive/ci_governance.yml
.github/workflows/.archive/ci_observability.yml
.github/workflows/.archive/ci_perf.yml
.github/workflows/.archive/ci_policy.yml
.github/workflows/.archive/ci_provenance.yml
.github/workflows/.archive/ci_sdk.yml
.github/workflows/.archive/ci_supplychain_foundation.yml
.github/workflows/.archive/cicd-observer.yml
.github/workflows/.archive/cli.yml
.github/workflows/.archive/client-ci.yml
.github/workflows/.archive/client-typecheck.yml
.github/workflows/.archive/code-quality-gates.yml
.github/workflows/.archive/codedata.yml
.github/workflows/.archive/codeql-analysis.yml

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Multiple GitHub Actions workflows were changed: many install/audit steps now use 3-attempt retry loops; several workflows added or wired a ci-metrics job; numerous action pins, pnpm setup/version changes, permission/error-handling tweaks, and a few conditional guards and artifact/name adjustments were applied.

Changes

Cohort / File(s)	Summary
Retry: installs & audits `.github/workflows/_reusable-ga-readiness.yml`, `.github/workflows/ci-verify.yml`, `.github/workflows/ci.yml`, `.github/workflows/golden-path-e2e.yml`, `.github/workflows/schema-diff.yml`	Replaced direct `pnpm install --frozen-lockfile` and some `npm audit`/install calls with 3-iteration retry loops (sleep 15s between attempts; exit on first success, fail after final attempt).
CI metrics integration `.github/workflows/ci-verify.yml`, `.github/workflows/ci.yml`, `.github/workflows/_reusable-ci-metrics.yml`	Added `ci-metrics` job wired to a reusable metrics workflow; adjusted quoting/heredoc and pinned upload-artifact action commit in the reusable workflow.
pnpm setup / versioning `.github/workflows/ga-evidence.yml`, `.github/workflows/security-regressions.yml`, `.github/workflows/post-release-canary.yml`, `.github/workflows/golden-path-e2e.yml`, `.github/workflows/graph-sync.yml`	Added or standardized pnpm setup steps and explicit version pins (e.g., `version: 9.12.0`) and consolidated `with:` inputs for setup steps.
Action pinning & swaps `.github/workflows/ci-actionlint.yml`, `.github/workflows/supply-chain-integrity.yml`, `.github/workflows/ci-security.yml`, `.github/workflows/reusable/canary-rollback.yml`, `.github/workflows/ci-signal-gate.yml`	Pinned several actions to specific commit SHAs and replaced/swapped some actions (e.g., actionlint -> reviewdog), and updated many upload-artifact refs to a commit hash. Review for reproducibility implications.
Error-handling & permissions `.github/workflows/auto-enqueue.yml`, `.github/workflows/ci-signal-gate.yml`	Removed `checks` read permission; made `gh pr checks` tolerant (`
Artifact naming & report logic `.github/workflows/ci-security.yml`, `.github/workflows/schema-diff.yml`, `.github/workflows/pr-quality-gate.yml`	Split and renamed security-report artifacts per tool; updated artifact download patterns; schema-diff expanded PR comment generation and breaking-change gating — inspect PR-comment and breaking-change logic closely.
Conditional / small control changes `.github/workflows/subsumption-bundle-verify.yml`, `.github/workflows/release-policy-tests.yml`, `.agentic-prompts/task-11847-fix-jest-esm.md`	Added file-existence guard for subsumption verification; added PyYAML install and a debug fixtures step; minor TypeScript/Jest mock quoting/style tweaks.
Formatting / minor workflow tweaks `.github/workflows/_reusable-ci-metrics.yml`, `.github/workflows/graph-sync.yml`, `.github/workflows/golden-path-e2e.yml`	Quoting/formatting, heredoc delimiter changes, cron/branch filter quoting normalization, and Playwright install command adjusted to use `pnpm exec`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped three times, then waited — neat,
Pipelines retry until they meet.
Metrics hum and artifacts sing,
Tests and reports — a joyous spring! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description includes a user-provided summary, PR type, detailed description with key changes, and a mermaid diagram. However, it does not follow the required template structure, particularly missing explicit Risk & Surface, Assumption Ledger, Security Impact, and Green CI Contract Checklist sections.	Complete the PR description using the provided template: add Risk Level and Surface Area selections, Assumption Ledger details, Security Impact assessment, and Green CI Contract Checklist items.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective: adding retry logic and metrics to critical CI workflows, which aligns with the primary changes across multiple workflow files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/ci-reliability-retry-metrics-17180110948053763867

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2026-02-01T22:24:38Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Centralize retry logic into reusable action Create a reusable GitHub composite action to encapsulate the duplicated shell-based retry logic. This centralizes the retry mechanism, improving maintainability and simplifying workflow files. Examples: .github/workflows/ci.yml [51] - run: for i in 1 2 3; do pnpm install --frozen-lockfile && exit 0 \|\| sleep 15; done; exit 1 .github/workflows/ci-verify.yml [40] run: for i in 1 2 3; do pnpm install --frozen-lockfile && exit 0 \|\| sleep 15; done; exit 1 Solution Walkthrough: Before: # In .github/workflows/ci.yml - name: Install dependencies run: for i in 1 2 3; do pnpm install --frozen-lockfile && exit 0 \|\| sleep 15; done; exit 1 # In .github/workflows/ci-verify.yml - name: Install dependencies run: for i in 1 2 3; do pnpm install --frozen-lockfile && exit 0 \|\| sleep 15; done; exit 1 # ... and in 8 other places After: # New file: .github/actions/retry/action.yml name: 'Retry Step' inputs: run: required: true runs: using: "composite" steps: - shell: bash run: \| for i in 1 2 3; do ${{ inputs.run }} && exit 0 \|\| sleep 15; done; exit 1 # In all workflow files: - name: Install dependencies uses: ./.github/actions/retry with: run: pnpm install --frozen-lockfile Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies significant code duplication of the retry logic across multiple workflows and proposes a robust solution using a reusable composite action, which greatly improves maintainability.	Medium
Possible issue	Fix misleading error message on audit failure Adjust the `pnpm audit` retry logic to only report "Critical CVEs detected" if the command fails after all retries, preventing misleading error messages from transient network issues. .github/workflows/ci-verify.yml [42-49] - name: Dependency audit (critical CVEs only) run: \| echo "=== Scanning for Critical CVEs ===" - for i in 1 2 3; do pnpm audit --audit-level critical && exit 0 \|\| { echo "Retry $i..."; sleep 15; }; done - echo "::error::Critical CVEs detected - CI BLOCKED" - echo "::error::Run 'pnpm audit' locally and document exceptions if needed" - exit 1 + if ! (for i in 1 2 3; do pnpm audit --audit-level critical && exit 0 \|\| { echo "Retry $i..."; sleep 15; }; done; exit 1); then + echo "::error::Critical CVEs detected - CI BLOCKED" + echo "::error::Run 'pnpm audit' locally and document exceptions if needed" + exit 1 + fi continue-on-error: false # BLOCKING: critical CVEs block merge Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a logic flaw where a transient error would be misreported as a security vulnerability, improving the accuracy and reliability of the CI feedback.	Medium
General	Use exponential backoff for retries Implement exponential backoff in the `pnpm install` retry logic, increasing the wait time after each failed attempt to better handle transient network issues. .github/workflows/ci-verify.yml [40] -for i in 1 2 3; do pnpm install --frozen-lockfile && exit 0 \|\| sleep 15; done; exit 1 +for i in 1 2 3; do + echo "Attempt $i/3: pnpm install --frozen-lockfile" + pnpm install --frozen-lockfile && exit 0 + backoff=$((15 * 2**(i-1))) + echo "Install failed on attempt $i, retrying in ${backoff}s..." + sleep "${backoff}" +done +exit 1 `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 5 __ Why: The suggestion proposes using exponential backoff, which is a standard best practice for retry logic to handle transient failures more gracefully and reduce server load.	Low
General	Add logging to retry loops Add logging to the `pnpm install` retry loop to show the attempt number and a failure message, which will aid in debugging. .github/workflows/_reusable-ga-readiness.yml [92] -for i in 1 2 3; do pnpm install --frozen-lockfile && exit 0 \|\| sleep 15; done; exit 1 +for i in 1 2 3; do + echo "Attempt $i/3: pnpm install --frozen-lockfile" + pnpm install --frozen-lockfile && exit 0 + echo "Install failed on attempt $i, retrying in 15s..." + sleep 15 +done +exit 1 `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 4 __ Why: The suggestion improves debuggability by adding explicit logging for each retry attempt, which is helpful for diagnosing transient CI failures, though it is a minor enhancement.	Low
More

BrianCLong

Reviewed. Retry loops for pnpm install/audit and new ci-metrics job look fine. Confirmed _reusable-ci-metrics.yml exists. Ready for human approval.

…etry-metrics-17180110948053763867

- Fix: Swap `pnpm/action-setup` and `actions/setup-node` order in `golden-path-e2e.yml` to support caching. - Feat: Add retry logic (3 attempts, 15s delay) to `pnpm install` in `ci.yml`, `ci-verify.yml`, `golden-path-e2e.yml`, and `_reusable-ga-readiness.yml`. - Feat: Add retry logic to `pnpm/npm audit` steps. - Feat: Add `ci-metrics` job to `ci.yml` and `ci-verify.yml` to track runner performance. Co-authored-by: BrianCLong <6404035+BrianCLong@users.noreply.github.com>

… verification - Fix: Swap `pnpm/action-setup` and `actions/setup-node` order in `golden-path-e2e.yml` to support caching. - Feat: Add retry logic (3 attempts, 15s delay) to `pnpm install` in `ci.yml`, `ci-verify.yml`, `golden-path-e2e.yml`, and `_reusable-ga-readiness.yml`. - Feat: Add retry logic to `pnpm/npm audit` steps. - Feat: Add `ci-metrics` job to `ci.yml` and `ci-verify.yml` utilizing `_reusable-ci-metrics.yml`. - Fix: Add missing `.github/workflows/_reusable-ci-metrics.yml` file. - Fix: Update `scripts/verify_evidence.py` to ignore `ga`, `bundles`, and `ai-influence-ops` directories to prevent false positives in evidence verification. Co-authored-by: BrianCLong <6404035+BrianCLong@users.noreply.github.com>

TopicalitySummit

LGTM - Bulk approval phase.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

.agentic-prompts/task-11847-fix-jest-esm.md (1)

1-297: ⚠️ Potential issue | 🟡 Minor

Clarify why this documentation file is included in a metrics/retry logic PR.

This file documents the Jest ESM configuration task (#11847), which is an active infrastructure concern in the repo (referenced in ci-legacy.yml and validated in CI workflows). However, the PR objectives focus on CI retry logic and metrics collection (#17561).

The modifications here are minimal formatting changes (primarily quote style in code examples) that don't add meaningful value to the documented task. If this file's inclusion is intentional—perhaps as part of a broader testing infrastructure update—explain the relationship to the PR objectives. Otherwise, consider removing it to keep the PR focused.
.github/workflows/auto-enqueue.yml (1)
32-36: ⚠️ Potential issue | 🟡 Minor

checks variable is captured but never used.

The checks variable is assigned on line 32 but not referenced in the conditional on line 34. This appears to be either dead code or an incomplete implementation where required checks were intended to gate the enqueue.

If the intent is to verify required checks pass before enqueueing:
-         if echo "$labels" | grep -q "queue:ready" && [ "$approvals" -ge 1 ]; then
+         # Verify all required checks passed (no "fail" or "pending" in output)
+         if echo "$labels" | grep -q "queue:ready" && [ "$approvals" -ge 1 ] && ! echo "$checks" | grep -qE 'fail|pending'; then
If checks verification is not needed, remove the unused variable to avoid confusion.
.github/workflows/_reusable-ci-metrics.yml (1)
52-52: ⚠️ Potential issue | 🟠 Major

Job output artifact_name will always be empty due to step configuration error.

The job output at line 52 references steps.upload.outputs.artifact_name, but:

The upload step (lines 164-170) uses actions/upload-artifact which does not output artifact_name — it outputs artifact-id and artifact-url

The step at lines 172-174 writes artifact_name to $GITHUB_OUTPUT but lacks an id, making it inaccessible

Consumers of outputs.metrics_artifact_name (line 29-31) will receive an empty value.
🐛 Proposed fix: merge the output into the upload step or add an id

Option 1: Add id to the output step and fix the job output reference:
      - name: Output Artifact Name
+       id: artifact-name
        run: |
          echo "artifact_name=ci-metrics-${{ github.run_id }}-${{ github.run_attempt }}" >> $GITHUB_OUTPUT
And update line 52:
-     artifact_name: ${{ steps.upload.outputs.artifact_name }}
+     artifact_name: ${{ steps.artifact-name.outputs.artifact_name }}
Option 2: Remove the extra step and set artifact_name directly in the metrics step:
      - name: Collect Workflow Metrics
        id: metrics
        ...
        run: |
          ...
          echo "duration_minutes=${DURATION_MINUTES}" >> $GITHUB_OUTPUT
          echo "success_rate=${SUCCESS_RATE}" >> $GITHUB_OUTPUT
+         echo "artifact_name=ci-metrics-${{ github.run_id }}-${{ github.run_attempt }}" >> $GITHUB_OUTPUT
And update line 52:
-     artifact_name: ${{ steps.upload.outputs.artifact_name }}
+     artifact_name: ${{ steps.metrics.outputs.artifact_name }}
Also applies to: 164-174
.github/workflows/supply-chain-integrity.yml (1)

45-54: ⚠️ Potential issue | 🔴 Critical

Correct the pinned SHAs: they do not match the claimed versions.

The workflow pins to incorrect commit SHAs. Verification against official GitHub releases shows critical mismatches:

actions/checkout: Pinned SHA 34e114876b0b11c390a56381ad16ebd13914f8d5 does not match v4.1.7's actual SHA 6ccd57f4c5...

actions/setup-node: Pinned SHA 65d868f8d4d85d7d4abb7de0875cde3fcc8798f5 does not match v4.0.3's actual SHA 1e60f620b9541d16bece96c5465dc8ee9832be0b

actions/upload-artifact: Pinned SHA b7c566a772e6b6bfb58ed0dc250532a479d7789f does not match v4.3.3's actual SHA 65462800fd760344b1a7b4382951275a0abb4808 (lines 126, 134, 217, 226, 235)

These mismatches mean the locked commits do not correspond to the tagged versions, undermining the supply chain security goal. Update all pinned SHAs to the correct commit hashes for their claimed versions.

🧹 Nitpick comments (1)

.github/workflows/supply-chain-integrity.yml (1)
44-47: Clarify the comment: SHA pins are immutable, not branch tips.

The comment "(pinned to v4 branch tip)" is misleading. A SHA pin is a fixed, immutable reference to a specific commit—not a branch tip, which moves over time. Consider simplifying to just the version number for consistency with other pinned actions in this file.
📝 Suggested comment fix
       - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.1.7 (pinned to v4 branch tip)
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.1.7

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.github/workflows/pr-quality-gate.yml (1)

81-94: ⚠️ Potential issue | 🔴 Critical

SBOM file paths are inconsistent — upload will fail to find the generated file.

The generate-sbom.sh script outputs files matching the pattern {ARTIFACT_NAME}-{service_name}-{VERSION}.cdx.json. With the parameters passed (summit-platform ci-build artifacts/sbom), it generates files like summit-platform-main-ci-build.cdx.json.

However, the upload step references artifacts/sbom/sbom.json, which generate-sbom.sh never creates. This causes:

Upload step will fail or silently skip (actions/upload-artifact@v4 doesn't upload missing paths by default)

The SBOM artifact is never retained

Additionally, the hardcoded summit-platform-main-ci-build.cdx.json in the policy-check env var assumes a plain Dockerfile or Dockerfile.main exists in the repo. If the repo structure differs, the SBOM file won't be found and the check will warn.

Fix: Either update the upload path to match what generate-sbom.sh actually produces (e.g., artifacts/sbom/summit-platform-*-ci-build.cdx.json), or modify generate-sbom.sh to also output or symlink to sbom.json. Also verify the Dockerfile naming convention matches the hardcoded main service name, or make SBOM_FILE dynamic.

🤖 Fix all issues with AI agents

In @.github/workflows/ci-security.yml:
- Line 83: Update the inline version comment on every "uses:
actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f" occurrence (12
places) from "# v4.1.0" to "# v6.0.0"; then verify the GitHub-hosted
ubuntu-22.04 runners meet the minimum runner version requirement (>= 2.327.1)
for upload-artifact v6.0.0 and confirm there are no incompatibilities between
upload-artifact v6.0.0 and the pinned "download-artifact" action (v4.1.8)
referenced elsewhere.

In @.github/workflows/ci-verify.yml:
- Around line 312-316: The ga-evidence-completeness GitHub Actions job currently
sets cache: 'pnpm' while using actions/setup-node but never configures pnpm
(missing pnpm/action-setup) and never runs pnpm install; fix by either removing
the pnpm cache entry from the ga-evidence-completeness job (preferred since it
doesn't call pnpm) or, if pnpm is actually needed, add a pnpm/action-setup step
before actions/setup-node so the pnpm store path can be resolved; update the job
definition around the cache: 'pnpm' and actions/setup-node entries accordingly.
- Around line 147-151: The mcp-ux-lint and ga-evidence-completeness jobs use
actions/setup-node@v4 with cache: 'pnpm' but are missing the pnpm/action-setup
step; add a step using pnpm/action-setup (e.g., uses: pnpm/action-setup@v2)
immediately before the actions/setup-node@v4 step in both the mcp-ux-lint and
ga-evidence-completeness job definitions so pnpm is installed before the node
setup and pnpm cache action runs.

In @.github/workflows/mvp4-gate.yml:
- Line 59: The comment on the setup-node pin "uses:
actions/setup-node@6044e13b5dc448c55e2357c09f80417699197238" is wrong (it says
"# v6"); update that inline comment to the correct release label used for that
commit (e.g., "# v4" or "# v4.0.3") and decide on a consistent pinning strategy
across workflows (either change the other occurrences in build-lint-strict and
quarantine-tests to the same commit hash or make them all use the same tag like
"@v4") so all three setup-node references use the same strategy for consistency
and reproducibility.

In @.github/workflows/release-policy-tests.yml:
- Around line 61-62: The CI contains a temporary "Debug Fixtures" step that runs
ls -R on scripts/release/tests/fixtures and can fail the job if the directory is
missing; either remove the "Debug Fixtures" step entirely (if it was for
temporary debugging) or make it non-fatal by guarding the command so it only
runs when the directory exists or by marking the step as non-fatal (e.g., use a
test like check for directory existence before listing, or set the step to
continue-on-error) — edit the step named "Debug Fixtures" to implement one of
these fixes.
- Around line 58-59: Replace the loose "pip install PyYAML" invocation with a
pinned version and retry logic: change the "pip install PyYAML" command to
install a specific version (e.g., PyYAML==6.0 to match other workflows) and wrap
it with the same retry/backoff mechanism used in ci-verify.yml so the job
retries transient download failures; locate the line containing "pip install
PyYAML" and update it to use the pinned version and the repository's standard
retry pattern.

In @.github/workflows/schema-diff.yml:
- Around line 59-62: The "Install dependencies" step uses plain "pnpm install"
and lacks retry logic and the --frozen-lockfile flag; replace that run block so
the workflow retries the install up to 3 times with a 15s delay and runs "pnpm
install --frozen-lockfile" (and then "pnpm add -g ts-node typescript") inside
the retry loop, mirroring the retry behavior and flag used in ci.yml /
ci-verify.yml / _reusable-ga-readiness.yml to prevent lockfile changes and
transient network failures.

🧹 Nitpick comments (1)

.github/workflows/schema-diff.yml (1)

52-52: Inconsistent action pinning: pnpm/action-setup@v4 uses a mutable tag.

Other actions in this workflow are pinned to commit SHAs (e.g., upload-artifact, download-artifact, github-script), but pnpm/action-setup uses a mutable @v4 tag. For supply-chain consistency, consider pinning to a specific commit hash here as well.

.github/workflows/ci-security.yml

.github/workflows/ci-verify.yml

.github/workflows/mvp4-gate.yml

.github/workflows/release-policy-tests.yml

.github/workflows/schema-diff.yml

TopicalitySummit

Governed exception sweep: approved for merge queue progression.

BrianCLong · 2026-03-30T14:02:18Z

Temporarily closing to reduce Actions queue saturation and unblock #22241. Reopen after the golden-main convergence PR merges.

BrianCLong · 2026-03-30T14:02:19Z

Temporarily closing to reduce Actions queue saturation and unblock #22241. Reopen after the golden-main convergence PR merges.

google-labs-jules · 2026-03-30T14:09:30Z

Temporarily closing to reduce Actions queue saturation and unblock #22241. Reopen after the golden-main convergence PR merges.

Understood. Acknowledging that this work is temporarily closed to unblock the queue, and I will stop work on this task until it's reopened.

google-labs-jules · 2026-03-30T14:09:31Z

Temporarily closing to reduce Actions queue saturation and unblock #22241. Reopen after the golden-main convergence PR merges.

Understood. Acknowledging that this work is temporarily closed to unblock the queue, and I will stop work on this task until it's reopened.

qodo-code-review bot added the Review effort 2/5 label Feb 1, 2026

BrianCLong commented Feb 3, 2026

View reviewed changes

BrianCLong and others added 6 commits February 3, 2026 11:17

Merge remote-tracking branch 'origin/main' into feat/ci-reliability-r…

b4fb085

…etry-metrics-17180110948053763867

Merge main (conflicts resolved by taking main for common files)

c0d890b

Merge main (conflicts resolved by taking main for common files)

b2eead1

Merge main (conflicts resolved by taking main for common files)

025fb34

TopicalitySummit previously approved these changes Feb 4, 2026

View reviewed changes

TopicalitySummit enabled auto-merge (squash) February 4, 2026 03:02

BrianCLong dismissed TopicalitySummit’s stale review via 059c576 February 4, 2026 23:27

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

coderabbitai bot reviewed Feb 7, 2026

View reviewed changes

BrianCLong force-pushed the feat/ci-reliability-retry-metrics-17180110948053763867 branch from d81508e to 06b6595 Compare February 8, 2026 12:49

BrianCLong added a commit that referenced this pull request Feb 22, 2026

chore(ci): align GA baseline for PR #17561

4a22cb2

TopicalitySummit previously approved these changes Feb 23, 2026

View reviewed changes

BrianCLong pushed a commit that referenced this pull request Feb 24, 2026

Merge PR #17561 into integration (force-resolved conflicts)

2f16cd8

TopicalitySummit added the queue:conflict label Mar 1, 2026

BrianCLong added this to the v2026.04-ga milestone Mar 5, 2026

BrianCLong added queue:needs-rebase and removed queue:conflict labels Mar 7, 2026

BrianCLong self-assigned this Mar 8, 2026

chore: merge origin/main and resolve conflicts surgically

bebef8a

BrianCLong force-pushed the feat/ci-reliability-retry-metrics-17180110948053763867 branch from 85aa24e to bebef8a Compare March 8, 2026 15:20

BrianCLong added queue:blocked and removed queue:needs-rebase labels Mar 23, 2026

BrianCLong mentioned this pull request Mar 23, 2026

Prepare rebase_prs.sh script for the 6 PRs #21602

Closed

27 tasks

BrianCLong dismissed TopicalitySummit’s stale review via bebef8a March 30, 2026 00:28

BrianCLong closed this Mar 30, 2026

auto-merge was automatically disabled March 30, 2026 14:02
Pull request was closed

Conversation

BrianCLong commented Feb 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Summary by CodeRabbit

Uh oh!

google-labs-jules bot commented Feb 1, 2026

Uh oh!

gemini-code-assist bot commented Feb 1, 2026

Uh oh!

qodo-code-review bot commented Feb 1, 2026

PR Compliance Guide 🔍

Uh oh!

coderabbitai bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

qodo-code-review bot commented Feb 1, 2026

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

BrianCLong left a comment

Choose a reason for hiding this comment

Uh oh!

TopicalitySummit left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TopicalitySummit left a comment

Choose a reason for hiding this comment

Uh oh!

BrianCLong commented Mar 30, 2026

Uh oh!

BrianCLong commented Mar 30, 2026

Uh oh!

google-labs-jules bot commented Mar 30, 2026

Uh oh!

google-labs-jules bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BrianCLong commented Feb 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 1, 2026 •

edited

Loading