fix: harden AWS release deployment workflow for go-live execution by BrianCLong · Pull Request #22206 · BrianCLong/summit

BrianCLong · 2026-03-29T21:49:14Z

Motivation

Make the AWS release path deterministic and safe to run from a published release or manually via workflow_dispatch so go-live promotions are auditable and repeatable.
Fail fast on missing deployment prerequisites and capture governance evidence for traceability during production deploys.
Reduce operational risk by consolidating build/push, optional infra apply, and k8s rollout verification into a single hardened workflow.

Description

Reworked .github/workflows/deploy-aws.yml to trigger on release.published and workflow_dispatch with inputs image_tag and apply_infra and normalized concurrency and env vars such as ECR_REGISTRY, EKS_CLUSTER, K8S_NAMESPACE, and AWS_ROLE_NAME.
Added a preflight job that validates presence of secrets.AWS_ACCOUNT_ID, terraform/environments/prod/main.tf, charts/universal-app/Chart.yaml, and scripts/verify-deployment.sh to fail early when prerequisites are missing.
Hardened build-and-push with explicit pnpm/node setup, dependency audit per-service (safety for Python and pnpm audit for Node), consistent ECR tag resolution (RELEASE_TAG), and robust docker build/push semantics for maestro, prov-ledger, and policy-lac.
Added optional deploy-infra controlled by apply_infra to run Terraform in terraform/environments/prod, and enhanced deploy-k8s to configure Helm/kubectl, update kubeconfig, deploy workloads using the RELEASE_TAG, verify rollouts with kubectl rollout status, run ./scripts/verify-deployment.sh, and upload governance evidence artifacts.

Testing

Parsed the modified workflow YAML using PyYAML with yaml.safe_load which succeeded (YAML_OK).
Verified required referenced files exist via automated checks: terraform/environments/prod/main.tf, charts/universal-app/Chart.yaml, and scripts/verify-deployment.sh (all present).
Attempted to run actionlint .github/workflows/deploy-aws.yml but actionlint is not installed in this environment so static GitHub Actions linting was not performed (recommend running actionlint in CI or locally before merge).

{"agent":"codex","change_type":"patch","affected_paths":[".github/workflows/deploy-aws.yml"],"ci_checks":["yaml-parse","file-existence","actionlint:skipped"],"rollback":"re-run workflow with previous image_tag or use helm rollback for affected releases","post_deploy_watch":"60m"}

Codex Task

Summary by CodeRabbit

Chores
- Deployment workflow now triggers on releases instead of automatic pushes to main branch.
- Added manual deployment option with configurable image tag and infrastructure update settings.
- Enhanced pre-deployment validation and Kubernetes rollout verification processes.
- Updated compliance and governance evidence tracking.

gemini-code-assist · 2026-03-29T21:49:19Z

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

coderabbitai · 2026-03-29T21:50:03Z

Warning

Rate limit exceeded

@BrianCLong has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 31 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 3 minutes and 31 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 69eb6520-94cf-4e8b-896d-db8f22998f2e

📥 Commits

Reviewing files that changed from the base of the PR and between 3c28137 and aeed8ad.

📒 Files selected for processing (2)

.github/workflows/deploy-aws.yml
docker-compose.dev.yaml

Walkthrough

The GitHub Actions workflow for AWS deployment was restructured: renamed to reflect release-based triggers, changed from push-on-main to release events with optional inputs, replaced reusable pre-deploy gate with inline checks, and enhanced infrastructure and Kubernetes deployment stages with conditional execution, improved credential handling, and expanded deployment verification.

Changes

Cohort / File(s)	Summary
AWS Deployment Workflow `.github/workflows/deploy-aws.yml`	Renamed workflow, changed event triggers from `push` to `release` with `workflow_dispatch` inputs. Replaced reusable pre-deploy gate with inline `preflight` job, removed Trivy scan. Refactored image tagging from `github.sha` to computed `RELEASE_TAG`. Added conditional `deploy-infra` job with Terraform deployment when `apply_infra == true`. Updated Kubernetes deployment to use Helm/kubectl setup, explicit kubeconfig generation, and parameterized image tag/namespace. Expanded rollout verification for additional deployments. Updated governance evidence artifact naming and retention policy.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hoppy Deployment Days

Releases trigger, no more main push,
Preflight gates in a single rush,
Terraform builds when the choice is made,
Kubernetes rolls with the RELEASE TAG parade,
Evidence logged, safely stored for days!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is comprehensive and covers motivation, implementation details, and testing approach. However, it is missing several required template sections including Risk & Surface area, Assumption Ledger, Execution Governor & Customer Impact, Green CI Contract Checklist, and Investigation Trust Doctrine Checklist.	Complete the required template sections: select Risk Level and Surface Area, fill in Assumption Ledger, specify Customer Impact and Rollback Plan, and complete the Green CI Contract and Investigation Trust Doctrine checklists.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change—hardening the AWS release deployment workflow for go-live execution, which aligns with the PR's core objective of making deployments deterministic and safe.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/deploy-release-on-aws

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3c28137fe9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-29T21:51:44Z

.github/workflows/deploy-aws.yml

          helm upgrade --install maestro charts/universal-app \
-            --set image.repository=$ECR_REGISTRY/summit/maestro \
-            --set image.tag=${{ github.sha }} \
-            --set service.targetPort=8001 \
-            --set ingress.enabled=true \
-            --set ingress.hosts[0].host=api.summit.internal \
-            --set ingress.hosts[0].paths[0].path=/maestro \
-            --set ingress.hosts[0].paths[0].pathType=Prefix \
-            --namespace default
-
-          # Deploy Prov Ledger
+            --namespace "$K8S_NAMESPACE" \
+            --set image.repository="$ECR_REGISTRY/summit/maestro" \
+            --set image.tag="$RELEASE_TAG" \
+            --set service.targetPort=8001


Preserve Maestro ingress overrides during Helm upgrade

This Helm invocation no longer sets ingress.enabled=true (or the hosts/paths overrides) for maestro; because charts/universal-app/values.yaml defaults ingress.enabled to false, running this workflow will render the ingress manifest absent and Helm will remove the existing ingress on upgrade. In environments that rely on the /maestro ingress route, this change causes an external availability outage immediately after deployment.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-29T21:51:44Z

.github/workflows/deploy-aws.yml

+      - name: Upload governance evidence
+        uses: actions/upload-artifact@v4
+        with:
+          name: aws-deploy-evidence-${{ github.run_id }}
+          path: evidence-artifacts/


Upload deployment evidence even when rollout fails

The artifact upload runs only on success by default, so any failure in rollout verification or verify-deployment.sh prevents evidence-artifacts/ from being published. That removes the caller/cluster evidence needed for incident triage and compliance in exactly the failure scenarios where it is most valuable; adding if: always() (or moving upload earlier) keeps evidence available for failed runs.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

.github/workflows/deploy-aws.yml (2)

199-204: Always upload the evidence collected for failed deploys.

If rollout verification or the smoke check fails, this step is skipped and the caller/cluster snapshots are lost for the exact attempt you need to debug. Make the upload unconditional, and consider including github.run_attempt so reruns keep distinct evidence.

Suggested change

       - name: Upload governance evidence
+        if: ${{ always() }}
         uses: actions/upload-artifact@v4
         with:
-          name: aws-deploy-evidence-${{ github.run_id }}
+          name: aws-deploy-evidence-${{ github.run_id }}-${{ github.run_attempt }}
           path: evidence-artifacts/
           retention-days: 30

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-aws.yml around lines 199 - 204, The "Upload
governance evidence" workflow step currently gets skipped on failed deploys;
make the artifact upload unconditional by removing any surrounding if: or
conditional job/step guards so the actions/upload-artifact@v4 step always runs,
and update the artifact name expression to include the run attempt (e.g.,
aws-deploy-evidence-${{ github.run_id }}-attempt-${{ github.run_attempt }}) so
reruns produce distinct evidence; keep the same path (evidence-artifacts/) and
retention-days: 30.

122-136: Make Terraform consume the same region the rest of the workflow uses.

The AWS CLI/actions read AWS_REGION, but terraform/environments/prod/main.tf is wired to var.aws_region. If prod is not already setting that variable via tfvars, this job is either non-deterministic or will fail under -input=false. Export TF_VAR_aws_region here so the provider uses the same region as the rest of the deploy.

Suggested change

       - name: Terraform apply
+        env:
+          TF_VAR_aws_region: ${{ env.AWS_REGION }}
         working-directory: terraform/environments/prod
         run: |
           set -euo pipefail
           terraform init -input=false
           terraform apply -auto-approve -input=false

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-aws.yml around lines 122 - 136, The Terraform step
is not receiving the workflow AWS region, causing indeterministic runs; set the
TF_VAR_aws_region environment variable from the workflow's AWS_REGION so
Terraform's var.aws_region matches the actions' region. Update the "Terraform
apply" step (or its environment for that job) to export or define
TF_VAR_aws_region="${{ env.AWS_REGION }}" before running terraform init/apply so
the provider uses the same region as the aws-actions/configure-aws-credentials
step.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/deploy-aws.yml:
- Around line 19-20: The concurrency group currently uses a dynamic key ("group:
deploy-aws-${{ github.ref_name || github.sha }}") which allows separate runs to
race; change the concurrency group to a fixed name (for example
"deploy-aws-production") so all production deployments serialize into the same
group, keeping or adjusting cancel-in-progress as desired; update the
concurrency block that defines group/cancel-in-progress to use the fixed string
instead of the expression.
- Around line 6-16: The workflow currently rebuilds and repushes unconditionally
when workflow_dispatch.image_tag is supplied; change the build-and-push logic so
manual runs that specify image_tag do not overwrite existing tags: modify the
build/push job or steps (the steps that perform the build and ECR push) to run
only when no image_tag was passed (e.g., add an if condition checking
github.event.inputs.image_tag is empty) or, if you must allow a push, first
validate the checked-out ref matches the requested tag (use git to resolve the
ref/tag and compare to github.event.inputs.image_tag) and fail if they differ;
update the conditional on the build/push steps or job to use github.event_name
== 'workflow_dispatch' && github.event.inputs.image_tag == '' (or the equivalent
validation check) so manual deploys with image_tag run in deploy-only mode or
safely validate before pushing.
- Around line 168-196: The rollout checks fail because the chart's templates
render deployments as "<release>-universal-app" while the later kubectl checks
expect "<release>" names; update each Helm invocation (the helm upgrade
--install calls for maestro, prov-ledger, and policy-lac) to set the release
name as the intended deployment basename by adding --set
fullnameOverride=<release-name> (i.e., --set fullnameOverride=maestro for the
maestro release, --set fullnameOverride=prov-ledger for prov-ledger, and --set
fullnameOverride=policy-lac for policy-lac) so the generated Deployment names
match the kubectl rollout status checks.
- Around line 88-97: The Dependency audit step currently runs safety without
using maestro's pinned deps; update the "Dependency audit" step so when
matrix.service == "maestro" you either install maestro's locked deps (e.g., run
python -m pip install --upgrade pip && pip install -r maestro/requirements.txt)
before invoking safety or call safety against the manifest (e.g., safety check
-r maestro/requirements.txt) so safety actually scans maestro's dependency
manifest instead of the environment's default packages.

---

Nitpick comments:
In @.github/workflows/deploy-aws.yml:
- Around line 199-204: The "Upload governance evidence" workflow step currently
gets skipped on failed deploys; make the artifact upload unconditional by
removing any surrounding if: or conditional job/step guards so the
actions/upload-artifact@v4 step always runs, and update the artifact name
expression to include the run attempt (e.g., aws-deploy-evidence-${{
github.run_id }}-attempt-${{ github.run_attempt }}) so reruns produce distinct
evidence; keep the same path (evidence-artifacts/) and retention-days: 30.
- Around line 122-136: The Terraform step is not receiving the workflow AWS
region, causing indeterministic runs; set the TF_VAR_aws_region environment
variable from the workflow's AWS_REGION so Terraform's var.aws_region matches
the actions' region. Update the "Terraform apply" step (or its environment for
that job) to export or define TF_VAR_aws_region="${{ env.AWS_REGION }}" before
running terraform init/apply so the provider uses the same region as the
aws-actions/configure-aws-credentials step.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b153727a-feb9-4764-aac2-edc9acfc75f6

📥 Commits

Reviewing files that changed from the base of the PR and between 12cad4a and 3c28137.

📒 Files selected for processing (1)

.github/workflows/deploy-aws.yml

.github/workflows/deploy-aws.yml

BrianCLong · 2026-03-29T22:46:48Z

Queue note: branch-local workflow issues have been repaired and the AWS deploy workflow was tightened in aeed8adba33bd799f7ea7cd16dc8fa8248cbe12a, so current CI is exercising the intended branch state. Remaining DIRTY status is intentionally constrained to the compose base repair in #22220; once that merges, refresh this branch from main and rerun build, evidence, test, and tests.

BrianCLong · 2026-03-29T22:54:41Z

Closing in favor of #22241 for the active governed surface. The remaining non-main delta only targeted the archived deploy workflow path and was intentionally excluded from the convergence train.

fix: harden aws release deployment workflow

3c28137

BrianCLong added the codex Codex-owned implementation work label Mar 29, 2026 — with ChatGPT Codex Connector

chatgpt-codex-connector bot reviewed Mar 29, 2026

View reviewed changes

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

.github/workflows/deploy-aws.yml Show resolved Hide resolved

.github/workflows/deploy-aws.yml Outdated Show resolved Hide resolved

.github/workflows/deploy-aws.yml Show resolved Hide resolved

.github/workflows/deploy-aws.yml Show resolved Hide resolved

BrianCLong added 2 commits March 29, 2026 16:36

chore(ci): sync docker-compose.dev build-gate fix from main

100a231

fix(ci): harden aws release workflow safeguards

aeed8ad

BrianCLong mentioned this pull request Mar 29, 2026

fix(ci): converge golden-main pilot governance path #22241

Closed

32 tasks

BrianCLong closed this Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden AWS release deployment workflow for go-live execution#22206

fix: harden AWS release deployment workflow for go-live execution#22206
BrianCLong wants to merge 3 commits intomainfrom
codex/deploy-release-on-aws

BrianCLong commented Mar 29, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 29, 2026

Uh oh!

coderabbitai bot commented Mar 29, 2026 •

edited

Loading

Rate limit exceeded

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BrianCLong commented Mar 29, 2026

Uh oh!

BrianCLong commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BrianCLong commented Mar 29, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Testing

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Mar 29, 2026

Uh oh!

coderabbitai bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BrianCLong commented Mar 29, 2026

Uh oh!

BrianCLong commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BrianCLong commented Mar 29, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 29, 2026 •

edited

Loading