Skip to content

feat(mit-learn-nextjs): remove blue/green EFS deployment, scope Fastly purge#4652

Open
blarghmatey wants to merge 4 commits into
mainfrom
tmacey/nextjs-deployment-simplification
Open

feat(mit-learn-nextjs): remove blue/green EFS deployment, scope Fastly purge#4652
blarghmatey wants to merge 4 commits into
mainfrom
tmacey/nextjs-deployment-simplification

Conversation

@blarghmatey
Copy link
Copy Markdown
Member

@blarghmatey blarghmatey commented May 20, 2026

What are the relevant tickets?

N/A

Description (What does it do?)

Simplifies the MIT Learn Next.js deployment by removing the blue/green EFS build mechanism and replacing it with a standard rolling update. Accompanies app-side changes in mitodl/mit-learn#3364.

1. Scoped Fastly cache invalidation (pipeline.py)
Changes all three purge_all Concourse task calls (CI, QA, Production) to purge/html-pages so immutable /_next/static/ content-addressed chunks are never invalidated at deploy time. The mit-learn app now tags all HTML routes with Surrogate-Key: html-pages.

2. Build baked into Docker image (pipeline.py)
Removes build_target="build_skip_yarn" from the pipeline params. yarn build is now run inside the Docker build (baked into the image's final runner stage via output: "standalone" in next.config.js) — not as a Kubernetes Job at deploy time. Without a build_target, the pipeline builds the full Dockerfile to its last stage (runner).

3. Replace blue/green EFS deployment with rolling update (__main__.py)

Removed (~350 lines):

  • Blue/green PVCs (nextjs-build-cache-efs-blue/green) backed by EFS
  • Kubernetes build Job that ran yarn build at deploy time
  • create_deployment_for_color(), create_pvc_for_color(), determine_colors()
  • get_last_active_from_configmap() (live Kubernetes API calls during pulumi up)
  • deployment_state_configmap ConfigMap
  • auto_toggle / last_active / color-toggle logic
  • from kubernetes import client, config import

Added (~100 lines):

  • Single Deployment with RollingUpdate (maxUnavailable: 0, maxSurge: 1)
  • No volumes or volumeMounts — no EFS dependency
  • Static Service selector on k8s_app_labels
  • Single PodDisruptionBudget
  • Simplified exports: domain and image only

4. Env vars now consumed at runtime (not build time)
Previously, the NEXT_PUBLIC_* env vars set here were passed to the Kubernetes build Job at deploy time, where webpack's DefinePlugin inlined them as literals into the JS bundle. With the standalone build, that Job no longer exists.

The mit-learn app (PR #3364) introduces a PublicEnvScript Server Component that reads process.env at request time and renders a synchronous <script>window.__ENV={...}</script> in <head> before any JS bundle loads. Client code reads env vars via an env() helper that reads window.__ENV instead of using static process.env.NEXT_PUBLIC_* dot-access (which DefinePlugin would inline to empty strings). The NEXT_PUBLIC_* env vars defined in __main__.py are therefore still required and correct — they are now runtime inputs rather than build-time inputs.

How can this be tested?

  1. Run pulumi preview on the CI stack and confirm the plan shows:
    • 2 PVC deletions (nextjs-build-cache-efs-blue/green)
    • 1 Job deletion (mit-learn-nextjs-build-*)
    • 2 Deployment deletions (mit-learn-nextjs-blue/green)
    • 1 ConfigMap deletion (mit-learn-nextjs-deployment-state)
    • 1 Deployment create (mit-learn-nextjs)
    • 1 PDB create/update
    • Service selector simplified (no deployment-color label)
  2. Confirm ruff check and mypy pass on __main__.py.

Additional Context

Deployment coordination: this PR and mitodl/mit-learn#3364 must be merged and applied together as a single pulumi up. The new standalone Dockerfile runner stage is incompatible with the old EFS volume mount path.

Fastly purge transition: the first deployment after merging will have no cached objects tagged html-pages yet (nothing was tagged before). A one-time manual purge_all can be run after the first deployment to reset the Fastly cache slate if needed.

Checklist:

blarghmatey and others added 2 commits May 20, 2026 16:03
…p_yarn for mit-learn-nextjs

- Change all three purge_all calls to purge/html-pages surrogate key so
  immutable /_next/static/ chunks are never invalidated at deploy time
- Remove build_target="build_skip_yarn" from mit-learn-nextjs AppPipelineParams
  now that next build is baked into the Docker image via standalone output

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…g update

Remove the blue/green deployment mechanism and EFS PVC-based build
approach in favour of a standard Kubernetes rolling update:

- Remove: PVC creation (blue/green EFS volumes), Kubernetes build Job,
  blue and green Deployments, deployment-state ConfigMap, get_last_active_
  from_configmap(), determine_colors(), create_deployment_for_color(),
  auto_toggle logic, and all color-dependent exports
- Add: single Deployment with RollingUpdate strategy (maxUnavailable=0,
  maxSurge=1), static Service selector, single PodDisruptionBudget
- Drop kubernetes-client import (no longer reads live cluster state
  during pulumi up)

The next build is now baked into the Docker image (standalone output),
so there is no need for an EFS volume or a build Job at deploy time.

BREAKING: must be deployed together with the corresponding
mit-open Dockerfile change that adds the standalone runner stage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR simplifies the MIT Learn Next.js Kubernetes deployment by removing the blue/green EFS-based build/cache mechanism in favor of a standard rolling Deployment, and updates the Concourse pipeline to scope Fastly cache invalidation to HTML pages only.

Changes:

  • Replace blue/green EFS build + dual-Deployment toggle logic with a single rolling Deployment, static Service selector, and one PodDisruptionBudget (mit_learn_nextjs/__main__.py).
  • Remove the pipeline’s special build_skip_yarn build target for mit-learn-nextjs so it uses the default Dockerfile stage (pipeline.py).
  • Change Fastly invalidation from purge_all to purge/html-pages for mit-learn-nextjs deploys (pipeline.py).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/ol_infrastructure/applications/mit_learn_nextjs/__main__.py Removes blue/green + EFS build workflow and defines a single rolling Deployment/Service/PDB for Next.js.
src/ol_concourse/pipelines/infrastructure/k8s_apps/pipeline.py Updates mit-learn-nextjs pipeline params and scopes Fastly purges to the html-pages surrogate key.
Comments suppressed due to low confidence (2)

src/ol_concourse/pipelines/infrastructure/k8s_apps/pipeline.py:484

  • The Fastly purge command string ends with an extra empty argument (""). This is interpreted by the shell as an additional (empty) URL argument to curl, which can make the purge task fail. Remove the trailing "" from the command.
                                "-exc",
                                # Purge only HTML pages (tagged with surrogate key "html-pages").
                                # /_next/static/ assets are content-addressed and immutable —
                                # purging them causes missing-chunk errors during rolling deployments.
                                f"""curl -H "Fastly-Key: ((fastly.fastly_api_token))" -H "Accept: application/json" -i -X POST "https://api.fastly.com/service/((fastly.{pipeline_parameters.fastly_service_prefix}service_id_qa))/purge/html-pages" """,
                            ],

src/ol_concourse/pipelines/infrastructure/k8s_apps/pipeline.py:506

  • The Fastly purge command includes a trailing empty-string argument (""). When run via sh -exc, curl receives an extra empty URL argument and may exit non-zero, breaking the deployment pipeline. Drop the trailing "" so the command only contains the intended purge URL.
                                "-exc",
                                # Purge only HTML pages (tagged with surrogate key "html-pages").
                                # /_next/static/ assets are content-addressed and immutable —
                                # purging them causes missing-chunk errors during rolling deployments.
                                f"""curl -H "Fastly-Key: ((fastly.fastly_api_token))" -H "Accept: application/json" -i -X POST "https://api.fastly.com/service/((fastly.{pipeline_parameters.fastly_service_prefix}service_id_production))/purge/html-pages" """,
                            ],

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ol_concourse/pipelines/infrastructure/k8s_apps/pipeline.py
Comment thread src/ol_concourse/pipelines/infrastructure/k8s_apps/pipeline.py Outdated
- Remove stale phase-coordination comment from mit-learn-nextjs params;
  the Kubernetes build Job is already removed in this PR so the comment
  describing future Phase 3d work no longer applies
- Remove trailing whitespace from curl command strings in all three
  purge-fastly-cache task steps (CI, QA, Production); the trailing space
  in the triple-quoted f-strings was benign but misleading

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@gumaerc gumaerc self-assigned this May 21, 2026
…rams

Add a new fastly_purge_scope field to AppPipelineParams (default: "purge_all")
that controls which Fastly purge endpoint is called when purge_fastly_cache
is enabled.

- "purge_all" (default) maps to POST /service/{id}/purge_all, preserving
  the existing full-cache purge behaviour for all current consumers
- Any other string maps to POST /service/{id}/purge/{scope}, purging only
  objects tagged with that surrogate key

Update mit-learn-nextjs to explicitly set fastly_purge_scope="html-pages"
rather than hardcoding the surrogate key in comments inside the curl strings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants