feat(nextjs): standalone output, deterministic builds, and scoped Fastly purge#3364
feat(nextjs): standalone output, deterministic builds, and scoped Fastly purge#3364blarghmatey wants to merge 5 commits into
Conversation
…tly purge
Three coordinated improvements to the Next.js deployment pipeline:
1. Scoped Fastly cache invalidation
- Add Surrogate-Key: html-pages response header to HTML routes and
sitemaps (excludes /_next/static/ via the existing file-extension
regex), so purge/html-pages at deploy time no longer invalidates
immutable content-addressed static chunks.
2. Deterministic builds
- Add generateBuildId returning NEXT_PUBLIC_VERSION || GIT_REF || 'dev'
so the build manifest filename is stable across identical builds.
- Override webpack output.filename / output.chunkFilename to use
[contenthash] instead of [chunkhash], making chunk filenames depend
only on content rather than module graph ordering.
3. Standalone output + slimmed Docker image
- Add output: 'standalone' to next.config.js so Next.js emits a
self-contained server with a minimal node_modules tree.
- Remove the build_skip_yarn Docker stage; add a new slim runner stage
that copies .next/standalone/, .next/static/, and public/ from the
build stage into a clean node:24-alpine image. The build is now fully
baked in at image build time — no EFS volume or Kubernetes Job needed
at deploy time.
- Add ARG/ENV GIT_REF to the build stage so the git SHA passed by
Concourse as BUILD_ARG_GIT_REF is available to next build.
BREAKING: this Dockerfile change must be deployed together with the
corresponding ol-infrastructure Pulumi change that removes the blue/green
EFS deployment and Kubernetes build Job.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
OpenAPI ChangesNo changes detected Unexpected changes? Ensure your branch is up-to-date with |
There was a problem hiding this comment.
Pull request overview
This PR updates the MIT Learn Next.js frontend build/deploy approach to reduce missing-chunk 404s by scoping CDN purges, making build outputs more deterministic, and shipping a standalone Next.js server bundle in the Docker image.
Changes:
- Add
Surrogate-Key: html-pagesto sitemap and “page” responses to enable targeted Fastly purges without touching immutable/_next/static/*assets. - Make builds more deterministic via a stable
generateBuildIdand replacing webpack[chunkhash]with[contenthash]in output filenames. - Switch the Docker image to a standalone
runnerstage that runs the emitted server vianodeand copies only the standalone output + static/public assets.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
frontends/main/next.config.js |
Adds standalone output, surrogate-key tagging, stable build IDs, and deterministic chunk naming. |
frontends/main/Dockerfile.web |
Reworks the container build into build + slim runner stage that runs the standalone server bundle. |
…tern - Fix misleading comment: standalone bundle includes a minimal node_modules, not zero node_modules - Add NODE_ENV=production to runner stage; it was previously inherited from the base stage but the runner uses a fresh FROM so must be set explicitly - Exclude /healthcheck from the html-pages Surrogate-Key pattern; healthcheck returns JSON and should not be tagged as an HTML page for Fastly purges Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace webpack build-time NEXT_PUBLIC_* inlining with a runtime injection
pattern that allows a single Docker image to be deployed across environments.
Problem: DefinePlugin bakes all process.env.NEXT_PUBLIC_* references as
literal values at build time. When yarn build runs in CI (without per-env
values), all NEXT_PUBLIC_* vars are empty strings in the bundle—even though
the Kubernetes pod has the correct values set.
Solution:
- PublicEnvScript: a Server Component that calls connection() (opts out of
SSG) and renders a synchronous inline <script> setting window.__ENV to
all NEXT_PUBLIC_* values from process.env at request time. Placed in root
<head> so it executes before any JS bundle loads.
- env(): a helper that reads window.__ENV in the browser and uses dynamic
process.env[key] access on the server (dynamic bracket access is NOT
replaced by DefinePlugin). Falls back to process.env[key] in the browser
to support test environments (jsdom) that set process.env directly.
- Migrate all 143 process.env.NEXT_PUBLIC_* usages across 56 source files
to env(). Test files unchanged—existing vi.stubEnv / direct process.env
assignment continues to work via the fallback.
Additional changes:
- next.config.js: gate validateEnv() with NEXT_BUILD_CI to skip at build
time; runtime validation added to instrumentation-node.ts (runs at server
startup via instrumentation.ts register()).
- Dockerfile: set ENV NEXT_BUILD_CI=1 in build stage.
- ConfiguredPostHogProvider: compute PostHog bootstrap feature flags from
window.__ENV at runtime (previously baked via processFeatureFlags() in
next.config.js which returns {} in CI builds). Remove build-time
FEATURE_FLAGS from next.config.js env config.
- instrumentation-node.ts: use dynamic process.env[key] access for
NEXT_PUBLIC_SENTRY_* and NEXT_PUBLIC_VERSION to avoid DefinePlugin
inlining; add required env var check at startup.
- otel-utils.ts: same dynamic access for APP_VERSION.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@copilot address the test failures |
Agent-Logs-Url: https://github.com/mitodl/mit-learn/sessions/03f963fb-4658-4987-87e4-3a7610eee518 Co-authored-by: blarghmatey <479088+blarghmatey@users.noreply.github.com>
Agent-Logs-Url: https://github.com/mitodl/mit-learn/sessions/03f963fb-4658-4987-87e4-3a7610eee518 Co-authored-by: blarghmatey <479088+blarghmatey@users.noreply.github.com>
Addressed in 9e82cc7, with a small cleanup in c7c0888. I traced the failing CI run to the Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
What are the relevant tickets?
N/A
Description (What does it do?)
Addresses the root causes of missing-chunk 404 errors in the MIT Learn Next.js deployment by removing the EFS/blue-green architecture and adding runtime env injection so a single Docker image works across all environments.
1. Scoped Fastly cache invalidation
Adds a
Surrogate-Key: html-pagesresponse header to HTML routes and sitemaps. The Concourse pipeline now calls/service/b3/purge/html-pagesinstead ofpurge_all— so immutable/_next/static/content-addressed chunks are never evicted during a deploy.2. Deterministic builds
generateBuildIdreturnsNEXT_PUBLIC_VERSION || GIT_REF || 'dev'— stabilises the build manifest filename across identical builds.output.filename/output.chunkFilenameoverridden to use[contenthash]instead of[chunkhash], making chunk filenames depend only on file content rather than module graph ordering.3. Standalone Docker image (replaces EFS + Kubernetes build Job)
output: 'standalone'tonext.config.jsso Next.js emits a self-contained server bundle.build_skip_yarnDocker stage; adds a new slimrunnerstage that copies.next/standalone/,.next/static/, andpublic/from the build stage into a cleannode:24-alpineimage.ARG GIT_REF/ENV GIT_REFto the build stage so the git SHA passed by Concourse asBUILD_ARG_GIT_REFis available tonext build.4. Runtime env injection via PublicEnvScript + env() helper
Fixes a critical issue with standalone builds: webpack's
DefinePlugininlines allprocess.env.NEXT_PUBLIC_*references as literal strings at build time. Sinceyarn buildruns in CI without per-environment values, allNEXT_PUBLIC_*vars would be empty strings in the bundle — even though Kubernetes sets the correct values.The solution decouples env var reads from the build:
src/env.ts—env("NEXT_PUBLIC_FOO")helper. In the browser, readswindow.__ENV[key](set before any JS loads). On the server, uses dynamic bracket access (process.env[key]) which is not replaced by DefinePlugin. Falls back toprocess.env[key]in the browser so existingvi.stubEnv()/ directprocess.envassignments in tests continue to work without modification.src/app/components/PublicEnvScript.tsx— a Next.js Server Component that callsconnection()(opts the route out of SSG so env is read at request time) and renders a synchronous inline<script>window.__ENV={...}</script>in<head>with allNEXT_PUBLIC_*env vars serialized as HTML-safe JSON. Because it's a plain<script>tag (notnext/script), it executes synchronously before any JS module — including Sentry'sinstrumentation-client.ts.56 source files migrated: all
process.env.NEXT_PUBLIC_*references replaced withenv()calls; no test files changed.PostHog bootstrap feature flags (
NEXT_PUBLIC_FEATURE_*) are now computed at runtime fromwindow.__ENVinsideuseEffectinConfiguredPostHogProvider, rather than being baked viaprocessFeatureFlags()innext.config.js.Server startup validation:
next.config.jsskipsvalidateEnv()during Docker builds (NEXT_BUILD_CI=1). Runtime validation was moved toinstrumentation-node.ts(runs viainstrumentation.ts register()at server startup) — if required env vars are missing, the pod exits with code 1 and crash-loops to make the misconfiguration immediately visible.Companion infrastructure PR: mitodl/ol-infrastructure#4652 — removes the blue/green EFS deployment and Kubernetes build Job from Pulumi. These two PRs must be merged and deployed together.
How can this be tested?
Standalone image + env injection:
curl -s http://localhost:3000/ | grep 'window.__ENV'— should show the injected env vars in the HTML<head>Fastly surrogate key:
Deterministic builds:
Build twice from the same source; confirm chunk filenames are identical.
Additional Context
Deployment coordination: this PR and mitodl/ol-infrastructure#4652 must be deployed as a single
pulumi up— deploying the new Dockerfile without removing the Kubernetes build Job (or vice versa) will break the app.One-time post-deploy: the first deployment after merging will have no existing
html-pages-tagged objects in Fastly (nothing was tagged before). A one-time manualpurge_allshould be run immediately after the first deploy to ensure stale pages are cleared, after which the scoped purge takes over.