-
Notifications
You must be signed in to change notification settings - Fork 264
docs: expand golden values update strategy in release process guide #3268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,7 +28,32 @@ From RC3 onward, RCs are cut **more frequently and as needed**, rather than stri | |
|
|
||
| ## Golden Values | ||
|
|
||
| Golden values are reference outputs used to validate model behavior in CI. | ||
| Golden values are reference outputs used to validate model behavior in CI. They live in the **internal CI repository** and are the baseline for the internal regression tracker — keeping them current and accurate is therefore critical for meaningful signal. | ||
|
|
||
| ### When to update golden values | ||
|
|
||
| Any PR that can affect performance metrics (e.g. changes to model code, training loop, optimizer, or numerical kernels) **must be accompanied by a corresponding internal PR that updates the golden values** before merging. Do not wait until after the PR lands. | ||
|
|
||
| ### Updating golden values for PRs targeting `main` | ||
|
|
||
| 1. **Rebase the MBridge PR against `main`** so it is at top-of-tree before launching CI. | ||
| 2. **Launch an internal CI run** using: | ||
| - The **latest nightly container** as the base image. | ||
| - The **latest MCore commit** on `main`. | ||
| - The **MBridge PR commit** (the head of your MBridge branch). | ||
| 3. Collect the outputs and open a PR against the **internal CI repository's `main` branch** with the updated golden values. | ||
| 4. The MBridge PR and the internal golden-values PR should be merged together (or the golden-values PR first). | ||
|
|
||
| ### Updating golden values during a release | ||
|
|
||
| When golden values need to be refreshed on the release branch (e.g. at the start of code-freeze or after an accepted regression): | ||
|
|
||
| 1. **Rebase the MBridge PR against the MBridge release branch** so it is at the head of that branch. | ||
| 2. **Launch an internal CI run** using: | ||
| - The **latest internal RC container** for the release. | ||
| - The **MCore commit pinned on the release branch**. | ||
| - The **MBridge PR commit** (head of the MBridge release branch). | ||
| 3. Open a PR against the **internal CI repository's release branch** with the updated golden values. | ||
|
|
||
|
Comment on lines
+47
to
57
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clarify the commit source on Line 55 to avoid wrong SHA selection. Line 55 says “MBridge PR commit” and then defines it as “head of the MBridge release branch.” That can conflict when multiple PRs target the same release branch. Recommend explicitly saying “the head commit of the PR branch (or exact PR SHA).” 🤖 Prompt for AI Agents |
||
| ### During the RC Phase (before code-freeze) | ||
|
|
||
|
|
@@ -41,24 +66,26 @@ This means golden values are not automatically updated with every run — a deli | |
|
|
||
| ### On the Release Branch (during code-freeze) | ||
|
|
||
| When the release branch is created at code-freeze, all golden values are updated **unconditionally**. Whatever the current output is becomes the new reference baseline for the release. | ||
| When the release branch is created at code-freeze, all golden values are updated **unconditionally** — whatever the current output is becomes the new reference baseline for the release. | ||
|
|
||
| In **Week 5**, the last bulk update of golden values is performed. After that point, engineers are individually responsible for updating any remaining golden values on the release branch, reviewing discrepancies and ensuring the suite is clean ahead of the release. | ||
|
|
||
| ----- | ||
|
|
||
| ## Code-Freeze | ||
|
|
||
| Code-freeze lasts **two weeks** and begins when RC3 is cut. This is the **stabilization phase** — no new features are landed. | ||
|
|
||
| ### First Half | ||
| ### First Half (Weeks 3–5) | ||
|
|
||
| - **Release branches are created.** | ||
| - All golden values on the release branch are updated unconditionally (see above). | ||
| - The **last bulk CI run** occurs one week into the code-freeze period. | ||
| - The **last bulk update of golden values** happens in **Week 5**. | ||
| - RCs continue to be cut as needed. | ||
|
|
||
| ### Second Half | ||
| ### Second Half (Weeks 6–7) | ||
|
|
||
| - **Engineers are responsible for updating golden values** on the release branch — reviewing any remaining discrepancies and ensuring the suite is in a clean state ahead of release. | ||
| - **Engineers are individually responsible for updating golden values** on the release branch — reviewing any remaining discrepancies and ensuring the suite is in a clean state ahead of release. | ||
| - RCs continue to be cut as needed. | ||
|
|
||
| ### Release Day | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a statement for exceptions- "Exceptions can be made on rare occasion of issues with GPU availability- cluster is offline, compute availability is low, etc."?