Skip to content

docs: expand golden values update strategy in release process guide#3268

Open
ko3n1g wants to merge 2 commits intomainfrom
ko3n1g/docs/golden-values-update-strategy
Open

docs: expand golden values update strategy in release process guide#3268
ko3n1g wants to merge 2 commits intomainfrom
ko3n1g/docs/golden-values-update-strategy

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Apr 10, 2026

Summary

  • Adds a "When to update golden values" rule: any metric-affecting PR must ship with a paired internal PR — no waiting until after merge.
  • Adds a step-by-step guide for PRs targeting main: rebase MBridge against main, launch CI with nightly container + latest MCore commit + MBridge PR commit, open internal PR against internal-ci/main.
  • Adds a step-by-step guide for release updates: rebase MBridge against its release branch, use the latest RC container + release-branch MCore commit, open internal PR against internal-ci/<release-branch>.
  • Clarifies that the internal regression tracker is driven by golden values, making accuracy critical.

Example (what a dev sees now)

When to update golden values

Any PR that can affect performance metrics … must be accompanied by a corresponding internal PR that updates the golden values before merging.

Updating golden values for PRs targeting main

  1. Rebase the MBridge PR against main
  2. Launch an internal CI run using: latest nightly container, latest MCore commit, MBridge PR commit.
  3. Open a PR against the internal CI repository's main branch …

Test plan

  • Doc renders correctly in GitHub
  • Steps reviewed for accuracy against current CI tooling

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Updated release process guidelines with clearer procedures for managing performance metrics throughout release cycles.
    • Refined release timeline structure with explicit instructions for metric updates during code-freeze phases and release branches.
    • Expanded guidance on coordinating metric updates with code changes during development and release processes.

ko3n1g and others added 2 commits April 10, 2026 10:15
Add explicit step-by-step instructions for when and how to update golden
values: paired internal PRs for metric-affecting changes, exact CI inputs
(container, MCore commit, MBridge commit) for both main and release
branches, and which internal CI repo branch to target in each case.

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Specify that the last bulk golden values update happens in Week 5 of
code-freeze; after that, engineers are individually responsible for
keeping golden values current on the release branch.

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Documentation update to the release process guide that expands and clarifies guidance on golden values management, including new rules for PR updates, step-by-step procedures for updating golden values for main and release branches, and revised release-branch code-freeze timelines.

Changes

Cohort / File(s) Summary
Release Process Documentation
docs/releases/release-process.md
Expanded "Golden Values" section with definitions and new guidance; added explicit procedures for updating golden values in PRs targeting main and during releases; clarified code-freeze timeline sections with specific week references (Weeks 3–5 and 6–7) and refined language on golden-value refresh responsibilities.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: expanding documentation on golden values update strategy in the release process guide.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR contains documentation-only changes to release procedures without code modifications, qualifying as minor changes that do not require numerical test results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ko3n1g/docs/golden-values-update-strategy

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docs/releases/release-process.md (2)

33-36: Add explicit requirement to record perf deltas + run context.

Line 35 mandates paired golden-values PRs, which is good. Please also require before/after performance numbers and the exact run configuration/context in the PR description so reviewers can validate why golden values changed.

Based on learnings, if a change could affect performance, the PR description should include before-and-after performance numbers, as well as the configuration and context in which they apply.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/releases/release-process.md` around lines 33 - 36, Update the "When to
update golden values" section to require that any PR triggering a golden-values
update must include before-and-after performance numbers and the exact run
configuration/context in the PR description; specifically, augment the guideline
under the "When to update golden values" header to explicitly state that PRs
must record performance deltas (numeric before/after metrics), the benchmark
commands, dataset/seed/environment details, and any relevant hardware/software
configuration so reviewers can validate the change.

37-46: Minor wording cleanup on Line 45.

“merged together” is a bit wordy; “merged in sync” (or similar) reads cleaner.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/releases/release-process.md` around lines 37 - 46, Update the wording in
the "Updating golden values for PRs targeting `main`" step 4: replace the phrase
"The MBridge PR and the internal golden-values PR should be merged together (or
the golden-values PR first)." with a cleaner alternative such as "The MBridge PR
and the internal golden-values PR should be merged in sync (or the golden-values
PR first)." Modify the text under that header/step to use "merged in sync" (or a
similar concise phrase) instead of "merged together" to improve clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/releases/release-process.md`:
- Around line 47-57: The phrase "MBridge PR commit" defined as "head of the
MBridge release branch" is ambiguous; update the text so it explicitly instructs
to use the head commit of the PR branch (or the exact PR SHA) to avoid picking
the wrong commit when multiple PRs target the same release branch—replace the
current wording "MBridge PR commit (head of the MBridge release branch)" with
something like "MBridge PR commit — the head commit of the PR branch (use the
exact PR branch tip SHA or PR merge SHA)".

---

Nitpick comments:
In `@docs/releases/release-process.md`:
- Around line 33-36: Update the "When to update golden values" section to
require that any PR triggering a golden-values update must include
before-and-after performance numbers and the exact run configuration/context in
the PR description; specifically, augment the guideline under the "When to
update golden values" header to explicitly state that PRs must record
performance deltas (numeric before/after metrics), the benchmark commands,
dataset/seed/environment details, and any relevant hardware/software
configuration so reviewers can validate the change.
- Around line 37-46: Update the wording in the "Updating golden values for PRs
targeting `main`" step 4: replace the phrase "The MBridge PR and the internal
golden-values PR should be merged together (or the golden-values PR first)."
with a cleaner alternative such as "The MBridge PR and the internal
golden-values PR should be merged in sync (or the golden-values PR first)."
Modify the text under that header/step to use "merged in sync" (or a similar
concise phrase) instead of "merged together" to improve clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d0ed51b4-22f2-4771-8999-699c40b889af

📥 Commits

Reviewing files that changed from the base of the PR and between fad15ab and 5ffc8c1.

📒 Files selected for processing (1)
  • docs/releases/release-process.md

Comment on lines +47 to 57
### Updating golden values during a release

When golden values need to be refreshed on the release branch (e.g. at the start of code-freeze or after an accepted regression):

1. **Rebase the MBridge PR against the MBridge release branch** so it is at the head of that branch.
2. **Launch an internal CI run** using:
- The **latest internal RC container** for the release.
- The **MCore commit pinned on the release branch**.
- The **MBridge PR commit** (head of the MBridge release branch).
3. Open a PR against the **internal CI repository's release branch** with the updated golden values.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the commit source on Line 55 to avoid wrong SHA selection.

Line 55 says “MBridge PR commit” and then defines it as “head of the MBridge release branch.” That can conflict when multiple PRs target the same release branch. Recommend explicitly saying “the head commit of the PR branch (or exact PR SHA).”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/releases/release-process.md` around lines 47 - 57, The phrase "MBridge
PR commit" defined as "head of the MBridge release branch" is ambiguous; update
the text so it explicitly instructs to use the head commit of the PR branch (or
the exact PR SHA) to avoid picking the wrong commit when multiple PRs target the
same release branch—replace the current wording "MBridge PR commit (head of the
MBridge release branch)" with something like "MBridge PR commit — the head
commit of the PR branch (use the exact PR branch tip SHA or PR merge SHA)".

### When to update golden values

Any PR that can affect performance metrics (e.g. changes to model code, training loop, optimizer, or numerical kernels) **must be accompanied by a corresponding internal PR that updates the golden values** before merging. Do not wait until after the PR lands.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a statement for exceptions- "Exceptions can be made on rare occasion of issues with GPU availability- cluster is offline, compute availability is low, etc."?

@yaoyu-33 yaoyu-33 added docs-only With great power comes great responsibility. needs-review PR is ready for code review and waiting on a reviewer labels Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-only With great power comes great responsibility. needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants