Fix Model Sigma Data by albertocarpentieri · Pull Request #1534 · NVIDIA/physicsnemo

albertocarpentieri · 2026-03-25T17:50:40Z

PhysicsNeMo Pull Request

Description

The EDM preconditioner (EDMPrecond) defaults to sigma_data=0.5, ignoring the value set in training.loss.sigma_data. This causes a mismatch between the loss weighting and the preconditioning coefficients (c_skip, c_out, c_in), leading to systematic bias in diffusion model predictions.
Forward training.loss.sigma_data into the model hyperparameters when building the diffusion network, so the preconditioner and loss use the same value.

Checklist

[x ] I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

greptile-apps · 2026-03-25T17:55:33Z

Greptile Summary

This PR fixes a real bug in the StormCast trainer: EDMPrecond was silently defaulting to sigma_data=0.5 regardless of the value configured in training.loss.sigma_data, creating a mismatch between the EDM loss weighting and the preconditioner coefficients (c_skip, c_out, c_in). The fix reads sigma_data from the loss config at model build time and injects it into model_hparams via setdefault, correctly preserving any explicit user override in model.hyperparameters.

Key observations:

The scalar sigma_data case is handled correctly and the use of setdefault is appropriate.
Per-channel sigma_data: LossConfig.sigma_data can be a list[float]. In _setup_loss this list is converted to a (1, C, 1, 1) tensor for broadcasting. In the new code the raw list is forwarded to EDMPrecond, which declares sigma_data: float and stores torch.tensor(sigma_data) as a buffer — producing a 1D (C,) tensor that will broadcast incorrectly (or error) in the preconditioner's coefficient arithmetic. The per-channel case needs a guard or explicit documentation.
The info log fires unconditionally even when a model-level override causes the preconditioner and loss to intentionally use different values, which may be confusing.

Important Files Changed

Filename	Overview
examples/weather/stormcast/utils/trainer.py	Forwards `training.loss.sigma_data` into the model's hyperparameters so the EDMPrecond preconditioner and the EDM loss share the same value. The scalar case is handled correctly; however, when `sigma_data` is a per-channel `list[float]`, the raw list is passed to `EDMPrecond` which expects a `float`, potentially causing incorrect preconditioner coefficients or a runtime shape error.

_{Reviews (1): Last reviewed commit: "fix model sigma data" | Re-trigger Greptile}

jleinonen · 2026-03-25T18:20:13Z

As per the Greptile comment, this doesn't address the case of sigma_data passed as a list. Should we implement it now or leave it for later?

jleinonen · 2026-03-25T18:22:12Z

Can we add a test that makes sure the loss and the model have the same sigma_data?

pzharrington · 2026-03-25T19:27:33Z

Did this bug only apply to EDMPrecond (used with songunet) and not EDMPreconditioner? If so these changes may be superseded fairly soon. I started working on full adoption of the new physcisnemo.diffusion interfaces, thus eliminating use of EDMPrecond, still need to figure out the specific changes to support channel-wise sigma_data though.

jleinonen · 2026-03-25T20:39:34Z

Did this bug only apply to EDMPrecond (used with songunet) and not EDMPreconditioner? If so these changes may be supersedes fairly soon. I started working on full adoption of the new physcisnemo.diffusion interfaces, thus eliminating use of EDMPrecond, still need to figure out the specific changes to support channel-wise sigma_data though.

It seems to affect both and this change should fix both, since both have a sigma_data parameter in their call signature.

pzharrington · 2026-03-25T20:45:14Z

Ah ok I was just going off the greptile summary. Regarding the channel-wise sigma_data, how impactful has it been to tune that rather than use a scalar or the default 0.5? I don't think StormScope uses per-channel sigma_data IIRC

jleinonen · 2026-03-25T22:02:04Z

Ah ok I was just going off the greptile summary. Regarding the channel-wise sigma_data, how impactful has it been to tune that rather than use a scalar or the default 0.5? I don't think StormScope uses per-channel sigma_data IIRC

It's mostly relevant for regression-diffusion models. For those the regression net often has very different errors for the channels and then we should choose the RMSE of the regression for each channel as sigma_data. Whereas for pure diffusion models we usually have the data normalized to unit variance, in which case sigma_data = 1.0 should be the most mathematically justified choice.

Signed-off-by: root <root@pool0-01762.cm.cluster>

Signed-off-by: root <root@pool0-01101.cm.cluster>

Signed-off-by: root <root@pool0-01523.cm.cluster>

Signed-off-by: root <root@pool0-01102.cm.cluster>

Signed-off-by: root <root@pool0-01814.cm.cluster>

greptile-apps Bot reviewed Mar 25, 2026

View reviewed changes

Comment thread examples/weather/stormcast/utils/trainer.py

Comment thread examples/weather/stormcast/utils/trainer.py Outdated

jleinonen requested a review from pzharrington March 25, 2026 19:15

root and others added 4 commits April 7, 2026 05:37

fix model sigma data

48e168b

Signed-off-by: root <root@pool0-01762.cm.cluster>

support sigma_data per channel

1e7f98c

Signed-off-by: root <root@pool0-01101.cm.cluster>

fix bug in schedulers.py logging

b4243cf

Signed-off-by: root <root@pool0-01523.cm.cluster>

remove bias

bcb2432

Signed-off-by: root <root@pool0-01102.cm.cluster>

albertocarpentieri force-pushed the fix/stormcast-sigma-data-precond branch from f4e3310 to bcb2432 Compare April 7, 2026 12:40

update loss and dit

083de57

Signed-off-by: root <root@pool0-01814.cm.cluster>

albertocarpentieri requested a review from CharlelieLrt as a code owner April 14, 2026 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Model Sigma Data#1534

Fix Model Sigma Data#1534
albertocarpentieri wants to merge 5 commits intoNVIDIA:mainfrom
albertocarpentieri:fix/stormcast-sigma-data-precond

albertocarpentieri commented Mar 25, 2026

Uh oh!

greptile-apps Bot commented Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

jleinonen commented Mar 25, 2026

Uh oh!

jleinonen commented Mar 25, 2026

Uh oh!

pzharrington commented Mar 25, 2026 •

edited

Loading

Uh oh!

jleinonen commented Mar 25, 2026

Uh oh!

pzharrington commented Mar 25, 2026

Uh oh!

jleinonen commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertocarpentieri commented Mar 25, 2026

PhysicsNeMo Pull Request

Description

Checklist

Uh oh!

greptile-apps Bot commented Mar 25, 2026

Greptile Summary

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

jleinonen commented Mar 25, 2026

Uh oh!

jleinonen commented Mar 25, 2026

Uh oh!

pzharrington commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jleinonen commented Mar 25, 2026

Uh oh!

pzharrington commented Mar 25, 2026

Uh oh!

jleinonen commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pzharrington commented Mar 25, 2026 •

edited

Loading

jleinonen commented Mar 25, 2026 •

edited

Loading