Add log_std bounds to Gaussian distributions to prevent std underflow crash by kevinzakka · Pull Request #190 · leggedrobotics/rsl_rl

kevinzakka · 2026-03-06T03:30:43Z

During PPO training, the optimizer steadily reduces the policy's standard deviation as the policy converges. With std_type="log", the learnable parameter is log_std and std = exp(log_std). This is mathematically always positive, but in float32, exp(-104) = 0.0 exactly. The value underflows to zero. When Normal.sample() calls torch.normal() with std=0.0, it raises:

RuntimeError: normal expects all elements of std >= 0.0

The error message is misleading. It says std >= 0.0, but std is exactly 0.0, which should satisfy that condition. The actual check in the C++ kernel is strict > 0. This makes it appear that std went negative, leading to the natural but incorrect fix of switching to std_type="log". The problem was never negative std. It was std underflowing to exactly zero in float32, which both parameterizations are susceptible to.

With std_type="scalar", the crash can also occur through a different path. The optimizer can psuh the raw std parameter negative in a single SGD step.

Even before the crash, extremely small std causes numerical instability. Log probabilities diverge to ±inf, importance sampling ratios overflow, and gradients become NaN. None of these failure modes can be caught by gradient clipping or PPO's clip objective.

This PR addresses this problem by adding optional log_std_min / log_std_max bounds to GaussianDistribution and HeteroscedasticGaussianDistribution. The clamp is applied in log space inside update(), before any downstream computation:

log_std = self.log_std_param.clamp(min=self.log_std_min, max=self.log_std_max)
std = torch.exp(log_std)

Defaults are conservative: log_std_min=-20.0 (std ~ 2e-9) and log_std_max=inf. Users experiencing crashes during convergence should set tighter bounds (e.g., log_std_min=-3.0 for std ~ 0.05).

This is a well known stability technique. Ilya Kostrikov's jaxrl uses LOG_STD_MIN = -10.0 and LOG_STD_MAX = 2.0 as default bounds on Gaussian policy log standard deviations.

The "scalar" std parameterization allows the optimizer to push std negative, causing `RuntimeError: normal expects all elements of std >= 0.0` during training. This adds `log_std_min`/`log_std_max` bounds (applied in log-space) to both `GaussianDistribution` and `HeteroscedasticGaussianDistribution`. Defaults are `log_std_min=-20` (std ≈ 2e-9) and `log_std_max=inf` (no upper bound).

ClemensSchwarke · 2026-03-06T08:58:27Z

Hey @kevinzakka,
thanks for the PR! I am a bit confused about your statement: "For std_type="scalar", std is converted to log space first, clamped, then exponentiated. This ensures positivity and bounding in one step." I don't see this happening in the code.

kevinzakka · 2026-03-06T16:14:31Z

Hi @ClemensSchwarke, sorry that was an earlier implementation but I switched to something much simpler and force pushed.

kevinzakka · 2026-03-06T16:14:56Z

BTW, converted to a draft for discussion!

ClemensSchwarke · 2026-03-10T08:37:46Z

Got it. One more question: Does it make sense to expose std_min and std_max and convert to log instead of the other way around? To me those values would be more intuitive than their log counterparts.

kevinzakka changed the title ~~Clamp log-std in Gaussian distributions to prevent negative std crash~~ Add log_std bounds to Gaussian distributions to prevent std underflow crash Mar 6, 2026

kevinzakka force-pushed the fix/distribution-log-std-clamp branch from c7cf612 to 4b203cf Compare March 6, 2026 05:21

kevinzakka marked this pull request as draft March 6, 2026 16:14

kevinzakka mentioned this pull request Mar 11, 2026

RuntimeError: normal expects all elements of std >= 0.0 due to unknown cause mujocolab/mjlab#765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add log_std bounds to Gaussian distributions to prevent std underflow crash#190

Add log_std bounds to Gaussian distributions to prevent std underflow crash#190
kevinzakka wants to merge 1 commit intoleggedrobotics:mainfrom
kevinzakka:fix/distribution-log-std-clamp

kevinzakka commented Mar 6, 2026 •

edited

Loading

Uh oh!

ClemensSchwarke commented Mar 6, 2026

Uh oh!

kevinzakka commented Mar 6, 2026

Uh oh!

kevinzakka commented Mar 6, 2026

Uh oh!

ClemensSchwarke commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kevinzakka commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ClemensSchwarke commented Mar 6, 2026

Uh oh!

kevinzakka commented Mar 6, 2026

Uh oh!

kevinzakka commented Mar 6, 2026

Uh oh!

ClemensSchwarke commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kevinzakka commented Mar 6, 2026 •

edited

Loading