Skip to content

Add log_std bounds to Gaussian distributions to prevent std underflow crash#190

Draft
kevinzakka wants to merge 1 commit intoleggedrobotics:mainfrom
kevinzakka:fix/distribution-log-std-clamp
Draft

Add log_std bounds to Gaussian distributions to prevent std underflow crash#190
kevinzakka wants to merge 1 commit intoleggedrobotics:mainfrom
kevinzakka:fix/distribution-log-std-clamp

Conversation

@kevinzakka
Copy link
Copy Markdown
Contributor

@kevinzakka kevinzakka commented Mar 6, 2026

During PPO training, the optimizer steadily reduces the policy's standard deviation as the policy converges. With std_type="log", the learnable parameter is log_std and std = exp(log_std). This is mathematically always positive, but in float32, exp(-104) = 0.0 exactly. The value underflows to zero. When Normal.sample() calls torch.normal() with std=0.0, it raises:

RuntimeError: normal expects all elements of std >= 0.0

The error message is misleading. It says std >= 0.0, but std is exactly 0.0, which should satisfy that condition. The actual check in the C++ kernel is strict > 0. This makes it appear that std went negative, leading to the natural but incorrect fix of switching to std_type="log". The problem was never negative std. It was std underflowing to exactly zero in float32, which both parameterizations are susceptible to.

With std_type="scalar", the crash can also occur through a different path. The optimizer can psuh the raw std parameter negative in a single SGD step.

Even before the crash, extremely small std causes numerical instability. Log probabilities diverge to ±inf, importance sampling ratios overflow, and gradients become NaN. None of these failure modes can be caught by gradient clipping or PPO's clip objective.

This PR addresses this problem by adding optional log_std_min / log_std_max bounds to GaussianDistribution and HeteroscedasticGaussianDistribution. The clamp is applied in log space inside update(), before any downstream computation:

log_std = self.log_std_param.clamp(min=self.log_std_min, max=self.log_std_max)
std = torch.exp(log_std)

Defaults are conservative: log_std_min=-20.0 (std ~ 2e-9) and log_std_max=inf. Users experiencing crashes during convergence should set tighter bounds (e.g., log_std_min=-3.0 for std ~ 0.05).

This is a well known stability technique. Ilya Kostrikov's jaxrl uses LOG_STD_MIN = -10.0 and LOG_STD_MAX = 2.0 as default bounds on Gaussian policy log standard deviations.

@kevinzakka kevinzakka changed the title Clamp log-std in Gaussian distributions to prevent negative std crash Add log_std bounds to Gaussian distributions to prevent std underflow crash Mar 6, 2026
The "scalar" std parameterization allows the optimizer to push std
negative, causing `RuntimeError: normal expects all elements of std >= 0.0`
during training. This adds `log_std_min`/`log_std_max` bounds (applied in
log-space) to both `GaussianDistribution` and
`HeteroscedasticGaussianDistribution`. Defaults are `log_std_min=-20`
(std ≈ 2e-9) and `log_std_max=inf` (no upper bound).
@kevinzakka kevinzakka force-pushed the fix/distribution-log-std-clamp branch from c7cf612 to 4b203cf Compare March 6, 2026 05:21
@ClemensSchwarke
Copy link
Copy Markdown
Collaborator

Hey @kevinzakka,
thanks for the PR! I am a bit confused about your statement: "For std_type="scalar", std is converted to log space first, clamped, then exponentiated. This ensures positivity and bounding in one step." I don't see this happening in the code.

@kevinzakka
Copy link
Copy Markdown
Contributor Author

Hi @ClemensSchwarke, sorry that was an earlier implementation but I switched to something much simpler and force pushed.

@kevinzakka kevinzakka marked this pull request as draft March 6, 2026 16:14
@kevinzakka
Copy link
Copy Markdown
Contributor Author

BTW, converted to a draft for discussion!

@ClemensSchwarke
Copy link
Copy Markdown
Collaborator

Got it. One more question: Does it make sense to expose std_min and std_max and convert to log instead of the other way around? To me those values would be more intuitive than their log counterparts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants