Add log_std bounds to Gaussian distributions to prevent std underflow crash#190
Draft
kevinzakka wants to merge 1 commit intoleggedrobotics:mainfrom
Draft
Add log_std bounds to Gaussian distributions to prevent std underflow crash#190kevinzakka wants to merge 1 commit intoleggedrobotics:mainfrom
kevinzakka wants to merge 1 commit intoleggedrobotics:mainfrom
Conversation
The "scalar" std parameterization allows the optimizer to push std negative, causing `RuntimeError: normal expects all elements of std >= 0.0` during training. This adds `log_std_min`/`log_std_max` bounds (applied in log-space) to both `GaussianDistribution` and `HeteroscedasticGaussianDistribution`. Defaults are `log_std_min=-20` (std ≈ 2e-9) and `log_std_max=inf` (no upper bound).
c7cf612 to
4b203cf
Compare
Collaborator
|
Hey @kevinzakka, |
Contributor
Author
|
Hi @ClemensSchwarke, sorry that was an earlier implementation but I switched to something much simpler and force pushed. |
Contributor
Author
|
BTW, converted to a draft for discussion! |
Collaborator
|
Got it. One more question: Does it make sense to expose |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
During PPO training, the optimizer steadily reduces the policy's standard deviation as the policy converges. With
std_type="log", the learnable parameter islog_stdandstd = exp(log_std). This is mathematically always positive, but in float32,exp(-104) = 0.0exactly. The value underflows to zero. WhenNormal.sample()callstorch.normal()withstd=0.0, it raises:The error message is misleading. It says
std >= 0.0, butstdis exactly0.0, which should satisfy that condition. The actual check in the C++ kernel is strict> 0. This makes it appear that std went negative, leading to the natural but incorrect fix of switching tostd_type="log". The problem was never negative std. It was std underflowing to exactly zero in float32, which both parameterizations are susceptible to.With
std_type="scalar", the crash can also occur through a different path. The optimizer can psuh the rawstdparameter negative in a single SGD step.Even before the crash, extremely small
stdcauses numerical instability. Log probabilities diverge to±inf, importance sampling ratios overflow, and gradients become NaN. None of these failure modes can be caught by gradient clipping or PPO's clip objective.This PR addresses this problem by adding optional
log_std_min/log_std_maxbounds toGaussianDistributionandHeteroscedasticGaussianDistribution. The clamp is applied in log space insideupdate(), before any downstream computation:Defaults are conservative:
log_std_min=-20.0(std ~ 2e-9) andlog_std_max=inf. Users experiencing crashes during convergence should set tighter bounds (e.g.,log_std_min=-3.0for std ~ 0.05).This is a well known stability technique. Ilya Kostrikov's jaxrl uses
LOG_STD_MIN = -10.0andLOG_STD_MAX = 2.0as default bounds on Gaussian policy log standard deviations.