Bounded Action Space by AntoineRichard · Pull Request #81 · leggedrobotics/rsl_rl

AntoineRichard · 2025-04-01T11:46:36Z

Hi there!

This PR adds support for bounded action spaces directly into the agent.
The main difference with clipping, is that this ensures actions are sampled within a fixed range and rewards on actions will not be computed on clipped actions.

To accomodate this, two options are provided:

The "SAC style", where a gaussian based policy is bounded to the [-1, 1] range with a tanh on the mean and a tanh on the sampled actions. This is accounted for in the calculation of the action log dist. (Appendix C here: https://arxiv.org/pdf/1801.01290) Or one could look at: https://github.com/DLR-RM/stable-baselines3/blob/ea913a848242b2fca3cbcac255097e1d144207df/stable_baselines3/common/distributions.py#L207 ?
A "beta policy", where rather than sampling on a probability distribution that's unbounded (the normal distribution for instance), we sample actions using a bounded probability distribution (the beta distribution). Original paper here: https://proceedings.mlr.press/v70/chou17a/chou17a.pdf . This is then rescaled to whatever is needed.

To allow for the smooth calculation of the KL distance between two beta distribution, I had to slightly rework the transition to store the distribution parameters rather than just the std and the mean. Hence in the case of the normal distribution, I save mean + std_dev, while for the beta distribution alpha and beta.

Then instead of manually computing the KL distance, I let torch do the heavy lifting.

Configuration wise it could look like this:

Beta

    policy = RslRlPpoActorCriticBetaCfg(
        init_noise_std=1.0,
        actor_hidden_dims=[32, 32],
        critic_hidden_dims=[32, 32],
        activation="elu",
        clip_actions=True, # Note this is useless since it clips all the time regardless.
        clip_actions_range=[-1.0, 1.0],
    )

Normal

    policy = RslRlPpoActorCriticCfg(
        init_noise_std=1.0,
        actor_hidden_dims=[32, 32],
        critic_hidden_dims=[32, 32],
        activation="elu",
        clip_actions=True, # Default to False
        clip_actions_range=[-1.0, 1.0],
    )

I know, this changes significantly the way PPO updates are done, and it's a BREAKING CHANGE, so no I totally understand if the beta policy doesn't make it to main repo! Though having a reliable action clipping mechanism would be nice :).

LMK if you want me to change anything, I'd be happy to!

Best,

Antoine

… naturally bounded action space.

ClemensSchwarke · 2025-05-09T10:46:26Z

Hi Antoine,
thanks for your PR! Distributions are definitely on our list for a future release. We will review your code once we are at that point.
Thanks :)

Mayankm96 · 2025-05-09T14:19:40Z

If you'd like to help add this feature, I think it would make sense to have a general distribution class through which beta or gaussian essentially becomes different options users can configure. Eventually we want to add support for categorical distribution and other types as well.

jingyang-huang · 2025-05-15T03:49:59Z

Hello, have you tested this modification, I try to train it with my env but it fails

AntoineRichard · 2025-05-15T06:40:46Z

Hey! Yes we use it to train all our robots at the University of Luxembourg! Is it crashing? Or just not training?

jingyang-huang · 2025-05-15T07:38:13Z

Hey! Yes we use it to train all our robots at the University of Luxembourg! Is it crashing? Or just not training?

Hey, Antoine, glad to hear from you so fast! I merged your PR into my branch and use the Normal one to train my agent. I keep the env and rewards all the same and just change the policy. However, my training fails and the mean reward did not increase, which is quite abnormal(Previous one works well). I was wondering if we need to adjust some hyper-parameters to adpat to the new policy?
The specific training log is shown below, in which the orange one is the newly modified ppo.

Looking forward to your reply!

AntoineRichard · 2025-05-15T08:33:47Z

And that's the Beta or the Squashed Gaussian?

jingyang-huang · 2025-05-15T08:57:13Z

And that's the Beta or the Squashed Gaussian?

It`s the Squashed Gaussian. Besides, I have also tried Beta, but sadly none of them work for my case😭

AntoineRichard · 2025-05-15T10:54:28Z

Any chances it's just not outputting values in the range that make sense for you? I would recommend looking at the std_dev values

jingyang-huang · 2025-05-15T16:33:45Z

Any chances it's just not outputting values in the range that make sense for you? I would recommend looking at the std_dev values

Okay, thank you for your advice Antoine, I would try to debug and find the problem

jingyang-huang · 2025-05-17T10:47:55Z

Any chances it's just not outputting values in the range that make sense for you? I would recommend looking at the std_dev values

Hi, Antoine, I was wondering how is the performance of Beta and Squashed on your task. I increased num_steps_per_env to enable the training for them, but it turns out that they are not as good as the original one on my task, and could not converge to the optimal. How about you, do they perform better?

Added action clipping SAC style, and created a BetaPolicy which has a…

543ecc7

… naturally bounded action space.

AntoineRichard changed the title ~~Added action clipping SAC style, and created a BetaPolicy which has a…~~ Bounded Action Space Apr 1, 2025

ClemensSchwarke force-pushed the main branch from 3c12abb to 830fa98 Compare July 18, 2025 09:40

Mayankm96 force-pushed the main branch from 1979479 to cf71aa6 Compare September 18, 2025 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bounded Action Space#81

Bounded Action Space#81
AntoineRichard wants to merge 1 commit intoleggedrobotics:mainfrom
AntoineRichard:antoine/clipped_action_spaces

AntoineRichard commented Apr 1, 2025

Uh oh!

ClemensSchwarke commented May 9, 2025

Uh oh!

Mayankm96 commented May 9, 2025

Uh oh!

jingyang-huang commented May 15, 2025

Uh oh!

AntoineRichard commented May 15, 2025 •

edited

Loading

Uh oh!

jingyang-huang commented May 15, 2025 •

edited

Loading

Uh oh!

AntoineRichard commented May 15, 2025

Uh oh!

jingyang-huang commented May 15, 2025 •

edited

Loading

Uh oh!

AntoineRichard commented May 15, 2025

Uh oh!

jingyang-huang commented May 15, 2025

Uh oh!

jingyang-huang commented May 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

AntoineRichard commented Apr 1, 2025

Uh oh!

ClemensSchwarke commented May 9, 2025

Uh oh!

Mayankm96 commented May 9, 2025

Uh oh!

jingyang-huang commented May 15, 2025

Uh oh!

AntoineRichard commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jingyang-huang commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AntoineRichard commented May 15, 2025

Uh oh!

jingyang-huang commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AntoineRichard commented May 15, 2025

Uh oh!

jingyang-huang commented May 15, 2025

Uh oh!

jingyang-huang commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AntoineRichard commented May 15, 2025 •

edited

Loading

jingyang-huang commented May 15, 2025 •

edited

Loading

jingyang-huang commented May 15, 2025 •

edited

Loading

jingyang-huang commented May 17, 2025 •

edited

Loading