Conversation
… naturally bounded action space.
|
Hi Antoine, |
|
If you'd like to help add this feature, I think it would make sense to have a general distribution class through which beta or gaussian essentially becomes different options users can configure. Eventually we want to add support for categorical distribution and other types as well. |
|
Hello, have you tested this modification, I try to train it with my env but it fails |
|
Hey! Yes we use it to train all our robots at the University of Luxembourg! Is it crashing? Or just not training? |
|
And that's the Beta or the Squashed Gaussian? |
It`s the Squashed Gaussian. Besides, I have also tried Beta, but sadly none of them work for my case😭 |
|
Any chances it's just not outputting values in the range that make sense for you? I would recommend looking at the std_dev values |
Okay, thank you for your advice Antoine, I would try to debug and find the problem |
Hi, Antoine, I was wondering how is the performance of Beta and Squashed on your task. I increased |


Hi there!
This PR adds support for bounded action spaces directly into the agent.
The main difference with clipping, is that this ensures actions are sampled within a fixed range and rewards on actions will not be computed on clipped actions.
To accomodate this, two options are provided:
To allow for the smooth calculation of the KL distance between two beta distribution, I had to slightly rework the transition to store the distribution parameters rather than just the std and the mean. Hence in the case of the normal distribution, I save mean + std_dev, while for the beta distribution alpha and beta.
Then instead of manually computing the KL distance, I let torch do the heavy lifting.
Configuration wise it could look like this:
Beta
Normal
I know, this changes significantly the way PPO updates are done, and it's a BREAKING CHANGE, so no I totally understand if the beta policy doesn't make it to main repo! Though having a reliable action clipping mechanism would be nice :).
LMK if you want me to change anything, I'd be happy to!
Best,
Antoine