-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DDPG with ou_0.2 noise does not converge in MountainCarContinuous-v0 #482
Comments
I haven't tested DDPG specifically on MountainCarContinuous but I can tell you that this gym poses a serious exploration challenge. The agent will earn negative rewards for each action it takes making it very easy to land in the local optima of doing nothing. In my project, I was only able to discover the optimal policy if the car made it to the top of the hill within the first episode. |
I'm surprised that for you guys runs in the first place, for me it keeps crashing due to NaNs, occurring in different places, tried to rescale down the reward function but still nothing. |
I didn't have any problems with nans. You can check out my code but I still haven't been able to beat this gym in a satisfying way. I basically had to reinitialize many times until it found the optimal solution early on. It would be very helpful if someone could find a working implementation of this online. |
@SamKirkiles thanks for the example. I meant have you guys tried the ddpg without modifying anything on different environments without any porblems? Check here #480. |
@kirk86 Baseline's ddpg has noisy parameter updates but in the original DDPG paper, they said that they were able to beat MountainCar with only Ornstein–Uhlenbeck noise. |
@aznshodan ok, I haven't read the paper yet. In which line in @SamKirkiles's example is that implemented? |
@kirk86 I wasn't referring to @SamKirkiles's example. I was referring to Baseline DDPG code. |
@aznshodan Cool, thanks. Probably misunderstanding, since in my comment:
was referring to @SamKirkiles's example. |
I have a similar situation with DDPG Humanoid-v2 environment; it doesn't converge. Suggestions will be highly appreciated |
Has anyone got DDPG with ou_0.2 noise parameter to converge in MountainCarContinuous-v0 environment? The rollout/return_history stays around -10 after 1 million steps. In the ddpg paper, MountainCarContinuous converges to full score way before it hits 1 million steps.
Any suggestions on how to tune it would be great.
The text was updated successfully, but these errors were encountered: