Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDPG with ou_0.2 noise does not converge in MountainCarContinuous-v0 #482

Open
ghost opened this issue Jul 27, 2018 · 9 comments
Open

DDPG with ou_0.2 noise does not converge in MountainCarContinuous-v0 #482

ghost opened this issue Jul 27, 2018 · 9 comments

Comments

@ghost
Copy link

ghost commented Jul 27, 2018

Has anyone got DDPG with ou_0.2 noise parameter to converge in MountainCarContinuous-v0 environment? The rollout/return_history stays around -10 after 1 million steps. In the ddpg paper, MountainCarContinuous converges to full score way before it hits 1 million steps.

Any suggestions on how to tune it would be great.

@SamKirkiles
Copy link

I haven't tested DDPG specifically on MountainCarContinuous but I can tell you that this gym poses a serious exploration challenge. The agent will earn negative rewards for each action it takes making it very easy to land in the local optima of doing nothing. In my project, I was only able to discover the optimal policy if the car made it to the top of the hill within the first episode.

@kirk86
Copy link

kirk86 commented Jul 27, 2018

I'm surprised that for you guys runs in the first place, for me it keeps crashing due to NaNs, occurring in different places, tried to rescale down the reward function but still nothing.

@SamKirkiles
Copy link

I didn't have any problems with nans. You can check out my code but I still haven't been able to beat this gym in a satisfying way. I basically had to reinitialize many times until it found the optimal solution early on. It would be very helpful if someone could find a working implementation of this online.

@kirk86
Copy link

kirk86 commented Jul 27, 2018

@SamKirkiles thanks for the example. I meant have you guys tried the ddpg without modifying anything on different environments without any porblems? Check here #480.
Another thing that I've noticed but I'm not certain if it's related to replicating the results issue is that in your example you don't have the different types of noisy parameters technique they use in their baselines ddpg code, am I correct?

@ghost
Copy link
Author

ghost commented Jul 28, 2018

@kirk86 Baseline's ddpg has noisy parameter updates but in the original DDPG paper, they said that they were able to beat MountainCar with only Ornstein–Uhlenbeck noise.

@kirk86
Copy link

kirk86 commented Jul 28, 2018

@aznshodan ok, I haven't read the paper yet. In which line in @SamKirkiles's example is that implemented?

@ghost
Copy link
Author

ghost commented Jul 28, 2018

@kirk86 I wasn't referring to @SamKirkiles's example. I was referring to Baseline DDPG code.

@kirk86
Copy link

kirk86 commented Jul 28, 2018

@aznshodan Cool, thanks. Probably misunderstanding, since in my comment:

replicating the results issue is that in your example

was referring to @SamKirkiles's example.

@sayomakinwa
Copy link

I have a similar situation with DDPG Humanoid-v2 environment; it doesn't converge. Suggestions will be highly appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants