DDPG with ou_0.2 noise does not converge in MountainCarContinuous-v0 #482

ghost · 2018-07-27T14:57:30Z

Has anyone got DDPG with ou_0.2 noise parameter to converge in MountainCarContinuous-v0 environment? The rollout/return_history stays around -10 after 1 million steps. In the ddpg paper, MountainCarContinuous converges to full score way before it hits 1 million steps.

Any suggestions on how to tune it would be great.

SamKirkiles · 2018-07-27T15:17:22Z

I haven't tested DDPG specifically on MountainCarContinuous but I can tell you that this gym poses a serious exploration challenge. The agent will earn negative rewards for each action it takes making it very easy to land in the local optima of doing nothing. In my project, I was only able to discover the optimal policy if the car made it to the top of the hill within the first episode.

kirk86 · 2018-07-27T15:22:21Z

I'm surprised that for you guys runs in the first place, for me it keeps crashing due to NaNs, occurring in different places, tried to rescale down the reward function but still nothing.

SamKirkiles · 2018-07-27T15:52:43Z

I didn't have any problems with nans. You can check out my code but I still haven't been able to beat this gym in a satisfying way. I basically had to reinitialize many times until it found the optimal solution early on. It would be very helpful if someone could find a working implementation of this online.

kirk86 · 2018-07-27T21:28:29Z

@SamKirkiles thanks for the example. I meant have you guys tried the ddpg without modifying anything on different environments without any porblems? Check here #480.
Another thing that I've noticed but I'm not certain if it's related to replicating the results issue is that in your example you don't have the different types of noisy parameters technique they use in their baselines ddpg code, am I correct?

ghost · 2018-07-28T00:37:39Z

@kirk86 Baseline's ddpg has noisy parameter updates but in the original DDPG paper, they said that they were able to beat MountainCar with only Ornstein–Uhlenbeck noise.

kirk86 · 2018-07-28T02:23:42Z

@aznshodan ok, I haven't read the paper yet. In which line in @SamKirkiles's example is that implemented?

ghost · 2018-07-28T03:42:39Z

@kirk86 I wasn't referring to @SamKirkiles's example. I was referring to Baseline DDPG code.

kirk86 · 2018-07-28T10:03:00Z

@aznshodan Cool, thanks. Probably misunderstanding, since in my comment:

replicating the results issue is that in your example

was referring to @SamKirkiles's example.

sayomakinwa · 2019-02-25T22:01:15Z

I have a similar situation with DDPG Humanoid-v2 environment; it doesn't converge. Suggestions will be highly appreciated

DanielTakeshi mentioned this issue Jun 20, 2019

DDPG implementation fails to learn well on at least five MuJoCo-v2 envs for all three noise types. I report steps to reproduce and learning curve plots [and show that PPO2 seems to work fine]. #938

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDPG with ou_0.2 noise does not converge in MountainCarContinuous-v0 #482

DDPG with ou_0.2 noise does not converge in MountainCarContinuous-v0 #482

ghost commented Jul 27, 2018

SamKirkiles commented Jul 27, 2018

kirk86 commented Jul 27, 2018

SamKirkiles commented Jul 27, 2018

kirk86 commented Jul 27, 2018

ghost commented Jul 28, 2018

kirk86 commented Jul 28, 2018

ghost commented Jul 28, 2018

kirk86 commented Jul 28, 2018

sayomakinwa commented Feb 25, 2019

DDPG with ou_0.2 noise does not converge in MountainCarContinuous-v0 #482

DDPG with ou_0.2 noise does not converge in MountainCarContinuous-v0 #482

Comments

ghost commented Jul 27, 2018

SamKirkiles commented Jul 27, 2018

kirk86 commented Jul 27, 2018

SamKirkiles commented Jul 27, 2018

kirk86 commented Jul 27, 2018

ghost commented Jul 28, 2018

kirk86 commented Jul 28, 2018

ghost commented Jul 28, 2018

kirk86 commented Jul 28, 2018

sayomakinwa commented Feb 25, 2019