Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval reward is much lower than rollout reward. #2

Closed
Panda-Shawn opened this issue May 31, 2023 · 2 comments
Closed

Eval reward is much lower than rollout reward. #2

Panda-Shawn opened this issue May 31, 2023 · 2 comments

Comments

@Panda-Shawn
Copy link

When I run the command below,

python3 -m train --log_dir logs/H+O+S-DPre-1 --prob_id H+O+S --algo_id DPre --seed 1

I found that the eval/mean_reward is much lower than rollout/ep_rew_mean. For example,

-------------------------------------
| rollout/            |             |
|    ep_len_mean      | 913         |
|    ep_rew_mean      | 3011.9375   |
| time/               |             |
|    episodes         | 2372        |
|    fps              | 35          |
|    time_elapsed     | 28224       |
|    total_timesteps  | 998668      |
| train/              |             |
|    action_grad      | 1.7         |
|    actor_loss       | -304        |
|    critic_loss      | 2.14        |
|    grad/mu.0.bias   | 0.28243673  |
|    grad/mu.0.weight | 1.4815665   |
|    grad/mu.2.bias   | 0.110535    |
|    grad/mu.2.weight | 0.4451258   |
|    grad/mu.4.bias   | 0.084684595 |
|    grad/mu.4.weight | 0.28864697  |
|    learning_rate    | 0.0003      |
|    n_updates        | 988667      |
-------------------------------------
Eval num_timesteps=1000000, episode_reward=91.04 +/- 59.98
Episode length: 48.60 +/- 27.00
------------------------------------
| eval/               |            |
|    mean_ep_length   | 48.6       |
|    mean_reward      | 91         |
| time/               |            |
|    total_timesteps  | 1000000    |
| train/              |            |
|    action_grad      | 2.31       |
|    actor_loss       | -301       |
|    critic_loss      | 5.1        |
|    grad/mu.0.bias   | 0.34394315 |
|    grad/mu.0.weight | 1.6371939  |
|    grad/mu.2.bias   | 0.13446647 |
|    grad/mu.2.weight | 0.4613981  |
|    grad/mu.4.bias   | 0.10974046 |
|    grad/mu.4.weight | 0.2964168  |
|    learning_rate    | 0.0003     |
|    n_updates        | 989999     |
------------------------------------

Considering the action noise, eval/mean_reward should have been a little higher than rollout/ep_rew_mean. However, the case seems to be the opposite. I found that some environment wrappers could cause similar issues (DLR-RM/stable-baselines3#181). So does it have something to do with the ConstraintEnvWrapper? Or any other explanations to this observation?

@Hziwara
Copy link
Collaborator

Hziwara commented Jun 20, 2023

Sorry for the delay in replying.
I could not reproduce this phenomenon in my environment.
Could you describe in detail the environment in which you run the program?

@Panda-Shawn
Copy link
Author

Thanks for your reply.
Last time the posted results were obtained on an environment created by pip install -r requirements.txt without using docker. Then I use docker-based installation recommended in the Readme, and everything goes well now. When I run the same command above, I can get the similar results with the paper.

-------------------------------------
| rollout/            |             |
|    ep_len_mean      | 968         |
|    ep_rew_mean      | 3056.6426   |
| time/               |             |
|    episodes         | 2428        |
|    fps              | 40          |
|    time_elapsed     | 24363       |
|    total_timesteps  | 997851      |
| train/              |             |
|    action_grad      | 1.8         |
|    actor_loss       | -288        |
|    critic_loss      | 6.28        |
|    grad/mu.0.bias   | 0.26518062  |
|    grad/mu.0.weight | 1.5936807   |
|    grad/mu.2.bias   | 0.089183375 |
|    grad/mu.2.weight | 0.5349882   |
|    grad/mu.4.bias   | 0.07624212  |
|    grad/mu.4.weight | 0.44528466  |
|    learning_rate    | 0.0003      |
|    n_updates        | 986874      |
-------------------------------------
Eval num_timesteps=1000000, episode_reward=3167.99 +/- 3.17
Episode length: 1000.00 +/- 0.00
------------------------------------
| eval/               |            |
|    mean_ep_length   | 1e+03      |
|    mean_reward      | 3.17e+03   |
| time/               |            |
|    total_timesteps  | 1000000    |
| train/              |            |
|    action_grad      | 1.86       |
|    actor_loss       | -288       |
|    critic_loss      | 7.58       |
|    grad/mu.0.bias   | 0.34355706 |
|    grad/mu.0.weight | 1.9284163  |
|    grad/mu.2.bias   | 0.1174118  |
|    grad/mu.2.weight | 0.53956914 |
|    grad/mu.4.bias   | 0.12189353 |
|    grad/mu.4.weight | 0.41779268 |
|    learning_rate    | 0.0003     |
|    n_updates        | 989874     |
------------------------------------

Now the eval/mean_reward is a little higher than rollout/ep_rew_mean. I find that the difference between the former environment and the latter one is the numpy version caused by python version. Last time there were conflicts when installing with requirements.txt under python-3.8.16, so I had to degrade numpy from 1.23.0 to 1.22.4 so that the whole installation was successful. On the contrary, this time under python-3.8.12 requirements.txt could be run directly without modifications in the docker container. I'm a little confused by the such a big performance degradation caused by numpy version, and I haven't found the root reason of this issue.
Thanks you again anyway, and I can't wait to do my own work on ACRL with this benchmark.

@Hziwara Hziwara closed this as completed Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants