Eval reward is much lower than rollout reward. #2

Panda-Shawn · 2023-05-31T07:58:46Z

When I run the command below,

python3 -m train --log_dir logs/H+O+S-DPre-1 --prob_id H+O+S --algo_id DPre --seed 1

I found that the eval/mean_reward is much lower than rollout/ep_rew_mean. For example,

-------------------------------------
| rollout/            |             |
|    ep_len_mean      | 913         |
|    ep_rew_mean      | 3011.9375   |
| time/               |             |
|    episodes         | 2372        |
|    fps              | 35          |
|    time_elapsed     | 28224       |
|    total_timesteps  | 998668      |
| train/              |             |
|    action_grad      | 1.7         |
|    actor_loss       | -304        |
|    critic_loss      | 2.14        |
|    grad/mu.0.bias   | 0.28243673  |
|    grad/mu.0.weight | 1.4815665   |
|    grad/mu.2.bias   | 0.110535    |
|    grad/mu.2.weight | 0.4451258   |
|    grad/mu.4.bias   | 0.084684595 |
|    grad/mu.4.weight | 0.28864697  |
|    learning_rate    | 0.0003      |
|    n_updates        | 988667      |
-------------------------------------
Eval num_timesteps=1000000, episode_reward=91.04 +/- 59.98
Episode length: 48.60 +/- 27.00
------------------------------------
| eval/               |            |
|    mean_ep_length   | 48.6       |
|    mean_reward      | 91         |
| time/               |            |
|    total_timesteps  | 1000000    |
| train/              |            |
|    action_grad      | 2.31       |
|    actor_loss       | -301       |
|    critic_loss      | 5.1        |
|    grad/mu.0.bias   | 0.34394315 |
|    grad/mu.0.weight | 1.6371939  |
|    grad/mu.2.bias   | 0.13446647 |
|    grad/mu.2.weight | 0.4613981  |
|    grad/mu.4.bias   | 0.10974046 |
|    grad/mu.4.weight | 0.2964168  |
|    learning_rate    | 0.0003     |
|    n_updates        | 989999     |
------------------------------------

Considering the action noise, eval/mean_reward should have been a little higher than rollout/ep_rew_mean. However, the case seems to be the opposite. I found that some environment wrappers could cause similar issues (DLR-RM/stable-baselines3#181). So does it have something to do with the ConstraintEnvWrapper? Or any other explanations to this observation?

The text was updated successfully, but these errors were encountered:

Hziwara · 2023-06-20T03:41:12Z

Sorry for the delay in replying.
I could not reproduce this phenomenon in my environment.
Could you describe in detail the environment in which you run the program?

Panda-Shawn · 2023-06-25T03:03:09Z

Thanks for your reply.
Last time the posted results were obtained on an environment created by pip install -r requirements.txt without using docker. Then I use docker-based installation recommended in the Readme, and everything goes well now. When I run the same command above, I can get the similar results with the paper.

-------------------------------------
| rollout/            |             |
|    ep_len_mean      | 968         |
|    ep_rew_mean      | 3056.6426   |
| time/               |             |
|    episodes         | 2428        |
|    fps              | 40          |
|    time_elapsed     | 24363       |
|    total_timesteps  | 997851      |
| train/              |             |
|    action_grad      | 1.8         |
|    actor_loss       | -288        |
|    critic_loss      | 6.28        |
|    grad/mu.0.bias   | 0.26518062  |
|    grad/mu.0.weight | 1.5936807   |
|    grad/mu.2.bias   | 0.089183375 |
|    grad/mu.2.weight | 0.5349882   |
|    grad/mu.4.bias   | 0.07624212  |
|    grad/mu.4.weight | 0.44528466  |
|    learning_rate    | 0.0003      |
|    n_updates        | 986874      |
-------------------------------------
Eval num_timesteps=1000000, episode_reward=3167.99 +/- 3.17
Episode length: 1000.00 +/- 0.00
------------------------------------
| eval/               |            |
|    mean_ep_length   | 1e+03      |
|    mean_reward      | 3.17e+03   |
| time/               |            |
|    total_timesteps  | 1000000    |
| train/              |            |
|    action_grad      | 1.86       |
|    actor_loss       | -288       |
|    critic_loss      | 7.58       |
|    grad/mu.0.bias   | 0.34355706 |
|    grad/mu.0.weight | 1.9284163  |
|    grad/mu.2.bias   | 0.1174118  |
|    grad/mu.2.weight | 0.53956914 |
|    grad/mu.4.bias   | 0.12189353 |
|    grad/mu.4.weight | 0.41779268 |
|    learning_rate    | 0.0003     |
|    n_updates        | 989874     |
------------------------------------

Now the eval/mean_reward is a little higher than rollout/ep_rew_mean. I find that the difference between the former environment and the latter one is the numpy version caused by python version. Last time there were conflicts when installing with requirements.txt under python-3.8.16, so I had to degrade numpy from 1.23.0 to 1.22.4 so that the whole installation was successful. On the contrary, this time under python-3.8.12 requirements.txt could be run directly without modifications in the docker container. I'm a little confused by the such a big performance degradation caused by numpy version, and I haven't found the root reason of this issue.
Thanks you again anyway, and I can't wait to do my own work on ACRL with this benchmark.

Hziwara closed this as completed Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval reward is much lower than rollout reward. #2

Eval reward is much lower than rollout reward. #2

Panda-Shawn commented May 31, 2023

Hziwara commented Jun 20, 2023

Panda-Shawn commented Jun 25, 2023

Eval reward is much lower than rollout reward. #2

Eval reward is much lower than rollout reward. #2

Comments

Panda-Shawn commented May 31, 2023

Hziwara commented Jun 20, 2023

Panda-Shawn commented Jun 25, 2023