Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIR output] "iteration" is shown in the output for RL users #34918

Open
scottsun94 opened this issue May 1, 2023 · 6 comments
Open

[AIR output] "iteration" is shown in the output for RL users #34918

scottsun94 opened this issue May 1, 2023 · 6 comments
Assignees
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical ray-team-created Ray Team created

Comments

@scottsun94
Copy link
Contributor

What happened + What you expected to happen

I ran the learning_tests_impala_torch with the new air output. It seems that we show "iteration" in the output. Not sure if it's a good thing or not because users may not be familiar with this "iteration" concept.

We plan to show something like Finished 1000 timesteps [359 timesteps/s] at 2023-02-24 12:35:39. Running time: 2min 14s in the original design.

Training finished iter 13 at 2023-03-29 09:08:13 (running for 00:03:11.52)
agent_timesteps_total: 389000
connector_metrics: {}
counters:
  num_agent_steps_sampled: 389000
  num_agent_steps_trained: 388500
  num_env_steps_sampled: 389000
  num_env_steps_trained: 388500
  num_samples_added_to_queue: 389000
  num_training_step_calls_since_last_synch_worker_weights: 134
  num_weight_broadcasts: 964
custom_metrics: {}
episode_len_mean: 1721.88
episode_media: {}
episode_reward_max: 36.0
episode_reward_mean: 9.77
episode_reward_min: 4.0
episodes_this_iter: 67
episodes_total: 1710
info:
  learner:
    default_policy:
      custom_metrics: {}
      diff_num_grad_updates_vs_sampler_policy: 10.0
      learner_stats:
        cur_lr: 0.0005
        entropy: 1.0906739234924316
        entropy_coeff: 0.01
        policy_loss: -32.83232116699219
        total_loss: -28.335006713867188
        var_gnorm: 16.424108505249023
        vf_explained_var: 0.6170328259468079
        vf_loss: 19.683231353759766
      model: {}
      num_grad_updates_lifetime: 777.0
  learner_queue:
    size_count: 778
    size_mean: 0.0
    size_quantiles: [0.0, 0.0, 0.0, 0.0, 0.0]
    size_std: 0.0
  num_agent_steps_sampled: 389000
  num_agent_steps_trained: 388500
  num_env_steps_sampled: 389000
  num_env_steps_trained: 388500
  num_samples_added_to_queue: 389000
  num_training_step_calls_since_last_synch_worker_weights: 134
  num_weight_broadcasts: 964
  timing_breakdown:
    learner_dequeue_time_ms: 2772.957
    learner_grad_time_ms: 123.634
    learner_load_time_ms: 4.319
    learner_load_wait_time_ms: 47.829
num_agent_steps_sampled: 389000
num_agent_steps_trained: 388500
num_env_steps_sampled: 389000
num_env_steps_sampled_this_iter: 30750
num_env_steps_trained: 388500
num_env_steps_trained_this_iter: 31000
num_faulty_episodes: 0
num_healthy_workers: 10
num_in_flight_async_reqs: 20
num_remote_worker_restarts: 0
num_steps_trained_this_iter: 31000
perf:
  cpu_util_percent: 34.94117647058823
  ram_util_percent: 5.211764705882353
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
  mean_action_processing_ms: 0.6395067034460499
  mean_env_render_ms: 0.0
  mean_env_wait_ms: 7.840870172264184
  mean_inference_ms: 6.614064184668028
  mean_raw_obs_processing_ms: 2.9528597540097277
sampler_results:
  connector_metrics: {}
  custom_metrics: {}
  episode_len_mean: 1721.88
  episode_media: {}
  episode_reward_max: 36.0
  episode_reward_mean: 9.77
  episode_reward_min: 4.0
  episodes_this_iter: 67
  hist_stats:
    episode_lengths: [1414, 1306, 1641, 1446, 1234, 2026, 1600, 2454, 1359, 1572,
      1411, 1471, 1463, 1269, 1347, 1083, 2344, 1095, 1956, 1603, 1255, 2218, 1208,
      1943, 1483, 1158, 2108, 1073, 1535, 2590, 1804, 1802, 2109, 1783, 1099, 1258,
      1211, 1826, 2480, 1977, 1649, 1159, 1598, 1972, 2280, 2026, 1732, 1167, 1884,
      1599, 1722, 2156, 1723, 1767, 1387, 1849, 2061, 2356, 1875, 1727, 2524, 1620,
      1926, 1507, 1902, 1999, 1914, 1514, 1699, 1095, 2081, 1632, 1520, 1578, 2329,
      985, 1681, 1719, 1836, 1306, 2122, 1726, 1804, 2020, 2076, 1235, 1074, 1970,
      1853, 1836, 1228, 1431, 2112, 1946, 2793, 1822, 2044, 1946, 2200, 1880]
    episode_reward: [9.0, 12.0, 7.0, 7.0, 5.0, 11.0, 7.0, 17.0, 10.0, 11.0, 9.0, 6.0,
      6.0, 6.0, 9.0, 4.0, 13.0, 11.0, 12.0, 7.0, 5.0, 10.0, 5.0, 10.0, 14.0, 4.0,
      10.0, 4.0, 10.0, 15.0, 7.0, 11.0, 9.0, 8.0, 11.0, 5.0, 8.0, 16.0, 13.0, 8.0,
      6.0, 12.0, 6.0, 9.0, 11.0, 13.0, 7.0, 4.0, 11.0, 7.0, 10.0, 15.0, 7.0, 15.0,
      6.0, 8.0, 10.0, 36.0, 8.0, 8.0, 14.0, 9.0, 11.0, 13.0, 15.0, 9.0, 8.0, 5.0,
      8.0, 11.0, 14.0, 11.0, 6.0, 14.0, 20.0, 4.0, 11.0, 8.0, 8.0, 12.0, 14.0, 10.0,
      10.0, 11.0, 10.0, 4.0, 4.0, 9.0, 8.0, 8.0, 4.0, 5.0, 10.0, 11.0, 20.0, 13.0,
      14.0, 8.0, 13.0, 9.0]
  num_faulty_episodes: 0
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.6395067034460499
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 7.840870172264184
    mean_inference_ms: 6.614064184668028
    mean_raw_obs_processing_ms: 2.9528597540097277
time_this_iter_s: 11.578344583511353
time_total_s: 149.75555968284607
timers:
  sample_time_ms: 0.242
  synch_weights_time_ms: 0.027
  training_iteration_time_ms: 0.354
timesteps_total: 389000
training_iteration: 13

Versions / Dependencies

nightly

Reproduction script

learning_tests_impala_torch

Issue Severity

Low: It annoys or frustrates me.

@scottsun94 scottsun94 added bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks labels May 1, 2023
@scottsun94 scottsun94 added this to the Tune Console Output milestone May 1, 2023
@scottsun94
Copy link
Contributor Author

@kouroshHakha @gjoliver
Do you think this is a p1 issue? (fix it before we expose the new design to users by default in 2.5)

@scottsun94 scottsun94 added the ray-team-created Ray Team created label May 1, 2023
@krfricke
Copy link
Contributor

krfricke commented May 2, 2023

I think we should keep this in as its an important (and default) metric for schedulers and checkpoint management.

@scottsun94
Copy link
Contributor Author

Training finished iter 13 at 2023-03-29 09:08:13 (running for 00:03:11.52)

@krfricke Actually, I'm referring to the sentence before the reported results.

In the original design, we plan to use timesteps instead of iteration, something like this

Training finished 1000 timesteps [359 timesteps/s] at 2023-02-24 12:35:39. Running time: 2min 14s

@krfricke
Copy link
Contributor

krfricke commented May 3, 2023

Ah I see.

I may try to tackle this after #34951 is merged.

@krfricke
Copy link
Contributor

krfricke commented May 4, 2023

Actually, if it's ok, I'd like to punt this for later. We're basically targeting an rllib-specific progress reporter here, and it's not easy to shoehorn the functionality in without introducing a more advanced context management. I'm pretty sure we'll do this (see also discussion in #35003) but until this is done, let's deprioritize this. Ok?

cc @sven1977 @kouroshHakha

@scottsun94
Copy link
Contributor Author

SGTM

@krfricke krfricke added P2 Important issue, but not time-critical and removed P1 Issue that should be fixed within a few weeks labels May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical ray-team-created Ray Team created
Projects
None yet
Development

No branches or pull requests

2 participants