[AIR output] "iteration" is shown in the output for RL users #34918

scottsun94 · 2023-05-01T18:53:11Z

What happened + What you expected to happen

I ran the learning_tests_impala_torch with the new air output. It seems that we show "iteration" in the output. Not sure if it's a good thing or not because users may not be familiar with this "iteration" concept.

We plan to show something like Finished 1000 timesteps [359 timesteps/s] at 2023-02-24 12:35:39. Running time: 2min 14s in the original design.

Training finished iter 13 at 2023-03-29 09:08:13 (running for 00:03:11.52)
agent_timesteps_total: 389000
connector_metrics: {}
counters:
  num_agent_steps_sampled: 389000
  num_agent_steps_trained: 388500
  num_env_steps_sampled: 389000
  num_env_steps_trained: 388500
  num_samples_added_to_queue: 389000
  num_training_step_calls_since_last_synch_worker_weights: 134
  num_weight_broadcasts: 964
custom_metrics: {}
episode_len_mean: 1721.88
episode_media: {}
episode_reward_max: 36.0
episode_reward_mean: 9.77
episode_reward_min: 4.0
episodes_this_iter: 67
episodes_total: 1710
info:
  learner:
    default_policy:
      custom_metrics: {}
      diff_num_grad_updates_vs_sampler_policy: 10.0
      learner_stats:
        cur_lr: 0.0005
        entropy: 1.0906739234924316
        entropy_coeff: 0.01
        policy_loss: -32.83232116699219
        total_loss: -28.335006713867188
        var_gnorm: 16.424108505249023
        vf_explained_var: 0.6170328259468079
        vf_loss: 19.683231353759766
      model: {}
      num_grad_updates_lifetime: 777.0
  learner_queue:
    size_count: 778
    size_mean: 0.0
    size_quantiles: [0.0, 0.0, 0.0, 0.0, 0.0]
    size_std: 0.0
  num_agent_steps_sampled: 389000
  num_agent_steps_trained: 388500
  num_env_steps_sampled: 389000
  num_env_steps_trained: 388500
  num_samples_added_to_queue: 389000
  num_training_step_calls_since_last_synch_worker_weights: 134
  num_weight_broadcasts: 964
  timing_breakdown:
    learner_dequeue_time_ms: 2772.957
    learner_grad_time_ms: 123.634
    learner_load_time_ms: 4.319
    learner_load_wait_time_ms: 47.829
num_agent_steps_sampled: 389000
num_agent_steps_trained: 388500
num_env_steps_sampled: 389000
num_env_steps_sampled_this_iter: 30750
num_env_steps_trained: 388500
num_env_steps_trained_this_iter: 31000
num_faulty_episodes: 0
num_healthy_workers: 10
num_in_flight_async_reqs: 20
num_remote_worker_restarts: 0
num_steps_trained_this_iter: 31000
perf:
  cpu_util_percent: 34.94117647058823
  ram_util_percent: 5.211764705882353
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
  mean_action_processing_ms: 0.6395067034460499
  mean_env_render_ms: 0.0
  mean_env_wait_ms: 7.840870172264184
  mean_inference_ms: 6.614064184668028
  mean_raw_obs_processing_ms: 2.9528597540097277
sampler_results:
  connector_metrics: {}
  custom_metrics: {}
  episode_len_mean: 1721.88
  episode_media: {}
  episode_reward_max: 36.0
  episode_reward_mean: 9.77
  episode_reward_min: 4.0
  episodes_this_iter: 67
  hist_stats:
    episode_lengths: [1414, 1306, 1641, 1446, 1234, 2026, 1600, 2454, 1359, 1572,
      1411, 1471, 1463, 1269, 1347, 1083, 2344, 1095, 1956, 1603, 1255, 2218, 1208,
      1943, 1483, 1158, 2108, 1073, 1535, 2590, 1804, 1802, 2109, 1783, 1099, 1258,
      1211, 1826, 2480, 1977, 1649, 1159, 1598, 1972, 2280, 2026, 1732, 1167, 1884,
      1599, 1722, 2156, 1723, 1767, 1387, 1849, 2061, 2356, 1875, 1727, 2524, 1620,
      1926, 1507, 1902, 1999, 1914, 1514, 1699, 1095, 2081, 1632, 1520, 1578, 2329,
      985, 1681, 1719, 1836, 1306, 2122, 1726, 1804, 2020, 2076, 1235, 1074, 1970,
      1853, 1836, 1228, 1431, 2112, 1946, 2793, 1822, 2044, 1946, 2200, 1880]
    episode_reward: [9.0, 12.0, 7.0, 7.0, 5.0, 11.0, 7.0, 17.0, 10.0, 11.0, 9.0, 6.0,
      6.0, 6.0, 9.0, 4.0, 13.0, 11.0, 12.0, 7.0, 5.0, 10.0, 5.0, 10.0, 14.0, 4.0,
      10.0, 4.0, 10.0, 15.0, 7.0, 11.0, 9.0, 8.0, 11.0, 5.0, 8.0, 16.0, 13.0, 8.0,
      6.0, 12.0, 6.0, 9.0, 11.0, 13.0, 7.0, 4.0, 11.0, 7.0, 10.0, 15.0, 7.0, 15.0,
      6.0, 8.0, 10.0, 36.0, 8.0, 8.0, 14.0, 9.0, 11.0, 13.0, 15.0, 9.0, 8.0, 5.0,
      8.0, 11.0, 14.0, 11.0, 6.0, 14.0, 20.0, 4.0, 11.0, 8.0, 8.0, 12.0, 14.0, 10.0,
      10.0, 11.0, 10.0, 4.0, 4.0, 9.0, 8.0, 8.0, 4.0, 5.0, 10.0, 11.0, 20.0, 13.0,
      14.0, 8.0, 13.0, 9.0]
  num_faulty_episodes: 0
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.6395067034460499
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 7.840870172264184
    mean_inference_ms: 6.614064184668028
    mean_raw_obs_processing_ms: 2.9528597540097277
time_this_iter_s: 11.578344583511353
time_total_s: 149.75555968284607
timers:
  sample_time_ms: 0.242
  synch_weights_time_ms: 0.027
  training_iteration_time_ms: 0.354
timesteps_total: 389000
training_iteration: 13

Versions / Dependencies

nightly

Reproduction script

learning_tests_impala_torch

Issue Severity

Low: It annoys or frustrates me.

The text was updated successfully, but these errors were encountered:

scottsun94 · 2023-05-01T18:54:25Z

@kouroshHakha @gjoliver
Do you think this is a p1 issue? (fix it before we expose the new design to users by default in 2.5)

krfricke · 2023-05-02T15:19:03Z

I think we should keep this in as its an important (and default) metric for schedulers and checkpoint management.

scottsun94 · 2023-05-02T16:02:10Z

Training finished iter 13 at 2023-03-29 09:08:13 (running for 00:03:11.52)

@krfricke Actually, I'm referring to the sentence before the reported results.

In the original design, we plan to use timesteps instead of iteration, something like this

Training finished 1000 timesteps [359 timesteps/s] at 2023-02-24 12:35:39. Running time: 2min 14s

krfricke · 2023-05-03T14:22:58Z

Ah I see.

I may try to tackle this after #34951 is merged.

krfricke · 2023-05-04T13:06:28Z

Actually, if it's ok, I'd like to punt this for later. We're basically targeting an rllib-specific progress reporter here, and it's not easy to shoehorn the functionality in without introducing a more advanced context management. I'm pretty sure we'll do this (see also discussion in #35003) but until this is done, let's deprioritize this. Ok?

cc @sven1977 @kouroshHakha

scottsun94 · 2023-05-04T15:51:15Z

SGTM

scottsun94 added bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks labels May 1, 2023

scottsun94 added this to the Tune Console Output milestone May 1, 2023

scottsun94 assigned krfricke May 1, 2023

scottsun94 added the ray-team-created Ray Team created label May 1, 2023

krfricke added P2 Important issue, but not time-critical and removed P1 Issue that should be fixed within a few weeks labels May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIR output] "iteration" is shown in the output for RL users #34918

[AIR output] "iteration" is shown in the output for RL users #34918

scottsun94 commented May 1, 2023

scottsun94 commented May 1, 2023

krfricke commented May 2, 2023

scottsun94 commented May 2, 2023

krfricke commented May 3, 2023

krfricke commented May 4, 2023

scottsun94 commented May 4, 2023

[AIR output] "iteration" is shown in the output for RL users #34918

[AIR output] "iteration" is shown in the output for RL users #34918

Comments

scottsun94 commented May 1, 2023

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

scottsun94 commented May 1, 2023

krfricke commented May 2, 2023

scottsun94 commented May 2, 2023

krfricke commented May 3, 2023

krfricke commented May 4, 2023

scottsun94 commented May 4, 2023