Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dreamerv3 trouble resuming? freeze? #187

Closed
Disastorm opened this issue Jan 12, 2024 · 26 comments
Closed

dreamerv3 trouble resuming? freeze? #187

Disastorm opened this issue Jan 12, 2024 · 26 comments

Comments

@Disastorm
Copy link

Disastorm commented Jan 12, 2024

Hello, sorry I don't actually know much about the mathematical formulas and whatnot behind rl and the algorithms, I've previously been just training stuff with SB3 PPO.
Anyway, I installed sheeprl and implemented a wrapper for Stable Retro and got it started and I could see the envs running in parallel and the agent doing stuff. However, once it hits the point where its set to "learning_starts" it stops logging anything, although its still taking my CPU and RAM. It was basically sitting here for over 30 minutes with no logs. No idea if its actually doing anything or not, although I suppose I could try again with the visual retro window open so I could check.

Rank-0: policy_step=46848, reward_env_2=110.59998321533203
Rank-0: policy_step=50772, reward_env_3=278.8006286621094
Rank-0: policy_step=50904, reward_env_0=309.19964599609375
Rank-0: policy_step=53848, reward_env_1=126.399658203125
Rank-0: policy_step=55596, reward_env_0=268.4999694824219
Rank-0: policy_step=57212, reward_env_2=240.7994842529297
Rank-0: policy_step=59680, reward_env_3=195.09988403320312
Rank-0: policy_step=62880, reward_env_1=362.70013427734375
Rank-0: policy_step=64500, reward_env_0=532.301513671875

Any ideas as to what the issue could be?

*edit actually it finally updated, i guess its just really slow after learning starts? Is there a way to run this on GPU?

Rank-0: policy_step=64500, reward_env_0=532.301513671875
Rank-0: policy_step=66776, reward_env_2=365.80255126953125
Rank-0: policy_step=66808, reward_env_3=68.799560546875
@Disastorm
Copy link
Author

It looks like its faster now when i use fabric accelerator gpu, ill close this for now.

@Disastorm
Copy link
Author

Disastorm commented Jan 12, 2024

I got this error after 75k steps.

CUDA error: misaligned address

What does this mean? Also I'm running on windows btw.

More info:

 File "...\sheeprl\sheeprl\algos\dreamer_v3\dreamer_v3.py", line 290, in train
    actor_grads = fabric.clip_gradients(
  File "...\MiniConda3\envs\sheeprl\lib\site-packages\lightning\fabric\fabric.py", line 460, in clip_gradients
    return self.strategy.clip_gradients_norm(
  File "...\MiniConda3\envs\sheeprl\lib\site-packages\lightning\fabric\strategies\strategy.py", line 380, in clip_gradients_norm
    return torch.nn.utils.clip_grad_norm_(
  File "...\MiniConda3\envs\sheeprl\lib\site-packages\torch\nn\utils\clip_grad.py", line 76, in clip_grad_norm_
    torch._foreach_mul_(grads, clip_coef_clamped.to(device))  # type: ignore[call-overload]
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Could this be due to my vram maxing out? Are there any settings I can do to reduce the usage?

Why does dreamerv3 take so little memory during the pretraining, and then once it starts learning it spikes up massively, is that expected?

@Disastorm Disastorm reopened this Jan 12, 2024
@Disastorm
Copy link
Author

Disastorm commented Jan 12, 2024

Also since stable retro can only start one emulator per process, the training works due to the async_vector_env but when i run sheeprl_eval, the env fails because it tries to open two within the same process. Any way around this?

@Disastorm
Copy link
Author

I figured some ways around this stuff, only thing I'd like to really know is just if there is any way to reduce the memory since it takes so much vram while its training. otherwise ill just close this since its not really an issue.

@Disastorm
Copy link
Author

I see there are different dreamerv3 model sizes defined in the original paper that reduce memory and speed up performance.

My question is do the transition_model.hidden_size and representation_model.hidden_size change along with the dense_units or any of the other params, or should I just leave those at 1024 for all of the model sizes?

@balloch
Copy link

balloch commented Jan 12, 2024

I feel like you should leave this open, I am encountering the same error with command:
python sheeprl.py exp=dreamer_v3 env=dmc env.id=walker_walk "algo.cnn_keys.encoder=[rgb]"
Hangs at 64k. I'm, 99% sure this isn't a vram problem on my end (A40 barely showed fill) what was your fix?

@Disastorm
Copy link
Author

Disastorm commented Jan 13, 2024

@balloch
ok I'll open it again.
i made it use gpu. i think its actually just insanely slow on cpu.

fabric.accelerator=gpu on command line.

Also do you happen to know if this is a problem?

site-packages\torchmetrics\utilities\prints.py:43: UserWarning: The ``compute`` method of metric MeanMetric was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.

To speed it up more you can lower the size of the dreamer model by using a custom yaml in the algo config.
default is X-Large.
Although if you are using A40 maybe you don't need to do this. I'm using a 4090 so the X-Large size basically fills up the VRAM completely, at least for my custom gym retro environment.

Note for the below I don't actually know if the "hidden_size" fields should be changed with the model size, so its possible they should just always be 1024.

large:

defaults:
  - dreamer_v3
  - _self_

cnn_keys:
  encoder: [rgb]

dense_units: 768
mlp_layers: 4

world_model:
  encoder:
    cnn_channels_multiplier: 64

  recurrent_model:
    recurrent_state_size: 2048

  # Prior
  transition_model:
    hidden_size: 768

  # Posterior
  representation_model:
    hidden_size: 768

medium:

defaults:
  - dreamer_v3
  - _self_

cnn_keys:
  encoder: [rgb]

dense_units: 640
mlp_layers: 3

world_model:
  encoder:
    cnn_channels_multiplier: 48

  recurrent_model:
    recurrent_state_size: 1024

  # Prior
  transition_model:
    hidden_size: 640

  # Posterior
  representation_model:
    hidden_size: 640

@Disastorm Disastorm reopened this Jan 13, 2024
@Disastorm
Copy link
Author

Disastorm commented Jan 13, 2024

I noticed odd behavior when resuming a checkpoint. it does the pretraining again, even if i change learning_starts=0.
When not-resuming the learning_starts works properly but when resuming im not sure what its doing.

It looks like when resuming it just ends after 50k steps, regardless of what i've set the total_steps to also.

@Disastorm Disastorm changed the title dreamerv3 freeze? dreamerv3 trouble resuming? freeze? Jan 13, 2024
@belerico
Copy link
Member

belerico commented Jan 13, 2024

Hi @Disastorm, thank you for reporting this! I'll have a look asap 🤟
Could you please, in the meantime, post here the yaml config of your experiment? It could be found inside the log folder of the exp and it's called config.yaml, it should be placed at the same level of the checkpoint and memmap folders.
Thanks

@Disastorm
Copy link
Author

I'm actually just using regular training rather than experiment. Is regular training not resumable? I actually dont really know the difference between experiments and regular training.

This is what I use for the initial training:

python sheeprl.py exp=dreamer_v3 env=streets_of_rage algo.total_steps=500000 fabric.accelerator=gpu algo=dreamer_v3_large env.num_envs=4

This is what I use for resuming:

python sheeprl.py exp=dreamer_v3 env=streets_of_rage algo.total_steps=700000 fabric.accelerator=gpu algo=dreamer_v3_large env.num_envs=4 checkpoint=streets_of_rage

here is the yaml of the original training:

num_threads: 1
dry_run: false
seed: 42
torch_deterministic: false
exp_name: dreamer_v3_StreetsOfRage-Genesis
run_name: 2024-01-13_12-19-22_dreamer_v3_StreetsOfRage-Genesis_42
root_dir: dreamer_v3/StreetsOfRage-Genesis
algo:
  name: dreamer_v3
  total_steps: 500000
  per_rank_batch_size: 16
  run_test: true
  cnn_keys:
    encoder:
    - rgb
    decoder:
    - rgb
  mlp_keys:
    encoder: []
    decoder: []
  world_model:
    optimizer:
      _target_: torch.optim.Adam
      lr: 0.0001
      eps: 1.0e-08
      weight_decay: 0
      betas:
      - 0.9
      - 0.999
    discrete_size: 32
    stochastic_size: 32
    kl_dynamic: 0.5
    kl_representation: 0.1
    kl_free_nats: 1.0
    kl_regularizer: 1.0
    continue_scale_factor: 1.0
    clip_gradients: 1000.0
    encoder:
      cnn_channels_multiplier: 64
      cnn_act: torch.nn.SiLU
      dense_act: torch.nn.SiLU
      mlp_layers: 4
      layer_norm: true
      dense_units: 768
    recurrent_model:
      recurrent_state_size: 2048
      layer_norm: true
      dense_units: 768
    transition_model:
      hidden_size: 768
      dense_act: torch.nn.SiLU
      layer_norm: true
    representation_model:
      hidden_size: 768
      dense_act: torch.nn.SiLU
      layer_norm: true
    observation_model:
      cnn_channels_multiplier: 64
      cnn_act: torch.nn.SiLU
      dense_act: torch.nn.SiLU
      mlp_layers: 4
      layer_norm: true
      dense_units: 768
    reward_model:
      dense_act: torch.nn.SiLU
      mlp_layers: 4
      layer_norm: true
      dense_units: 768
      bins: 255
    discount_model:
      learnable: true
      dense_act: torch.nn.SiLU
      mlp_layers: 4
      layer_norm: true
      dense_units: 768
  actor:
    optimizer:
      _target_: torch.optim.Adam
      lr: 8.0e-05
      eps: 1.0e-05
      weight_decay: 0
      betas:
      - 0.9
      - 0.999
    cls: sheeprl.algos.dreamer_v3.agent.Actor
    ent_coef: 0.0003
    min_std: 0.1
    init_std: 0.0
    objective_mix: 1.0
    dense_act: torch.nn.SiLU
    mlp_layers: 4
    layer_norm: true
    dense_units: 768
    clip_gradients: 100.0
    expl_amount: 0.0
    expl_min: 0.0
    expl_decay: false
    max_step_expl_decay: 0
    moments:
      decay: 0.99
      max: 1.0
      percentile:
        low: 0.05
        high: 0.95
  critic:
    optimizer:
      _target_: torch.optim.Adam
      lr: 8.0e-05
      eps: 1.0e-05
      weight_decay: 0
      betas:
      - 0.9
      - 0.999
    dense_act: torch.nn.SiLU
    mlp_layers: 4
    layer_norm: true
    dense_units: 768
    target_network_update_freq: 1
    tau: 0.02
    bins: 255
    clip_gradients: 100.0
  gamma: 0.996996996996997
  lmbda: 0.95
  horizon: 15
  train_every: 16
  learning_starts: 65536
  per_rank_pretrain_steps: 1
  per_rank_gradient_steps: 1
  per_rank_sequence_length: 64
  layer_norm: true
  dense_units: 768
  mlp_layers: 4
  dense_act: torch.nn.SiLU
  cnn_act: torch.nn.SiLU
  unimix: 0.01
  hafner_initialization: true
  player:
    discrete_size: 32
buffer:
  size: 1000000
  memmap: true
  validate_args: false
  from_numpy: false
  checkpoint: false
checkpoint:
  every: 100000
  resume_from: null
  save_last: true
  keep_last: 5
distribution:
  validate_args: false
  type: auto
env:
  id: StreetsOfRage-Genesis
  num_envs: 4
  frame_stack: -1
  sync_env: false
  screen_size: 64
  action_repeat: 1
  grayscale: false
  clip_rewards: false
  capture_video: false
  frame_stack_dilation: 1
  max_episode_steps: null
  reward_as_observation: false
  wrapper:
    _target_: sheeprl.envs.retro.RetroSheepWrapper
    screen_size: 64
    seed: 42
    rom: StreetsOfRage-Genesis
    loadstate: 1Player.Blaze.Round5.Normal
  from_vectors: false
  env:
    id: StreetsOfRage-Genesis
    render_mode: rgb_array
  algo:
    cnn_keys:
      encoder:
      - rgb
    world_model:
      encoder:
        cnn_channels_multiplier: 48
      recurrent_model:
        recurrent_state_size: 2048
      transition_model:
        hidden_size: 512
      representation_model:
        hidden_size: 512
fabric:
  _target_: lightning.fabric.Fabric
  devices: 1
  num_nodes: 1
  strategy: auto
  accelerator: gpu
  precision: 32-true
  callbacks:
  - _target_: sheeprl.utils.callback.CheckpointCallback
    keep_last: 5
metric:
  log_every: 5000
  disable_timer: false
  log_level: 1
  sync_on_compute: false
  aggregator:
    _target_: sheeprl.utils.metric.MetricAggregator
    raise_on_missing: false
    metrics:
      Rewards/rew_avg:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Game/ep_len_avg:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/world_model_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/value_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/policy_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/observation_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/reward_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/state_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/continue_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      State/kl:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      State/post_entropy:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      State/prior_entropy:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Params/exploration_amount:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Grads/world_model:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Grads/actor:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Grads/critic:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
  logger:
    _target_: lightning.fabric.loggers.TensorBoardLogger
    name: 2024-01-13_12-19-22_dreamer_v3_StreetsOfRage-Genesis_42
    root_dir: logs\runs\dreamer_v3/StreetsOfRage-Genesis
    version: null
    default_hp_metric: true
    prefix: ''
    sub_dir: null
model_manager:
  disabled: true
  models:
    world_model:
      model_name: dreamer_v3_StreetsOfRage-Genesis_world_model
      description: DreamerV3 World Model used in StreetsOfRage-Genesis Environment
      tags: {}
    actor:
      model_name: dreamer_v3_StreetsOfRage-Genesis_actor
      description: DreamerV3 Actor used in StreetsOfRage-Genesis Environment
      tags: {}
    critic:
      model_name: dreamer_v3_StreetsOfRage-Genesis_critic
      description: DreamerV3 Critic used in StreetsOfRage-Genesis Environment
      tags: {}
    target_critic:
      model_name: dreamer_v3_StreetsOfRage-Genesis_target_critic
      description: DreamerV3 Target Critic used in StreetsOfRage-Genesis Environment
      tags: {}
    moments:
      model_name: dreamer_v3_StreetsOfRage-Genesis_moments
      description: DreamerV3 Moments used in StreetsOfRage-Genesis Environment
      tags: {}

here is the yaml of the resume training ( in this case it didnt even do a single step of training, it just did a "test" and returned the reward value and then stopped.

num_threads: 1
dry_run: false
seed: 42
torch_deterministic: false
exp_name: dreamer_v3_StreetsOfRage-Genesis
run_name: 2024-01-13_21-46-54_dreamer_v3_StreetsOfRage-Genesis_42
root_dir: dreamer_v3/StreetsOfRage-Genesis
algo:
  name: dreamer_v3
  total_steps: 500000
  per_rank_batch_size: 16
  run_test: true
  cnn_keys:
    encoder:
    - rgb
    decoder:
    - rgb
  mlp_keys:
    encoder: []
    decoder: []
  world_model:
    optimizer:
      _target_: torch.optim.Adam
      lr: 0.0001
      eps: 1.0e-08
      weight_decay: 0
      betas:
      - 0.9
      - 0.999
    discrete_size: 32
    stochastic_size: 32
    kl_dynamic: 0.5
    kl_representation: 0.1
    kl_free_nats: 1.0
    kl_regularizer: 1.0
    continue_scale_factor: 1.0
    clip_gradients: 1000.0
    encoder:
      cnn_channels_multiplier: 64
      cnn_act: torch.nn.SiLU
      dense_act: torch.nn.SiLU
      mlp_layers: 4
      layer_norm: true
      dense_units: 768
    recurrent_model:
      recurrent_state_size: 2048
      layer_norm: true
      dense_units: 768
    transition_model:
      hidden_size: 768
      dense_act: torch.nn.SiLU
      layer_norm: true
    representation_model:
      hidden_size: 768
      dense_act: torch.nn.SiLU
      layer_norm: true
    observation_model:
      cnn_channels_multiplier: 64
      cnn_act: torch.nn.SiLU
      dense_act: torch.nn.SiLU
      mlp_layers: 4
      layer_norm: true
      dense_units: 768
    reward_model:
      dense_act: torch.nn.SiLU
      mlp_layers: 4
      layer_norm: true
      dense_units: 768
      bins: 255
    discount_model:
      learnable: true
      dense_act: torch.nn.SiLU
      mlp_layers: 4
      layer_norm: true
      dense_units: 768
  actor:
    optimizer:
      _target_: torch.optim.Adam
      lr: 8.0e-05
      eps: 1.0e-05
      weight_decay: 0
      betas:
      - 0.9
      - 0.999
    cls: sheeprl.algos.dreamer_v3.agent.Actor
    ent_coef: 0.0003
    min_std: 0.1
    init_std: 0.0
    objective_mix: 1.0
    dense_act: torch.nn.SiLU
    mlp_layers: 4
    layer_norm: true
    dense_units: 768
    clip_gradients: 100.0
    expl_amount: 0.0
    expl_min: 0.0
    expl_decay: false
    max_step_expl_decay: 0
    moments:
      decay: 0.99
      max: 1.0
      percentile:
        low: 0.05
        high: 0.95
  critic:
    optimizer:
      _target_: torch.optim.Adam
      lr: 8.0e-05
      eps: 1.0e-05
      weight_decay: 0
      betas:
      - 0.9
      - 0.999
    dense_act: torch.nn.SiLU
    mlp_layers: 4
    layer_norm: true
    dense_units: 768
    target_network_update_freq: 1
    tau: 0.02
    bins: 255
    clip_gradients: 100.0
  gamma: 0.996996996996997
  lmbda: 0.95
  horizon: 15
  train_every: 16
  learning_starts: 65536
  per_rank_pretrain_steps: 1
  per_rank_gradient_steps: 1
  per_rank_sequence_length: 64
  layer_norm: true
  dense_units: 768
  mlp_layers: 4
  dense_act: torch.nn.SiLU
  cnn_act: torch.nn.SiLU
  unimix: 0.01
  hafner_initialization: true
  player:
    discrete_size: 32
buffer:
  size: 1000000
  memmap: true
  validate_args: false
  from_numpy: false
  checkpoint: false
checkpoint:
  every: 100000
  resume_from: H:\aiWorkspace\gymRetro\rl\sheeprl\sheeprl\logs\runs\dreamer_v3\StreetsOfRage-Genesis\2024-01-13_12-19-22_dreamer_v3_StreetsOfRage-Genesis_42\version_0\checkpoint\ckpt_500000_0.ckpt
  save_last: true
  keep_last: 5
distribution:
  validate_args: false
  type: auto
env:
  id: StreetsOfRage-Genesis
  num_envs: 4
  frame_stack: -1
  sync_env: false
  screen_size: 64
  action_repeat: 1
  grayscale: false
  clip_rewards: false
  capture_video: false
  frame_stack_dilation: 1
  max_episode_steps: null
  reward_as_observation: false
  wrapper:
    _target_: sheeprl.envs.retro.RetroSheepWrapper
    screen_size: 64
    seed: 42
    rom: StreetsOfRage-Genesis
    loadstate: 1Player.Blaze.Round5.Normal
  from_vectors: false
  env:
    id: StreetsOfRage-Genesis
    render_mode: rgb_array
  algo:
    cnn_keys:
      encoder:
      - rgb
    world_model:
      encoder:
        cnn_channels_multiplier: 48
      recurrent_model:
        recurrent_state_size: 2048
      transition_model:
        hidden_size: 512
      representation_model:
        hidden_size: 512
fabric:
  _target_: lightning.fabric.Fabric
  devices: 1
  num_nodes: 1
  strategy: auto
  accelerator: gpu
  precision: 32-true
  callbacks:
  - _target_: sheeprl.utils.callback.CheckpointCallback
    keep_last: 5
metric:
  log_every: 5000
  disable_timer: false
  log_level: 1
  sync_on_compute: false
  aggregator:
    _target_: sheeprl.utils.metric.MetricAggregator
    raise_on_missing: false
    metrics:
      Rewards/rew_avg:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Game/ep_len_avg:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/world_model_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/value_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/policy_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/observation_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/reward_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/state_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Loss/continue_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      State/kl:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      State/post_entropy:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      State/prior_entropy:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Params/exploration_amount:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Grads/world_model:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Grads/actor:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
      Grads/critic:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: false
  logger:
    _target_: lightning.fabric.loggers.TensorBoardLogger
    name: 2024-01-13_21-46-54_dreamer_v3_StreetsOfRage-Genesis_42
    root_dir: logs\runs\dreamer_v3/StreetsOfRage-Genesis
    version: null
    default_hp_metric: true
    prefix: ''
    sub_dir: null
model_manager:
  disabled: true
  models:
    world_model:
      model_name: dreamer_v3_StreetsOfRage-Genesis_world_model
      description: DreamerV3 World Model used in StreetsOfRage-Genesis Environment
      tags: {}
    actor:
      model_name: dreamer_v3_StreetsOfRage-Genesis_actor
      description: DreamerV3 Actor used in StreetsOfRage-Genesis Environment
      tags: {}
    critic:
      model_name: dreamer_v3_StreetsOfRage-Genesis_critic
      description: DreamerV3 Critic used in StreetsOfRage-Genesis Environment
      tags: {}
    target_critic:
      model_name: dreamer_v3_StreetsOfRage-Genesis_target_critic
      description: DreamerV3 Target Critic used in StreetsOfRage-Genesis Environment
      tags: {}
    moments:
      model_name: dreamer_v3_StreetsOfRage-Genesis_moments
      description: DreamerV3 Moments used in StreetsOfRage-Genesis Environment
      tags: {}

@Disastorm
Copy link
Author

Disastorm commented Jan 13, 2024

oh maybe i see the reason. it looks like the resume yaml isn't getting my algo.total_steps params properly from the command line for some reason. It still has the previous total steps, which explains why it quits as soon as it starts. Can you tell me if i should be using experiment instead or is this supposed to work? Also when exactly would I use experiment vs using regular training? thanks.

interestingly enough the yaml that is printed out in the console actually has the updated total_steps but the yaml written to the folder ( and presumably the yaml that is actually used, seems to be the previous yaml ).

@Disastorm
Copy link
Author

Disastorm commented Jan 13, 2024

looks like if i modify the values in the old yaml, it works. i change the total_steps and learning_starts in the old yaml and when resuming the resuming gets those values properly.

@belerico
Copy link
Member

I feel like you should leave this open, I am encountering the same error with command: python sheeprl.py exp=dreamer_v3 env=dmc env.id=walker_walk "algo.cnn_keys.encoder=[rgb]" Hangs at 64k. I'm, 99% sure this isn't a vram problem on my end (A40 barely showed fill) what was your fix?

If you have used just like that it is possible that you're running the experiment on cpu. To run it on gpu you can run the following command:

python sheeprl.py exp=dreamer_v3 env=dmc env.id=walker_walk "algo.cnn_keys.encoder=[rgb] fabric.accelerator=gpu

and to reduce the memory footprint you can also try to add fabric.precision=bf16-mixed to train the model with a mixed precision.
Could you please try this out and tell us if this solves your problem?

@belerico
Copy link
Member

I figured some ways around this stuff, only thing I'd like to really know is just if there is any way to reduce the memory since it takes so much vram while its training. otherwise ill just close this since its not really an issue.

To reduce the memory footprint you could try to lower down the model dimension: you could have a look at this config, where the S version of dreamer-v3 is used. You could also try to lower down the algo.per_rank_batch_size or algo.per_rank_sequence_length and see if it helps

@belerico
Copy link
Member

looks like if i modify the values in the old yaml, it works. i change the total_steps and learning_starts in the old yaml and when resuming the resuming gets those values properly.

This is inteded because when we resume from a checkpoint we assume that somethig has gone wrong, so we load the old config from the checkpoint to be resumed and we merge it with the one running now, so everything that you specify from the CLI will be discarded.

Moreover, when you resume from a checkppoint you must (as explained in our how-to) specify the entire path to the file .ckpt containing the checkpoint to be resumed.

When the checkpoint is resumed, the start_step will be set to the last update saved in the checkpoint file, so that the training can be safely resume from there and if you haven't saved the buffer in the checkpoint, then the algorithm will collect random data to fill up the new buffer for learning_starts steps before starting the actual training.

@belerico
Copy link
Member

PS I also suggest to install the latest version of sheeprl with pip install sheeprl==0.5.2, which faster and better optimized

@balloch
Copy link

balloch commented Jan 13, 2024

fabric.accelerator=gpu on command line.

I think this is an indicator that I have done something wrong in my setup. fabric.accelerator=gpu doesn't work for me, I have to do fabric.accelerator=cuda . I did notice that eventually it did start going again (like yours) but veeeeeeeeeeery slow. original tf dreamer implementation runs about 10x faster, to say nothing of the JAX dreamerv3 implementation (but im not trying to compare against that, apples to oranges, that code is lowkey impossible to work with)

@balloch
Copy link

balloch commented Jan 13, 2024

fabric.precision=bf16-mixed

great advice with the fabric=precision

I did rerun it with the accelerator set, and it resumes eventually but it takes a long time. maybe there is some initialization that lightning does that im not used to, and it will speed up over time (jax/flax is like this; the first time operations are run is much longer)?

@belerico
Copy link
Member

fabric.accelerator=gpu on command line.

I think this is an indicator that I have done something wrong in my setup. fabric.accelerator=gpu doesn't work for me, I have to do fabric.accelerator=cuda . I did notice that eventually it did start going again (like yours) but veeeeeeeeeeery slow. original tf dreamer implementation runs about 10x faster, to say nothing of the JAX dreamerv3 implementation (but im not trying to compare against that, apples to oranges, that code is lowkey impossible to work with)

Could you quantify your being slow? Consider that even the smallest model (the S-sized one) used to train on the Atari-100k takes around 9/10 hours on a single V100, the same reported by the authors of DV3 with just a "< 1 day". Another thing that can speed up the training is to set the algo.train_every=N with N>1, meaning that the agent will be trained algo.per_rank_gradient_steps times every N policy steps, where a policy step is a single forward of the agent to retrieve the action given an observation from the env: if you have E envs per process and P processes, then for a single environment interaction you will have E * P policy steps

@Disastorm
Copy link
Author

Disastorm commented Jan 14, 2024

looks like if i modify the values in the old yaml, it works. i change the total_steps and learning_starts in the old yaml and when resuming the resuming gets those values properly.

This is inteded because when we resume from a checkpoint we assume that somethig has gone wrong, so we load the old config from the checkpoint to be resumed and we merge it with the one running now, so everything that you specify from the CLI will be discarded.

Moreover, when you resume from a checkppoint you must (as explained in our how-to) specify the entire path to the file .ckpt containing the checkpoint to be resumed.

When the checkpoint is resumed, the start_step will be set to the last update saved in the checkpoint file, so that the training can be safely resume from there and if you haven't saved the buffer in the checkpoint, then the algorithm will collect random data to fill up the new buffer for learning_starts steps before starting the actual training.

Ok I see, thanks. So actually manual resuming is not really intended, its considered like "something went wrong"? Is there no way to have the buffer read from the previous run's buffer files? Would filling up the buffer with 65k runs at the start of the resuming make up for this, or should i actually just include the buffer in the checkpoint until I feel like the model is good enough and then take it out of the checkpoint? I assume the way to do this is have the experiment's buffer.checkpoint = True?

So my main questions are:

  1. How do i continously train something until I think its good across multiple runs, do I just do what I mentioned above?
  2. Also, if my config screen_size doesnt match my environment am I correct in seeing that it transforms the observation to the config screen_size? And the config screen_size only supports a square, you can't use a rectangle? And it has to be a power of 2? so basically 64x64, 128x128, 256x256, etc ?

Otherwise feel free to close this issue at least for my stuff.

@belerico
Copy link
Member

Is there no way to have the buffer read from the previous run's buffer files?

Right now no, this is done only when you're resuming a checkpoint and you have set buffer.checkpoint=True

So actually manual resuming is not really intended, its considered like "something went wrong"?

Exactly, if you're resuming we assume that you want to continue with a previous experiment, taking off from where you left

How do i continuously train something until I think it's good across multiple runs, do I just do what I mentioned above?

Right now you just have to set a higher algo.total_steps and train for more time

Also, if my config screen_size doesn't match my environment am I correct in seeing that it transforms the observation to the config screen_size? And the config screen_size only supports a square, you can't use a rectangle? And it has to be a power of 2? so basically 64x64, 128x128, 256x256, etc ?

Dreamer-v3 works only with the power of 2 for how the CNN encoder is defined and yes, your observation is resized to match the env.screen_size variable

@Disastorm
Copy link
Author

Cool thanks for the answers.

@belerico
Copy link
Member

@balloch @Disastorm, is it ok for you to close the issue?

@balloch
Copy link

balloch commented Jan 14, 2024

Yes, I think so, very helpful! By slow I mean the default model took ~24 hours to do 1.2 million steps, which isn't bad it's about the speed of dreamerv2, I think it just seemed weird because of that long gap of time to get passed 64k

@balloch
Copy link

balloch commented Jan 14, 2024

@balloch @Disastorm, is it ok for you to close the issue?

Ok with me!

@Disastorm
Copy link
Author

Disastorm commented Jan 15, 2024

yea closing is fine.

@belerico
Hey just wondering how this buffer checkpointing works? I have

buffer:
  size: 1000000
  checkpoint: True

And so when resuming it doesn't do the pretraining buffer steps anymore, however I noticed the buffer files don't ever get updated, the last modified date is just when the first training started. Is this a problem? The files I'm referring to are the .memmap files, I see now it doesn't keep creating them for each run when checkpoint = True, so I assumed it would be using the ones from the previous run, but their update date isn't changing at all. Is it inside the checkpoint file itself? The filesize of the checkpoint still looks pretty similar to when running with checkpoint: False I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants