-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dreamer v3 resuming problem #273
Comments
Hi @Disastorm, when you resume you should set the |
is it memory mapped? the buffer checkpoint is set to False? |
Sorry, I haven't made myself clear in the previous post. The behaviour that I mentioned happens if you chekpoint the buffer and the buffer is memory mapped. If you checkpoint your buffer without memory mapping nothing happens, cause you can safely resume your training as the buffer is saved within the checkpoint. If you don't save your buffer in the checkpoint, then you would pre-fill the buffer with the policy (agent) from the checkpoint so to recreate some sort of history of the buffer before the experiment was stopped and that's what's happening: your agent will pre-fill the buffer for at maximum Right now there's no way to pre-fill the buffer after resuming from checkpoint. |
So there are 2 methods of checkpointing the buffer? The default seems to just have checkpoint disabled. But you are saying you can enable either a memory mapped version or a version that is stored inside the checkpoint? |
This is already done. Suppose that you start an experiment with
Yes, with |
Hi there,
So far, the buffer is pre filled only when you do not save the buffer in the checkpoint (with and without memory mapping). @belerico we should add a config for choosing whether or not to prefill the buffer when resuming the checkpoint. |
I set learning starts to 65k but its still not prefilling, or at least it doesnt seem like it is because its slow from the beginning ( it used to be when it did the prefilling it was super fast ). |
ok i think it worked now when i put memmap false |
actually its weird it started out good then got worse? I originally stopped the training at around 2k rewards so the early rewards here are actually correct. |
Hi @Disastorm, can you specify exactly what you have done? Maybe share your config for the first training and the one for the resuming. |
I used default dreamer v3 large and changed replay_ratio to 0.2 for initial training. That resulted in my initial post, basically it lost a whole bunch of training, my current model will drop from 2k rewards down to 700 or something, so it will lose like multiple hours of training. Then I set learning_starts to 66k and it still didn't help anything. Your guys previous version of dreamerv3 had no problem resuming at all, it worked perfect and it will do the prefil with 65k steps and exactly resume where it left off. I have no idea what your new dreamer v3 is doing, but I have not yet been able to get it to resume properly. If I can't get it to resume, I might just revert back to your old dreamer and use that. |
I think I remember before you had some issue with windows that you fixed (possibly related to memmap, or resuming), is it possible your new dreamerv3 has another issue with windows? |
I've prepared here a branch where you can decide how many prefill steps you want to perform after resuming from the checkpoint. You can specify Please remember that:
One thing that we can add is the possibility to save the buffer in the checkpoint by loading chunks into memory and save them in the checkpoint file: this would be super slow and will definitely hurt the disk memory, in particular if you're working with images and a large buffer |
Have the definitions of buffer.checkpoint and buffer.memmap changed? buffer.checkpoint basically meant each run was going to use the same memmap files from one of the previous runs, the actual same file in the same folder from the older run, so that it didnt create new memmap files for each run. The only thing in the checkpoint was just the path to the files. but it sounds like you are saying now buffer.checkpoint now stores the buffer in the checkpoint file, and buffer.memmap does what buffer.checkpoint used to do, or something like that? I can go ahead and try your branch, too, but just wondering why is the learning_starts divided by num env? |
Nothing changed from there that I know of
This is just what's happening now
This is not what i'm saying. What i wrote to you are the different scenarios that you could encounter.
Because we need to convert those steps into policy-steps |
I see you are right your descriptions are actually the same, thanks. The steps that print out while training, are those the env steps or the policy steps? |
Those are policy steps |
It looks like in your branch, learning starts is already in policy steps. i just tested with 100k and it started trying to learn after 100k policy steps. However, it got this error:
|
It only happens if you're using 1 parallel env and 1 process (1 GPU for example), as you can see here.
If you don't memmap your buffer and you don't have 80GB of RAM on your PC how is it possible to allocate that amount of RAM to hold all the images? We pre-allocate everything in the buffer because sooner or later you need to have that amount of data residing in the RAM. Right now I'm a little bit lost on what is your issue here... |
*edit just reverified the below. Sorry didnt mention I set memmap back to true.
Not commenting about the code, but in terms of testing that wasn't the case. I have 4 environments with 100k learning_starts. started from step 525k and it appeared to error out with the above error at around 625k: You can also see the reward_env0 -> reward_env3 is there. |
Yeah, you're right about the learning starts: I got confused! Those are transformed to policy steps as in the link that I shared. Acknowledging that, have you solved your issue about the resuming? What is your issue now? If this regards some memory issue i suggest you to open another issue |
Yes I get that memory issue you saw before, however this is only happened when i did the last attempt which used your branch. I have never seen this error before on the main branch, although the main branch's learning starts doesnt seem to work right either, so I dont know if the error is related to your branch specifically or just the learning_starts functionality. |
I've spotted the memory error. The problem is related to the replay-ratio: since the replay-ratio is the number of gradient steps per policy steps (i.e. a replay-ratio=0.5 means 1 gradient steps every 2 policy steps), when we resume from the checkpoint the raply-ratio loads its state and when you set the learning_starts to something > 0 then the first time the learning starts the Ratio class wants to keep maintaining the ratio, that's why you see it's slower when training the first time after resuming and that's also the cause of the memory error, since we sample all the needed trajectories once and loop through them. In the branch it should now be fixed the memory error. |
looks like it got past the memory error, i see my gpu vram usage went up and my gpu is processing stuff, although i havn't seen a policy_step reward log after the prefill yet, even though its been almost 30 minutes which is very strange. Do you have any ideas about this? Pretty sure the policy step logs should be showing up in like 5 or at least 10 minutes normally with the ratio I'm using. Is it possible the logging broke, or there is some kind of infinite loop or something, or its perhaps reverted back to 1.0 ratio or something? |
still no logs at all, no further checkpoints have been created either so i think somethings wrong with the training portion even though it passed the memory error, although ill keep it on for a total of an hour before i cancel it |
As i told you in the previous answer the slowdown you see is due to the replay ratio. Suppose for example the following:
When you resume your training the Ratio class knows that 4000 steps have already been done so far. Now, you want to do 2048 pre-fill steps and to maintain the replay-ratio at 0.5 the Ratio class will return a number of training step equal to (6048-4000) * 0.5 = 2048 * 0.5 = 1024. This means that the first time you resume, to maintain the replay-ratio, you will do 1024 training steps. That's why you see a slowdown. |
Sorry I don't really understand the details, Is there a way I can get it to train at the normal speed instead of like 10 or 20 times slower that I guess it might be doing? |
From this branch can you try to set |
you mean before resuming right? Or do you want me to resume with a learning_starts value, cancel and then resume again with it set to 0 or something like that? If i set it to 0, is it going to prefill at all though? |
This is an experiment i've done with the new branch: I've run a training with the following config: python sheeprl.py exp=dreamer_v3 \
env=gym env.id=CartPole-v1 \
env.num_envs=4 \
fabric.accelerator=gpu \
fabric.precision=16-mixed \
algo=dreamer_v3_S \
algo.learning_starts=1024 \
algo.cnn_keys.encoder=\[\] \
algo.mlp_keys.encoder=\["vector"\] \
algo.cnn_keys.decoder=\[\] \
algo.mlp_keys.decoder=\["vector"\] \
algo.per_rank_sequence_length=64 \
algo.replay_ratio=0.5 \
algo.world_model.decoupled_rssm=False \
algo.world_model.learnable_initial_recurrent_state=False Then i've stoppend the training and resumed it with Then i've stopped again the training and resumed it again with As you can see the training resumed perfectly!
Yes, i want you to start a training with a |
I'll try it but I"m just wondering how does the prefill work, does it automatically detect some amount to prefill even if you have it set to 0? You are using checkpoint false right? |
@Disastorm, please come in this google meet |
@belerico I'll try out the stuff you mentioned in the meeting, tommorow, sorry don't have the time right now to do it. |
@belerico So the resuming with checkpoint: true and memmap: true does work as you've said although when checkpoint is disabled and pre-filling is attempted my attempts there have always seemed to be abnormally slow, so I've stopped trying to attempt that alternative. I'm just going to stick with the checkpoint: true and memmap: true. |
@michele-milesi we should decide what to do when we don't checkpoint the buffer and we need to prefill the buffer. The simplest thing is to disable the replay-ratio. Another solution is to dilute the pre-fill steps over the course of the agent training. What do you think? |
I dont really know about that so I can't really help. I guess forcing the replay-ratio to 1 could be an ok solution but it could be annoying if someone wants a different ratio, and theyll need to reconfigure how often they save a checkpoint and whatnot since the steps are going to be slower. |
@belerico, what if we pre-filled the dataset with the number of policy steps played at the time of the checkpoint and continued training as if nothing had happened? (during the pre-fill phase, we do not increase the policy steps). For example, the experiment was interrupted at the This way we would (more or less) have the same situation we had at the moment of the checkpoint, and the ratio would not be affected by the pre-fill. |
i noticed something when resuming. When I had stopped my training initially the envs were getting like 1k-2k rewards, and now after resuming they are only getting around 700. Did they loose some training or something?
The text was updated successfully, but these errors were encountered: