use DeepSpeed dataloader duplicate per batch #1136

Mryangkaitong · 2023-03-02T10:20:17Z

when I test examples/nlp_example.py， I added a little print log on line 175（https://github.com/huggingface/accelerate/blob/main/examples/nlp_example.py#L145）.

the default_config.yaml ：

You can see that different GPU run different data in the same step. it is ok

compute_environment: LOCAL_MACHINE deepspeed_config: {} distributed_type: MULTI_GPU downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} gpu_ids: all machine_rank: 0 main_training_function: main megatron_lm_config: {} mixed_precision: bf16 num_machines: 1 num_processes: 2 rdzv_backend: static same_network: true use_cpu: false

but when use DeepSpeed , different GPU run same data in the same step. How to understand this? Different cards running the same data?

ed
the default_config.yaml ：
compute_environment: LOCAL_MACHINE deepspeed_config: deepspeed_config_file: ./config_blocklm.json zero3_init_flag: false distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} machine_rank: 0 main_training_function: main megatron_lm_config: {} num_machines: 1 num_processes: 2 rdzv_backend: static same_network: true use_cpu: false

The text was updated successfully, but these errors were encountered:

pacman100 · 2023-03-02T11:18:20Z

Hello @Mryangkaitong, can you check if PR #1126 fixes the above issue. Currently, if train_micro_batch_size_per_gpu isn't auto, dataloaders aren't prepared. The above PR should resolve it.

Mryangkaitong · 2023-03-02T12:40:05Z

thanks, it is different data in the same step

pacman100 mentioned this issue Mar 2, 2023

deepspeed dataloader prepare fix #1126

Merged

Mryangkaitong closed this as completed Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use DeepSpeed dataloader duplicate per batch #1136

use DeepSpeed dataloader duplicate per batch #1136

Mryangkaitong commented Mar 2, 2023 •

edited

Loading

pacman100 commented Mar 2, 2023

Mryangkaitong commented Mar 2, 2023

use DeepSpeed dataloader duplicate per batch #1136

use DeepSpeed dataloader duplicate per batch #1136

Comments

Mryangkaitong commented Mar 2, 2023 • edited Loading

pacman100 commented Mar 2, 2023

Mryangkaitong commented Mar 2, 2023

Mryangkaitong commented Mar 2, 2023 •

edited

Loading