Error in running pretrain because of torch.distributed #26

tinaboya2023 · 2023-04-18T05:29:52Z

Hi,
I install environment with below information
python=3.8
pytorch,cuda with command= conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
GPU= 1 geforce RTX 3090 (24 GPU-RAM)

I'm trying to run pretrain with below command
python -m torch.distributed.launch --nproc_per_node 1 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True

but I encounter below code

Could you help me to resolve this problem?
Is this error because of using 1 GPU?
Do I need to change the initial value of a some parameter(like local_rank)?
Could the reason for this error be due to lack of GPU-memory?
It is very important to me to solve this problem.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in running pretrain because of torch.distributed #26

Error in running pretrain because of torch.distributed #26

tinaboya2023 commented Apr 18, 2023

Error in running pretrain because of torch.distributed #26

Error in running pretrain because of torch.distributed #26

Comments

tinaboya2023 commented Apr 18, 2023