Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in running pretrain because of torch.distributed #26

Open
tinaboya2023 opened this issue Apr 18, 2023 · 0 comments
Open

Error in running pretrain because of torch.distributed #26

tinaboya2023 opened this issue Apr 18, 2023 · 0 comments

Comments

@tinaboya2023
Copy link

Hi,
I install environment with below information
python=3.8
pytorch,cuda with command= conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
GPU= 1 geforce RTX 3090 (24 GPU-RAM)

I'm trying to run pretrain with below command
python -m torch.distributed.launch --nproc_per_node 1 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True

but I encounter below code
3

Could you help me to resolve this problem?
Is this error because of using 1 GPU?
Do I need to change the initial value of a some parameter(like local_rank)?
Could the reason for this error be due to lack of GPU-memory?
It is very important to me to solve this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant