Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Use Multiple GPUs. #67

Closed
inouetaka opened this issue Sep 10, 2019 · 2 comments
Closed

How to Use Multiple GPUs. #67

inouetaka opened this issue Sep 10, 2019 · 2 comments

Comments

@inouetaka
Copy link

Hi Author,
Great work from you, and thanks for the sharing.

I tried to use more than one GPU, but it failed.

Command
CUDA_VISIBLE_DEVICES=0,1 python3 train.py --train_data data_lmdb_release/training --valid_data data_lmdb_release/validation --select_data MJ --batch_ratio 1.0 --Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC --data_filtering_off --workers 0

Options
experiment_name: None-VGG-BiLSTM-CTC-Seed1111 train_data: data_lmdb_release/training valid_data: data_lmdb_release/validation manualSeed: 1111 workers: 0 batch_size: 192 num_iter: 300000 valInterval: 2000 continue_model: adam: False lr: 1 beta1: 0.9 rho: 0.95 eps: 1e-08 grad_clip: 5 select_data: ['MJ'] batch_ratio: ['1.0'] total_data_usage_ratio: 1.0 batch_max_length: 25 imgH: 32 imgW: 100 rgb: False sensitive: False PAD: False data_filtering_off: True Transformation: None FeatureExtraction: VGG SequenceModeling: BiLSTM Prediction: CTC num_fiducial: 20 input_channel: 1 output_channel: 512 hidden_size: 256 num_gpu: 2 num_class: 2974

Error
[0/300000] Loss: nan elapsed_time: 14.51815 Traceback (most recent call last): File "train.py", line 282, in <module> train(opt) File "train.py", line 164, in train model, criterion, valid_loader, converter, opt) File "/home/ubuntu/deep-text-recognition/test.py", line 90, in validation preds = model(image, text_for_pred).log_softmax(2) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply raise output File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker output = module(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/deep-text-recognition/model.py", line 82, in forward contextual_feature = self.SequenceModeling(visual_feature) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/deep-text-recognition/modules/sequence_modeling.py", line 25, in forward self.rnn.flatten_parameters() File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters self.batch_first, bool(self.bidirectional)) RuntimeError: set_storage is not allowed on Tensor created from .data or .detach()

@inouetaka
Copy link
Author

Hi.
I was able to solve the problem.
CONCLUSIONS: We upgraded the torch version to 1.2.0.

pytorch/pytorch#21108

@ku21fan
Copy link
Contributor

ku21fan commented Sep 11, 2019

Good :D
It was duplicated of #55 though.

@ku21fan ku21fan closed this as completed Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants