CUDA_ERROR_INVALID_PTX with old GPUs #1

ibeltagy · 2020-04-07T19:38:40Z

CUDA5 and older gpus are not supported for now.
https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

jseale · 2020-04-13T22:49:18Z

When I run the hello world code, I get the following error:

Loading tvm binary from: ./longformer/lib/lib_diagonaled_mm_float32_cuda.so
---------------------------------------------------------------------------
TVMError                                  Traceback (most recent call last)
<ipython-input-1-2b363e69144a> in <module>
     23                                      # QA: question tokenss
     24 
---> 25 output = model(input_ids, attention_mask=attention_mask)[0]

~/anaconda3/envs/longformer/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/transformers/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask)
    175                                                  token_type_ids=token_type_ids,
    176                                                  position_ids=position_ids,
--> 177                                                  head_mask=head_mask)
    178 
    179 

~/anaconda3/envs/longformer/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask)
    623         encoder_outputs = self.encoder(embedding_output,
    624                                        extended_attention_mask,
--> 625                                        head_mask=head_mask)
    626         sequence_output = encoder_outputs[0]
    627         pooled_output = self.pooler(sequence_output)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask)
    344                 all_hidden_states = all_hidden_states + (hidden_states,)
    345 
--> 346             layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i])
    347             hidden_states = layer_outputs[0]
    348 

~/anaconda3/envs/longformer/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask)
    322 
    323     def forward(self, hidden_states, attention_mask=None, head_mask=None):
--> 324         attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
    325         attention_output = attention_outputs[0]
    326         intermediate_output = self.intermediate(attention_output)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, input_tensor, attention_mask, head_mask)
    279 
    280     def forward(self, input_tensor, attention_mask=None, head_mask=None):
--> 281         self_outputs = self.self(input_tensor, attention_mask, head_mask)
    282         attention_output = self.output(self_outputs[0], input_tensor)
    283         outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them

~/anaconda3/envs/longformer/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/longformer/longformer/longformer.py in forward(self, hidden_states, attention_mask, head_mask)
    105         k = k.view(seq_len, bsz, self.num_heads, self.head_dim).transpose(0, 1).contiguous().float()
    106         # attn_weights = (bsz, seq_len, num_heads, window*2+1)
--> 107         attn_weights = diagonaled_mm_tvm(q, k, self.attention_window, self.attention_dilation, False, 0, False)
    108         mask_invalid_locations(attn_weights, self.attention_window, self.attention_dilation, False)
    109         if remove_from_windowed_attention_mask is not None:

~/longformer/longformer/diagonaled_mm_tvm.py in forward(ctx, t1, t2, w, d, is_t1_diagonaled, padding, autoregressive)
    259         t2 = DiagonaledMM._prepare_tensors(t2)
    260         # output = t1.mm(t2)  # what would have been called if this was a regular matmul
--> 261         output = DiagonaledMM._diagonaled_mm(t1, t2, w, d, is_t1_diagonaled=is_t1_diagonaled, padding=padding, autoregressive=autoregressive)
    262         return output
    263 

~/longformer/longformer/diagonaled_mm_tvm.py in _diagonaled_mm(t1, t2, w, d, is_t1_diagonaled, transpose_t1, padding, autoregressive)
    200             print('Error: the hidden dimension {m} shouldn\'t match number of diagonals {c}')
    201             assert False
--> 202         _diagonaled_mm_function(t1, t2, r, d, w, w_upper, padding, transpose_t1, m if is_t1_diagonaled else c)
    203         return r
    204 

~/longformer/tvm/contrib/dlpack.py in _wrapper(*args)
     38         args = tuple(ndarray.from_dlpack(to_dlpack_func(arg))\
     39             if isinstance(arg, tensor_type) else arg for arg in args)
---> 40         return tvm_func(*args)
     41 
     42     return _wrapper

~/longformer/tvm/_ffi/function.py in __call__(self, *args)
    143             return self._entry(*args)
    144         f = self.entry_func
--> 145         return f(*args)
    146 
    147 

~/longformer/tvm/_ffi/_ctypes/function.py in __call__(self, *args)
    208                 self.handle, values, tcodes, ctypes.c_int(num_args),
    209                 ctypes.byref(ret_val), ctypes.byref(ret_tcode)) != 0:
--> 210             raise get_last_ffi_error()
    211         _ = temp_args
    212         _ = args

TVMError: Traceback (most recent call last):
  [bt] (3) /home/ubuntu/longformer/tvm/libtvm_runtime.so(TVMFuncCall+0x61) [0x7fd4f55e6681]
  [bt] (2) /home/ubuntu/longformer/tvm/libtvm_runtime.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr_<0, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x14a) [0x7fd4f567f4da]
  [bt] (1) /home/ubuntu/longformer/tvm/libtvm_runtime.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x9e3) [0x7fd4f567eff3]
  [bt] (0) /home/ubuntu/longformer/tvm/libtvm_runtime.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7fd4f55e0492]
  File "/code/tvm/src/runtime/cuda/cuda_module.cc", line 111
  File "/code/tvm/src/runtime/module_util.cc", line 73
CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

I'm using Ubuntu; followed the venv instructions. Here's output from conda list:
markdown 3.2.1 pypi_0 pypi
mkl 2020.0 166
mkl-service 2.3.0 py37he904b0f_0
mkl_fft 1.0.15 py37ha843d7b_0
mkl_random 1.1.0 py37hd6b4f25_0
ncurses 6.2 he6710b0_0
ninja 1.9.0 py37hfd86e86_0
numpy 1.18.2 pypi_0 pypi
numpy-base 1.18.1 py37hde5b4d6_1
oauthlib 3.1.0 pypi_0 pypi
openssl 1.0.2u h7b6447c_0
pandas 1.0.3 pypi_0 pypi
parso 0.6.2 py_0 anaconda
pexpect 4.8.0 py37_0 anaconda
pickleshare 0.7.5 py37_0 anaconda
pillow 7.1.1 pypi_0 pypi
pip 20.0.2 py37_1
prompt-toolkit 3.0.4 py_0 anaconda
prompt_toolkit 3.0.4 0 anaconda
protobuf 3.11.3 pypi_0 pypi
ptyprocess 0.6.0 py37_0 anaconda
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pycparser 2.20 py_0
pygments 2.6.1 py_0 anaconda
python 3.7.0 h6e4f718_3
python-dateutil 2.8.1 py_0 anaconda
pytorch 1.2.0 cuda100py37h938c94c_0
pytorch-lightning 0.6.0 pypi_0 pypi
pytz 2019.3 pypi_0 pypi
pyzmq 18.1.1 py37he6710b0_0 anaconda
readline 7.0 h7b6447c_5
regex 2020.4.4 pypi_0 pypi
requests 2.23.0 pypi_0 pypi
requests-oauthlib 1.3.0 pypi_0 pypi
rsa 4.0 pypi_0 pypi
s3transfer 0.3.3 pypi_0 pypi
sacremoses 0.0.40 pypi_0 pypi
scikit-learn 0.22.2.post1 pypi_0 pypi
scipy 1.4.1 pypi_0 pypi
sentencepiece 0.1.85 pypi_0 pypi
setuptools 46.1.3 py37_0
six 1.14.0 py37_0 anaconda
sqlite 3.31.1 h7b6447c_0
tensorboard 2.2.0 pypi_0 pypi
tensorboard-plugin-wit 1.6.0.post3 pypi_0 pypi
tensorboardx 2.0 pypi_0 pypi
test-tube 0.7.5 pypi_0 pypi
tk 8.6.8 hbc83047_0
torchvision 0.4.2 pypi_0 pypi
tornado 6.0.4 py37h7b6447c_1 anaconda
tqdm 4.45.0 pypi_0 pypi
traitlets 4.3.3 py37_0 anaconda
transformers 2.0.0 pypi_0 pypi
urllib3 1.25.8 pypi_0 pypi
wcwidth 0.1.9 py_0 anaconda
werkzeug 1.0.1 pypi_0 pypi
wheel 0.34.2 py37_0
xz 5.2.4 h14c3975_4
zeromq 4.3.1 he6710b0_3 anaconda
zlib 1.2.11 h7b6447c_3

ibeltagy · 2020-04-13T23:08:39Z

Which GPU are you using? the CUDA kernel doesn't work with certain old GPUs for now, but we are planning to fix this in the near future. In the meantime, you can compile your own binaries following the instructions in the cheatsheet.txt and after changing arch='sm_52' here to an older arch (check here for a list of architectures to find the one that matches your GPU). Probably arch='sm_30' will work fine, but it is slower for newer GPUs, that's why we didn't use it by default.

jseale · 2020-04-14T05:49:07Z

Thank you, Iz. I had been using a Tesla K80. I switched to a Tesla V100, and the hello world code runs without problem now.

Global attn

ibeltagy changed the title ~~CUDA_ERROR_INVALID_PTX with old GPUs~~ CUDA_ERROR_INVALID_PTX with old GPUs Apr 7, 2020

ibeltagy changed the title ~~CUDA_ERROR_INVALID_PTX with old GPUs~~ CUDA_ERROR_INVALID_PTX with old GPUs Apr 7, 2020

ibeltagy closed this as completed Jun 24, 2020

adkuln mentioned this issue Oct 26, 2020

Multiple GPUs Encoder Decoder #128

Open

armancohan added a commit that referenced this issue Nov 10, 2020

Merge pull request #1 from armancohan/global-attn

f379173

Global attn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA_ERROR_INVALID_PTX with old GPUs #1

CUDA_ERROR_INVALID_PTX with old GPUs #1

ibeltagy commented Apr 7, 2020

jseale commented Apr 13, 2020

ibeltagy commented Apr 13, 2020 •

edited

Loading

jseale commented Apr 14, 2020

CUDA_ERROR_INVALID_PTX with old GPUs #1

CUDA_ERROR_INVALID_PTX with old GPUs #1

Comments

ibeltagy commented Apr 7, 2020

jseale commented Apr 13, 2020

ibeltagy commented Apr 13, 2020 • edited Loading

jseale commented Apr 14, 2020

ibeltagy commented Apr 13, 2020 •

edited

Loading