Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA_ERROR_INVALID_PTX with old GPUs #1

Closed
ibeltagy opened this issue Apr 7, 2020 · 3 comments
Closed

CUDA_ERROR_INVALID_PTX with old GPUs #1

ibeltagy opened this issue Apr 7, 2020 · 3 comments

Comments

@ibeltagy
Copy link
Collaborator

ibeltagy commented Apr 7, 2020

CUDA5 and older gpus are not supported for now.
https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

@ibeltagy ibeltagy changed the title CUDA_ERROR_INVALID_PTX with old GPUs CUDA_ERROR_INVALID_PTX with old GPUs Apr 7, 2020
@ibeltagy ibeltagy changed the title CUDA_ERROR_INVALID_PTX with old GPUs CUDA_ERROR_INVALID_PTX with old GPUs Apr 7, 2020
@jseale
Copy link

jseale commented Apr 13, 2020

When I run the hello world code, I get the following error:

Loading tvm binary from: ./longformer/lib/lib_diagonaled_mm_float32_cuda.so
---------------------------------------------------------------------------
TVMError                                  Traceback (most recent call last)
<ipython-input-1-2b363e69144a> in <module>
     23                                      # QA: question tokenss
     24 
---> 25 output = model(input_ids, attention_mask=attention_mask)[0]

~/anaconda3/envs/longformer/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/transformers/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask)
    175                                                  token_type_ids=token_type_ids,
    176                                                  position_ids=position_ids,
--> 177                                                  head_mask=head_mask)
    178 
    179 

~/anaconda3/envs/longformer/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask)
    623         encoder_outputs = self.encoder(embedding_output,
    624                                        extended_attention_mask,
--> 625                                        head_mask=head_mask)
    626         sequence_output = encoder_outputs[0]
    627         pooled_output = self.pooler(sequence_output)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask)
    344                 all_hidden_states = all_hidden_states + (hidden_states,)
    345 
--> 346             layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i])
    347             hidden_states = layer_outputs[0]
    348 

~/anaconda3/envs/longformer/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask)
    322 
    323     def forward(self, hidden_states, attention_mask=None, head_mask=None):
--> 324         attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
    325         attention_output = attention_outputs[0]
    326         intermediate_output = self.intermediate(attention_output)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/anaconda3/envs/longformer/lib/python3.7/site-packages/transformers/modeling_bert.py in forward(self, input_tensor, attention_mask, head_mask)
    279 
    280     def forward(self, input_tensor, attention_mask=None, head_mask=None):
--> 281         self_outputs = self.self(input_tensor, attention_mask, head_mask)
    282         attention_output = self.output(self_outputs[0], input_tensor)
    283         outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them

~/anaconda3/envs/longformer/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/longformer/longformer/longformer.py in forward(self, hidden_states, attention_mask, head_mask)
    105         k = k.view(seq_len, bsz, self.num_heads, self.head_dim).transpose(0, 1).contiguous().float()
    106         # attn_weights = (bsz, seq_len, num_heads, window*2+1)
--> 107         attn_weights = diagonaled_mm_tvm(q, k, self.attention_window, self.attention_dilation, False, 0, False)
    108         mask_invalid_locations(attn_weights, self.attention_window, self.attention_dilation, False)
    109         if remove_from_windowed_attention_mask is not None:

~/longformer/longformer/diagonaled_mm_tvm.py in forward(ctx, t1, t2, w, d, is_t1_diagonaled, padding, autoregressive)
    259         t2 = DiagonaledMM._prepare_tensors(t2)
    260         # output = t1.mm(t2)  # what would have been called if this was a regular matmul
--> 261         output = DiagonaledMM._diagonaled_mm(t1, t2, w, d, is_t1_diagonaled=is_t1_diagonaled, padding=padding, autoregressive=autoregressive)
    262         return output
    263 

~/longformer/longformer/diagonaled_mm_tvm.py in _diagonaled_mm(t1, t2, w, d, is_t1_diagonaled, transpose_t1, padding, autoregressive)
    200             print('Error: the hidden dimension {m} shouldn\'t match number of diagonals {c}')
    201             assert False
--> 202         _diagonaled_mm_function(t1, t2, r, d, w, w_upper, padding, transpose_t1, m if is_t1_diagonaled else c)
    203         return r
    204 

~/longformer/tvm/contrib/dlpack.py in _wrapper(*args)
     38         args = tuple(ndarray.from_dlpack(to_dlpack_func(arg))\
     39             if isinstance(arg, tensor_type) else arg for arg in args)
---> 40         return tvm_func(*args)
     41 
     42     return _wrapper

~/longformer/tvm/_ffi/function.py in __call__(self, *args)
    143             return self._entry(*args)
    144         f = self.entry_func
--> 145         return f(*args)
    146 
    147 

~/longformer/tvm/_ffi/_ctypes/function.py in __call__(self, *args)
    208                 self.handle, values, tcodes, ctypes.c_int(num_args),
    209                 ctypes.byref(ret_val), ctypes.byref(ret_tcode)) != 0:
--> 210             raise get_last_ffi_error()
    211         _ = temp_args
    212         _ = args

TVMError: Traceback (most recent call last):
  [bt] (3) /home/ubuntu/longformer/tvm/libtvm_runtime.so(TVMFuncCall+0x61) [0x7fd4f55e6681]
  [bt] (2) /home/ubuntu/longformer/tvm/libtvm_runtime.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::detail::PackFuncVoidAddr_<0, tvm::runtime::CUDAWrappedFunc>(tvm::runtime::CUDAWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x14a) [0x7fd4f567f4da]
  [bt] (1) /home/ubuntu/longformer/tvm/libtvm_runtime.so(tvm::runtime::CUDAWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void**) const+0x9e3) [0x7fd4f567eff3]
  [bt] (0) /home/ubuntu/longformer/tvm/libtvm_runtime.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7fd4f55e0492]
  File "/code/tvm/src/runtime/cuda/cuda_module.cc", line 111
  File "/code/tvm/src/runtime/module_util.cc", line 73
CUDAError: Check failed: ret == 0 (-1 vs. 0) : cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

I'm using Ubuntu; followed the venv instructions. Here's output from conda list:
markdown 3.2.1 pypi_0 pypi
mkl 2020.0 166
mkl-service 2.3.0 py37he904b0f_0
mkl_fft 1.0.15 py37ha843d7b_0
mkl_random 1.1.0 py37hd6b4f25_0
ncurses 6.2 he6710b0_0
ninja 1.9.0 py37hfd86e86_0
numpy 1.18.2 pypi_0 pypi
numpy-base 1.18.1 py37hde5b4d6_1
oauthlib 3.1.0 pypi_0 pypi
openssl 1.0.2u h7b6447c_0
pandas 1.0.3 pypi_0 pypi
parso 0.6.2 py_0 anaconda
pexpect 4.8.0 py37_0 anaconda
pickleshare 0.7.5 py37_0 anaconda
pillow 7.1.1 pypi_0 pypi
pip 20.0.2 py37_1
prompt-toolkit 3.0.4 py_0 anaconda
prompt_toolkit 3.0.4 0 anaconda
protobuf 3.11.3 pypi_0 pypi
ptyprocess 0.6.0 py37_0 anaconda
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pycparser 2.20 py_0
pygments 2.6.1 py_0 anaconda
python 3.7.0 h6e4f718_3
python-dateutil 2.8.1 py_0 anaconda
pytorch 1.2.0 cuda100py37h938c94c_0
pytorch-lightning 0.6.0 pypi_0 pypi
pytz 2019.3 pypi_0 pypi
pyzmq 18.1.1 py37he6710b0_0 anaconda
readline 7.0 h7b6447c_5
regex 2020.4.4 pypi_0 pypi
requests 2.23.0 pypi_0 pypi
requests-oauthlib 1.3.0 pypi_0 pypi
rsa 4.0 pypi_0 pypi
s3transfer 0.3.3 pypi_0 pypi
sacremoses 0.0.40 pypi_0 pypi
scikit-learn 0.22.2.post1 pypi_0 pypi
scipy 1.4.1 pypi_0 pypi
sentencepiece 0.1.85 pypi_0 pypi
setuptools 46.1.3 py37_0
six 1.14.0 py37_0 anaconda
sqlite 3.31.1 h7b6447c_0
tensorboard 2.2.0 pypi_0 pypi
tensorboard-plugin-wit 1.6.0.post3 pypi_0 pypi
tensorboardx 2.0 pypi_0 pypi
test-tube 0.7.5 pypi_0 pypi
tk 8.6.8 hbc83047_0
torchvision 0.4.2 pypi_0 pypi
tornado 6.0.4 py37h7b6447c_1 anaconda
tqdm 4.45.0 pypi_0 pypi
traitlets 4.3.3 py37_0 anaconda
transformers 2.0.0 pypi_0 pypi
urllib3 1.25.8 pypi_0 pypi
wcwidth 0.1.9 py_0 anaconda
werkzeug 1.0.1 pypi_0 pypi
wheel 0.34.2 py37_0
xz 5.2.4 h14c3975_4
zeromq 4.3.1 he6710b0_3 anaconda
zlib 1.2.11 h7b6447c_3

@ibeltagy
Copy link
Collaborator Author

ibeltagy commented Apr 13, 2020

Which GPU are you using? the CUDA kernel doesn't work with certain old GPUs for now, but we are planning to fix this in the near future. In the meantime, you can compile your own binaries following the instructions in the cheatsheet.txt and after changing arch='sm_52' here to an older arch (check here for a list of architectures to find the one that matches your GPU). Probably arch='sm_30' will work fine, but it is slower for newer GPUs, that's why we didn't use it by default.

@jseale
Copy link

jseale commented Apr 14, 2020

Thank you, Iz. I had been using a Tesla K80. I switched to a Tesla V100, and the hello world code runs without problem now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants