Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: No available kernel. Aborting execution. #11

Open
RealCalumPlays opened this issue Jun 2, 2023 · 7 comments
Open

RuntimeError: No available kernel. Aborting execution. #11

RealCalumPlays opened this issue Jun 2, 2023 · 7 comments

Comments

@RealCalumPlays
Copy link

RealCalumPlays commented Jun 2, 2023

Any ideas? Full log below:

Traceback (most recent call last):
File "/home/cosmos/miniconda3/envs/ftune/bin/falcontune", line 33, in
sys.exit(load_entry_point('falcontune==0.1.0', 'console_scripts', 'falcontune')())
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/run.py", line 87, in main
args.func(args)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/finetune.py", line 162, in finetune
trainer.train()
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/transformers/trainer.py", line 2767, in compute_loss
outputs = model(**inputs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
return self.base_model(
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 1070, in forward
transformer_outputs = self.transformer(
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 965, in forward
outputs = block(
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 698, in forward
attn_outputs = self.self_attention(
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 337, in forward
attn_output = F.scaled_dot_product_attention(
RuntimeError: No available kernel. Aborting execution.

EDIT: CUDA is installed in kernel modules, on the system & in the environment just to rule out that. Using python 3.10.6

@itjuba
Copy link

itjuba commented Jun 4, 2023

same error here on Tesla V100-SXM2-32GB

@rmihaylov
Copy link
Owner

There is a choice of three kernels:

torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False

Currently, only flash attention is on. Try enabling the other options as well.

@chintan-donda
Copy link

same error here on Tesla V100-SXM2-32GB

Same issue for me as well on the same machine, with below details:
OS: Ubuntu 18.04.5 LTS
Libs:

bitsandbytes==0.39.0
transformers==4.29.2
triton==2.0.0
sentencepiece==0.1.99
datasets==2.12.0
peft==0.3.0
torch==2.0.1+cu118
accelerate==0.19.0
safetensors==0.3.1
einops==0.6.1
wandb==0.15.3
bitsandbytes==0.39.0
scipy==1.10.1

@chintan-donda
Copy link

There is a choice of three kernels:

torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False

Currently, only flash attention is on. Try enabling the other options as well.

Doing this giving the below error:

Traceback (most recent call last):
  File "falcontune/run.py", line 93, in <module>
    main()
  File "falcontune/run.py", line 89, in main 
    args.func(args)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/finetune.py", line 162, in fin
etune
    trainer.train()
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
    return inner_training_loop(
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)  
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/transformers/trainer.py", line 2767, in compute_loss
    outputs = model(**inputs)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/peft/peft_model.py", line 678, in forward
    return self.base_model(
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/model/falcon/model.py", line 1070, in forward
    transformer_outputs = self.transformer(  
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/model/falcon/model.py", line 965, in forward
    outputs = block(
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/model/falcon/model.py", line 634, in forward
      attn_outputs = self.self_attention(
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/model/falcon/model.py", line 486, in forward
    fused_qkv = self.query_key_value(hidden_states)  # [batch_size, seq_length, 3 x hidden_size]
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/model/lora.py", line 54, in forward
    result = self.quant_class.forward(self, x)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/backend/triton/quantlinear.py", line 13, in forward
    out = AutogradMatmul.apply(
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 106, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/backend/triton/autograd.py", line 11, in forward
    output = tu.triton_matmul(x, qweight, scales, qzeros, g_idx, bits, maxq)
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/backend/triton/triton_utils.py", line 246, in triton_matmul
    matmul_248_kernel[grid](input, qweight, output,
  File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/backend/triton/custom_autotune.py", line 110, in run
    return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
  File "<string>", line 24, in matmul_248_kernel
ValueError: Pointer argument (at 1) cannot be accessed from Triton (cpu tensor?)

@fpena06
Copy link

fpena06 commented Jun 15, 2023

I was having this same issue on google colab v100, switching to a100 fixed it for me.

@chintan-donda
Copy link

Any fix for this? I'm still getting this issue.

@wyklq
Copy link

wyklq commented Jun 26, 2023

In V100, we need enable the mem_efficient mode, it doesn't support native flash attention.

--- a/falcontune/model/falcon/model.py
+++ b/falcontune/model/falcon/model.py
@@ -523,7 +523,7 @@ class Attention40B(nn.Module):
             key_layer_ = key_layer.reshape(batch_size, self.num_heads, -1, self.head_dim)
             value_layer_ = value_layer.reshape(batch_size, self.num_heads, -1, self.head_dim)

-            with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
+            with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=True):
                 attn_output = F.scaled_dot_product_attention(
                     query_layer_, key_layer_, value_layer_, None, 0.0, is_causal=True
                 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants