Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: FP16 Mixed precision training with AMP or APEX (--fp16) and FP16 half precision evaluation (--fp16_full_eval) can only be used on CUDA devices #24

Open
chintan-donda opened this issue Jun 13, 2023 · 1 comment

Comments

@chintan-donda
Copy link

chintan-donda commented Jun 13, 2023

Getting below error when trying to finetune the model.

Converted as Half.
trainable params: 8355840 || all params: 1075691520 || trainable%: 0.7767877541695225
Found cached dataset json (/home/users/users/.cache/huggingface/datasets/json/default-7089e4ef944c023b/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 21.48it/s]
Loading cached split indices for dataset at /home/users/users/.cache/huggingface/datasets/json/default-7089e4ef944c023b/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-a03d095090258b35.arrow and /home/users/users/.cache/huggingface/datasets/json/default-7089e4ef944c023b/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-f83f741993333274.arrow
Run eval every 6 steps                                                                                                                                                  
Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
PyTorch: setting up devices

Traceback (most recent call last):
  File "/home/users/users/falcontune/venv_falcontune/bin/falcontune", line 33, in <module>
    sys.exit(load_entry_point('falcontune==0.1.0', 'console_scripts', 'falcontune')())
  File "/home/users/users/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/run.py", line 87, in main
    args.func(args)
  File "/home/users/users/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/finetune.py", line 116, in finetune
    training_arguments = transformers.TrainingArguments(
  File "<string>", line 111, in __init__
  File "/home/users/users/falcontune/venv_falcontune/lib/python3.8/site-packages/transformers/training_args.py", line 1338, in __post_init__
    raise ValueError(
ValueError: FP16 Mixed precision training with AMP or APEX (`--fp16`) and FP16 half precision evaluation (`--fp16_full_eval`) can only be used on CUDA devices.

Experimental setup details:
OS: Ubuntu 18.04.5 LTS
GPU: Tesla V100-SXM2-32GB
Libs:

bitsandbytes==0.39.0
transformers==4.29.2
triton==2.0.0
sentencepiece==0.1.99
datasets==2.12.0
peft==0.3.0
torch==2.0.1+cu118
accelerate==0.19.0
safetensors==0.3.1
einops==0.6.1
wandb==0.15.3
bitsandbytes==0.39.0
scipy==1.10.1

Finetuning command:

falcontune finetune \
    --model="falcon-40b-instruct-4bit" \
    --weights="./gptq_model-4bit--1g.safetensors" \
    --dataset="./alpaca_cleaned.json" \
    --data_type="alpaca" \
    --lora_out_dir="./falcon-40b-instruct-4bit-alpaca/" \
    --mbatch_size=1 \
    --batch_size=2 \
    --epochs=$epochs \
    --lr=3e-4 \
    --cutoff_len=256 \
    --lora_r=8 \
    --lora_alpha=16 \
    --lora_dropout=0.05 \
    --warmup_steps=5 \
    --save_steps=50 \
    --save_total_limit=3 \
    --logging_steps=5 \
    --target_modules='["query_key_value"]' \
    --backend="triton"

Any help please?

@wyklq
Copy link

wyklq commented Jun 26, 2023

To train with V100, we need enable_mem_efficient, otherwise, the above error is shown.

--- a/falcontune/model/falcon/model.py
+++ b/falcontune/model/falcon/model.py
@@ -523,7 +523,7 @@ class Attention40B(nn.Module):
             key_layer_ = key_layer.reshape(batch_size, self.num_heads, -1, self.head_dim)
             value_layer_ = value_layer.reshape(batch_size, self.num_heads, -1, self.head_dim)

-            with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
+            with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=True):
                 attn_output = F.scaled_dot_product_attention(
                     query_layer_, key_layer_, value_layer_, None, 0.0, is_causal=True
                 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants