Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QLoRA finetuning of granite-34b-code-base-gptq model fails on fms-hf-tuning 2.6.0 #479

Open
sutaakar opened this issue Feb 25, 2025 · 1 comment

Comments

@sutaakar
Copy link

Describe the bug

Running QLoRA finetuning of granite-34b-code-base-gptq model fails with error:

TypeError: GPTBigCodeForCausalLM.forward() got an unexpected keyword argument 'cu_seq_lens_q'

Finetuning configuration:

{
    "model_name_or_path": "/mnt/model/model/granite-34b-code-base-gptq-20241001T150701",
    "training_data_path": "/mnt/scratch/dataset/alpaca_data.json",
    "output_dir": "/mnt/output/model",
    "save_model_dir": "/mnt/output/model",
    "num_train_epochs": 1.0,
    "per_device_train_batch_size": 1,
    "per_device_eval_batch_size": 4,
    "gradient_accumulation_steps": 4,
    "save_strategy": "no",
    "learning_rate": 1e-5,
    "weight_decay": 0.0,
    "lr_scheduler_type": "cosine",
    "include_tokens_per_second": true,
    "response_template": "\n### Response:",
    "dataset_text_field": "output",
    "use_flash_attn": true,
    "peft_method": "lora",
    "target_modules": ["all-linear"],
    "auto_gptq": ["triton_v2"],
    "torch_dtype": "float16",
    "fp16": true,
    "fast_kernels": [true, true, true],
    "fused_lora":  ["auto_gptq", true],
    "padding_free": ["huggingface"]
}

Full Pod log:

INFO - Ignoring unknown parameter in the quantization configuration: is_marlin_format.
INFO - `checkpoint_format` is missing from the quantization configuration and is automatically inferred to gptq
INFO - Ignoring unknown parameter in the quantization configuration: is_marlin_format.
INFO - `checkpoint_format` is missing from the quantization configuration and is automatically inferred to gptq
WARNING:modeling.py:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
WARNING:sft_trainer.py:PAD token set to default, to make it different from eos token
WARNING:modeling.py:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
WARNING:modeling.py:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
INFO - Compatibility: converting `checkpoint_format` from `gptq` to `gptq_v2`.
WARNING:sft_trainer.py:PAD token set to default, to make it different from eos token
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`

Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 520 examples [00:00, 110142.31 examples/s]

Map (num_proc=80):   0%|          | 0/520 [00:00<?, ? examples/s]
Map (num_proc=80):   1%|â–         | 7/520 [00:00<00:13, 39.24 examples/s]
Map (num_proc=80):   8%|â–Š         | 42/520 [00:00<00:02, 170.18 examples/s]
Map (num_proc=80):  15%|█■       | 77/520 [00:00<00:01, 229.55 examples/s]
Map (num_proc=80):  22%|██■      | 112/520 [00:00<00:01, 269.21 examples/s]
Map (num_proc=80):  28%|██▊       | 147/520 [00:00<00:01, 243.08 examples/s]
Map (num_proc=80):  34%|███▎      | 175/520 [00:00<00:01, 240.79 examples/s]
Map (num_proc=80):  42%|████■    | 217/520 [00:00<00:01, 275.45 examples/s]
Map (num_proc=80):  48%|████▊     | 252/520 [00:01<00:00, 279.71 examples/s]
Map (num_proc=80):  55%|█████▌    | 286/520 [00:01<00:00, 284.01 examples/s]
Map (num_proc=80):  61%|██████    | 316/520 [00:01<00:00, 268.06 examples/s]
Map (num_proc=80):  68%|██████▊   | 352/520 [00:01<00:00, 278.88 examples/s]
Map (num_proc=80):  73%|███████▎  | 382/520 [00:01<00:00, 276.99 examples/s]
Map (num_proc=80):  79%|███████▉  | 412/520 [00:01<00:00, 274.43 examples/s]
Map (num_proc=80):  85%|████████▌ | 442/520 [00:01<00:00, 273.51 examples/s]
Map (num_proc=80):  91%|█████████ | 472/520 [00:01<00:00, 258.29 examples/s]
Map (num_proc=80):  97%|█████████▋| 502/520 [00:01<00:00, 269.09 examples/s]
Map (num_proc=80): 100%|██████████| 520/520 [00:02<00:00, 246.16 examples/s]
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/granite.py:172: UserWarning: Granite Rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
  warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/granite.py:172: UserWarning: Granite Rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
  warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/llama.py:167: UserWarning: LLamaRules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
  warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/mistral.py:160: UserWarning: Mistral rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
  warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/llama.py:167: UserWarning: LLamaRules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
  warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/mistral.py:160: UserWarning: Mistral rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
  warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_aadp/framework_plugin_padding_free.py:132: UserWarning: transformers version supports padding free natively in various models.
  warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_aadp/framework_plugin_padding_free.py:132: UserWarning: transformers version supports padding free natively in various models.
  warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/transformers/training_args.py:2058: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
  warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/transformers/training_args.py:2058: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
  warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py:364: FutureWarning: `tokenizer` is deprecated and removed starting from version 0.16.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
  trainer = SFTTrainer(
/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py:364: FutureWarning: `tokenizer` is deprecated and removed starting from version 0.16.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
  trainer = SFTTrainer(

Map:   0%|          | 0/520 [00:00<?, ? examples/s]
Map: 100%|██████████| 520/520 [00:00<00:00, 3944.25 examples/s]
Map: 100%|██████████| 520/520 [00:00<00:00, 3873.46 examples/s]
/home/tuning/.local/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code.
  warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code.
  warnings.warn(

  0%|          | 0/65 [00:00<?, ?it/s]ERROR:sft_trainer.py:Traceback (most recent call last):
  File "/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py", line 676, in main
    trainer, additional_train_info = train(
                                     ^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py", line 420, in train
    trainer.train(resume_from_checkpoint)
  File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 3731, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 820, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 808, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
    output = self._fsdp_wrapped_module(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 820, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 808, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/peft/peft_model.py", line 1644, in forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.12/site-packages/peft/tuners/tuners_utils.py", line 197, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: GPTBigCodeForCausalLM.forward() got an unexpected keyword argument 'cu_seq_lens_q'

Platform

fms-hf-tuning image: quay.io/modh/fms-hf-tuning:v2.6.0
Trained model: granite-34b-code-base-gptq-20241001T150701

Sample Code

Expected behavior

Training of the model pass successfully.

Observed behavior

Training failed, see description for logs.

Additional context

Add any other context about the problem here.

@anhuong
Copy link
Collaborator

anhuong commented Feb 27, 2025

Slack thread discussion

cu_seq_lens_q is passed when padding_free is set, this error occurs when padding_free is set and only for the GPTBigCode model. Upon further investigation I saw that the error occurs when paddng_free is set but not when fast_kernels is set by itself. Although my limited testing was using a GPTBigCode model just running full fine-tuning and lora-tuning not qlora but the same error appears for other tuning types with padding_free set. I get no error when not setting padding_free. It looks to be additional parameters that are being passed to GPTBigCodeForCausalLM.forward() that is not being handled in transformers v4.48.1 but was added 2 weeks ago as seen here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants