You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
INFO - Ignoring unknown parameter in the quantization configuration: is_marlin_format.
INFO - `checkpoint_format` is missing from the quantization configuration and is automatically inferred to gptq
INFO - Ignoring unknown parameter in the quantization configuration: is_marlin_format.
INFO - `checkpoint_format` is missing from the quantization configuration and is automatically inferred to gptq
WARNING:modeling.py:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
WARNING:sft_trainer.py:PAD token set to default, to make it different from eos token
WARNING:modeling.py:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
WARNING:modeling.py:The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
INFO - Compatibility: converting `checkpoint_format` from `gptq` to `gptq_v2`.
WARNING:sft_trainer.py:PAD token set to default, to make it different from eos token
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 520 examples [00:00, 110142.31 examples/s]
Map (num_proc=80): 0%| | 0/520 [00:00<?, ? examples/s]
Map (num_proc=80): 1%|â– | 7/520 [00:00<00:13, 39.24 examples/s]
Map (num_proc=80): 8%|â–Š | 42/520 [00:00<00:02, 170.18 examples/s]
Map (num_proc=80): 15%|█■| 77/520 [00:00<00:01, 229.55 examples/s]
Map (num_proc=80): 22%|██■| 112/520 [00:00<00:01, 269.21 examples/s]
Map (num_proc=80): 28%|██▊ | 147/520 [00:00<00:01, 243.08 examples/s]
Map (num_proc=80): 34%|███▎ | 175/520 [00:00<00:01, 240.79 examples/s]
Map (num_proc=80): 42%|████■| 217/520 [00:00<00:01, 275.45 examples/s]
Map (num_proc=80): 48%|████▊ | 252/520 [00:01<00:00, 279.71 examples/s]
Map (num_proc=80): 55%|█████▌ | 286/520 [00:01<00:00, 284.01 examples/s]
Map (num_proc=80): 61%|██████ | 316/520 [00:01<00:00, 268.06 examples/s]
Map (num_proc=80): 68%|██████▊ | 352/520 [00:01<00:00, 278.88 examples/s]
Map (num_proc=80): 73%|███████▎ | 382/520 [00:01<00:00, 276.99 examples/s]
Map (num_proc=80): 79%|███████▉ | 412/520 [00:01<00:00, 274.43 examples/s]
Map (num_proc=80): 85%|████████▌ | 442/520 [00:01<00:00, 273.51 examples/s]
Map (num_proc=80): 91%|█████████ | 472/520 [00:01<00:00, 258.29 examples/s]
Map (num_proc=80): 97%|█████████▋| 502/520 [00:01<00:00, 269.09 examples/s]
Map (num_proc=80): 100%|██████████| 520/520 [00:02<00:00, 246.16 examples/s]
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/granite.py:172: UserWarning: Granite Rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/granite.py:172: UserWarning: Granite Rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/llama.py:167: UserWarning: LLamaRules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/mistral.py:160: UserWarning: Mistral rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/llama.py:167: UserWarning: LLamaRules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_foak/models/mistral.py:160: UserWarning: Mistral rules: activation is gelu_pytorch_tanh, thus disabling LoRA fused-op for MLP, since only SwiGLU is supported. This only affects quantized-peft.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_aadp/framework_plugin_padding_free.py:132: UserWarning: transformers version supports padding free natively in various models.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/fms_acceleration_aadp/framework_plugin_padding_free.py:132: UserWarning: transformers version supports padding free natively in various models.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/transformers/training_args.py:2058: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/transformers/training_args.py:2058: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py:364: FutureWarning: `tokenizer` is deprecated and removed starting from version 0.16.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
trainer = SFTTrainer(
/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py:364: FutureWarning: `tokenizer` is deprecated and removed starting from version 0.16.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
trainer = SFTTrainer(
Map: 0%| | 0/520 [00:00<?, ? examples/s]
Map: 100%|██████████| 520/520 [00:00<00:00, 3944.25 examples/s]
Map: 100%|██████████| 520/520 [00:00<00:00, 3873.46 examples/s]
/home/tuning/.local/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code.
warnings.warn(
/home/tuning/.local/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code.
warnings.warn(
0%| | 0/65 [00:00<?, ?it/s]ERROR:sft_trainer.py:Traceback (most recent call last):
File "/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py", line 676, in main
trainer, additional_train_info = train(
^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/tuning/sft_trainer.py", line 420, in train
trainer.train(resume_from_checkpoint)
File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/transformers/trainer.py", line 3731, in compute_loss
outputs = model(**inputs)
^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 820, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 808, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 863, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 820, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/accelerate/utils/operations.py", line 808, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/peft/peft_model.py", line 1644, in forward
return self.base_model(
^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.12/site-packages/peft/tuners/tuners_utils.py", line 197, in forward
return self.model.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: GPTBigCodeForCausalLM.forward() got an unexpected keyword argument 'cu_seq_lens_q'
cu_seq_lens_q is passed when padding_free is set, this error occurs when padding_free is set and only for the GPTBigCode model. Upon further investigation I saw that the error occurs when paddng_free is set but not when fast_kernels is set by itself. Although my limited testing was using a GPTBigCode model just running full fine-tuning and lora-tuning not qlora but the same error appears for other tuning types with padding_free set. I get no error when not setting padding_free. It looks to be additional parameters that are being passed to GPTBigCodeForCausalLM.forward() that is not being handled in transformers v4.48.1 but was added 2 weeks ago as seen here.
Describe the bug
Running QLoRA finetuning of granite-34b-code-base-gptq model fails with error:
Finetuning configuration:
Full Pod log:
Platform
fms-hf-tuning image:
quay.io/modh/fms-hf-tuning:v2.6.0
Trained model:
granite-34b-code-base-gptq-20241001T150701
Sample Code
Expected behavior
Training of the model pass successfully.
Observed behavior
Training failed, see description for logs.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: