forward() got an unexpected keyword argument 'num_items_in_batch' #35838

Bachstelze · 2025-01-22T12:09:35Z

System Info

New versions can't train encoder-decoder models.
Related issue and pull request: #34575
System-Info:

transformers version: 4.48.1
Platform: Linux-6.8.0-36-generic-x86_64-with-glibc2.39
Python version: 3.12.8
Huggingface_hub version: 0.24.6
Safetensors version: 0.4.5
Accelerate version: 1.2.1
Accelerate config: not found
PyTorch version (GPU?): 2.4.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: No
Using GPU in script?: Yes
GPU type: Tesla V100-PCIE-32GB

Traceback (most recent call last):
  File "/home/hilsenbek/workspace/thesis/syntax_transformer/training/train_cross_attention.py", line 110, in <module>
    trainer.train()
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 3731, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py", line 603, in forward
    encoder_outputs = self.encoder(
                      ^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: RobertaModel.forward() got an unexpected keyword argument 'num_items_in_batch'

Who can help?

@ArthurZucker
@gheinrich

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

follow the blog https://huggingface.co/blog/encoder-decoder

Expected behavior

Work as in old transformer versions

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-01-22T17:16:50Z

This seems related to the trainer changes - cc @muellerzr @SunMarc

shubhamjain0594 · 2025-01-23T01:29:53Z

Getting same error for the Gemma Model.

SilverSoldier · 2025-01-23T06:09:54Z

Same for bloom which is marking unexpected arguments as deprecated and throws ValueError: Got unexpected arguments: {'num_items_in_batch': 5120}.

Seems to be these 3 lines causing the problem:

loss_kwargs["num_items_in_batch"] = num_items_in_batch
inputs = {**inputs, **loss_kwargs}
outputs = model(**inputs)

ArthurZucker · 2025-01-23T10:02:25Z

We'll do a patch as soon as there is a fix!

SunMarc · 2025-01-23T10:49:19Z

Can you share the traceback for the gemma model error @shubhamjain0594 ?

For the bloom error, this can be easily fixed by setting accepts_loss_kwargs = False in bloom modeling code. This happens because for bloom, we allow to pass kwargs hence the issue.

For the encoder decoder, this is because we allow to pass **kwargs in the forward + kwargs_encoder is not set correctly.

I'll let @muellerzr decide how to fix these. Maybe the easiest fix would be to just set accepts_loss_kwargs = True for models that supports it.

shubhamjain0594 · 2025-01-23T11:21:15Z

@SunMarc here you go. Does this help?

SunMarc · 2025-01-23T14:19:25Z

Yeah thanks ! The issue comes from the attention refactor PR where we pass **kwargs in the loss calculation and in the model. cc @muellerzr

muellerzr · 2025-01-23T18:24:56Z

@shubhamjain0594 can you post a repr of the model you're using and how the Trainer is configured? I can't recreate this with "google/gemma-2-2b-it"

# End-to-end script running the Hugging Face Trainer
# for causal language modeling. Based on the Tasks documentation
# originally from: https://hf.co/docs/transformers/tasks/language_modeling
from accelerate import PartialState
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    DataCollatorForLanguageModeling,
    Trainer,
    TrainingArguments,
)

# Constants
model_name = "google/gemma-2-2b-it"
dataset_name = "wikitext"
dataset_config = "wikitext-2-raw-v1"

# Load dataset
print(f"Downloading dataset ({dataset_name})")
dataset = load_dataset(dataset_name, dataset_config, split="train[:500]")
dataset = dataset.train_test_split(test_size=0.2)

# Tokenize the dataset
tokenizer = AutoTokenizer.from_pretrained(model_name)


def tokenize_function(examples):
    return tokenizer(examples["text"])


print(f"Tokenizing dataset for {model_name}...")
tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=dataset["train"].column_names)

# We still need to concatenate our sequences
# and split them into shorter chunks to ease
# minimal RAM usage
block_size = 128


def group_texts(examples):
    # Concatenate all texts.
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
    # customize this part to your needs.
    if total_length >= block_size:
        total_length = (total_length // block_size) * block_size
    # Split by chunks of block_size.
    result = {
        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result


# And apply
tokenized_dataset = tokenized_dataset.map(group_texts, batched=True)

# Create an efficient collator which dynamically pads
# End-of-sequence as the padding token and mlm=False will
# use the inputs as labels, shifted to the right by one element
tokenizer.pad_token = tokenizer.eos_token
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

print(f"Instantiating model ({model_name})...")
model = AutoModelForCausalLM.from_pretrained(model_name)

# Define the hyperparameters in the TrainingArguments
print("Creating training arguments (weights are stored at `results/causal_language_modeling`)...")
training_args = TrainingArguments(
    output_dir="results/causal_language_modeling",  # Where weights are stored
    learning_rate=2e-5,  # The learning rate during training
    per_device_train_batch_size=1,  # Number of samples per batch during training
    per_device_eval_batch_size=1,  # Number of samples per batch during evaluation
    gradient_accumulation_steps=2,
    num_train_epochs=2,  # How many iterations through the dataloaders should be done
    weight_decay=0.01,  # Regularization penalization
)

# Create the `Trainer`, passing in the model and arguments
# the datasets to train on, how the data should be collated,
# and the method for computing our metrics
print("Creating `Trainer`...")
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    data_collator=data_collator,
)

# Initiate training
print("Training...")
trainer.train()```

muellerzr · 2025-01-23T18:32:15Z

Or @SilverSoldier or @Bachstelze

shubhamjain0594 · 2025-01-23T20:49:27Z

@shubhamjain0594 can you post a repr of the model you're using and how the Trainer is configured? I can't recreate this with "google/gemma-2-2b-it"

I am using google/gemma-1.1-2b-it, maybe that is the difference?

muellerzr · 2025-01-23T21:29:59Z

@shubhamjain0594 can recreate it with that model, thanks! :)

muellerzr · 2025-01-24T10:45:21Z

Essentially this stems from certain model forwards not accepting kwargs, which is an issue on our end.

Said problem models:

FAILED tests/models/bamba/test_modeling_bamba.py::BambaModelTest::test_training_gradient_accumulation - TypeError: BambaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/bert_generation/test_modeling_bert_generation.py::BertGenerationEncoderTest::test_training_gradient_accumulation - TypeError: BertGenerationDecoder.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/big_bird/test_modeling_big_bird.py::BigBirdModelTest::test_training_gradient_accumulation - TypeError: BigBirdForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/biogpt/test_modeling_biogpt.py::BioGptModelTest::test_training_gradient_accumulation - TypeError: BioGptForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/bloom/test_modeling_bloom.py::BloomModelTest::test_training_gradient_accumulation - ValueError: Got unexpected arguments: {'num_items_in_batch': 14}
FAILED tests/models/codegen/test_modeling_codegen.py::CodeGenModelTest::test_training_gradient_accumulation - TypeError: CodeGenForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/ctrl/test_modeling_ctrl.py::CTRLModelTest::test_training_gradient_accumulation - TypeError: CTRLLMHeadModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/data2vec/test_modeling_data2vec_text.py::Data2VecTextModelTest::test_training_gradient_accumulation - TypeError: Data2VecTextForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/dbrx/test_modeling_dbrx.py::DbrxModelTest::test_training_gradient_accumulation - TypeError: DbrxForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/electra/test_modeling_electra.py::ElectraModelTest::test_training_gradient_accumulation - TypeError: ElectraForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/ernie/test_modeling_ernie.py::ErnieModelTest::test_training_gradient_accumulation - TypeError: ErnieForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/falcon/test_modeling_falcon.py::FalconModelTest::test_training_gradient_accumulation - TypeError: FalconForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/fuyu/test_modeling_fuyu.py::FuyuModelTest::test_training_gradient_accumulation - TypeError: FuyuForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gemma/test_modeling_gemma.py::GemmaModelTest::test_training_gradient_accumulation - TypeError: GemmaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gemma2/test_modeling_gemma2.py::GemmaModelTest::test_training_gradient_accumulation - TypeError: GemmaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/git/test_modeling_git.py::GitModelTest::test_training_gradient_accumulation - TypeError: GitForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_training_gradient_accumulation - TypeError: GPT2LMHeadModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt_bigcode/test_modeling_gpt_bigcode.py::GPTBigCodeModelTest::test_training_gradient_accumulation - TypeError: GPTBigCodeForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt_bigcode/test_modeling_gpt_bigcode.py::GPTBigCodeMHAModelTest::test_training_gradient_accumulation - TypeError: GPTBigCodeForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt_neo/test_modeling_gpt_neo.py::GPTNeoModelTest::test_training_gradient_accumulation - TypeError: GPTNeoForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt_neox/test_modeling_gpt_neox.py::GPTNeoXModelTest::test_training_gradient_accumulation - TypeError: GPTNeoXForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt_neox_japanese/test_modeling_gpt_neox_japanese.py::GPTNeoXModelJapaneseTest::test_training_gradient_accumulation - TypeError: GPTNeoXJapaneseForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gptj/test_modeling_gptj.py::GPTJModelTest::test_training_gradient_accumulation - TypeError: GPTJForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/granitemoe/test_modeling_granitemoe.py::GraniteMoeModelTest::test_training_gradient_accumulation - TypeError: GraniteMoeForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/helium/test_modeling_helium.py::GemmaModelTest::test_training_gradient_accumulation - TypeError: GemmaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/jetmoe/test_modeling_jetmoe.py::JetMoeModelTest::test_training_gradient_accumulation - TypeError: JetMoeForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/megatron_bert/test_modeling_megatron_bert.py::MegatronBertModelTest::test_training_gradient_accumulation - TypeError: MegatronBertForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/mllama/test_modeling_mllama.py::MllamaForCausalLMModelTest::test_training_gradient_accumulation - ValueError: Unrecognized configuration class <class 'transformers.models.mllama.configuration_mllama.MllamaTextConfig'> for this kind of AutoModel: AutoModelForCausalLM.
FAILED tests/models/moshi/test_modeling_moshi.py::MoshiDecoderTest::test_training_gradient_accumulation - TypeError: MoshiForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/mpt/test_modeling_mpt.py::MptModelTest::test_training_gradient_accumulation - TypeError: MptForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/musicgen/test_modeling_musicgen.py::MusicgenDecoderTest::test_training_gradient_accumulation - ValueError: Unrecognized configuration class <class 'transformers.models.musicgen.configuration_musicgen.MusicgenDecoderConfig'> for this kind of AutoModel: AutoModelForCausalLM.
FAILED tests/models/musicgen_melody/test_modeling_musicgen_melody.py::MusicgenMelodyDecoderTest::test_training_gradient_accumulation - ValueError: Unrecognized configuration class <class 'transformers.models.musicgen_melody.configuration_musicgen_melody.MusicgenMelodyDecoderConfig'> for this kind of AutoModel: ...
FAILED tests/models/nemotron/test_modeling_nemotron.py::GemmaModelTest::test_training_gradient_accumulation - TypeError: GemmaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/openai/test_modeling_openai.py::OpenAIGPTModelTest::test_training_gradient_accumulation - TypeError: OpenAIGPTLMHeadModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/opt/test_modeling_opt.py::OPTModelTest::test_training_gradient_accumulation - TypeError: OPTForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/persimmon/test_modeling_persimmon.py::PersimmonModelTest::test_training_gradient_accumulation - TypeError: PersimmonForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py::RecurrentGemmaModelTest::test_training_gradient_accumulation - TypeError: RecurrentGemmaForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/reformer/test_modeling_reformer.py::ReformerLocalAttnModelTest::test_training_gradient_accumulation - TypeError: ReformerModelWithLMHead.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/rembert/test_modeling_rembert.py::RemBertModelTest::test_training_gradient_accumulation - TypeError: RemBertForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/roberta/test_modeling_roberta.py::RobertaModelTest::test_training_gradient_accumulation - TypeError: RobertaForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/roberta_prelayernorm/test_modeling_roberta_prelayernorm.py::RobertaPreLayerNormModelTest::test_training_gradient_accumulation - TypeError: RobertaPreLayerNormForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/roc_bert/test_modeling_roc_bert.py::RoCBertModelTest::test_training_gradient_accumulation - TypeError: RoCBertForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/roformer/test_modeling_roformer.py::RoFormerModelTest::test_training_gradient_accumulation - TypeError: RoFormerForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/rwkv/test_modeling_rwkv.py::RwkvModelTest::test_training_gradient_accumulation - TypeError: RwkvForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/stablelm/test_modeling_stablelm.py::StableLmModelTest::test_training_gradient_accumulation - TypeError: StableLmForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/xglm/test_modeling_xglm.py::XGLMModelTest::test_training_gradient_accumulation - TypeError: XGLMForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/xlm/test_modeling_xlm.py::XLMModelTest::test_training_gradient_accumulation - TypeError: XLMWithLMHeadModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/xlm_roberta_xl/test_modeling_xlm_roberta_xl.py::XLMRobertaXLModelTest::test_training_gradient_accumulation - TypeError: XLMRobertaXLForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/xmod/test_modeling_xmod.py::XmodModelTest::test_training_gradient_accumulation - TypeError: XmodForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'

techkang · 2025-01-26T06:58:01Z

I think the better way to fix this bug is delete:

transformers/src/transformers/models/gemma/modeling_gemma.py

Line 799 in fc269f7

**kwargs: Unpack[KwargsForCausalLM],

and

transformers/src/transformers/models/gemma/modeling_gemma.py

Line 851 in fc269f7

**kwargs,

Forcefully enabling variable args but not using num_items_in_batch for loss calculation will make training loss gradient_accumulation_steps times larger than before.

muellerzr · 2025-01-27T13:49:38Z

@techkang that is impossible as **kwargs will be needed on models from hereonout for different reasons

techkang · 2025-01-27T14:42:26Z

@muellerzr I misunderstood your pull requests earlier. Most of the models can handle num_items_in_batch correctly after your PR. However, some models that take **kwargs but don't handle this arg will still encounter a problem that the training loss will be gradient_accumulation_steps times larger than before if the loss is calculated inside the model. I think we should only allow models to use **kwargs when they can handle num_items_in_batch correctly.

muellerzr · 2025-01-27T16:06:17Z

@techkang I'm updating all models that do this (the old way) to do so as part of that PR. It's a big undertaking, but it's what we're going to be doing going forward.

(I'll also ensure that current models that take **kwargs that need it are supported, thanks for the catch there :) )

Bachstelze · 2025-02-02T13:20:50Z

Since when does this bug occur? Is there a stable version with modernBERT?

Bachstelze · 2025-02-05T11:36:31Z

Is it normal that some checks were not successful? Or should the pull request be tested on the own settings?

Pappasad · 2025-02-06T19:10:15Z

Has this been fixed yet? I am getting this error with TrOCR

Bachstelze · 2025-02-06T19:42:24Z

I am also still getting this error with modernBERT:

TypeError: ModernBertModel.forward() got an unexpected keyword argument 'num_items_in_batch'

Bachstelze · 2025-02-07T16:57:57Z

Why can't I reopen my own issue?
The error is still the same for roBERTa in an encoder-decoder model:

TypeError                                 Traceback (most recent call last)

[<ipython-input-10-a5a4ead42e3b>](https://localhost:8080/#) in <cell line: 0>()
    131     optimizers=(adam, lr_scheduler)
    132 )
--> 133 trainer.train()
    134 print("training finished", flush=True)
    135 #wandb.finish()

8 frames

[/usr/local/lib/python3.11/dist-packages/transformers/trainer.py](https://localhost:8080/#) in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   2183                 hf_hub_utils.enable_progress_bars()
   2184         else:
-> 2185             return inner_training_loop(
   2186                 args=args,
   2187                 resume_from_checkpoint=resume_from_checkpoint,

[/usr/local/lib/python3.11/dist-packages/transformers/trainer.py](https://localhost:8080/#) in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   2489                     )
   2490                     with context():
-> 2491                         tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
   2492 
   2493                     if (

[/usr/local/lib/python3.11/dist-packages/transformers/trainer.py](https://localhost:8080/#) in training_step(self, model, inputs, num_items_in_batch)
   3608 
   3609         with self.compute_loss_context_manager():
-> 3610             loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
   3611 
   3612         del inputs

[/usr/local/lib/python3.11/dist-packages/transformers/trainer.py](https://localhost:8080/#) in compute_loss(self, model, inputs, return_outputs, num_items_in_batch)
   3669                 loss_kwargs["num_items_in_batch"] = num_items_in_batch
   3670             inputs = {**inputs, **loss_kwargs}
-> 3671         outputs = model(**inputs)
   3672         # Save past state if it exists
   3673         # TODO: this needs to be fixed and made cleaner later.

[/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _wrapped_call_impl(self, *args, **kwargs)
   1734             return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1735         else:
-> 1736             return self._call_impl(*args, **kwargs)
   1737 
   1738     # torchrec tests the code consistency with the following code

[/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *args, **kwargs)
   1745                 or _global_backward_pre_hooks or _global_backward_hooks
   1746                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1747             return forward_call(*args, **kwargs)
   1748 
   1749         result = None

[/usr/local/lib/python3.11/dist-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, **kwargs)
    601 
    602         if encoder_outputs is None:
--> 603             encoder_outputs = self.encoder(
    604                 input_ids=input_ids,
    605                 attention_mask=attention_mask,

[/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _wrapped_call_impl(self, *args, **kwargs)
   1734             return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1735         else:
-> 1736             return self._call_impl(*args, **kwargs)
   1737 
   1738     # torchrec tests the code consistency with the following code

[/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *args, **kwargs)
   1745                 or _global_backward_pre_hooks or _global_backward_hooks
   1746                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1747             return forward_call(*args, **kwargs)
   1748 
   1749         result = None

TypeError: RobertaModel.forward() got an unexpected keyword argument 'num_items_in_batch'

muellerzr · 2025-02-07T17:03:36Z

Thanks, looking into it.

anyaschenikova · 2025-02-09T22:59:19Z

for me too:

TypeError: Qwen2VLForConditionalGeneration.forward() got an unexpected keyword argument 'num_items_in_batch'

ArthurZucker · 2025-02-13T09:38:02Z

😾 @muellerzr let's improve test coverage and make sure we fix all of them for the release!

dcm-kouki-eguchi · 2025-03-04T08:47:18Z

I am also still getting this error with GPT2Model.

File /opt/conda/lib/python3.11/site-packages/transformers/trainer.py:2241, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   2239         hf_hub_utils.enable_progress_bars()
   2240 else:
-> 2241     return inner_training_loop(
   2242         args=args,
   2243         resume_from_checkpoint=resume_from_checkpoint,
   2244         trial=trial,
   2245         ignore_keys_for_eval=ignore_keys_for_eval,
   2246     )

File /opt/conda/lib/python3.11/site-packages/transformers/trainer.py:2548, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   2541 context = (
   2542     functools.partial(self.accelerator.no_sync, model=model)
   2543     if i != len(batch_samples) - 1
   2544     and self.accelerator.distributed_type != DistributedType.DEEPSPEED
   2545     else contextlib.nullcontext
   2546 )
   2547 with context():
-> 2548     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
   2550 if (
   2551     args.logging_nan_inf_filter
   2552     and not is_torch_xla_available()
   2553     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   2554 ):
   2555     # if loss is nan or inf simply add the average of previous logged losses
   2556     tr_loss = tr_loss + tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File /opt/conda/lib/python3.11/site-packages/transformers/trainer.py:3698, in Trainer.training_step(self, model, inputs, num_items_in_batch)
   3695     return loss_mb.reduce_mean().detach().to(self.args.device)
   3697 with self.compute_loss_context_manager():
-> 3698     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
   3700 del inputs
   3701 if (
   3702     self.args.torch_empty_cache_steps is not None
   3703     and self.state.global_step % self.args.torch_empty_cache_steps == 0
   3704 ):

File /opt/conda/lib/python3.11/site-packages/transformers/trainer.py:3759, in Trainer.compute_loss(self, model, inputs, return_outputs, num_items_in_batch)
   3757         loss_kwargs["num_items_in_batch"] = num_items_in_batch
   3758     inputs = {**inputs, **loss_kwargs}
-> 3759 outputs = model(**inputs)
   3760 # Save past state if it exists
   3761 # TODO: this needs to be fixed and made cleaner later.
   3762 if self.args.past_index >= 0:

File /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
   1737     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1738 else:
-> 1739     return self._call_impl(*args, **kwargs)

File /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
   1745 # If we don't have any hooks, we want to skip the rest of the logic in
   1746 # this function, and just call forward.
   1747 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1748         or _global_backward_pre_hooks or _global_backward_hooks
   1749         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1750     return forward_call(*args, **kwargs)
   1752 result = None
   1753 called_always_called_hooks = set()

File /opt/conda/lib/python3.11/site-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py:603, in EncoderDecoderModel.forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, **kwargs)
    598 kwargs_decoder = {
    599     argument[len("decoder_") :]: value for argument, value in kwargs.items() if argument.startswith("decoder_")
    600 }
    602 if encoder_outputs is None:
--> 603     encoder_outputs = self.encoder(
    604         input_ids=input_ids,
    605         attention_mask=attention_mask,
    606         inputs_embeds=inputs_embeds,
    607         output_attentions=output_attentions,
    608         output_hidden_states=output_hidden_states,
    609         return_dict=return_dict,
    610         **kwargs_encoder,
    611     )
    612 elif isinstance(encoder_outputs, tuple):
    613     encoder_outputs = BaseModelOutput(*encoder_outputs)

File /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
   1737     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1738 else:
-> 1739     return self._call_impl(*args, **kwargs)

File /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
   1745 # If we don't have any hooks, we want to skip the rest of the logic in
   1746 # this function, and just call forward.
   1747 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1748         or _global_backward_pre_hooks or _global_backward_hooks
   1749         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1750     return forward_call(*args, **kwargs)
   1752 result = None
   1753 called_always_called_hooks = set()

TypeError: GPT2Model.forward() got an unexpected keyword argument 'num_items_in_batch'

Lagniappe52 · 2025-03-04T19:59:00Z

Also getting this error with MllamaForConditionalGeneration()

Traceback (most recent call last):
  File "/home/research/Ophthalmology/Ophthal_UNet_model.py", line 203, in <module>
    main()
  File "/home/research/Ophthalmology/Ophthal_UNet_model.py", line 200, in main
    trainer.train()
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/transformers/trainer.py", line 2241, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/transformers/trainer.py", line 2548, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/transformers/trainer.py", line 3698, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/transformers/trainer.py", line 3759, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/_dynamo/external_utils.py", line 40, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/_dynamo/external_utils.py", line 40, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/peft/peft_model.py", line 847, in forward
    with self._enable_peft_forward_hooks(*args, **kwargs):
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/peft/peft_model.py", line 849, in torch_dynamo_resume_in_forward_at_847
    return self.get_base_model()(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
TypeError: MllamaForConditionalGeneration.forward() got an unexpected keyword argument 'num_items_in_batch'

Bachstelze added the bug label Jan 22, 2025

muellerzr self-assigned this Jan 23, 2025

muellerzr mentioned this issue Jan 24, 2025

Fix model kwargs #35875

Merged

5 tasks

muellerzr closed this as completed in #35875 Feb 6, 2025

Bachstelze mentioned this issue Feb 6, 2025

TypeError: ModernBertModel.forward() got an unexpected keyword argument 'num_items_in_batch' #36074

Open

4 tasks

yoadsn mentioned this issue Feb 10, 2025

Adapting Whisper to the new loss_function attribute #36119

Open

jfitz02 mentioned this issue Feb 21, 2025

TypeError: ViTModel.forward() got an unexpected keyword argument 'num_items_in_batch' NielsRogge/Transformers-Tutorials#477

Open

daje0601 mentioned this issue Mar 6, 2025

chapter2_GPT.ipynb 중 trainer.train() Error wikibook/llm-finetuning#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

forward() got an unexpected keyword argument 'num_items_in_batch' #35838

forward() got an unexpected keyword argument 'num_items_in_batch' #35838

Bachstelze commented Jan 22, 2025

Rocketknight1 commented Jan 22, 2025

shubhamjain0594 commented Jan 23, 2025

SilverSoldier commented Jan 23, 2025

ArthurZucker commented Jan 23, 2025

SunMarc commented Jan 23, 2025 •

edited

Loading

shubhamjain0594 commented Jan 23, 2025

SunMarc commented Jan 23, 2025

muellerzr commented Jan 23, 2025

muellerzr commented Jan 23, 2025

shubhamjain0594 commented Jan 23, 2025 •

edited

Loading

muellerzr commented Jan 23, 2025

muellerzr commented Jan 24, 2025

techkang commented Jan 26, 2025

muellerzr commented Jan 27, 2025

techkang commented Jan 27, 2025 •

edited

Loading

muellerzr commented Jan 27, 2025 •

edited

Loading

Bachstelze commented Feb 2, 2025

Bachstelze commented Feb 5, 2025

Pappasad commented Feb 6, 2025

Bachstelze commented Feb 6, 2025

Bachstelze commented Feb 7, 2025

muellerzr commented Feb 7, 2025

anyaschenikova commented Feb 9, 2025

ArthurZucker commented Feb 13, 2025

dcm-kouki-eguchi commented Mar 4, 2025

Lagniappe52 commented Mar 4, 2025

forward() got an unexpected keyword argument 'num_items_in_batch' #35838

forward() got an unexpected keyword argument 'num_items_in_batch' #35838

Comments

Bachstelze commented Jan 22, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Jan 22, 2025

shubhamjain0594 commented Jan 23, 2025

SilverSoldier commented Jan 23, 2025

ArthurZucker commented Jan 23, 2025

SunMarc commented Jan 23, 2025 • edited Loading

shubhamjain0594 commented Jan 23, 2025

SunMarc commented Jan 23, 2025

muellerzr commented Jan 23, 2025

muellerzr commented Jan 23, 2025

shubhamjain0594 commented Jan 23, 2025 • edited Loading

muellerzr commented Jan 23, 2025

muellerzr commented Jan 24, 2025

techkang commented Jan 26, 2025

muellerzr commented Jan 27, 2025

techkang commented Jan 27, 2025 • edited Loading

muellerzr commented Jan 27, 2025 • edited Loading

Bachstelze commented Feb 2, 2025

Bachstelze commented Feb 5, 2025

Pappasad commented Feb 6, 2025

Bachstelze commented Feb 6, 2025

Bachstelze commented Feb 7, 2025

muellerzr commented Feb 7, 2025

anyaschenikova commented Feb 9, 2025

ArthurZucker commented Feb 13, 2025

dcm-kouki-eguchi commented Mar 4, 2025

Lagniappe52 commented Mar 4, 2025

SunMarc commented Jan 23, 2025 •

edited

Loading

shubhamjain0594 commented Jan 23, 2025 •

edited

Loading

techkang commented Jan 27, 2025 •

edited

Loading

muellerzr commented Jan 27, 2025 •

edited

Loading