Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forward() got an unexpected keyword argument 'num_items_in_batch' #35838

Closed
2 of 4 tasks
Bachstelze opened this issue Jan 22, 2025 · 26 comments · Fixed by #35875
Closed
2 of 4 tasks

forward() got an unexpected keyword argument 'num_items_in_batch' #35838

Bachstelze opened this issue Jan 22, 2025 · 26 comments · Fixed by #35875
Assignees
Labels

Comments

@Bachstelze
Copy link

System Info

New versions can't train encoder-decoder models.
Related issue and pull request: #34575
System-Info:

  • transformers version: 4.48.1
  • Platform: Linux-6.8.0-36-generic-x86_64-with-glibc2.39
  • Python version: 3.12.8
  • Huggingface_hub version: 0.24.6
  • Safetensors version: 0.4.5
  • Accelerate version: 1.2.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes
  • GPU type: Tesla V100-PCIE-32GB
Traceback (most recent call last):
  File "/home/hilsenbek/workspace/thesis/syntax_transformer/training/train_cross_attention.py", line 110, in <module>
    trainer.train()
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 2171, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 3675, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/trainer.py", line 3731, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py", line 603, in forward
    encoder_outputs = self.encoder(
                      ^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hilsenbek/.conda/envs/harness/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: RobertaModel.forward() got an unexpected keyword argument 'num_items_in_batch'

Who can help?

@ArthurZucker
@gheinrich

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

follow the blog https://huggingface.co/blog/encoder-decoder

Expected behavior

Work as in old transformer versions

@Bachstelze Bachstelze added the bug label Jan 22, 2025
@Rocketknight1
Copy link
Member

This seems related to the trainer changes - cc @muellerzr @SunMarc

@shubhamjain0594
Copy link

Getting same error for the Gemma Model.

@SilverSoldier
Copy link
Contributor

Same for bloom which is marking unexpected arguments as deprecated and throws ValueError: Got unexpected arguments: {'num_items_in_batch': 5120}.

Seems to be these 3 lines causing the problem:

loss_kwargs["num_items_in_batch"] = num_items_in_batch
inputs = {**inputs, **loss_kwargs}
outputs = model(**inputs)

@ArthurZucker
Copy link
Collaborator

We'll do a patch as soon as there is a fix!

@SunMarc
Copy link
Member

SunMarc commented Jan 23, 2025

Can you share the traceback for the gemma model error @shubhamjain0594 ?

For the bloom error, this can be easily fixed by setting accepts_loss_kwargs = False in bloom modeling code. This happens because for bloom, we allow to pass kwargs hence the issue.

For the encoder decoder, this is because we allow to pass **kwargs in the forward + kwargs_encoder is not set correctly.

I'll let @muellerzr decide how to fix these. Maybe the easiest fix would be to just set accepts_loss_kwargs = True for models that supports it.

@shubhamjain0594
Copy link

Image

@SunMarc here you go. Does this help?

@SunMarc
Copy link
Member

SunMarc commented Jan 23, 2025

Yeah thanks ! The issue comes from the attention refactor PR where we pass **kwargs in the loss calculation and in the model. cc @muellerzr

@muellerzr muellerzr self-assigned this Jan 23, 2025
@muellerzr
Copy link
Contributor

@shubhamjain0594 can you post a repr of the model you're using and how the Trainer is configured? I can't recreate this with "google/gemma-2-2b-it"

# End-to-end script running the Hugging Face Trainer
# for causal language modeling. Based on the Tasks documentation
# originally from: https://hf.co/docs/transformers/tasks/language_modeling
from accelerate import PartialState
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    DataCollatorForLanguageModeling,
    Trainer,
    TrainingArguments,
)

# Constants
model_name = "google/gemma-2-2b-it"
dataset_name = "wikitext"
dataset_config = "wikitext-2-raw-v1"

# Load dataset
print(f"Downloading dataset ({dataset_name})")
dataset = load_dataset(dataset_name, dataset_config, split="train[:500]")
dataset = dataset.train_test_split(test_size=0.2)

# Tokenize the dataset
tokenizer = AutoTokenizer.from_pretrained(model_name)


def tokenize_function(examples):
    return tokenizer(examples["text"])


print(f"Tokenizing dataset for {model_name}...")
tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=dataset["train"].column_names)

# We still need to concatenate our sequences
# and split them into shorter chunks to ease
# minimal RAM usage
block_size = 128


def group_texts(examples):
    # Concatenate all texts.
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
    # customize this part to your needs.
    if total_length >= block_size:
        total_length = (total_length // block_size) * block_size
    # Split by chunks of block_size.
    result = {
        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result


# And apply
tokenized_dataset = tokenized_dataset.map(group_texts, batched=True)

# Create an efficient collator which dynamically pads
# End-of-sequence as the padding token and mlm=False will
# use the inputs as labels, shifted to the right by one element
tokenizer.pad_token = tokenizer.eos_token
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

print(f"Instantiating model ({model_name})...")
model = AutoModelForCausalLM.from_pretrained(model_name)

# Define the hyperparameters in the TrainingArguments
print("Creating training arguments (weights are stored at `results/causal_language_modeling`)...")
training_args = TrainingArguments(
    output_dir="results/causal_language_modeling",  # Where weights are stored
    learning_rate=2e-5,  # The learning rate during training
    per_device_train_batch_size=1,  # Number of samples per batch during training
    per_device_eval_batch_size=1,  # Number of samples per batch during evaluation
    gradient_accumulation_steps=2,
    num_train_epochs=2,  # How many iterations through the dataloaders should be done
    weight_decay=0.01,  # Regularization penalization
)

# Create the `Trainer`, passing in the model and arguments
# the datasets to train on, how the data should be collated,
# and the method for computing our metrics
print("Creating `Trainer`...")
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    data_collator=data_collator,
)

# Initiate training
print("Training...")
trainer.train()```

@muellerzr
Copy link
Contributor

Or @SilverSoldier or @Bachstelze

@shubhamjain0594
Copy link

shubhamjain0594 commented Jan 23, 2025

@shubhamjain0594 can you post a repr of the model you're using and how the Trainer is configured? I can't recreate this with "google/gemma-2-2b-it"

I am using google/gemma-1.1-2b-it, maybe that is the difference?

@muellerzr
Copy link
Contributor

@shubhamjain0594 can recreate it with that model, thanks! :)

@muellerzr
Copy link
Contributor

Essentially this stems from certain model forwards not accepting kwargs, which is an issue on our end.

Said problem models:

FAILED tests/models/bamba/test_modeling_bamba.py::BambaModelTest::test_training_gradient_accumulation - TypeError: BambaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/bert_generation/test_modeling_bert_generation.py::BertGenerationEncoderTest::test_training_gradient_accumulation - TypeError: BertGenerationDecoder.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/big_bird/test_modeling_big_bird.py::BigBirdModelTest::test_training_gradient_accumulation - TypeError: BigBirdForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/biogpt/test_modeling_biogpt.py::BioGptModelTest::test_training_gradient_accumulation - TypeError: BioGptForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/bloom/test_modeling_bloom.py::BloomModelTest::test_training_gradient_accumulation - ValueError: Got unexpected arguments: {'num_items_in_batch': 14}
FAILED tests/models/codegen/test_modeling_codegen.py::CodeGenModelTest::test_training_gradient_accumulation - TypeError: CodeGenForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/ctrl/test_modeling_ctrl.py::CTRLModelTest::test_training_gradient_accumulation - TypeError: CTRLLMHeadModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/data2vec/test_modeling_data2vec_text.py::Data2VecTextModelTest::test_training_gradient_accumulation - TypeError: Data2VecTextForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/dbrx/test_modeling_dbrx.py::DbrxModelTest::test_training_gradient_accumulation - TypeError: DbrxForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/electra/test_modeling_electra.py::ElectraModelTest::test_training_gradient_accumulation - TypeError: ElectraForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/ernie/test_modeling_ernie.py::ErnieModelTest::test_training_gradient_accumulation - TypeError: ErnieForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/falcon/test_modeling_falcon.py::FalconModelTest::test_training_gradient_accumulation - TypeError: FalconForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/fuyu/test_modeling_fuyu.py::FuyuModelTest::test_training_gradient_accumulation - TypeError: FuyuForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gemma/test_modeling_gemma.py::GemmaModelTest::test_training_gradient_accumulation - TypeError: GemmaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gemma2/test_modeling_gemma2.py::GemmaModelTest::test_training_gradient_accumulation - TypeError: GemmaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/git/test_modeling_git.py::GitModelTest::test_training_gradient_accumulation - TypeError: GitForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_training_gradient_accumulation - TypeError: GPT2LMHeadModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt_bigcode/test_modeling_gpt_bigcode.py::GPTBigCodeModelTest::test_training_gradient_accumulation - TypeError: GPTBigCodeForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt_bigcode/test_modeling_gpt_bigcode.py::GPTBigCodeMHAModelTest::test_training_gradient_accumulation - TypeError: GPTBigCodeForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt_neo/test_modeling_gpt_neo.py::GPTNeoModelTest::test_training_gradient_accumulation - TypeError: GPTNeoForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt_neox/test_modeling_gpt_neox.py::GPTNeoXModelTest::test_training_gradient_accumulation - TypeError: GPTNeoXForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gpt_neox_japanese/test_modeling_gpt_neox_japanese.py::GPTNeoXModelJapaneseTest::test_training_gradient_accumulation - TypeError: GPTNeoXJapaneseForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/gptj/test_modeling_gptj.py::GPTJModelTest::test_training_gradient_accumulation - TypeError: GPTJForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/granitemoe/test_modeling_granitemoe.py::GraniteMoeModelTest::test_training_gradient_accumulation - TypeError: GraniteMoeForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/helium/test_modeling_helium.py::GemmaModelTest::test_training_gradient_accumulation - TypeError: GemmaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/jetmoe/test_modeling_jetmoe.py::JetMoeModelTest::test_training_gradient_accumulation - TypeError: JetMoeForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/megatron_bert/test_modeling_megatron_bert.py::MegatronBertModelTest::test_training_gradient_accumulation - TypeError: MegatronBertForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/mllama/test_modeling_mllama.py::MllamaForCausalLMModelTest::test_training_gradient_accumulation - ValueError: Unrecognized configuration class <class 'transformers.models.mllama.configuration_mllama.MllamaTextConfig'> for this kind of AutoModel: AutoModelForCausalLM.
FAILED tests/models/moshi/test_modeling_moshi.py::MoshiDecoderTest::test_training_gradient_accumulation - TypeError: MoshiForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/mpt/test_modeling_mpt.py::MptModelTest::test_training_gradient_accumulation - TypeError: MptForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/musicgen/test_modeling_musicgen.py::MusicgenDecoderTest::test_training_gradient_accumulation - ValueError: Unrecognized configuration class <class 'transformers.models.musicgen.configuration_musicgen.MusicgenDecoderConfig'> for this kind of AutoModel: AutoModelForCausalLM.
FAILED tests/models/musicgen_melody/test_modeling_musicgen_melody.py::MusicgenMelodyDecoderTest::test_training_gradient_accumulation - ValueError: Unrecognized configuration class <class 'transformers.models.musicgen_melody.configuration_musicgen_melody.MusicgenMelodyDecoderConfig'> for this kind of AutoModel: ...
FAILED tests/models/nemotron/test_modeling_nemotron.py::GemmaModelTest::test_training_gradient_accumulation - TypeError: GemmaModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/openai/test_modeling_openai.py::OpenAIGPTModelTest::test_training_gradient_accumulation - TypeError: OpenAIGPTLMHeadModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/opt/test_modeling_opt.py::OPTModelTest::test_training_gradient_accumulation - TypeError: OPTForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/persimmon/test_modeling_persimmon.py::PersimmonModelTest::test_training_gradient_accumulation - TypeError: PersimmonForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py::RecurrentGemmaModelTest::test_training_gradient_accumulation - TypeError: RecurrentGemmaForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/reformer/test_modeling_reformer.py::ReformerLocalAttnModelTest::test_training_gradient_accumulation - TypeError: ReformerModelWithLMHead.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/rembert/test_modeling_rembert.py::RemBertModelTest::test_training_gradient_accumulation - TypeError: RemBertForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/roberta/test_modeling_roberta.py::RobertaModelTest::test_training_gradient_accumulation - TypeError: RobertaForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/roberta_prelayernorm/test_modeling_roberta_prelayernorm.py::RobertaPreLayerNormModelTest::test_training_gradient_accumulation - TypeError: RobertaPreLayerNormForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/roc_bert/test_modeling_roc_bert.py::RoCBertModelTest::test_training_gradient_accumulation - TypeError: RoCBertForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/roformer/test_modeling_roformer.py::RoFormerModelTest::test_training_gradient_accumulation - TypeError: RoFormerForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/rwkv/test_modeling_rwkv.py::RwkvModelTest::test_training_gradient_accumulation - TypeError: RwkvForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/stablelm/test_modeling_stablelm.py::StableLmModelTest::test_training_gradient_accumulation - TypeError: StableLmForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/xglm/test_modeling_xglm.py::XGLMModelTest::test_training_gradient_accumulation - TypeError: XGLMForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/xlm/test_modeling_xlm.py::XLMModelTest::test_training_gradient_accumulation - TypeError: XLMWithLMHeadModel.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/xlm_roberta_xl/test_modeling_xlm_roberta_xl.py::XLMRobertaXLModelTest::test_training_gradient_accumulation - TypeError: XLMRobertaXLForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/xmod/test_modeling_xmod.py::XmodModelTest::test_training_gradient_accumulation - TypeError: XmodForCausalLM.forward() got an unexpected keyword argument 'num_items_in_batch'

@muellerzr muellerzr mentioned this issue Jan 24, 2025
5 tasks
@techkang
Copy link
Contributor

I think the better way to fix this bug is delete:

**kwargs: Unpack[KwargsForCausalLM],

and

Forcefully enabling variable args but not using num_items_in_batch for loss calculation will make training loss gradient_accumulation_steps times larger than before.

@muellerzr
Copy link
Contributor

@techkang that is impossible as **kwargs will be needed on models from hereonout for different reasons

@techkang
Copy link
Contributor

techkang commented Jan 27, 2025

@muellerzr I misunderstood your pull requests earlier. Most of the models can handle num_items_in_batch correctly after your PR. However, some models that take **kwargs but don't handle this arg will still encounter a problem that the training loss will be gradient_accumulation_steps times larger than before if the loss is calculated inside the model. I think we should only allow models to use **kwargs when they can handle num_items_in_batch correctly.

@muellerzr
Copy link
Contributor

muellerzr commented Jan 27, 2025

@techkang I'm updating all models that do this (the old way) to do so as part of that PR. It's a big undertaking, but it's what we're going to be doing going forward.

(I'll also ensure that current models that take **kwargs that need it are supported, thanks for the catch there :) )

@Bachstelze
Copy link
Author

Since when does this bug occur? Is there a stable version with modernBERT?

@Bachstelze
Copy link
Author

Is it normal that some checks were not successful? Or should the pull request be tested on the own settings?

@Pappasad
Copy link

Pappasad commented Feb 6, 2025

Has this been fixed yet? I am getting this error with TrOCR

@Bachstelze
Copy link
Author

I am also still getting this error with modernBERT:

TypeError: ModernBertModel.forward() got an unexpected keyword argument 'num_items_in_batch'

@Bachstelze
Copy link
Author

Why can't I reopen my own issue?
The error is still the same for roBERTa in an encoder-decoder model:

TypeError                                 Traceback (most recent call last)

[<ipython-input-10-a5a4ead42e3b>](https://localhost:8080/#) in <cell line: 0>()
    131     optimizers=(adam, lr_scheduler)
    132 )
--> 133 trainer.train()
    134 print("training finished", flush=True)
    135 #wandb.finish()

8 frames

[/usr/local/lib/python3.11/dist-packages/transformers/trainer.py](https://localhost:8080/#) in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   2183                 hf_hub_utils.enable_progress_bars()
   2184         else:
-> 2185             return inner_training_loop(
   2186                 args=args,
   2187                 resume_from_checkpoint=resume_from_checkpoint,

[/usr/local/lib/python3.11/dist-packages/transformers/trainer.py](https://localhost:8080/#) in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   2489                     )
   2490                     with context():
-> 2491                         tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
   2492 
   2493                     if (

[/usr/local/lib/python3.11/dist-packages/transformers/trainer.py](https://localhost:8080/#) in training_step(self, model, inputs, num_items_in_batch)
   3608 
   3609         with self.compute_loss_context_manager():
-> 3610             loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
   3611 
   3612         del inputs

[/usr/local/lib/python3.11/dist-packages/transformers/trainer.py](https://localhost:8080/#) in compute_loss(self, model, inputs, return_outputs, num_items_in_batch)
   3669                 loss_kwargs["num_items_in_batch"] = num_items_in_batch
   3670             inputs = {**inputs, **loss_kwargs}
-> 3671         outputs = model(**inputs)
   3672         # Save past state if it exists
   3673         # TODO: this needs to be fixed and made cleaner later.

[/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _wrapped_call_impl(self, *args, **kwargs)
   1734             return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1735         else:
-> 1736             return self._call_impl(*args, **kwargs)
   1737 
   1738     # torchrec tests the code consistency with the following code

[/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *args, **kwargs)
   1745                 or _global_backward_pre_hooks or _global_backward_hooks
   1746                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1747             return forward_call(*args, **kwargs)
   1748 
   1749         result = None

[/usr/local/lib/python3.11/dist-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py](https://localhost:8080/#) in forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, **kwargs)
    601 
    602         if encoder_outputs is None:
--> 603             encoder_outputs = self.encoder(
    604                 input_ids=input_ids,
    605                 attention_mask=attention_mask,

[/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _wrapped_call_impl(self, *args, **kwargs)
   1734             return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1735         else:
-> 1736             return self._call_impl(*args, **kwargs)
   1737 
   1738     # torchrec tests the code consistency with the following code

[/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *args, **kwargs)
   1745                 or _global_backward_pre_hooks or _global_backward_hooks
   1746                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1747             return forward_call(*args, **kwargs)
   1748 
   1749         result = None

TypeError: RobertaModel.forward() got an unexpected keyword argument 'num_items_in_batch'

@muellerzr
Copy link
Contributor

Thanks, looking into it.

@anyaschenikova
Copy link

for me too:

TypeError: Qwen2VLForConditionalGeneration.forward() got an unexpected keyword argument 'num_items_in_batch'

@ArthurZucker
Copy link
Collaborator

😾 @muellerzr let's improve test coverage and make sure we fix all of them for the release!

@dcm-kouki-eguchi
Copy link

I am also still getting this error with GPT2Model.

File /opt/conda/lib/python3.11/site-packages/transformers/trainer.py:2241, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   2239         hf_hub_utils.enable_progress_bars()
   2240 else:
-> 2241     return inner_training_loop(
   2242         args=args,
   2243         resume_from_checkpoint=resume_from_checkpoint,
   2244         trial=trial,
   2245         ignore_keys_for_eval=ignore_keys_for_eval,
   2246     )

File /opt/conda/lib/python3.11/site-packages/transformers/trainer.py:2548, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   2541 context = (
   2542     functools.partial(self.accelerator.no_sync, model=model)
   2543     if i != len(batch_samples) - 1
   2544     and self.accelerator.distributed_type != DistributedType.DEEPSPEED
   2545     else contextlib.nullcontext
   2546 )
   2547 with context():
-> 2548     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
   2550 if (
   2551     args.logging_nan_inf_filter
   2552     and not is_torch_xla_available()
   2553     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   2554 ):
   2555     # if loss is nan or inf simply add the average of previous logged losses
   2556     tr_loss = tr_loss + tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File /opt/conda/lib/python3.11/site-packages/transformers/trainer.py:3698, in Trainer.training_step(self, model, inputs, num_items_in_batch)
   3695     return loss_mb.reduce_mean().detach().to(self.args.device)
   3697 with self.compute_loss_context_manager():
-> 3698     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
   3700 del inputs
   3701 if (
   3702     self.args.torch_empty_cache_steps is not None
   3703     and self.state.global_step % self.args.torch_empty_cache_steps == 0
   3704 ):

File /opt/conda/lib/python3.11/site-packages/transformers/trainer.py:3759, in Trainer.compute_loss(self, model, inputs, return_outputs, num_items_in_batch)
   3757         loss_kwargs["num_items_in_batch"] = num_items_in_batch
   3758     inputs = {**inputs, **loss_kwargs}
-> 3759 outputs = model(**inputs)
   3760 # Save past state if it exists
   3761 # TODO: this needs to be fixed and made cleaner later.
   3762 if self.args.past_index >= 0:

File /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
   1737     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1738 else:
-> 1739     return self._call_impl(*args, **kwargs)

File /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
   1745 # If we don't have any hooks, we want to skip the rest of the logic in
   1746 # this function, and just call forward.
   1747 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1748         or _global_backward_pre_hooks or _global_backward_hooks
   1749         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1750     return forward_call(*args, **kwargs)
   1752 result = None
   1753 called_always_called_hooks = set()

File /opt/conda/lib/python3.11/site-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py:603, in EncoderDecoderModel.forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, **kwargs)
    598 kwargs_decoder = {
    599     argument[len("decoder_") :]: value for argument, value in kwargs.items() if argument.startswith("decoder_")
    600 }
    602 if encoder_outputs is None:
--> 603     encoder_outputs = self.encoder(
    604         input_ids=input_ids,
    605         attention_mask=attention_mask,
    606         inputs_embeds=inputs_embeds,
    607         output_attentions=output_attentions,
    608         output_hidden_states=output_hidden_states,
    609         return_dict=return_dict,
    610         **kwargs_encoder,
    611     )
    612 elif isinstance(encoder_outputs, tuple):
    613     encoder_outputs = BaseModelOutput(*encoder_outputs)

File /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
   1737     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1738 else:
-> 1739     return self._call_impl(*args, **kwargs)

File /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
   1745 # If we don't have any hooks, we want to skip the rest of the logic in
   1746 # this function, and just call forward.
   1747 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1748         or _global_backward_pre_hooks or _global_backward_hooks
   1749         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1750     return forward_call(*args, **kwargs)
   1752 result = None
   1753 called_always_called_hooks = set()

TypeError: GPT2Model.forward() got an unexpected keyword argument 'num_items_in_batch'

@Lagniappe52
Copy link

Also getting this error with MllamaForConditionalGeneration()

Traceback (most recent call last):
  File "/home/research/Ophthalmology/Ophthal_UNet_model.py", line 203, in <module>
    main()
  File "/home/research/Ophthalmology/Ophthal_UNet_model.py", line 200, in main
    trainer.train()
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/transformers/trainer.py", line 2241, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/transformers/trainer.py", line 2548, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/transformers/trainer.py", line 3698, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/transformers/trainer.py", line 3759, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/_dynamo/external_utils.py", line 40, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/_dynamo/external_utils.py", line 40, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/peft/peft_model.py", line 847, in forward
    with self._enable_peft_forward_hooks(*args, **kwargs):
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/peft/peft_model.py", line 849, in torch_dynamo_resume_in_forward_at_847
    return self.get_base_model()(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/research/Ophthalmology/.venv/lib64/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
TypeError: MllamaForConditionalGeneration.forward() got an unexpected keyword argument 'num_items_in_batch'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.