-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError during training with Squad dataset and T5-small model #6973
Comments
add remove_unused_columns=False to training_args |
arthasking123
added a commit
to arthasking123/transformers
that referenced
this issue
Jun 18, 2024
5 tasks
amyeroberts
pushed a commit
to huggingface/transformers
that referenced
this issue
Jun 19, 2024
* Add valid columns checking in _remove_unused_columns method huggingface/datasets#6973 (comment) huggingface/datasets#6535 https://discuss.huggingface.co/t/indexerror-invalid-key-16-is-out-of-bounds-for-size-0/14298/25 * Update modeling_mixtral.py * Update modeling_mixtral.py * Update modeling_mixtral.py
Closing this issue because it was a reported and fixed in transformers. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
I am encountering an IndexError while training a T5-small model on the Squad dataset using the transformers and datasets libraries. The error occurs even with a minimal reproducible example, suggesting a potential bug or incompatibility.
Steps to reproduce the bug
1.Install the required libraries: !pip install transformers datasets
2.Run the following code:
!pip install transformers datasets
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, TrainingArguments, Trainer, DataCollatorWithPadding
Load a small, publicly available dataset
from datasets import load_dataset
dataset = load_dataset("squad", split="train[:100]") # Use a small subset for testing
Load a pre-trained model and tokenizer
model_name = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
Define a basic data collator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
Define training arguments
training_args = TrainingArguments(
output_dir="./results",
per_device_train_batch_size=2,
num_train_epochs=1,
)
Create a trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
data_collator=data_collator,
)
Train the model
trainer.train()
Expected behavior
IndexError Traceback (most recent call last)
in <cell line: 34>()
32
33 # Train the model
---> 34 trainer.train()
10 frames
/usr/local/lib/python3.10/dist-packages/datasets/formatting/formatting.py in _check_valid_index_key(key, size)
427 if isinstance(key, int):
428 if (key < 0 and key + size < 0) or (key >= size):
--> 429 raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
430 return
431 elif isinstance(key, slice):
IndexError: Invalid key: 42 is out of bounds for size 0
Environment info
transformers version:4.41.2
datasets version:1.18.4
Python version:3.10.12
The text was updated successfully, but these errors were encountered: