model_max_length default parameters are missing in transformers>=4.40.0 #30643

helpmefindaname · 2024-05-03T17:05:54Z

System Info

transformers version: 4.40.0
Platform: Windows-11-10.0.22631-SP0
Python version: 3.12.3
Huggingface_hub version: 0.23.0
Safetensors version: 0.4.3
Accelerate version: 0.29.3
Accelerate config: not found
PyTorch version (GPU?): 2.3.0+cpu (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help?

@ArthurZucker @younesbelkada because of tokenization/text models
@LysandreJik as the bug was introduced by #29112

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

install tokenizers>=4.40.0:
pip install "transformers>=4.40.0"

run python:

from transformers import AutoTokenizer;

tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
max_length = tokenizer.model_max_length

print(max_length)

prints 1000000000000000019884624838656 which indicates infinity sequence length, however distilbert-base-uncased has a max sequence length of 512

Expected behavior

The code above should print 512 like it does in transformers==4.39.3

Details

This bug is a direct regression of #29112 which refactors the way default configurations are stored. For the PretrainedConfig classes, the config-maps were moved and deprecated, but for the Tokenizer classes, the default values were just removed.

I suppose a quick fix would be to also create deprecated default configurations for the Tokenizers (analogous to the deprecated config maps). I can work on that, if you accept this solution

The text was updated successfully, but these errors were encountered:

LysandreJik · 2024-05-06T13:44:54Z

Thanks @helpmefindaname, it seems the update to this tokenizer's config file wasn't merged. I just merged it: https://huggingface.co/distilbert/distilbert-base-uncased/discussions/12

LysandreJik · 2024-05-06T13:47:12Z

It seems a few others from distilbert weren't merged. I just merged them. Thanks again for the heads-up!

helpmefindaname mentioned this issue May 3, 2024

[Bug]: transformers 4.40.0 assumes infinite sequence length on many models and breaks flairNLP/flair#3450

Closed

LysandreJik closed this as completed May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_max_length default parameters are missing in transformers>=4.40.0 #30643

model_max_length default parameters are missing in transformers>=4.40.0 #30643

helpmefindaname commented May 3, 2024 •

edited

Loading

LysandreJik commented May 6, 2024

LysandreJik commented May 6, 2024

model_max_length default parameters are missing in transformers>=4.40.0 #30643

model_max_length default parameters are missing in transformers>=4.40.0 #30643

Comments

helpmefindaname commented May 3, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Details

LysandreJik commented May 6, 2024

LysandreJik commented May 6, 2024

helpmefindaname commented May 3, 2024 •

edited

Loading