You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
prints 1000000000000000019884624838656 which indicates infinity sequence length, however distilbert-base-uncased has a max sequence length of 512
Expected behavior
The code above should print 512 like it does in transformers==4.39.3
Details
This bug is a direct regression of #29112 which refactors the way default configurations are stored. For the PretrainedConfig classes, the config-maps were moved and deprecated, but for the Tokenizer classes, the default values were just removed.
I suppose a quick fix would be to also create deprecated default configurations for the Tokenizers (analogous to the deprecated config maps). I can work on that, if you accept this solution
The text was updated successfully, but these errors were encountered:
System Info
transformers
version: 4.40.0Who can help?
@ArthurZucker @younesbelkada because of tokenization/text models
@LysandreJik as the bug was introduced by #29112
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
install tokenizers>=4.40.0:
pip install "transformers>=4.40.0"
run python:
prints
1000000000000000019884624838656
which indicates infinity sequence length, however distilbert-base-uncased has a max sequence length of 512Expected behavior
The code above should print
512
like it does intransformers==4.39.3
Details
This bug is a direct regression of #29112 which refactors the way default configurations are stored. For the
PretrainedConfig
classes, the config-maps were moved and deprecated, but for the Tokenizer classes, the default values were just removed.I suppose a quick fix would be to also create deprecated default configurations for the Tokenizers (analogous to the deprecated config maps). I can work on that, if you accept this solution
The text was updated successfully, but these errors were encountered: