You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there,
when I load the pretrained Camenbert model and tokenizer via
model = CamembertForMaskedLM.from_pretrained('camembert-base') tokenizer = CamembertTokenizer.from_pretrained('camembert-base')
the length of the tokenizer is 32004 but the vocab_size of the model is 32005. print(len(tokenizer))
'print(model.config.vocab_size'
This throws me an error
Index out of range
when I try to adapt the lm_finetuning example because of model.resize_token_embeddings(len(tokenizer))
It runs when I comment out this line. So my question is, is this the intended behaviour resp. what's the reason for the unevenness between the tokenizer and the model vocab_size?
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
❓ Questions & Help
Hi there,
when I load the pretrained Camenbert model and tokenizer via
model = CamembertForMaskedLM.from_pretrained('camembert-base') tokenizer = CamembertTokenizer.from_pretrained('camembert-base')
the length of the tokenizer is 32004 but the vocab_size of the model is 32005.
print(len(tokenizer))
'print(model.config.vocab_size'
This throws me an error
when I try to adapt the lm_finetuning example because of
model.resize_token_embeddings(len(tokenizer))
It runs when I comment out this line. So my question is, is this the intended behaviour resp. what's the reason for the unevenness between the tokenizer and the model vocab_size?
The text was updated successfully, but these errors were encountered: