AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id' #31348

rajanish4 · 2024-06-10T10:04:54Z

System Info

transformers version: 4.42.0.dev0
Platform: Windows-10-10.0.20348-SP0
Python version: 3.9.7
Huggingface_hub version: 0.23.3
Safetensors version: 0.4.3
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 1.13.0 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA RTX A6000

Who can help?

@ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", src_lang="ron_Latn",
token=token)
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M", token=token)

article = "Şeful ONU spune că nu există o soluţie militară în Siria"
inputs = tokenizer(article, return_tensors="pt")
translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["deu_Latn"], max_length=30)
tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]

Expected behavior

It should output translated text: UN-Chef sagt, es gibt keine militärische Lösung in Syrien

Complete error:

translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["deu_Latn"], max_length=30)
AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id'

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-06-10T10:20:34Z

Yes, we had a deprecation cycle and this attribute was removed 😉

rajanish4 · 2024-06-10T12:28:05Z

Thanks, but then how can i provide the language code for translation?

ArthurZucker · 2024-06-10T13:53:03Z

you should simply do tokenizer.encode("deu_Latn")[0]

tokenizer-decode · 2024-07-01T17:44:38Z

Then why the doc says otherwise? This is V4.42.0.
I also don't understand how to use tokenizer.encode("deu_Latn")[0]. What's the keyword? Is this a positional argument? @ArthurZucker

fe1ixxu · 2024-07-02T18:02:32Z

It seems there is an error: whatever the language code I gave to the NLLB tokenizer, it will always output English token id. My version is V4.42.3 @ArthurZucker :

ShayekhBinIslam · 2024-07-02T21:17:18Z

I think, tokenizer.encode("deu_Latn")[0] is the regular BOS token, tokenizer.encode("deu_Latn")[1] is the expected token. @ArthurZucker

ArthurZucker · 2024-07-10T10:46:55Z

Yes! You should use convert_token_to_id rather than encode sorry 😉

tnitn · 2024-07-12T18:41:21Z

What worked for me is
translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("deu_Latn"), max_length=30)

ArthurZucker · 2024-07-13T09:03:20Z

yep this is what we expect!

…ssue. huggingface/transformers#31348

LahadMbacke · 2024-08-02T14:10:31Z

It works for me :
FR_CODE = tokenizer.convert_tokens_to_ids("fr_Latn")
WO_CODE = tokenizer.convert_tokens_to_ids("wol_Latn")

github-actions · 2024-08-27T08:05:30Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

brauliobo · 2024-09-30T18:24:52Z

I'm getting this error using nllb-serve binary from the latest git version:

set 30 15:23:42 xeon1 nllb-serve[10089]:  * Running on http://10.0.0.6:6060
set 30 15:23:42 xeon1 nllb-serve[10089]: INFO:werkzeug:Press CTRL+C to quit
set 30 15:23:57 xeon1 nllb-serve[10089]: INFO:root:Loading tokenizer for facebook/nllb-200-distilled-600M; src_lang=eng_Latn ...
set 30 15:23:58 xeon1 nllb-serve[10089]: ERROR:nllb_serve.app:Exception on /translate [POST]
set 30 15:23:58 xeon1 nllb-serve[10089]: Traceback (most recent call last):
set 30 15:23:58 xeon1 nllb-serve[10089]:   File "/home/nllb-serve/nllb-serve/env/lib/python3.12/site-packages/flask/app.py", line 1463, in wsgi_app
set 30 15:23:58 xeon1 nllb-serve[10089]:     response = self.full_dispatch_request()
set 30 15:23:58 xeon1 nllb-serve[10089]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
set 30 15:23:58 xeon1 nllb-serve[10089]:   File "/home/nllb-serve/nllb-serve/env/lib/python3.12/site-packages/flask/app.py", line 872, in full_dispatch_request
set 30 15:23:58 xeon1 nllb-serve[10089]:     rv = self.handle_user_exception(e)
set 30 15:23:58 xeon1 nllb-serve[10089]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
set 30 15:23:58 xeon1 nllb-serve[10089]:   File "/home/nllb-serve/nllb-serve/env/lib/python3.12/site-packages/flask/app.py", line 870, in full_dispatch_request
set 30 15:23:58 xeon1 nllb-serve[10089]:     rv = self.dispatch_request()
set 30 15:23:58 xeon1 nllb-serve[10089]:          ^^^^^^^^^^^^^^^^^^^^^^^
set 30 15:23:58 xeon1 nllb-serve[10089]:   File "/home/nllb-serve/nllb-serve/env/lib/python3.12/site-packages/flask/app.py", line 855, in dispatch_request
set 30 15:23:58 xeon1 nllb-serve[10089]:     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
set 30 15:23:58 xeon1 nllb-serve[10089]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
set 30 15:23:58 xeon1 nllb-serve[10089]:   File "/home/nllb-serve/nllb-serve/nllb_serve/app.py", line 145, in translate
set 30 15:23:58 xeon1 nllb-serve[10089]:     **inputs, forced_bos_token_id=tokenizer.lang_code_to_id[tgt_lang],
set 30 15:23:58 xeon1 nllb-serve[10089]:                                   ^^^^^^^^^^^^^^^^^^^^^^^^^
set 30 15:23:58 xeon1 nllb-serve[10089]: AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id'

brauliobo · 2024-09-30T18:31:45Z

downgrading transformers from 4.45.1 to 4.37.0 fixed the issue for me. found it here https://drsuneamer.tistory.com/250

thomas-ferraz · 2024-12-01T22:32:54Z

I need to edit a tokenizer to add a new language code, how can i do it on NllbTokenizer? On the code I'm using it is like that:

old_len = len(tokenizer) - int(new_lang in tokenizer.added_tokens_encoder)
tokenizer.lang_code_to_id[new_lang] = old_len-1
tokenizer.id_to_lang_code[old_len-1] = new_lang

thomas-ferraz · 2024-12-01T23:04:57Z

I need to edit a tokenizer to add a new language code, how can i do it on NllbTokenizer? On the code I'm using it is like that:
old_len = len(tokenizer) - int(new_lang in tokenizer.added_tokens_encoder)
tokenizer.lang_code_to_id[new_lang] = old_len-1
tokenizer.id_to_lang_code[old_len-1] = new_lang

Just to provide a response here to my own question, this code worked for me
(maybe is not the best way, we can consider creating new functions for that.)

from transformers.tokenization_utils import AddedToken
old_len = len(tokenizer) - int(new_lang in tokenizer.added_tokens_encoder)
tokenizer._added_tokens_encoder[new_lang] = old_len-1
tokenizer._added_tokens_decoder[old_len-1] = AddedToken(new_lang, normalized=False, special=True)

if new_lang not in tokenizer._additional_special_tokens:
    tokenizer._additional_special_tokens.append(new_lang)

ArthurZucker · 2024-12-23T15:49:12Z

There is a function that does exactly that 😉 tokenizer.add_token(AddedToken(new_lang, normalized=False, special=True)).

LoicGrobol mentioned this issue Jun 11, 2024

Performance regression with transformers 0.41 LoicGrobol/zeldarose#85

Open

blackmesataiwan added a commit to blackmesataiwan/OneRingTranslator that referenced this issue Jul 19, 2024

Fix "'NllbTokenizerFast' object has no attribute 'lang_code_to_id'" i…

07128d0

…ssue. huggingface/transformers#31348

christoukmaji mentioned this issue Aug 6, 2024

Documentation: BOS token_id deprecation change for NLLB #32443

Merged

4 tasks

Sharrnah mentioned this issue Aug 13, 2024

NLLB Code Examples out of date? #32663

Closed

4 tasks

trolley813 mentioned this issue Aug 26, 2024

Nllb translation is broken (due to use of a formerly-deprecated symbol that has been deleted) Woolverine94/biniou#31

Closed

github-actions bot closed this as completed Sep 13, 2024

brauliobo mentioned this issue Sep 30, 2024

NllbTokenizerFast' object has no attribute 'lang_code_to_id thammegowda/nllb-serve#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id' #31348

AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id' #31348

rajanish4 commented Jun 10, 2024 •

edited

Loading

ArthurZucker commented Jun 10, 2024

rajanish4 commented Jun 10, 2024

ArthurZucker commented Jun 10, 2024

tokenizer-decode commented Jul 1, 2024

fe1ixxu commented Jul 2, 2024 •

edited

Loading

ShayekhBinIslam commented Jul 2, 2024 •

edited

Loading

ArthurZucker commented Jul 10, 2024

tnitn commented Jul 12, 2024 •

edited

Loading

ArthurZucker commented Jul 13, 2024

LahadMbacke commented Aug 2, 2024

github-actions bot commented Aug 27, 2024

brauliobo commented Sep 30, 2024

brauliobo commented Sep 30, 2024

thomas-ferraz commented Dec 1, 2024

thomas-ferraz commented Dec 1, 2024 •

edited

Loading

ArthurZucker commented Dec 23, 2024

AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id' #31348

AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id' #31348

Comments

rajanish4 commented Jun 10, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Complete error:

ArthurZucker commented Jun 10, 2024

rajanish4 commented Jun 10, 2024

ArthurZucker commented Jun 10, 2024

tokenizer-decode commented Jul 1, 2024

fe1ixxu commented Jul 2, 2024 • edited Loading

ShayekhBinIslam commented Jul 2, 2024 • edited Loading

ArthurZucker commented Jul 10, 2024

tnitn commented Jul 12, 2024 • edited Loading

ArthurZucker commented Jul 13, 2024

LahadMbacke commented Aug 2, 2024

github-actions bot commented Aug 27, 2024

brauliobo commented Sep 30, 2024

brauliobo commented Sep 30, 2024

thomas-ferraz commented Dec 1, 2024

thomas-ferraz commented Dec 1, 2024 • edited Loading

ArthurZucker commented Dec 23, 2024

rajanish4 commented Jun 10, 2024 •

edited

Loading

fe1ixxu commented Jul 2, 2024 •

edited

Loading

ShayekhBinIslam commented Jul 2, 2024 •

edited

Loading

tnitn commented Jul 12, 2024 •

edited

Loading

thomas-ferraz commented Dec 1, 2024 •

edited

Loading