Please update the blog to fix: "Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription" #1948

d2a-raudenaerde · 2024-04-01T09:06:32Z

I'm trying to trying to do the "https://huggingface.co/blog/fine-tune-whisper" so I setup an gpu supporterd jupyter.

However, after the first evaluation print (1000 steps), I get this error:

Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass language='en'.

I'm sure there is a workaround, but could you please update the blog with these new settings?

The text was updated successfully, but these errors were encountered:

d2a-raudenaerde · 2024-04-01T09:07:19Z

Ok i was too fast maybe it has a fix in #1944?

pcuenca · 2024-04-01T09:43:06Z

Yes, that PR was just merged :) Can you give a try and see if it works for you?

d2a-raudenaerde · 2024-04-01T10:00:17Z

ValueError: Unsupported language: ('hindi',). Language should be one of: ['english', 'chinese', 'german', 'spanish', 'russian', 'korean', 'french', 'japanese', 'portuguese', 'turkish', 'polish', 'catalan', 'dutch', 'arabic', 'swedish', 'italian', 'indonesian', 'hindi', 'finnish', 'vietnamese', 'hebrew', 'ukrainian', 'greek', 'malay', 'czech', 'romanian', 'danish', 'hungarian', 'tamil', 'norwegian', 'thai', 'urdu', 'croatian', 'bulgarian', 'lithuanian', 'latin', 'maori', 'malayalam', 'welsh', 'slovak', 'telugu', 'persian', 'latvian', 'bengali', 'serbian', 'azerbaijani', 'slovenian', 'kannada', 'estonian', 'macedonian', 'breton', 'basque', 'icelandic', 'armenian', 'nepali', 'mongolian', 'bosnian', 'kazakh', 'albanian', 'swahili', 'galician', 'marathi', 'punjabi', 'sinhala', 'khmer', 'shona', 'yoruba', 'somali', 'afrikaans', 'occitan', 'georgian', 'belarusian', 'tajik', 'sindhi', 'gujarati', 'amharic', 'yiddish', 'lao', 'uzbek', 'faroese', 'haitian creole', 'pashto', 'turkmen', 'nynorsk', 'maltese', 'sanskrit', 'luxembourgish', 'myanmar', 'tibetan', 'tagalog', 'malagasy', 'assamese', 'tatar', 'hawaiian', 'lingala', 'hausa', 'bashkir', 'javanese', 'sundanese', 'cantonese', 'burmese', 'valencian', 'flemish', 'haitian', 'letzeburgesch', 'pushto', 'panjabi', 'moldavian', 'moldovan', 'sinhalese', 'castilian', 'mandarin'].

d2a-raudenaerde · 2024-04-01T10:01:54Z

I only copied the model config part (searched for 'hindi' in the source')

model.generation_config.language = "hindi",
model.generation_config.task = "transcribe",
model.generation_config.forced_decoder_ids = None

Maybe I forgot it somewhere.. Will check!

d2a-raudenaerde · 2024-04-01T10:04:42Z

Ah the error shows it is a tuple ('hindi,) and the list contains 'hindi' as a regular string.

pcuenca · 2024-04-01T10:37:22Z

Yes, I think those trailing commas in your code snippet should not be there. This is how the blog shows up for me:

d2a-raudenaerde · 2024-04-01T10:40:34Z

Oh I see it is a copy paste error :S

d2a-raudenaerde · 2024-04-01T10:46:48Z

Ok now it goes further. I reduced the evaluation steps to 100 to see error sooner. However, the output seems empty?

d2a-raudenaerde · 2024-04-01T11:05:41Z

Ok I see this is to be expected and the second bar is the 'evaluation' process. Seems to work as I now have a first evaluation after 100 steps. (not recommended as evalutation takes about 10 minutes on my system)

d2a-raudenaerde closed this as completed Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please update the blog to fix: "Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription" #1948

Please update the blog to fix: "Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription" #1948

d2a-raudenaerde commented Apr 1, 2024

d2a-raudenaerde commented Apr 1, 2024

pcuenca commented Apr 1, 2024

d2a-raudenaerde commented Apr 1, 2024 •

edited

Loading

d2a-raudenaerde commented Apr 1, 2024

d2a-raudenaerde commented Apr 1, 2024

pcuenca commented Apr 1, 2024

d2a-raudenaerde commented Apr 1, 2024

d2a-raudenaerde commented Apr 1, 2024 •

edited

Loading

d2a-raudenaerde commented Apr 1, 2024

Please update the blog to fix: "Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription" #1948

Please update the blog to fix: "Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription" #1948

Comments

d2a-raudenaerde commented Apr 1, 2024

d2a-raudenaerde commented Apr 1, 2024

pcuenca commented Apr 1, 2024

d2a-raudenaerde commented Apr 1, 2024 • edited Loading

d2a-raudenaerde commented Apr 1, 2024

d2a-raudenaerde commented Apr 1, 2024

pcuenca commented Apr 1, 2024

d2a-raudenaerde commented Apr 1, 2024

d2a-raudenaerde commented Apr 1, 2024 • edited Loading

d2a-raudenaerde commented Apr 1, 2024

d2a-raudenaerde commented Apr 1, 2024 •

edited

Loading

d2a-raudenaerde commented Apr 1, 2024 •

edited

Loading