You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @sanchit-gandhi , first of all, thanks for such nicely curated transformers examples and fine-tuning tutorials. I would be completely lost without them.
I still found some discrepances, but maybe they are on purpose, so I am hesitant to write any PRs.
In seq2seq script in DataCollatorSpeechSeq2SeqWithPadding class, you have:
# if bos token is appended in previous tokenization step,# cut bos token here as it's append later anywaysif (labels[:, 0] ==self.decoder_start_token_id).all().cpu().item():
labels=labels[:, 1:]
whereas in the fine-tuning notebook it declares
if (labels[:, 0] ==self.processor.tokenizer.bos_token_id).all().cpu().item():
labels=labels[:, 1:]
I get, that this is to make the script generic to any model, not only Whisper. But when we look into Whisper config.json, we can see different values
Also second thing. In your new fine-tuning jupyter notebook you are no longer manually emptying forced_decoder_ids and supress_tokens in model.config, but in model.config_generation
Hey @MarhyCZ - thanks for these astute remarks! Regarding the discrepancy in the data collator, you're absolutely right: we should be using the decoder_start_token_id in both cases. I've fixed the blog post in this PR: huggingface/blog#1949
For the forced/suppressed token ids, I've fixed the fine-tuning example script in this commit: a20a96f.
I'm running both the blog post and script side-by-side to check whether I get equivalent results. I'll report back in ~3 hours when these are finished.
@sanchit-gandhi You are amazing, thanks for such a quick review!
I made those edits yesterday and I am getting almost the same WER now. So I think we are good. Let me know for my curiosity :)
Hi @sanchit-gandhi , first of all, thanks for such nicely curated transformers examples and fine-tuning tutorials. I would be completely lost without them.
I was comparing your latest whisper fine-tuning notebook huggingface/blog#1944 with the end to end script run_speech_recognition_seq2seq.py availble in examples. And also latest PRs with bugfixes like #29938
I still found some discrepances, but maybe they are on purpose, so I am hesitant to write any PRs.
In seq2seq script in DataCollatorSpeechSeq2SeqWithPadding class, you have:
whereas in the fine-tuning notebook it declares
I get, that this is to make the script generic to any model, not only Whisper. But when we look into Whisper config.json, we can see different values
They correspond do different values:
https://huggingface.co/openai/whisper-large-v3/blob/main/added_tokens.json
Is this wanted state?
Also second thing. In your new fine-tuning jupyter notebook you are no longer manually emptying
forced_decoder_ids and supress_tokens
in model.config, but in model.config_generationSo shouldnt this line be also changed in the run_speech_recognition_seq2seq.py ?
I was getting a little bit different WERs but in the range of 1-2% so that could be just a variation (I didnt defined any seed) so no biggie.
Have a nice day!
The text was updated successfully, but these errors were encountered: