-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM #28706
Merged
ylacombe
merged 5 commits into
huggingface:main
from
ylacombe:add-other-fe-support-to-processor-with-lm
May 20, 2024
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
2c40d44
Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM
ylacombe ce50dfd
update with a type filter
ylacombe c4c7dfb
Merge branch 'huggingface:main' into add-other-fe-support-to-processo…
ylacombe bce629b
add raises error test
ylacombe 387f49c
fix added test
ylacombe File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,7 +25,7 @@ | |
from datasets import load_dataset | ||
from parameterized import parameterized | ||
|
||
from transformers import AutoProcessor | ||
from transformers import AutoFeatureExtractor, AutoProcessor | ||
from transformers.models.wav2vec2 import Wav2Vec2CTCTokenizer, Wav2Vec2FeatureExtractor | ||
from transformers.models.wav2vec2.tokenization_wav2vec2 import VOCAB_FILES_NAMES | ||
from transformers.testing_utils import require_pyctcdecode, require_torch, require_torchaudio, slow | ||
|
@@ -157,6 +157,27 @@ def test_feature_extractor(self): | |
for key in input_feat_extract.keys(): | ||
self.assertAlmostEqual(input_feat_extract[key].sum(), input_processor[key].sum(), delta=1e-2) | ||
|
||
def test_another_feature_extractor(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should also test that a feature extractor this isn't SeamlessMT4 or Wav2vec2 raises an error on instantiation |
||
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/w2v-bert-2.0") | ||
tokenizer = self.get_tokenizer() | ||
decoder = self.get_decoder() | ||
|
||
processor = Wav2Vec2ProcessorWithLM(tokenizer=tokenizer, feature_extractor=feature_extractor, decoder=decoder) | ||
|
||
raw_speech = floats_list((3, 1000)) | ||
|
||
input_feat_extract = feature_extractor(raw_speech, return_tensors="np") | ||
input_processor = processor(raw_speech, return_tensors="np") | ||
|
||
for key in input_feat_extract.keys(): | ||
self.assertAlmostEqual(input_feat_extract[key].sum(), input_processor[key].sum(), delta=1e-2) | ||
|
||
self.assertListEqual( | ||
processor.model_input_names, | ||
feature_extractor.model_input_names, | ||
msg="`processor` and `feature_extractor` model input names do not match", | ||
) | ||
|
||
def test_tokenizer(self): | ||
feature_extractor = self.get_feature_extractor() | ||
tokenizer = self.get_tokenizer() | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps explicitly defining the valid feature extractor classes is less vulnerable to unexpected behaviour? Or do you think we should accept all feature extractors as valid attributes of the Wav2Vec2 processor? I would be in favour of the former
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the tokenizer behaviour, when passing a tuple to
..._class
, the second class is chosen by default ifuse_fast=True
(and the other way around). So in that case, it won't work.This is the first time that we had to face a processor with two possible classes. I'd be happy to modify the
ProcessorMixin
behaviour but I believe passingAutoFeatureExtractor
is an easier workaround, especially since I don't really see a case in which an user would pass another FE yet!cc @amyeroberts and @ArthurZucker, what do you think of this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - good with me to use
AutoFeatureExtractor
in this case since changing theProcessorMixin
would add unnecessary complexity to handle this unique exampleThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the first processor to handle the case with two object -- that's Instruct BLIP -- although it's slightly different as that has two tokenizers, rather than accepts two tokenizers.
I'm not a fan of using
AutoFeatureExtractor
here as the processor doesn't accept any feature extractor, and so the type is misleading. Ideally the processor should be adapted so for e.g. tokenizers it can accept a list of lists e.g.[[ToknizerA, TokenizerAFast], [TokenizerB, TokenizerBFast]]
, and to accept lists of objects for the other types e.g.[FeatureExtractorA, FeatureExtractorB, FeatureExtractorC]
.At the very least, the processor doc should indicate that only these two type are accepted and type verification happens in the init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am hitting the same issue at #29414, however I don't really see the problem of using autoclasses in the processors, I just would like it to be consistent (use either both auto classes for feature extractor and tokenizer, or not at all). In case of the use of auto classes, it would indeed be great to explicitly check for the supported classes.
Cause this PR currently would use AutoFeatureExtractor for the feature extractor, but a specific tokenizer class for the tokenizer. Hence I've opened #29414 to make the discussion more general, feel free to comment there!