Add Descript-Audio-Codec model #31494

kamilakesbi · 2024-06-19T13:16:26Z

What does this PR do?

This PR aims at adding Descript-Audio-Codec model, a high fidelity general neural audio codec, to the Transformers library.

This model is composed of 3 components:

An Encoder model.
A ResidualVectorQuantizer model, which is used with the encoder to obtain the audio quantized latent codes.
A Decoder model, used to reconstruct the audio after compression.

This is still a draft PR. Here's what I've done for now:

Adapted the model to Transformers format in modeling_dac.py.
Added the checkpoint conversion scripts, and pushed to the hub the 3 models here (16/24 and 44 khz).
Made sure the forward pass gives the same output as the original model
Added a Feature Extractor (very similar to the Encodec FeatureExtractor).
Started iterating on tests.

Who can review ?

cc @sanchit-gandhi and @ArthurZucker
cc @ylacombe for visibility

src/transformers/models/dac/modeling_dac.py

src/transformers/models/dac/configuration_dac.py

src/transformers/models/dac/feature_extraction_dac.py

src/transformers/models/encodec/convert_encodec_checkpoint_to_pytorch.py

kamilakesbi · 2024-06-26T10:10:13Z

They indeed use weights with the different losses during training (see original codebase). I'll add weight attributes in the config file.

Note that in the current code, we only return the commitment_loss and codebook_loss, but there are other losses used to train the model (mel_loss and gan losses).

kamilakesbi · 2024-06-27T12:54:48Z

I took all of Sanchit's reviews and added integration tests. @ylacombe this should be ready for review!

Note that there's still one failing test which indicates:

1 failed because AssertionError -> <class 'transformers.models.dac.modeling_dac.DacModel'> is too big for the common tests (74175906)! It should have 1M max.

Should we overwrite this common test in the dac test file ?

cc @sanchit-gandhi

HuggingFaceDocBuilderDev · 2024-06-27T13:04:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tests/models/dac/test_modeling_dac.py

ylacombe

Hey @kamilakesbi,
Thanks for this great PR !!

I've left a few comments, the main ones being:

we definitely should have a method to decode from audio codebooks. We could maybe make the decode method compatible with both audio codebooks and quantized representation, WDYT ?
I'm not quite sure that we should have the losses being computed by default, especially since these losses are alone not enough to train the model - we need a few more losses to train the model if I remember correctly !

Let me know if you got any further questions, but again congrats on the PR, it's looking really great!

src/transformers/models/auto/configuration_auto.py

src/transformers/models/dac/configuration_dac.py

src/transformers/models/dac/feature_extraction_dac.py

src/transformers/models/dac/modeling_dac.py

docs/source/en/model_doc/dac.md

kamilakesbi · 2024-07-03T11:12:30Z

Thank you for your review @ylacombe!

I have taken your feedback into account and updated the code. I've also added the ability to decode from audio codebooks.

Regarding the loss, I agree with you that we should probably not return the encoder loss by default. Normally, the loss is returned when the labels argument is passed to the forward pass of the model, but here I'm not sure it makes sense to add a labels argument, as the loss is computed in an unsupervised manner. Should I add a return_loss arg instead ?

Otherwise I think this is ready for a final review @amyeroberts :) failling tests are unrelated to this PR I think.

sanchit-gandhi

Looks in great shape - just some minor style nits from me!

src/transformers/models/dac/modeling_dac.py

src/transformers/models/dac/feature_extraction_dac.py

tests/models/dac/test_feature_extraction_dac.py

tests/models/dac/test_modeling_dac.py

ylacombe

Thanks for iterating @kamilakesbi, LGTM!

gentle ping to @amyeroberts and @ArthurZucker for a review!

docs/source/en/model_doc/dac.md

amyeroberts

Thanks for adding this model!

There's a few things here and there, mainly the weight_norm logic, but overall looks really good and clean 🤗

src/transformers/models/dac/modeling_dac.py

tests/models/dac/test_modeling_dac.py

src/transformers/models/dac/modeling_dac.py

docs/source/en/model_doc/dac.md

kamilakesbi · 2024-07-10T13:16:47Z

Thanks for the reviews @amyeroberts and @ylacombe!

We should be close from merging this model!

The last change would be to transfer the weights from my personal hugging face page to the descript organisation. I'm waiting for the members of the organisation to add me.

kamilakesbi · 2024-07-11T14:42:38Z

@amyeroberts the checkpoints have been transferred to the Descript organisation.

We can merge this PR if everything if ok for you :)

kamilakesbi · 2024-07-15T08:50:55Z

Gentle ping @amyeroberts

amyeroberts · 2024-07-15T09:22:57Z

@kamilakesbi There's still failing tests on the CI - these should be resolved before final review and merge. You may need to rebase on main to include upstream changes or trigger a re-run of the CI if the issues are relating to the environment or other libraries.

kamilakesbi · 2024-07-15T10:44:41Z

@amyeroberts I've rebased but there are still failing tests which I think are unrelated to this PR. The failing tests indicate the following message:

1 failed because huggingface_hub.utils._errors.RepositoryNotFoundError: 404 Client Error. (Request ID -> Root=1-6694ef78-6ec395936d885db72ebacd65;2c9bbc0d-6ecd-49c4-b28e-df134be7bd4a)

kamilakesbi · 2024-07-18T11:52:42Z

@amyeroberts after rebasing all tests pass on the CI :)

If I get your approval I can merge!

amyeroberts

Thanks for adding and iterating!

Just part of the feature extractor and docs to update

src/transformers/models/dac/feature_extraction_dac.py

docs/source/en/model_doc/dac.md

src/transformers/models/dac/convert_dac_checkpoint.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

amyeroberts · 2024-08-19T10:15:54Z

@kamilakesbi Why was this merged when there were failing slow tests?

This reverts commit 8260cb3.

kamilakesbi changed the title ~~Add dac~~ [WIP] - Add Descript-Audio-Codec model Jun 19, 2024

kamilakesbi requested a review from sanchit-gandhi June 19, 2024 14:20

kamilakesbi added the Audio label Jun 19, 2024

sanchit-gandhi reviewed Jun 21, 2024

View reviewed changes

ylacombe mentioned this pull request Jun 26, 2024

Stable Audio integration huggingface/diffusers#8716

Merged

5 tasks

kamilakesbi requested a review from ylacombe June 27, 2024 12:55

sanchit-gandhi reviewed Jun 28, 2024

View reviewed changes

tests/models/dac/test_modeling_dac.py Outdated Show resolved Hide resolved

ylacombe reviewed Jul 1, 2024

View reviewed changes

kamilakesbi requested a review from amyeroberts July 3, 2024 11:12

sanchit-gandhi approved these changes Jul 3, 2024

View reviewed changes

kamilakesbi requested a review from ArthurZucker July 4, 2024 09:29

kamilakesbi changed the title ~~[WIP] - Add Descript-Audio-Codec model~~ Add Descript-Audio-Codec model Jul 4, 2024

ylacombe approved these changes Jul 8, 2024

View reviewed changes

docs/source/en/model_doc/dac.md Outdated Show resolved Hide resolved

docs/source/en/model_doc/dac.md Outdated Show resolved Hide resolved

amyeroberts reviewed Jul 9, 2024

View reviewed changes

kamilakesbi force-pushed the add_dac branch from c63d7a7 to fe237ed Compare July 15, 2024 09:32

kamilakesbi force-pushed the add_dac branch from fe237ed to 550f1d6 Compare July 18, 2024 09:41

kamilakesbi requested a review from amyeroberts July 19, 2024 11:16

amyeroberts reviewed Jul 19, 2024

View reviewed changes

kamilakesbi requested a review from amyeroberts July 22, 2024 09:16

amyeroberts added the run-slow label Jul 22, 2024

kamilakesbi and others added 23 commits August 19, 2024 10:52

iterate on design and tests

3bc40c6

add integration tests

01511b7

feature extractor tests

5cdf0ae

make style

167cb8f

all tests pass

a4d1261

make style

1fd2496

fixup

09ec8b5

apply review suggestions

a5ac7c6

fix-copies

284c75b

apply review suggestions

7512886

apply review suggestions

dc2e85c

Update docs/source/en/model_doc/dac.md

c7318d5

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

Update docs/source/en/model_doc/dac.md

fdb8ced

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

anticipate transfer weights to descript

5388663

up

fac14fd

make style

e088e0d

apply review suggestions

bfaef5e

update slow test values

a473975

update slow tests

2be0f36

update test values

c13180e

update with CI values

8c72cda

update with vorace values

89b7143

update test with slice

5b02249

kamilakesbi force-pushed the add_dac branch from f90ce78 to 5b02249 Compare August 19, 2024 08:52

make style

1671917

kamilakesbi merged commit 8260cb3 into huggingface:main Aug 19, 2024
23 of 25 checks passed

kamilakesbi added a commit that referenced this pull request Aug 19, 2024

Revert "Add Descript-Audio-Codec model (#31494)"

ffc0f6b

This reverts commit 8260cb3.

This was referenced Aug 19, 2024

Revert "Add Descript-Audio-Codec model" #32876

Closed

fix DAC slow test #32879

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Descript-Audio-Codec model #31494

Add Descript-Audio-Codec model #31494

kamilakesbi commented Jun 19, 2024 •

edited

Loading

kamilakesbi commented Jun 26, 2024 •

edited

Loading

kamilakesbi commented Jun 27, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 27, 2024

ylacombe left a comment

kamilakesbi commented Jul 3, 2024

sanchit-gandhi left a comment

ylacombe left a comment

amyeroberts left a comment

kamilakesbi commented Jul 10, 2024

kamilakesbi commented Jul 11, 2024

kamilakesbi commented Jul 15, 2024

amyeroberts commented Jul 15, 2024

kamilakesbi commented Jul 15, 2024 •

edited

Loading

kamilakesbi commented Jul 18, 2024

amyeroberts left a comment

amyeroberts commented Aug 19, 2024

Add Descript-Audio-Codec model #31494

Add Descript-Audio-Codec model #31494

Conversation

kamilakesbi commented Jun 19, 2024 • edited Loading

What does this PR do?

Who can review ?

kamilakesbi commented Jun 26, 2024 • edited Loading

kamilakesbi commented Jun 27, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Jun 27, 2024

ylacombe left a comment

Choose a reason for hiding this comment

kamilakesbi commented Jul 3, 2024

sanchit-gandhi left a comment

Choose a reason for hiding this comment

ylacombe left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

kamilakesbi commented Jul 10, 2024

kamilakesbi commented Jul 11, 2024

kamilakesbi commented Jul 15, 2024

amyeroberts commented Jul 15, 2024

kamilakesbi commented Jul 15, 2024 • edited Loading

kamilakesbi commented Jul 18, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts commented Aug 19, 2024

kamilakesbi commented Jun 19, 2024 •

edited

Loading

kamilakesbi commented Jun 26, 2024 •

edited

Loading

kamilakesbi commented Jun 27, 2024 •

edited

Loading

kamilakesbi commented Jul 15, 2024 •

edited

Loading