-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Descript-Audio-Codec model #31494
Conversation
src/transformers/models/encodec/convert_encodec_checkpoint_to_pytorch.py
Outdated
Show resolved
Hide resolved
They indeed use weights with the different losses during training (see original codebase). I'll add weight attributes in the config file. Note that in the current code, we only return the |
I took all of Sanchit's reviews and added integration tests. @ylacombe this should be ready for review!
Should we overwrite this common test in the dac test file ? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kamilakesbi,
Thanks for this great PR !!
I've left a few comments, the main ones being:
- we definitely should have a method to decode from audio codebooks. We could maybe make the
decode
method compatible with both audio codebooks and quantized representation, WDYT ? - I'm not quite sure that we should have the losses being computed by default, especially since these losses are alone not enough to train the model - we need a few more losses to train the model if I remember correctly !
Let me know if you got any further questions, but again congrats on the PR, it's looking really great!
Thank you for your review @ylacombe! I have taken your feedback into account and updated the code. I've also added the ability to decode from audio codebooks. Regarding the loss, I agree with you that we should probably not return the encoder loss by default. Normally, the Otherwise I think this is ready for a final review @amyeroberts :) failling tests are unrelated to this PR I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks in great shape - just some minor style nits from me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating @kamilakesbi, LGTM!
gentle ping to @amyeroberts and @ArthurZucker for a review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this model!
There's a few things here and there, mainly the weight_norm logic, but overall looks really good and clean 🤗
Thanks for the reviews @amyeroberts and @ylacombe! We should be close from merging this model! The last change would be to transfer the weights from my personal hugging face page to the descript organisation. I'm waiting for the members of the organisation to add me. |
@amyeroberts the checkpoints have been transferred to the We can merge this PR if everything if ok for you :) |
Gentle ping @amyeroberts |
@kamilakesbi There's still failing tests on the CI - these should be resolved before final review and merge. You may need to rebase on main to include upstream changes or trigger a re-run of the CI if the issues are relating to the environment or other libraries. |
@amyeroberts I've rebased but there are still failing tests which I think are unrelated to this PR. The failing tests indicate the following message:
|
@amyeroberts after rebasing all tests pass on the CI :) If I get your approval I can merge! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding and iterating!
Just part of the feature extractor and docs to update
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
@kamilakesbi Why was this merged when there were failing slow tests? |
This reverts commit 8260cb3.
What does this PR do?
This PR aims at adding Descript-Audio-Codec model, a high fidelity general neural audio codec, to the Transformers library.
This model is composed of 3 components:
This is still a draft PR. Here's what I've done for now:
modeling_dac.py
.Who can review ?
cc @sanchit-gandhi and @ArthurZucker
cc @ylacombe for visibility