-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Moonshine #34784
Add Moonshine #34784
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
bac9c9f
to
d0ed917
Compare
d4864c7
to
22dbaae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just some notes in the meantime).
src/transformers/models/moonshine/convert_usefulsensors_to_hf.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Joshua Lochner <admin@xenova.com>
Amazing work @eustlb and team! 🤗 |
* config draft * full encoder forward * full decoder forward * fix sdpa and FA2 * fix sdpa and FA2 * moonshine model * moonshine model forward * fix attention with past_key_values * add MoonshineForConditionalGeneration * fix cache handling and causality for cross attention * no causal attention mask for the encoder * model addition (imports etc) * small nit * nits * Update src/transformers/models/moonshine/convert_usefulsensors_to_hf.py Co-authored-by: Joshua Lochner <admin@xenova.com> * add rope_theta * nits * model doc * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Joshua Lochner <admin@xenova.com> * imports * add MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES * updates modular * make * make fix-copies * ruff check examples fix * fix check_modular_conversion * nit * nits * nits * copied from -> imports * imports fix * integrate attention refacto * modular edge case * remove encoder * convolutions params in config * run modular_model_converter * make * Update docs/source/en/model_doc/moonshine.md Co-authored-by: Joshua Lochner <admin@xenova.com> * MoonshineModelTest * correct typo * make style * integration tests * make * modular convert * name conversion update (up_proj -> fc1 etc) * update config * update MLP * update attention * update encoder layer * update decoder layer * update convolutions parameters * update encoder * remove INPUTS_DOCSTRING * update decoder * update conditional generation * update pretrained model * imports * modular converted * update doc * fix * typo * update doc * update license * update init * split config in file * two classes for MLP * attention from GLM * from GlmRotaryEmbedding * split MLP * apply arthur's review suggestions * apply arthur's review suggestions * apply arthur's review suggestions * auto feature extractor * convert modular * fix + make * convert modular * make * unsplit config * use correct checkpoint * wrap generate * update tests * typos * make * typo * update doc --------- Co-authored-by: Joshua Lochner <admin@xenova.com>
please check usefulsensors/moonshine#81. there is an error when using huggingface to load moonshine. |
What does this PR do?
This PR adds support for Moonshine to the Transformers library.
Moonshine builds on top of Whisper’s architecture to overcome some of its limitations, primarily the restriction to a fixed 30-second audio window.
Key improvements in Moonshine’s architecture:
1. It uses SwiGLU activation instead of GELU in the decoder layers.
2. Most importantly, it replaces absolute position embeddings with Rotary Position Embeddings (RoPE), enabling Moonshine to process audio inputs of any length—unlike Whisper, which is limited to fixed 30-second windows.
Who can review?
@ArthurZucker
TODO