Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Moonshine #34784

Merged
merged 93 commits into from
Jan 10, 2025
Merged

Add Moonshine #34784

merged 93 commits into from
Jan 10, 2025

Conversation

eustlb
Copy link
Contributor

@eustlb eustlb commented Nov 18, 2024

What does this PR do?

This PR adds support for Moonshine to the Transformers library.

Moonshine builds on top of Whisper’s architecture to overcome some of its limitations, primarily the restriction to a fixed 30-second audio window.

Key improvements in Moonshine’s architecture:
1. It uses SwiGLU activation instead of GELU in the decoder layers.
2. Most importantly, it replaces absolute position embeddings with Rotary Position Embeddings (RoPE), enabling Moonshine to process audio inputs of any length—unlike Whisper, which is limited to fixed 30-second windows.

Who can review?

@ArthurZucker

TODO

  • update UsefulSensors model repos
  • run benchmarks

@eustlb eustlb changed the title Add Moonshine [WIP] Add Moonshine Nov 18, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just some notes in the meantime).

@eustlb eustlb merged commit 5f087d1 into huggingface:main Jan 10, 2025
25 checks passed
@xenova
Copy link
Contributor

xenova commented Jan 10, 2025

Amazing work @eustlb and team! 🤗

ArthurZucker pushed a commit that referenced this pull request Jan 10, 2025
* config draft

* full encoder forward

* full decoder forward

* fix sdpa and FA2

* fix sdpa and FA2

* moonshine model

* moonshine model forward

* fix attention with past_key_values

* add MoonshineForConditionalGeneration

* fix cache handling and causality for cross attention

* no causal attention mask for the encoder

* model addition (imports etc)

* small nit

* nits

* Update src/transformers/models/moonshine/convert_usefulsensors_to_hf.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* add rope_theta

* nits

* model doc

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* imports

* add MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES

* updates modular

* make

* make fix-copies

* ruff check examples fix

* fix check_modular_conversion

* nit

* nits

* nits

* copied from -> imports

* imports fix

* integrate attention refacto

* modular edge case

* remove encoder

* convolutions params in config

* run modular_model_converter

* make

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Joshua Lochner <admin@xenova.com>

* MoonshineModelTest

* correct typo

* make style

* integration tests

* make

* modular convert

* name conversion update (up_proj -> fc1 etc)

* update config

* update MLP

* update attention

* update encoder layer

* update decoder layer

* update convolutions parameters

* update encoder

* remove INPUTS_DOCSTRING

* update decoder

* update conditional generation

* update pretrained model

* imports

* modular converted

* update doc

* fix

* typo

* update doc

* update license

* update init

* split config in file

* two classes for MLP

* attention from GLM

* from GlmRotaryEmbedding

* split MLP

* apply arthur's review suggestions

* apply arthur's review suggestions

* apply arthur's review suggestions

* auto feature extractor

* convert modular

* fix + make

* convert modular

* make

* unsplit config

* use correct checkpoint

* wrap generate

* update tests

* typos

* make

* typo

* update doc

---------

Co-authored-by: Joshua Lochner <admin@xenova.com>
@alexmil2019
Copy link

alexmil2019 commented Jan 10, 2025

please check usefulsensors/moonshine#81. there is an error when using huggingface to load moonshine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants