[Model] Add support for GraniteMoeShared models #13313

tjohnson31415 · 2025-02-15T00:21:00Z

Adds support for the granitemoeshared model type which is based on granitemoe but with the addition of a shared experts layer. A preview model with this architecture can be found at ibm-research/moe-7b-1b-active-shared-experts.

transformers support for this GraniteMoeShared model was recently merged and requires transformers >= v4.49.0

github-actions · 2025-02-15T00:21:11Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

jeejeelee · 2025-02-15T02:28:52Z

vllm/model_executor/models/granitemoeshared.py

+
+        self.input_size = config.hidden_size
+        self.hidden_size = config.shared_intermediate_size
+        self.input_linear = MergedColumnParallelLinear(


QQ: why doesn't input_linear support LoRA?

Thanks for taking a look!

Honestly, I don't know what is required for a layer to support LoRA... I presume that there is no reason for a simple linear layer not to, but do please let me know if there are reasons I would need to investigate 😅

I added input_linear and output_linear to the supported_lora_modules.

vllm/model_executor/models/granitemoeshared.py

vllm/model_executor/models/registry.py

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 · 2025-02-28T17:18:09Z

docs/source/models/supported_models.md

@@ -280,7 +280,12 @@ See [this page](#generative-models) for more information on how to use generativ
  * Granite 3.0 MoE, PowerMoE
  * `ibm-granite/granite-3.0-1b-a400m-base`, `ibm-granite/granite-3.0-3b-a800m-instruct`, `ibm/PowerMoE-3b`, etc.
  * ✅︎
+  *


I did a quick test and PP doesn't seem to work for the GraniteMoe model either.
I can look in to that as a follow-on.

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

DarkLight1337

Since it's a test model, we can merge this as long as you can use the model on your end.

tests/models/registry.py

jeejeelee reviewed Feb 15, 2025

View reviewed changes

vllm/model_executor/models/granitemoeshared.py Show resolved Hide resolved

jeejeelee reviewed Feb 25, 2025

View reviewed changes

vllm/model_executor/models/granitemoeshared.py Outdated Show resolved Hide resolved

jeejeelee reviewed Feb 25, 2025

View reviewed changes

vllm/model_executor/models/granitemoeshared.py Show resolved Hide resolved

jeejeelee reviewed Feb 25, 2025

View reviewed changes

vllm/model_executor/models/granitemoeshared.py Show resolved Hide resolved

jeejeelee reviewed Feb 25, 2025

View reviewed changes

vllm/model_executor/models/registry.py Show resolved Hide resolved

tjohnson31415 added 7 commits February 28, 2025 09:22

first draft with shared experts support

75835e6

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

fix: use getattr and delete tensor early

d1ad12d

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

fix: use MergedColumnParallelLinear

e422449

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

refactor: move impl to dedicated granitesharedmoe.py file

d005fae

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

update supported_lora_modules

33959a8

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

linting

f7558e3

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

review: updates from code review

614e1a9

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 force-pushed the granite-shared-experts branch from f156ee9 to 3a9f19e Compare February 28, 2025 17:14

tjohnson31415 requested review from DarkLight1337 and ywang96 as code owners February 28, 2025 17:14

mergify bot added the documentation Improvements or additions to documentation label Feb 28, 2025

tjohnson31415 commented Feb 28, 2025

View reviewed changes

docs: update docs for granitemoeshared

b2d9e45

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 force-pushed the granite-shared-experts branch from 3a9f19e to b2d9e45 Compare February 28, 2025 17:23

DarkLight1337 approved these changes Mar 1, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) March 1, 2025 06:27

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 1, 2025

DarkLight1337 reviewed Mar 1, 2025

View reviewed changes

tests/models/registry.py Outdated Show resolved Hide resolved

Update tests/models/registry.py

4b0edeb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Add support for GraniteMoeShared models #13313

[Model] Add support for GraniteMoeShared models #13313

tjohnson31415 commented Feb 15, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 15, 2025

jeejeelee Feb 15, 2025

tjohnson31415 Feb 17, 2025

tjohnson31415 Feb 28, 2025

DarkLight1337 left a comment

[Model] Add support for GraniteMoeShared models #13313

Are you sure you want to change the base?

[Model] Add support for GraniteMoeShared models #13313

Conversation

tjohnson31415 commented Feb 15, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 15, 2025

jeejeelee Feb 15, 2025

Choose a reason for hiding this comment

tjohnson31415 Feb 17, 2025

Choose a reason for hiding this comment

tjohnson31415 Feb 28, 2025

Choose a reason for hiding this comment

DarkLight1337 left a comment

Choose a reason for hiding this comment

tjohnson31415 commented Feb 15, 2025 •

edited by github-actions bot

Loading