Skip to content

Commit

Permalink
add shared experts for upcoming Granite 4.0 language models (#35894)
Browse files Browse the repository at this point in the history
* Modular GraniteMoE with shared Experts.

Signed-off-by: Shawn Tan <shawntan@ibm.com>

* Modified

* Import order.

* Modified for style

* Fix space.

* Test

* Remove extra granitemoe file.

* New converted file and tests

* Modified __init__ files.

* Formatting.

* Dummy PT objects

* register granitemoe shared model

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix linting of a file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix import in modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* add documentation

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update docstrings

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix docstrings in config class

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* merge main

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

---------

Signed-off-by: Shawn Tan <shawntan@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Shawn Tan <shawntan@ibm.com>
Co-authored-by: Shawn Tan <shawn@wtf.sg>
Co-authored-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Sukriti Sharma <Ssukriti@users.noreply.github.com>
  • Loading branch information
5 people authored and ArthurZucker committed Feb 17, 2025
1 parent bcfc9d7 commit 8018e0e
Show file tree
Hide file tree
Showing 15 changed files with 2,517 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -461,6 +461,8 @@
title: Granite
- local: model_doc/granitemoe
title: GraniteMoe
- local: model_doc/granitemoeshared
title: GraniteMoeShared
- local: model_doc/granitevision
title: GraniteVision
- local: model_doc/helium
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ Flax), PyTorch, and/or TensorFlow.
| [GPTSAN-japanese](model_doc/gptsan-japanese) ||||
| [Granite](model_doc/granite) ||||
| [GraniteMoeMoe](model_doc/granitemoe) ||||
| [GraniteMoeSharedMoe](model_doc/granitemoeshared) ||||
| [Graphormer](model_doc/graphormer) ||||
| [Grounding DINO](model_doc/grounding-dino) ||||
| [GroupViT](model_doc/groupvit) ||||
Expand Down
66 changes: 66 additions & 0 deletions docs/source/en/model_doc/granitemoeshared.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# GraniteMoeShared

## Overview


The GraniteMoe model was proposed in [Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler](https://arxiv.org/abs/2408.13359) by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox and Rameswar Panda.

Additionally this class GraniteMoeSharedModel adds shared experts for Moe.

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "ibm-research/moe-7b-1b-active-shared-experts"
tokenizer = AutoTokenizer.from_pretrained(model_path)

# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
model.eval()

# change input text as desired
prompt = "Write a code to find the maximum value in a list of numbers."

# tokenize the text
input_tokens = tokenizer(prompt, return_tensors="pt")
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
print(i)
```

This HF implementation is contributed by [Mayank Mishra](https://huggingface.co/mayank-mishra), [Shawn Tan](https://huggingface.co/shawntan) and [Sukriti Sharma](https://huggingface.co/SukritiSharma).


## GraniteMoeSharedConfig

[[autodoc]] GraniteMoeSharedConfig

## GraniteMoeSharedModel

[[autodoc]] GraniteMoeSharedModel
- forward

## GraniteMoeSharedForCausalLM

[[autodoc]] GraniteMoeSharedForCausalLM
- forward
2 changes: 2 additions & 0 deletions docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ FlashAttention-2 is currently supported for the following architectures:
* [GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj#transformers.GPTJModel)
* [Granite](https://huggingface.co/docs/transformers/model_doc/granite#transformers.GraniteModel)
* [GraniteMoe](https://huggingface.co/docs/transformers/model_doc/granitemoe#transformers.GraniteMoeModel)
* [GraniteMoeShared](https://huggingface.co/docs/transformers/model_doc/granitemoeshared#transformers.GraniteMoeSharedModel)
* [Idefics2](https://huggingface.co/docs/transformers/model_doc/idefics2#transformers.Idefics2Model)
* [Idefics3](https://huggingface.co/docs/transformers/model_doc/idefics3#transformers.Idefics3Model)
* [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon#transformers.FalconModel)
Expand Down Expand Up @@ -266,6 +267,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [Idefics3](https://huggingface.co/docs/transformers/model_doc/idefics3#transformers.Idefics3Model)
* [I-JEPA](https://huggingface.co/docs/transformers/model_doc/ijepa#transformers.IJepaModel)
* [GraniteMoe](https://huggingface.co/docs/transformers/model_doc/granitemoe#transformers.GraniteMoeModel)
* [GraniteMoeShared](https://huggingface.co/docs/transformers/model_doc/granitemoeshared#transformers.GraniteMoeSharedModel)
* [JetMoe](https://huggingface.co/docs/transformers/model_doc/jetmoe#transformers.JetMoeModel)
* [Jamba](https://huggingface.co/docs/transformers/model_doc/jamba#transformers.JambaModel)
* [Llama](https://huggingface.co/docs/transformers/model_doc/llama#transformers.LlamaModel)
Expand Down
15 changes: 15 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -496,6 +496,7 @@
"models.gptj": ["GPTJConfig"],
"models.granite": ["GraniteConfig"],
"models.granitemoe": ["GraniteMoeConfig"],
"models.granitemoeshared": ["GraniteMoeSharedConfig"],
"models.grounding_dino": [
"GroundingDinoConfig",
"GroundingDinoProcessor",
Expand Down Expand Up @@ -2539,6 +2540,14 @@
"GraniteMoePreTrainedModel",
]
)
_import_structure["models.granitemoeshared"].extend(
[
"GraniteMoeSharedForCausalLM",
"GraniteMoeSharedModel",
"GraniteMoeSharedPreTrainedModel",
]
)

_import_structure["models.grounding_dino"].extend(
[
"GroundingDinoForObjectDetection",
Expand Down Expand Up @@ -5605,6 +5614,7 @@
from .models.gptj import GPTJConfig
from .models.granite import GraniteConfig
from .models.granitemoe import GraniteMoeConfig
from .models.granitemoeshared import GraniteMoeSharedConfig
from .models.grounding_dino import (
GroundingDinoConfig,
GroundingDinoProcessor,
Expand Down Expand Up @@ -7479,6 +7489,11 @@
GraniteMoeModel,
GraniteMoePreTrainedModel,
)
from .models.granitemoeshared import (
GraniteMoeSharedForCausalLM,
GraniteMoeSharedModel,
GraniteMoeSharedPreTrainedModel,
)
from .models.grounding_dino import (
GroundingDinoForObjectDetection,
GroundingDinoModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@
gptj,
granite,
granitemoe,
granitemoeshared,
grounding_dino,
groupvit,
helium,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@
("gptsan-japanese", "GPTSanJapaneseConfig"),
("granite", "GraniteConfig"),
("granitemoe", "GraniteMoeConfig"),
("granitemoeshared", "GraniteMoeSharedConfig"),
("granitevision", "LlavaNextConfig"),
("graphormer", "GraphormerConfig"),
("grounding-dino", "GroundingDinoConfig"),
Expand Down Expand Up @@ -467,6 +468,7 @@
("gptsan-japanese", "GPTSAN-japanese"),
("granite", "Granite"),
("granitemoe", "GraniteMoeMoe"),
("granitemoeshared", "GraniteMoeSharedMoe"),
("granitevision", "LLaVA-NeXT"),
("graphormer", "Graphormer"),
("grounding-dino", "Grounding DINO"),
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@
("gptsan-japanese", "GPTSanJapaneseForConditionalGeneration"),
("granite", "GraniteModel"),
("granitemoe", "GraniteMoeModel"),
("granitemoeshared", "GraniteMoeSharedModel"),
("graphormer", "GraphormerModel"),
("grounding-dino", "GroundingDinoModel"),
("groupvit", "GroupViTModel"),
Expand Down Expand Up @@ -526,6 +527,7 @@
("gptj", "GPTJForCausalLM"),
("granite", "GraniteForCausalLM"),
("granitemoe", "GraniteMoeForCausalLM"),
("granitemoeshared", "GraniteMoeSharedForCausalLM"),
("helium", "HeliumForCausalLM"),
("jamba", "JambaForCausalLM"),
("jetmoe", "JetMoeForCausalLM"),
Expand Down
27 changes: 27 additions & 0 deletions src/transformers/models/granitemoeshared/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import _LazyModule
from ...utils.import_utils import define_import_structure


if TYPE_CHECKING:
from .configuration_granitemoeshared import *
from .modeling_granitemoeshared import *
else:
import sys

_file = globals()["__file__"]
sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
Loading

0 comments on commit 8018e0e

Please sign in to comment.