Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DBRX Model #29921

Merged
merged 137 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
137 commits
Select commit Hold shift + click to select a range
7042915
wip
abhi-databricks Mar 27, 2024
c7dda8c
fix __init__.py
abhi-databricks Mar 27, 2024
18495d0
add docs
abhi-databricks Mar 27, 2024
292836b
Apply suggestions from code review
abhi-mosaic Mar 28, 2024
a27c69a
address comments 1
abhi-databricks Mar 28, 2024
5417623
work on make fixup
abhi-databricks Mar 28, 2024
46b45c1
pass configs down
abhi-databricks Mar 28, 2024
76c2e9c
add sdpa attention
abhi-databricks Mar 28, 2024
4e74661
remove DbrxBlock
Mar 29, 2024
120df40
add to configuration_auto
Mar 29, 2024
56d841e
docstring now passes formatting test
Mar 29, 2024
450ae2d
fix style
Mar 29, 2024
cec7356
update READMEs
Mar 29, 2024
b5d4a6e
add dbrx to modeling_auto
Mar 29, 2024
3d9fd16
make fix-copies generated this
Mar 29, 2024
2bff6b9
add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
Mar 29, 2024
ea940a6
config docstring passes formatting test
Mar 29, 2024
990f196
rename moe_loss_weight to router_aux_loss_coef
Mar 29, 2024
4a6f47a
add to flash-attn documentation
Mar 29, 2024
9268388
fix model-path in tests
Mar 29, 2024
54d98a4
Explicitly make `"suli"` the default `ffn_act_fn`
eitanturok Mar 31, 2024
370f578
default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
Mar 31, 2024
7aba29f
fix _flash_attn_uses_top_left_mask and is_causal
Mar 31, 2024
9475675
fix tests path
Mar 31, 2024
3450fd1
don't use token type IDs
Mar 31, 2024
46c9547
follow Llama and remove token_type_ids from test
Mar 31, 2024
0ed3675
init ConfigTester differently so tests pass
Mar 31, 2024
a08b27d
remove multiple choice test
Mar 31, 2024
c98d9f2
remove question + answer test
Mar 31, 2024
598c9a0
remove sequence classification test
Mar 31, 2024
c73c590
remove token classification test
Mar 31, 2024
e58f1b2
copy Llama tests and remove token_type_ids from test inputs
Mar 31, 2024
32ceb87
do not test pruning or headmasking; style code
Mar 31, 2024
daabaec
add _tied_weights_keys parameter to pass test
Mar 31, 2024
dabcca0
add type hints
Mar 31, 2024
191ec1e
fix type check
Mar 31, 2024
3dad3bd
update config tester
Mar 31, 2024
5c837c9
remove masked_lm test
Mar 31, 2024
58a4f15
remove encoder tests
Mar 31, 2024
60662cb
initialize DbrxModelTester with correct params
Apr 1, 2024
e829922
style
Apr 1, 2024
7cca86a
torch_dtype does not rely on torch
Apr 1, 2024
1e21729
run make fixup, fix-copies
Apr 1, 2024
7e4b7fd
use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_d…
Apr 1, 2024
4d9da54
add copyright info
Apr 1, 2024
9b8f912
fix imports and DbrxRotaryEmbedding
Apr 1, 2024
3692a90
update DbrxModel docstring
Apr 1, 2024
f050499
use copies
Apr 1, 2024
a075df2
change model path in docstring
Apr 1, 2024
1dc3073
use config in DbrxFFN
Apr 1, 2024
7df8369
fix flashattention2, sdpaattention
Apr 1, 2024
aa8c55d
input config to DbrXAttention, DbrxNormAttentionNorm
Apr 1, 2024
4b01cdc
more fixes
Apr 1, 2024
1c5816e
fix
Apr 1, 2024
09f601e
fix again!
Apr 1, 2024
5a52bb9
add informative comment
Apr 1, 2024
cc6e5d8
fix ruff?
Apr 1, 2024
4c5e127
remove print statement + style
Apr 1, 2024
0f562aa
change doc-test
Apr 1, 2024
62a512e
fix doc-test
Apr 1, 2024
aae8045
fix docstring
Apr 1, 2024
c3870bc
delete commented out text
Apr 1, 2024
efd10b8
make defaults match dbrx-instruct
Apr 1, 2024
ea836a8
replace `router_aux_loss_coef` with `moe_loss_weight`
Apr 1, 2024
c46e06b
is_decoder=True
Apr 1, 2024
aab6fd6
remove is_decoder from configtester
Apr 1, 2024
179834b
implement sdpa properly
Apr 1, 2024
f053b7b
make is_decoder pass tests
Apr 1, 2024
cdea470
start on the GenerationTesterMixin tests
Apr 1, 2024
b7dafdd
add dbrx to sdpa documentation
Apr 1, 2024
351bff2
skip weight typing test
Apr 1, 2024
fca26d4
style
Apr 1, 2024
cfef3ec
initialize smaller model
eitanturok Apr 2, 2024
d0f7bef
Add DBRX to toctree
Rocketknight1 Apr 2, 2024
99dcef7
skip test_new_cache_format
Apr 2, 2024
fb5ed67
make config defaults smaller again
Apr 2, 2024
24b28b5
add pad_token_id
Apr 2, 2024
f57e672
remove pad_token_id from config
Apr 2, 2024
6b6655d
Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
Rocketknight1 Apr 3, 2024
88f350f
Update src/transformers/models/dbrx/__init__.py
eitanturok Apr 3, 2024
a6c21eb
Update src/transformers/models/dbrx/modeling_dbrx.py
eitanturok Apr 3, 2024
a91e45f
Update docs/source/en/model_doc/dbrx.md
eitanturok Apr 3, 2024
5397d42
Update src/transformers/models/dbrx/configuration_dbrx.py
eitanturok Apr 4, 2024
ea571d1
Update docs/source/en/model_doc/dbrx.md
eitanturok Apr 4, 2024
2e12f34
fix typo
Apr 10, 2024
a5bebcb
Apply suggestions from code review
abhi-mosaic Apr 10, 2024
331db58
update docs, fix configuration_auto.py
abhi-databricks Apr 10, 2024
ce758df
address pr comments
abhi-databricks Apr 10, 2024
8df05d9
remove is_decoder flag
Apr 11, 2024
0c80857
Merge branch 'main' into dbrx
abhi-databricks Apr 11, 2024
c89f1a1
slice
Apr 11, 2024
dbd8b14
fix requires grad
Apr 11, 2024
a7ee563
remove grad
Apr 12, 2024
2e3bd86
disconnect differently
Apr 12, 2024
826947d
remove grad
Apr 12, 2024
35aca3a
enable grads
Apr 12, 2024
7ffb9f8
patch
Apr 13, 2024
a8237bd
detach expert
Apr 13, 2024
99eba88
nissan al ghaib
Apr 14, 2024
2e774b5
Merge branch 'dbrx' into mvpatel2000/dbrx-chunk
mvpatel2000 Apr 14, 2024
ab9d85f
Update modeling_dbrx.py
mvpatel2000 Apr 15, 2024
8c320f1
Merge pull request #4 from mvpatel2000/mvpatel2000/dbrx-chunk
mvpatel2000 Apr 15, 2024
43976c0
Update src/transformers/models/dbrx/modeling_dbrx.py
eitanturok Apr 15, 2024
2980330
replace "Gemma" with "Dbrx"
Apr 15, 2024
8e28942
remove # type: ignore
Apr 15, 2024
b265e23
don't hardcode vocab_size
Apr 15, 2024
2ab56d3
remove ToDo
Apr 15, 2024
8ea3258
Merge branch 'main' into dbrx
eitanturok Apr 15, 2024
dc30f2c
Re-add removed idefics2 line
Rocketknight1 Apr 16, 2024
3771843
Update test to use tiny-random!
Rocketknight1 Apr 17, 2024
661bf9d
Remove TODO
Rocketknight1 Apr 17, 2024
8dd5de7
Remove one more case of loading the entire dbrx-instruct in the tests
Rocketknight1 Apr 17, 2024
12cd8c8
Update src/transformers/models/dbrx/modeling_dbrx.py
eitanturok Apr 18, 2024
834801f
address some comments
Apr 18, 2024
e281916
small model
Apr 18, 2024
63b3db8
add dbrx to tokenization_auto
Apr 18, 2024
236d815
More docstrings with add_start_docstrings
Apr 18, 2024
2dc5445
Dbrx for now
Apr 18, 2024
745dc47
add PipelineTesterMixin
Apr 18, 2024
d115cb4
Update src/transformers/models/dbrx/configuration_dbrx.py
eitanturok Apr 18, 2024
15fb1eb
remove flash-attn2 import error
Apr 18, 2024
7c3cc3b
Merge branch 'dbrx' of https://github.com/abhi-mosaic/transformers in…
Apr 18, 2024
29c3e4d
fix docstring
eitanturok Apr 18, 2024
9608197
add useage example
Apr 18, 2024
93920d0
put on one line
eitanturok Apr 18, 2024
cad0b9d
fix ffn_act_fn
eitanturok Apr 18, 2024
49bcacc
change "dbrx" to "DBRX" for display purposes.
Apr 18, 2024
9e26850
fix __init__.py?
eitanturok Apr 18, 2024
d714986
fix __init__.py
eitanturok Apr 18, 2024
cac26a1
fix README
Apr 18, 2024
fe12d2a
return the aux_loss
Apr 18, 2024
58c8342
remove extra spaces
Apr 18, 2024
d04c870
fix configuration_auto.py
eitanturok Apr 18, 2024
22804bf
fix format in tokenization_auto
eitanturok Apr 18, 2024
95b327f
remove new line
Apr 18, 2024
c6cbbda
add more useage examples
Apr 18, 2024
8ee48c9
Merge branch 'main' into dbrx
eitanturok Apr 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions docs/source/en/model_doc/dbrx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# DBRX

## Overview

DBRX is a [transformer-based](https://www.isattentionallyouneed.com/) decoder-only large language model (LLM) that was trained using next-token prediction.
It uses a *fine-grained* mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input.
It was pre-trained on 12T tokens of text and code data.
Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral-8x7B and Grok-1 have 8 experts and choose 2.
This provides 65x more possible combinations of experts and we found that this improves model quality.
DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA).
It uses the GPT-4 tokenizer as described in the [tiktoken](https://github.com/openai/tiktoken) repository.
We made these choices based on exhaustive evaluation and scaling experiments.

DBRX was pretrained on 12T tokens of carefully curated data and a maximum context length of 32K tokens.
We estimate that this data is at least 2x better token-for-token than the data we used to pretrain the MPT family of models.
This new dataset was developed using the full suite of Databricks tools, including Apache Spark™ and Databricks notebooks for data processing, and Unity Catalog for data management and governance.
We used curriculum learning for pretraining, changing the data mix during training in ways we found to substantially improve model quality.


More detailed information about DBRX Instruct and DBRX Base can be found in our [technical blog post](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm).


This model was contributed by [abhi-db](<https://huggingface.co/abhi-db). The original code can be found [here](https://github.com/databricks/dbrx).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the model can use SDPA and flash attention, this should be added to the model page alongside a graph of expected speedups e.g. like here

## DbrxConfig

[[autodoc]] DbrxConfig


## DbrxModel

[[autodoc]] DbrxModel
- forward


## DbrxForCausalLM

[[autodoc]] DbrxForCausalLM
- forward

17 changes: 17 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@
],
"models": [],
# Models
"models.dbrx": ["DbrxConfig"],
"models.albert": ["ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "AlbertConfig"],
"models.align": [
"ALIGN_PRETRAINED_CONFIG_ARCHIVE_MAP",
Expand Down Expand Up @@ -1442,6 +1443,15 @@

# PyTorch models structure

_import_structure["models.dbrx"].extend(
[
"DbrxForCausalLM",
"DbrxBlock",
"DbrxModel",
"DbrxPreTrainedModel",
]
)

_import_structure["models.albert"].extend(
[
"ALBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand Down Expand Up @@ -7786,6 +7796,13 @@
)

# PyTorch model imports

from .models.dbrx import (
DbrxForCausalLM,
DbrxBlock,
DbrxModel,
DbrxPreTrainedModel,
)
from .models.seamless_m4t import (
SEAMLESS_M4T_PRETRAINED_MODEL_ARCHIVE_LIST,
SeamlessM4TCodeHifiGan,
Expand Down
65 changes: 65 additions & 0 deletions src/transformers/models/dbrx/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Copyright 2020 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import _LazyModule, OptionalDependencyNotAvailable
from ...utils import is_torch_available




_import_structure = {
"configuration_dbrx": ["DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP", "DbrxConfig"],
}

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_dbrx"] = [
"DBRX_PRETRAINED_MODEL_ARCHIVE_LIST",
"DbrxForCausalLM",
"DbrxBlock",
"DbrxModel",
"DbrxPreTrainedModel",
]




if TYPE_CHECKING:
from .configuration_dbrx import DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP, DbrxConfig

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_dbrx import (
DBRX_PRETRAINED_MODEL_ARCHIVE_LIST,
DbrxForCausalLM,
DbrxBlock,
DbrxModel,
DbrxPreTrainedModel,
)



else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
Loading