-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DBRX Model #29921
Merged
Merged
Add DBRX Model #29921
Changes from 79 commits
Commits
Show all changes
137 commits
Select commit
Hold shift + click to select a range
7042915
wip
abhi-databricks c7dda8c
fix __init__.py
abhi-databricks 18495d0
add docs
abhi-databricks 292836b
Apply suggestions from code review
abhi-mosaic a27c69a
address comments 1
abhi-databricks 5417623
work on make fixup
abhi-databricks 46b45c1
pass configs down
abhi-databricks 76c2e9c
add sdpa attention
abhi-databricks 4e74661
remove DbrxBlock
120df40
add to configuration_auto
56d841e
docstring now passes formatting test
450ae2d
fix style
cec7356
update READMEs
b5d4a6e
add dbrx to modeling_auto
3d9fd16
make fix-copies generated this
2bff6b9
add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
ea940a6
config docstring passes formatting test
990f196
rename moe_loss_weight to router_aux_loss_coef
4a6f47a
add to flash-attn documentation
9268388
fix model-path in tests
54d98a4
Explicitly make `"suli"` the default `ffn_act_fn`
eitanturok 370f578
default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
7aba29f
fix _flash_attn_uses_top_left_mask and is_causal
9475675
fix tests path
3450fd1
don't use token type IDs
46c9547
follow Llama and remove token_type_ids from test
0ed3675
init ConfigTester differently so tests pass
a08b27d
remove multiple choice test
c98d9f2
remove question + answer test
598c9a0
remove sequence classification test
c73c590
remove token classification test
e58f1b2
copy Llama tests and remove token_type_ids from test inputs
32ceb87
do not test pruning or headmasking; style code
daabaec
add _tied_weights_keys parameter to pass test
dabcca0
add type hints
191ec1e
fix type check
3dad3bd
update config tester
5c837c9
remove masked_lm test
58a4f15
remove encoder tests
60662cb
initialize DbrxModelTester with correct params
e829922
style
7cca86a
torch_dtype does not rely on torch
1e21729
run make fixup, fix-copies
7e4b7fd
use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_d…
4d9da54
add copyright info
9b8f912
fix imports and DbrxRotaryEmbedding
3692a90
update DbrxModel docstring
f050499
use copies
a075df2
change model path in docstring
1dc3073
use config in DbrxFFN
7df8369
fix flashattention2, sdpaattention
aa8c55d
input config to DbrXAttention, DbrxNormAttentionNorm
4b01cdc
more fixes
1c5816e
fix
09f601e
fix again!
5a52bb9
add informative comment
cc6e5d8
fix ruff?
4c5e127
remove print statement + style
0f562aa
change doc-test
62a512e
fix doc-test
aae8045
fix docstring
c3870bc
delete commented out text
efd10b8
make defaults match dbrx-instruct
ea836a8
replace `router_aux_loss_coef` with `moe_loss_weight`
c46e06b
is_decoder=True
aab6fd6
remove is_decoder from configtester
179834b
implement sdpa properly
f053b7b
make is_decoder pass tests
cdea470
start on the GenerationTesterMixin tests
b7dafdd
add dbrx to sdpa documentation
351bff2
skip weight typing test
fca26d4
style
cfef3ec
initialize smaller model
eitanturok d0f7bef
Add DBRX to toctree
Rocketknight1 99dcef7
skip test_new_cache_format
fb5ed67
make config defaults smaller again
24b28b5
add pad_token_id
f57e672
remove pad_token_id from config
6b6655d
Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
Rocketknight1 88f350f
Update src/transformers/models/dbrx/__init__.py
eitanturok a6c21eb
Update src/transformers/models/dbrx/modeling_dbrx.py
eitanturok a91e45f
Update docs/source/en/model_doc/dbrx.md
eitanturok 5397d42
Update src/transformers/models/dbrx/configuration_dbrx.py
eitanturok ea571d1
Update docs/source/en/model_doc/dbrx.md
eitanturok 2e12f34
fix typo
a5bebcb
Apply suggestions from code review
abhi-mosaic 331db58
update docs, fix configuration_auto.py
abhi-databricks ce758df
address pr comments
abhi-databricks 8df05d9
remove is_decoder flag
0c80857
Merge branch 'main' into dbrx
abhi-databricks c89f1a1
slice
dbd8b14
fix requires grad
a7ee563
remove grad
2e3bd86
disconnect differently
826947d
remove grad
35aca3a
enable grads
7ffb9f8
patch
a8237bd
detach expert
99eba88
nissan al ghaib
2e774b5
Merge branch 'dbrx' into mvpatel2000/dbrx-chunk
mvpatel2000 ab9d85f
Update modeling_dbrx.py
mvpatel2000 8c320f1
Merge pull request #4 from mvpatel2000/mvpatel2000/dbrx-chunk
mvpatel2000 43976c0
Update src/transformers/models/dbrx/modeling_dbrx.py
eitanturok 2980330
replace "Gemma" with "Dbrx"
8e28942
remove # type: ignore
b265e23
don't hardcode vocab_size
2ab56d3
remove ToDo
8ea3258
Merge branch 'main' into dbrx
eitanturok dc30f2c
Re-add removed idefics2 line
Rocketknight1 3771843
Update test to use tiny-random!
Rocketknight1 661bf9d
Remove TODO
Rocketknight1 8dd5de7
Remove one more case of loading the entire dbrx-instruct in the tests
Rocketknight1 12cd8c8
Update src/transformers/models/dbrx/modeling_dbrx.py
eitanturok 834801f
address some comments
e281916
small model
63b3db8
add dbrx to tokenization_auto
236d815
More docstrings with add_start_docstrings
2dc5445
Dbrx for now
745dc47
add PipelineTesterMixin
d115cb4
Update src/transformers/models/dbrx/configuration_dbrx.py
eitanturok 15fb1eb
remove flash-attn2 import error
7c3cc3b
Merge branch 'dbrx' of https://github.com/abhi-mosaic/transformers in…
29c3e4d
fix docstring
eitanturok 9608197
add useage example
93920d0
put on one line
eitanturok cad0b9d
fix ffn_act_fn
eitanturok 49bcacc
change "dbrx" to "DBRX" for display purposes.
9e26850
fix __init__.py?
eitanturok d714986
fix __init__.py
eitanturok cac26a1
fix README
fe12d2a
return the aux_loss
58c8342
remove extra spaces
d04c870
fix configuration_auto.py
eitanturok 22804bf
fix format in tokenization_auto
eitanturok 95b327f
remove new line
c6cbbda
add more useage examples
8ee48c9
Merge branch 'main' into dbrx
eitanturok File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the model in the paper is named
DBRX
- it should be capitalized here too. The camel casing is for the model classesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I fixed this.