Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DBRX Model #29921

Merged
merged 137 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
137 commits
Select commit Hold shift + click to select a range
7042915
wip
abhi-databricks Mar 27, 2024
c7dda8c
fix __init__.py
abhi-databricks Mar 27, 2024
18495d0
add docs
abhi-databricks Mar 27, 2024
292836b
Apply suggestions from code review
abhi-mosaic Mar 28, 2024
a27c69a
address comments 1
abhi-databricks Mar 28, 2024
5417623
work on make fixup
abhi-databricks Mar 28, 2024
46b45c1
pass configs down
abhi-databricks Mar 28, 2024
76c2e9c
add sdpa attention
abhi-databricks Mar 28, 2024
4e74661
remove DbrxBlock
Mar 29, 2024
120df40
add to configuration_auto
Mar 29, 2024
56d841e
docstring now passes formatting test
Mar 29, 2024
450ae2d
fix style
Mar 29, 2024
cec7356
update READMEs
Mar 29, 2024
b5d4a6e
add dbrx to modeling_auto
Mar 29, 2024
3d9fd16
make fix-copies generated this
Mar 29, 2024
2bff6b9
add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
Mar 29, 2024
ea940a6
config docstring passes formatting test
Mar 29, 2024
990f196
rename moe_loss_weight to router_aux_loss_coef
Mar 29, 2024
4a6f47a
add to flash-attn documentation
Mar 29, 2024
9268388
fix model-path in tests
Mar 29, 2024
54d98a4
Explicitly make `"suli"` the default `ffn_act_fn`
eitanturok Mar 31, 2024
370f578
default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
Mar 31, 2024
7aba29f
fix _flash_attn_uses_top_left_mask and is_causal
Mar 31, 2024
9475675
fix tests path
Mar 31, 2024
3450fd1
don't use token type IDs
Mar 31, 2024
46c9547
follow Llama and remove token_type_ids from test
Mar 31, 2024
0ed3675
init ConfigTester differently so tests pass
Mar 31, 2024
a08b27d
remove multiple choice test
Mar 31, 2024
c98d9f2
remove question + answer test
Mar 31, 2024
598c9a0
remove sequence classification test
Mar 31, 2024
c73c590
remove token classification test
Mar 31, 2024
e58f1b2
copy Llama tests and remove token_type_ids from test inputs
Mar 31, 2024
32ceb87
do not test pruning or headmasking; style code
Mar 31, 2024
daabaec
add _tied_weights_keys parameter to pass test
Mar 31, 2024
dabcca0
add type hints
Mar 31, 2024
191ec1e
fix type check
Mar 31, 2024
3dad3bd
update config tester
Mar 31, 2024
5c837c9
remove masked_lm test
Mar 31, 2024
58a4f15
remove encoder tests
Mar 31, 2024
60662cb
initialize DbrxModelTester with correct params
Apr 1, 2024
e829922
style
Apr 1, 2024
7cca86a
torch_dtype does not rely on torch
Apr 1, 2024
1e21729
run make fixup, fix-copies
Apr 1, 2024
7e4b7fd
use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_d…
Apr 1, 2024
4d9da54
add copyright info
Apr 1, 2024
9b8f912
fix imports and DbrxRotaryEmbedding
Apr 1, 2024
3692a90
update DbrxModel docstring
Apr 1, 2024
f050499
use copies
Apr 1, 2024
a075df2
change model path in docstring
Apr 1, 2024
1dc3073
use config in DbrxFFN
Apr 1, 2024
7df8369
fix flashattention2, sdpaattention
Apr 1, 2024
aa8c55d
input config to DbrXAttention, DbrxNormAttentionNorm
Apr 1, 2024
4b01cdc
more fixes
Apr 1, 2024
1c5816e
fix
Apr 1, 2024
09f601e
fix again!
Apr 1, 2024
5a52bb9
add informative comment
Apr 1, 2024
cc6e5d8
fix ruff?
Apr 1, 2024
4c5e127
remove print statement + style
Apr 1, 2024
0f562aa
change doc-test
Apr 1, 2024
62a512e
fix doc-test
Apr 1, 2024
aae8045
fix docstring
Apr 1, 2024
c3870bc
delete commented out text
Apr 1, 2024
efd10b8
make defaults match dbrx-instruct
Apr 1, 2024
ea836a8
replace `router_aux_loss_coef` with `moe_loss_weight`
Apr 1, 2024
c46e06b
is_decoder=True
Apr 1, 2024
aab6fd6
remove is_decoder from configtester
Apr 1, 2024
179834b
implement sdpa properly
Apr 1, 2024
f053b7b
make is_decoder pass tests
Apr 1, 2024
cdea470
start on the GenerationTesterMixin tests
Apr 1, 2024
b7dafdd
add dbrx to sdpa documentation
Apr 1, 2024
351bff2
skip weight typing test
Apr 1, 2024
fca26d4
style
Apr 1, 2024
cfef3ec
initialize smaller model
eitanturok Apr 2, 2024
d0f7bef
Add DBRX to toctree
Rocketknight1 Apr 2, 2024
99dcef7
skip test_new_cache_format
Apr 2, 2024
fb5ed67
make config defaults smaller again
Apr 2, 2024
24b28b5
add pad_token_id
Apr 2, 2024
f57e672
remove pad_token_id from config
Apr 2, 2024
6b6655d
Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
Rocketknight1 Apr 3, 2024
88f350f
Update src/transformers/models/dbrx/__init__.py
eitanturok Apr 3, 2024
a6c21eb
Update src/transformers/models/dbrx/modeling_dbrx.py
eitanturok Apr 3, 2024
a91e45f
Update docs/source/en/model_doc/dbrx.md
eitanturok Apr 3, 2024
5397d42
Update src/transformers/models/dbrx/configuration_dbrx.py
eitanturok Apr 4, 2024
ea571d1
Update docs/source/en/model_doc/dbrx.md
eitanturok Apr 4, 2024
2e12f34
fix typo
Apr 10, 2024
a5bebcb
Apply suggestions from code review
abhi-mosaic Apr 10, 2024
331db58
update docs, fix configuration_auto.py
abhi-databricks Apr 10, 2024
ce758df
address pr comments
abhi-databricks Apr 10, 2024
8df05d9
remove is_decoder flag
Apr 11, 2024
0c80857
Merge branch 'main' into dbrx
abhi-databricks Apr 11, 2024
c89f1a1
slice
Apr 11, 2024
dbd8b14
fix requires grad
Apr 11, 2024
a7ee563
remove grad
Apr 12, 2024
2e3bd86
disconnect differently
Apr 12, 2024
826947d
remove grad
Apr 12, 2024
35aca3a
enable grads
Apr 12, 2024
7ffb9f8
patch
Apr 13, 2024
a8237bd
detach expert
Apr 13, 2024
99eba88
nissan al ghaib
Apr 14, 2024
2e774b5
Merge branch 'dbrx' into mvpatel2000/dbrx-chunk
mvpatel2000 Apr 14, 2024
ab9d85f
Update modeling_dbrx.py
mvpatel2000 Apr 15, 2024
8c320f1
Merge pull request #4 from mvpatel2000/mvpatel2000/dbrx-chunk
mvpatel2000 Apr 15, 2024
43976c0
Update src/transformers/models/dbrx/modeling_dbrx.py
eitanturok Apr 15, 2024
2980330
replace "Gemma" with "Dbrx"
Apr 15, 2024
8e28942
remove # type: ignore
Apr 15, 2024
b265e23
don't hardcode vocab_size
Apr 15, 2024
2ab56d3
remove ToDo
Apr 15, 2024
8ea3258
Merge branch 'main' into dbrx
eitanturok Apr 15, 2024
dc30f2c
Re-add removed idefics2 line
Rocketknight1 Apr 16, 2024
3771843
Update test to use tiny-random!
Rocketknight1 Apr 17, 2024
661bf9d
Remove TODO
Rocketknight1 Apr 17, 2024
8dd5de7
Remove one more case of loading the entire dbrx-instruct in the tests
Rocketknight1 Apr 17, 2024
12cd8c8
Update src/transformers/models/dbrx/modeling_dbrx.py
eitanturok Apr 18, 2024
834801f
address some comments
Apr 18, 2024
e281916
small model
Apr 18, 2024
63b3db8
add dbrx to tokenization_auto
Apr 18, 2024
236d815
More docstrings with add_start_docstrings
Apr 18, 2024
2dc5445
Dbrx for now
Apr 18, 2024
745dc47
add PipelineTesterMixin
Apr 18, 2024
d115cb4
Update src/transformers/models/dbrx/configuration_dbrx.py
eitanturok Apr 18, 2024
15fb1eb
remove flash-attn2 import error
Apr 18, 2024
7c3cc3b
Merge branch 'dbrx' of https://github.com/abhi-mosaic/transformers in…
Apr 18, 2024
29c3e4d
fix docstring
eitanturok Apr 18, 2024
9608197
add useage example
Apr 18, 2024
93920d0
put on one line
eitanturok Apr 18, 2024
cad0b9d
fix ffn_act_fn
eitanturok Apr 18, 2024
49bcacc
change "dbrx" to "DBRX" for display purposes.
Apr 18, 2024
9e26850
fix __init__.py?
eitanturok Apr 18, 2024
d714986
fix __init__.py
eitanturok Apr 18, 2024
cac26a1
fix README
Apr 18, 2024
fe12d2a
return the aux_loss
Apr 18, 2024
58c8342
remove extra spaces
Apr 18, 2024
d04c870
fix configuration_auto.py
eitanturok Apr 18, 2024
22804bf
fix format in tokenization_auto
eitanturok Apr 18, 2024
95b327f
remove new line
Apr 18, 2024
c6cbbda
add more useage examples
Apr 18, 2024
8ee48c9
Merge branch 'main' into dbrx
eitanturok Apr 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DBRX](https://huggingface.co/docs/transformers/main/model_doc/dbrx)** (from Databricks) released with the paper [Introducing DBRX: A New State-of-the-Art Open LLM](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) by the Mosaic Research Team.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
Expand Down
1 change: 1 addition & 0 deletions README_de.md
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,7 @@ Aktuelle Anzahl der Checkpoints: ![](https://img.shields.io/endpoint?url=https:/
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DBRX](https://huggingface.co/docs/transformers/main/model_doc/dbrx)** (from Databricks) released with the paper [Introducing DBRX: A New State-of-the-Art Open LLM](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) by the Mosaic Research Team.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
Expand Down
3 changes: 2 additions & 1 deletion README_es.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DBRX](https://huggingface.co/docs/transformers/main/model_doc/dbrx)** (from Databricks) released with the paper [Introducing DBRX: A New State-of-the-Art Open LLM](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) by the Mosaic Research Team.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
Expand Down Expand Up @@ -477,7 +478,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[StableLm](https://huggingface.co/docs/transformers/model_doc/stablelm)** (from Stability AI) released with the paper [StableLM 3B 4E1T (Technical Report)](https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo) by Jonathan Tow, Marco Bellagente, Dakota Mahan, Carlos Riquelme Ruiz, Duy Phung, Maksym Zhuravinskyi, Nathan Cooper, Nikhil Pinnaparaju, Reshinth Adithyan, and James Baicoianu.
1. **[Starcoder2](https://huggingface.co/docs/transformers/model_doc/starcoder2)** (from BigCode team) released with a coming soon paper.
1. **[Starcoder2](https://huggingface.co/docs/transformers/model_doc/starcoder2)** (from BigCode team) released with the paper [StarCoder 2 and The Stack v2: The Next Generation](https://arxiv.org/abs/2402.19173) by Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de Vries.
1. **[SuperPoint](https://huggingface.co/docs/transformers/model_doc/superpoint)** (from MagicLeap) released with the paper [SuperPoint: Self-Supervised Interest Point Detection and Description](https://arxiv.org/abs/1712.07629) by Daniel DeTone, Tomasz Malisiewicz and Andrew Rabinovich.
1. **[SwiftFormer](https://huggingface.co/docs/transformers/model_doc/swiftformer)** (from MBZUAI) released with the paper [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
Expand Down
1 change: 1 addition & 0 deletions README_fr.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,7 @@ Nombre actuel de points de contrôle : ![](https://img.shields.io/endpoint?url=h
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (de Salesforce) publié dans l'article [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) par Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong et Richard Socher.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (de Microsoft) publié dans l'article [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) par Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (de Facebook) publié dans l'article [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) par Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DBRX](https://huggingface.co/docs/transformers/main/model_doc/dbrx)** (from Databricks) released with the paper [Introducing DBRX: A New State-of-the-Art Open LLM](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) by the Mosaic Research Team.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (de Microsoft) publié dans l'article [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) par Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (de Microsoft) publié dans l'article [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) par Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (de Berkeley/Facebook/Google) publié dans l'article [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) par Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
Expand Down
Loading
Loading