[WIP] Add support for Mistral-Nemo by supporting head_dim through config #2254

shaltielshmid · 2024-07-19T10:05:16Z

What does this PR do?

Added support for Mistral-Nemo by allowing the head dimension to be specified in the configuration file. The changes included here mimic the behavior done in the transformers package by Mistral here.

As discussed in the issue (#2252) with @ErikKaum - this PR allows the model to be loaded successfully and launched, but when sending a request I'm getting gibberish output. I tested running the same prompt with installing transformers from source, and the results were great (even with default temperature). I'm not familiar with how TGI works internally, so would appreciate any advice.

How to replicate:

Launch the TGI:

git checkout add-mistral-nemo
docker build . -t tgi
docker run --gpus all --shm-size 1g -e HF_TOKEN=<hf_token> -p 8080:80 -v $(pwd)/data:/data tgi --model-id mistralai/Mistral-Nemo-Base-2407

And then pinging it:

curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'

The output I'm getting is:

{"generated_text":" Deep zweigeschossy Bildholespy skulpt house Schriftstellerin\n\n في Bildholespy skulpt house Schriftstellerin\n\n في Bildh"}

Fixes # (issue)

#2252: Add support for Mistral-Nemo

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ErikKaum

ErikKaum · 2024-07-19T15:08:07Z

Thank you @shaltielshmid for a nice initiative 🙌

This might also be a good PR to look at from transformers

shaltielshmid · 2024-07-19T15:20:34Z

Thanks @ErikKaum - I see that the changes done in the PR you mentioned is already done in TGI here.

Narsil · 2024-07-23T09:31:06Z

@shaltielshmid thanks for the PR.

I created another smaller PR (other fixes have happened more generally regarding head_dim in other places.
#2283

Happy to take your if you take it out of draft and use the same code (we don't want to pass head_dim, as it can safely be inferred from the config, and we have to handle the case where it doesn't exist because old configs do not define head_dim.

shaltielshmid · 2024-07-23T09:46:39Z

Done. Took it out of draft - feel free to take over the PR.
I will note one change that I kept: In a recent commit you added a check if hasattr(config, 'head_dim') - which I removed, since I define the property in the config constructor and therefore it will always exist.
But feel free to undo it or just take your PR - either way. Do you know if TGI works now for mistral-nemo?

Narsil · 2024-07-23T12:53:20Z

Your solution would be better if we were still using MistralConfig but we are not. We removed a lot of complexity by relying more on transformers config directly as it simplifies code here quite a bit (with some cost regarding legacy, like the definition of head_dim which is non standard.)

key in mistralConfig (as defined in transformers).

…fig (#2254) * Support passing head_dim through config * Using `head_dim` as a fallback is necessary since it's a non standard key in mistralConfig (as defined in transformers). * Shorter diff. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

…uggingface#2254) * Support passing head_dim through config * Using `head_dim` as a fallback is necessary since it's a non standard key in mistralConfig (as defined in transformers). * Shorter diff. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

Support passing head_dim through config

db7c519

shaltielshmid marked this pull request as ready for review July 23, 2024 09:43

Merge branch 'main' into add-mistral-nemo

f0a5cb6

Narsil added 2 commits July 23, 2024 12:56

Using head_dim as a fallback is necessary since it's a non standard

ab62312

key in mistralConfig (as defined in transformers).

Shorter diff.

32cc60f

Narsil approved these changes Jul 23, 2024

View reviewed changes

Narsil merged commit 3961e32 into huggingface:main Jul 23, 2024
1 of 5 checks passed

Narsil mentioned this pull request Jul 23, 2024

Chore: Fix naming issues regarding head_size, there can only be one. #2283

Closed

5 tasks

ErikKaum mentioned this pull request Jul 29, 2024

Add support for Mistral-Nemo #2252

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add support for Mistral-Nemo by supporting head_dim through config #2254

[WIP] Add support for Mistral-Nemo by supporting head_dim through config #2254

shaltielshmid commented Jul 19, 2024

ErikKaum commented Jul 19, 2024 •

edited

Loading

shaltielshmid commented Jul 19, 2024

Narsil commented Jul 23, 2024

shaltielshmid commented Jul 23, 2024 •

edited

Loading

Narsil commented Jul 23, 2024 •

edited

Loading

[WIP] Add support for Mistral-Nemo by supporting head_dim through config #2254

[WIP] Add support for Mistral-Nemo by supporting head_dim through config #2254

Conversation

shaltielshmid commented Jul 19, 2024

What does this PR do?

Before submitting

Who can review?

ErikKaum commented Jul 19, 2024 • edited Loading

shaltielshmid commented Jul 19, 2024

Narsil commented Jul 23, 2024

shaltielshmid commented Jul 23, 2024 • edited Loading

Narsil commented Jul 23, 2024 • edited Loading

ErikKaum commented Jul 19, 2024 •

edited

Loading

shaltielshmid commented Jul 23, 2024 •

edited

Loading

Narsil commented Jul 23, 2024 •

edited

Loading