-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models) #7461
Conversation
Tested with https://huggingface.co/EleutherAI/pythia-1.4b/tree/main Seems to work. PPL on wiki.test is ./perplexity -m models/pythia-1b/ggml-model-f16.gguf -f build/wikitext-2-raw/wiki.test.raw I guess it's normal for 1.4B model that is 1 year old. Thanks for implementing this |
It seems that the perplexity is a little higher compared to the HF transformers implementation because there are differences in tokenization output between llama.cpp and GPTNeoXTokenizerFast. |
The tokenization differences on diff ./build/wikitext-2-raw/wiki.test.raw.tok ./build/wikitext-2-raw/wiki.test.raw.tokcpp
245413,245414c245413,245414
< 50276
< 6285
---
> 209
> 20589
245440,245441c245440,245441
< 50276
< 6285
---
> 209
> 20589
246660,246661c246660,246661
< 50276
< 6285
---
> 209
> 20589
246687,246688c246687,246688
< 50276
< 6285
---
> 209
> 20589 Likely the perplexity computation used in the HF transformers differs from For Pythia 2.8b I get PPL |
Feel free to merge this when ready - I think it works |
…EOX - didn't notice it was already present
Thank you for this, @fairydreaming! I have wanted it for so long! |
…NeoX base models) (ggerganov#7461) * convert-hf : add conversion of bloom-style qkv tensor to gpt-style qkv (code borrowed from BloomModel) * llama : add inference support for LLM_ARCH_GPTNEOX * llama : add model types for every Pythia variant and GPT-NeoX Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
This pull request adds missing pieces to support inference for GPT-NeoX-based models like the GPT-NeoX and the Pythia family. Fixes #742. It also adds model types for all Pythia model sizes.
Added use_par_res hparams field corresponds to the use_parallel_residual parameter from config.json.