-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPT-NeoX has only minimal inference support #3293
Comments
Even if you add dummy scores and token types in the conversion script, it fails here: Line 2288 in bc9d3e3
Was GPT-NeoX ever even implemented in GGUF? |
Yes, example inference code exists here: https://github.com/ggerganov/llama.cpp/blob/master/examples/gptneox-wip/gptneox-main.cpp |
Oh, it has a separate implementation. So I can't currently use it with any third-party software that uses the llama.cpp API. edit: This file is not listed in either of the build scripts. It doesn't seem to have GPU acceleration. It seems like that could be improved. |
Yeah, there's just a poc implementation. We should add it in |
The situation is now that we do have code in this repository to "successfully" convert and quantize a gpt-neo-x model, but no way to run these models. |
I still would like to bring this forward again: |
I believe you can run this using https://github.com/ggerganov/ggml/blob/master/examples/gpt-neox/main.cpp |
Oh, I alredy compiled that example for testing. It seems to expect the old ggml .bin files, which can be created using the convert program in the same example directory. It doesn't run the gguf files that are built using the |
It's much easier to add new arches to |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
I'm still interested in this. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Steps to reproduce:
The text was updated successfully, but these errors were encountered: