Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT-NeoX has only minimal inference support #3293

Closed
cebtenzzre opened this issue Sep 21, 2023 · 12 comments · Fixed by #7461
Closed

GPT-NeoX has only minimal inference support #3293

cebtenzzre opened this issue Sep 21, 2023 · 12 comments · Fixed by #7461
Labels
enhancement New feature or request

Comments

@cebtenzzre
Copy link
Collaborator

Steps to reproduce:

  1. Download https://huggingface.co/EleutherAI/gpt-neox-20b
  2. Convert the model and attempt to use it:
$ TMPDIR=/var/tmp ./convert-gptneox-hf-to-gguf.py gpt-neox-20b 1 --outfile gpt-neox-20b.f16.gguf
$ ./main -m gpt-neox-20b.f16.gguf
<snip>
llama_model_loader: - type  f32:  354 tensors
llama_model_loader: - type  f16:  178 tensors
error loading model: cannot find tokenizer scores in model file

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'gpt-neox-20b.f16.gguf'
main: error: unable to load model
@cebtenzzre
Copy link
Collaborator Author

cebtenzzre commented Sep 21, 2023

Even if you add dummy scores and token types in the conversion script, it fails here:

throw std::runtime_error("unknown architecture");

Was GPT-NeoX ever even implemented in GGUF?

@cebtenzzre cebtenzzre changed the title GPT-NeoX : cannot find tokenizer scores in model file GPT-NeoX has a conversion script but cannot be loaded or used for inference Sep 21, 2023
@cebtenzzre cebtenzzre added the enhancement New feature or request label Sep 21, 2023
@Jacoby1218
Copy link

Was GPT-NeoX ever even implemented in GGUF?

Yes, example inference code exists here: https://github.com/ggerganov/llama.cpp/blob/master/examples/gptneox-wip/gptneox-main.cpp

@cebtenzzre
Copy link
Collaborator Author

cebtenzzre commented Sep 22, 2023

Oh, it has a separate implementation. So I can't currently use it with any third-party software that uses the llama.cpp API.

edit: This file is not listed in either of the build scripts. It doesn't seem to have GPU acceleration. It seems like that could be improved.

@cebtenzzre cebtenzzre changed the title GPT-NeoX has a conversion script but cannot be loaded or used for inference GPT-NeoX has only minimal inference support Sep 22, 2023
@ggerganov
Copy link
Owner

Yeah, there's just a poc implementation. We should add it in llama.cpp eventually

@maddes8cht
Copy link
Contributor

The situation is now that we do have code in this repository to "successfully" convert and quantize a gpt-neo-x model, but no way to run these models.
https://github.com/ggerganov/ggml/tree/master/examples/gpt-neox does have its own convert-script. The conversion here in the convert-hf-to-gguf.py does not seem to have any purpose at all.

@maddes8cht
Copy link
Contributor

I still would like to bring this forward again:
There is code inside the convert-hf-to-gguf.py since its first release in #3838 and before in the seperate convert-gptneox-script to somehow "sucessfully" convert gpt-neo-x models into gguf models. But there is no code whatsoever to run inference on a model that the convert script labels as a gptneox model.

@Galunid
Copy link
Collaborator

Galunid commented Nov 22, 2023

I believe you can run this using https://github.com/ggerganov/ggml/blob/master/examples/gpt-neox/main.cpp
it's just not in llama.cpp. It'll be supported eventually.

@maddes8cht
Copy link
Contributor

maddes8cht commented Nov 22, 2023

Oh, I alredy compiled that example for testing. It seems to expect the old ggml .bin files, which can be created using the convert program in the same example directory. It doesn't run the gguf files that are built using the convet-hf-to-gguf.pyi script. Right now, there is no code that can run these gguf files converted from gpt-neo-x models.

@ggerganov
Copy link
Owner

It's much easier to add new arches to llama.cpp now (I hope) - PRs welcome

@github-actions github-actions bot added the stale label Mar 20, 2024
Copy link
Contributor

github-actions bot commented Apr 3, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@cebtenzzre
Copy link
Collaborator Author

I'm still interested in this.

@cebtenzzre cebtenzzre reopened this Apr 3, 2024
@github-actions github-actions bot added the stale label May 4, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
6 participants