Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py : switch to snake_case #8305

Merged
merged 6 commits into from
Jul 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)

### Hot topics

- **`convert.py` has been deprecated and moved to `examples/convert-legacy-llama.py`, please use `convert-hf-to-gguf.py`** https://github.com/ggerganov/llama.cpp/pull/7430
- **`convert.py` has been deprecated and moved to `examples/convert_legacy_llama.py`, please use `convert_hf_to_gguf.py`** https://github.com/ggerganov/llama.cpp/pull/7430
- Initial Flash-Attention support: https://github.com/ggerganov/llama.cpp/pull/5021
- BPE pre-tokenization support has been added: https://github.com/ggerganov/llama.cpp/pull/6920
- MoE memory layout has been updated - reconvert models for `mmap` support and regenerate `imatrix` https://github.com/ggerganov/llama.cpp/pull/6387
Expand Down Expand Up @@ -636,8 +636,8 @@ Building the program with BLAS support may lead to some performance improvements

To obtain the official LLaMA 2 weights please see the <a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a> section. There is also a large selection of pre-quantized `gguf` models available on Hugging Face.

Note: `convert.py` has been moved to `examples/convert-legacy-llama.py` and shouldn't be used for anything other than `Llama/Llama2/Mistral` models and their derivatives.
It does not support LLaMA 3, you can use `convert-hf-to-gguf.py` with LLaMA 3 downloaded from Hugging Face.
Note: `convert.py` has been moved to `examples/convert_legacy_llama.py` and shouldn't be used for anything other than `Llama/Llama2/Mistral` models and their derivatives.
It does not support LLaMA 3, you can use `convert_hf_to_gguf.py` with LLaMA 3 downloaded from Hugging Face.

```bash
# obtain the official LLaMA model weights and place them in ./models
Expand All @@ -654,7 +654,7 @@ ls ./models
python3 -m pip install -r requirements.txt

# convert the model to ggml FP16 format
python3 convert-hf-to-gguf.py models/mymodel/
python3 convert_hf_to_gguf.py models/mymodel/

# quantize the model to 4-bits (using Q4_K_M method)
./llama-quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-Q4_K_M.gguf Q4_K_M
Expand Down
6 changes: 3 additions & 3 deletions ci/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ function gg_run_open_llama_7b_v2 {
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DGGML_CUDA=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log

python3 ../examples/convert-legacy-llama.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
python3 ../examples/convert_legacy_llama.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf

model_f16="${path_models}/ggml-model-f16.gguf"
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
Expand Down Expand Up @@ -421,7 +421,7 @@ function gg_run_pythia_1_4b {
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log

python3 ../convert-hf-to-gguf.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
python3 ../convert_hf_to_gguf.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf

model_f16="${path_models}/ggml-model-f16.gguf"
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
Expand Down Expand Up @@ -553,7 +553,7 @@ function gg_run_pythia_2_8b {
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DGGML_CUDA=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log

python3 ../convert-hf-to-gguf.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
python3 ../convert_hf_to_gguf.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf

model_f16="${path_models}/ggml-model-f16.gguf"
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
Expand Down
8 changes: 4 additions & 4 deletions convert_hf_to_gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ def get_vocab_base(self) -> tuple[list[str], list[int], str]:

return tokens, toktypes, tokpre

# NOTE: this function is generated by convert-hf-to-gguf-update.py
# NOTE: this function is generated by convert_hf_to_gguf_update.py
# do not modify it manually!
# ref: https://github.com/ggerganov/llama.cpp/pull/6920
# Marker: Start get_vocab_base_pre
Expand All @@ -424,7 +424,7 @@ def get_vocab_base_pre(self, tokenizer) -> str:

res = None

# NOTE: if you get an error here, you need to update the convert-hf-to-gguf-update.py script
# NOTE: if you get an error here, you need to update the convert_hf_to_gguf_update.py script
# or pull the latest version of the model from Huggingface
# don't edit the hashes manually!
if chkhsh == "0ef9807a4087ebef797fc749390439009c3b9eda9ad1a097abbe738f486c01e5":
Expand Down Expand Up @@ -499,9 +499,9 @@ def get_vocab_base_pre(self, tokenizer) -> str:
logger.warning("**************************************************************************************")
logger.warning("** WARNING: The BPE pre-tokenizer was not recognized!")
logger.warning("** There are 2 possible reasons for this:")
logger.warning("** - the model has not been added to convert-hf-to-gguf-update.py yet")
logger.warning("** - the model has not been added to convert_hf_to_gguf_update.py yet")
logger.warning("** - the pre-tokenization config has changed upstream")
logger.warning("** Check your model files and convert-hf-to-gguf-update.py and update them accordingly.")
logger.warning("** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.")
logger.warning("** ref: https://github.com/ggerganov/llama.cpp/pull/6920")
logger.warning("**")
logger.warning(f"** chkhsh: {chkhsh}")
Expand Down
26 changes: 13 additions & 13 deletions convert_hf_to_gguf_update.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# -*- coding: utf-8 -*-

# This script downloads the tokenizer models of the specified models from Huggingface and
# generates the get_vocab_base_pre() function for convert-hf-to-gguf.py
# generates the get_vocab_base_pre() function for convert_hf_to_gguf.py
#
# This is necessary in order to analyze the type of pre-tokenizer used by the model and
# provide the necessary information to llama.cpp via the GGUF header in order to implement
Expand All @@ -15,9 +15,9 @@
# - Add a new model to the "models" list
# - Run the script with your huggingface token:
#
# python3 convert-hf-to-gguf-update.py <huggingface_token>
# python3 convert_hf_to_gguf_update.py <huggingface_token>
#
# - Copy-paste the generated get_vocab_base_pre() function into convert-hf-to-gguf.py
# - Copy-paste the generated get_vocab_base_pre() function into convert_hf_to_gguf.py
# - Update llama.cpp with the new pre-tokenizer if necessary
#
# TODO: generate tokenizer tests for llama.cpp
Expand All @@ -37,7 +37,7 @@
from transformers import AutoTokenizer

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("convert-hf-to-gguf-update")
logger = logging.getLogger("convert_hf_to_gguf_update")
sess = requests.Session()


Expand All @@ -56,10 +56,10 @@ class TOKENIZER_TYPE(IntEnum):
token = sys.argv[1]
if not token.startswith("hf_"):
logger.info("Huggingface token seems invalid")
logger.info("Usage: python convert-hf-to-gguf-update.py <huggingface_token>")
logger.info("Usage: python convert_hf_to_gguf_update.py <huggingface_token>")
sys.exit(1)
else:
logger.info("Usage: python convert-hf-to-gguf-update.py <huggingface_token>")
logger.info("Usage: python convert_hf_to_gguf_update.py <huggingface_token>")
sys.exit(1)

# TODO: add models here, base models preferred
Expand Down Expand Up @@ -134,7 +134,7 @@ def download_model(model):
logger.error(f"Failed to download model {model['name']}. Error: {e}")


# generate the source code for the convert-hf-to-gguf.py:get_vocab_base_pre() function:
# generate the source code for the convert_hf_to_gguf.py:get_vocab_base_pre() function:

src_ifs = ""
for model in models:
Expand Down Expand Up @@ -201,7 +201,7 @@ def get_vocab_base_pre(self, tokenizer) -> str:

res = None

# NOTE: if you get an error here, you need to update the convert-hf-to-gguf-update.py script
# NOTE: if you get an error here, you need to update the convert_hf_to_gguf_update.py script
# or pull the latest version of the model from Huggingface
# don't edit the hashes manually!
{src_ifs}
Expand All @@ -210,9 +210,9 @@ def get_vocab_base_pre(self, tokenizer) -> str:
logger.warning("**************************************************************************************")
logger.warning("** WARNING: The BPE pre-tokenizer was not recognized!")
logger.warning("** There are 2 possible reasons for this:")
logger.warning("** - the model has not been added to convert-hf-to-gguf-update.py yet")
logger.warning("** - the model has not been added to convert_hf_to_gguf_update.py yet")
logger.warning("** - the pre-tokenization config has changed upstream")
logger.warning("** Check your model files and convert-hf-to-gguf-update.py and update them accordingly.")
logger.warning("** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.")
logger.warning("** ref: https://github.com/ggerganov/llama.cpp/pull/6920")
logger.warning("**")
logger.warning(f"** chkhsh: {{chkhsh}}")
Expand All @@ -226,7 +226,7 @@ def get_vocab_base_pre(self, tokenizer) -> str:
return res
"""

convert_py_pth = pathlib.Path("convert-hf-to-gguf.py")
convert_py_pth = pathlib.Path("convert_hf_to_gguf.py")
convert_py = convert_py_pth.read_text(encoding="utf-8")
convert_py = re.sub(
r"(# Marker: Start get_vocab_base_pre)(.+?)( +# Marker: End get_vocab_base_pre)",
Expand All @@ -237,7 +237,7 @@ def get_vocab_base_pre(self, tokenizer) -> str:

convert_py_pth.write_text(convert_py, encoding="utf-8")

logger.info("+++ convert-hf-to-gguf.py was updated")
logger.info("+++ convert_hf_to_gguf.py was updated")

# generate tests for each tokenizer model

Expand Down Expand Up @@ -343,6 +343,6 @@ def get_vocab_base_pre(self, tokenizer) -> str:
for model in models:
name = model["name"]

print(f"python3 convert-hf-to-gguf.py models/tokenizers/{name}/ --outfile models/ggml-vocab-{name}.gguf --vocab-only") # noqa: NP100
print(f"python3 convert_hf_to_gguf.py models/tokenizers/{name}/ --outfile models/ggml-vocab-{name}.gguf --vocab-only") # noqa: NP100

logger.info("\n")
2 changes: 1 addition & 1 deletion docs/HOWTO-add-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Also, it is important to check that the examples and main ggml backends (CUDA, M
### 1. Convert the model to GGUF

This step is done in python with a `convert` script using the [gguf](https://pypi.org/project/gguf/) library.
Depending on the model architecture, you can use either [convert-hf-to-gguf.py](../convert-hf-to-gguf.py) or [examples/convert-legacy-llama.py](../examples/convert-legacy-llama.py) (for `llama/llama2` models in `.pth` format).
Depending on the model architecture, you can use either [convert_hf_to_gguf.py](../convert_hf_to_gguf.py) or [examples/convert_legacy_llama.py](../examples/convert_legacy_llama.py) (for `llama/llama2` models in `.pth` format).

The convert script reads the model configuration, tokenizer, tensor names+data and converts them to GGUF metadata and tensors.

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Usage:
#! ./llama-server -m some-model.gguf &
#! pip install pydantic
#! python json-schema-pydantic-example.py
#! python json_schema_pydantic_example.py

from pydantic import BaseModel, Extra, TypeAdapter
from annotated_types import MinLen
Expand Down
14 changes: 7 additions & 7 deletions examples/llava/MobileVLM-README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,34 +30,34 @@ git clone https://huggingface.co/mtgv/MobileVLM-1.7B
git clone https://huggingface.co/openai/clip-vit-large-patch14-336
```

2. Use `llava-surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:
2. Use `llava_surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:

```sh
python ./examples/llava/llava-surgery.py -m path/to/MobileVLM-1.7B
python ./examples/llava/llava_surgery.py -m path/to/MobileVLM-1.7B
```

3. Use `convert-image-encoder-to-gguf.py` with `--projector-type ldp` (for **V2** please use `--projector-type ldpv2`) to convert the LLaVA image encoder to GGUF:
3. Use `convert_image_encoder_to_gguf.py` with `--projector-type ldp` (for **V2** please use `--projector-type ldpv2`) to convert the LLaVA image encoder to GGUF:

```sh
python ./examples/llava/convert-image-encoder-to-gguf \
python ./examples/llava/convert_image_encoder_to_gguf \
-m path/to/clip-vit-large-patch14-336 \
--llava-projector path/to/MobileVLM-1.7B/llava.projector \
--output-dir path/to/MobileVLM-1.7B \
--projector-type ldp
```

```sh
python ./examples/llava/convert-image-encoder-to-gguf \
python ./examples/llava/convert_image_encoder_to_gguf \
-m path/to/clip-vit-large-patch14-336 \
--llava-projector path/to/MobileVLM-1.7B_V2/llava.projector \
--output-dir path/to/MobileVLM-1.7B_V2 \
--projector-type ldpv2
```

4. Use `examples/convert-legacy-llama.py` to convert the LLaMA part of LLaVA to GGUF:
4. Use `examples/convert_legacy_llama.py` to convert the LLaMA part of LLaVA to GGUF:

```sh
python ./examples/convert-legacy-llama.py path/to/MobileVLM-1.7B
python ./examples/convert_legacy_llama.py path/to/MobileVLM-1.7B
```

5. Use `quantize` to convert LLaMA part's DataType from `fp16` to `q4_k`
Expand Down
20 changes: 10 additions & 10 deletions examples/llava/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,22 +38,22 @@ git clone https://huggingface.co/openai/clip-vit-large-patch14-336
pip install -r examples/llava/requirements.txt
```

3. Use `llava-surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:
3. Use `llava_surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:

```sh
python ./examples/llava/llava-surgery.py -m ../llava-v1.5-7b
python ./examples/llava/llava_surgery.py -m ../llava-v1.5-7b
```

4. Use `convert-image-encoder-to-gguf.py` to convert the LLaVA image encoder to GGUF:
4. Use `convert_image_encoder_to_gguf.py` to convert the LLaVA image encoder to GGUF:

```sh
python ./examples/llava/convert-image-encoder-to-gguf.py -m ../clip-vit-large-patch14-336 --llava-projector ../llava-v1.5-7b/llava.projector --output-dir ../llava-v1.5-7b
python ./examples/llava/convert_image_encoder_to_gguf.py -m ../clip-vit-large-patch14-336 --llava-projector ../llava-v1.5-7b/llava.projector --output-dir ../llava-v1.5-7b
```

5. Use `examples/convert-legacy-llama.py` to convert the LLaMA part of LLaVA to GGUF:
5. Use `examples/convert_legacy_llama.py` to convert the LLaMA part of LLaVA to GGUF:

```sh
python ./examples/convert-legacy-llama.py ../llava-v1.5-7b --skip-unknown
python ./examples/convert_legacy_llama.py ../llava-v1.5-7b --skip-unknown
```

Now both the LLaMA part and the image encoder are in the `llava-v1.5-7b` directory.
Expand All @@ -70,9 +70,9 @@ git clone https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b
pip install -r examples/llava/requirements.txt
```

3) Use `llava-surgery-v2.py` which also supports llava-1.5 variants pytorch as well as safetensor models:
3) Use `llava_surgery_v2.py` which also supports llava-1.5 variants pytorch as well as safetensor models:
```console
python examples/llava/llava-surgery-v2.py -C -m ../llava-v1.6-vicuna-7b/
python examples/llava/llava_surgery_v2.py -C -m ../llava-v1.6-vicuna-7b/
```
- you will find a llava.projector and a llava.clip file in your model directory

Expand All @@ -86,13 +86,13 @@ curl -s -q https://huggingface.co/cmp-nct/llava-1.6-gguf/raw/main/config_vit.jso

5) Create the visual gguf model:
```console
python ./examples/llava/convert-image-encoder-to-gguf.py -m vit --llava-projector vit/llava.projector --output-dir vit --clip-model-is-vision
python ./examples/llava/convert_image_encoder_to_gguf.py -m vit --llava-projector vit/llava.projector --output-dir vit --clip-model-is-vision
```
- This is similar to llava-1.5, the difference is that we tell the encoder that we are working with the pure vision model part of CLIP

6) Then convert the model to gguf format:
```console
python ./examples/convert-legacy-llama.py ../llava-v1.6-vicuna-7b/ --skip-unknown
python ./examples/convert_legacy_llama.py ../llava-v1.6-vicuna-7b/ --skip-unknown
```

7) And finally we can run the llava cli using the 1.6 model version:
Expand Down
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion examples/llava/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
-r ../../requirements/requirements-convert-legacy-llama.txt
-r ../../requirements/requirements-convert_legacy_llama.txt
pillow~=10.2.0
torch~=2.2.1
File renamed without changes.
File renamed without changes.
10 changes: 5 additions & 5 deletions gguf-py/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
This is a Python package for writing binary files in the [GGUF](https://github.com/ggerganov/ggml/pull/302)
(GGML Universal File) format.

See [convert-llama-hf-to-gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert-hf-to-gguf.py)
See [convert_hf_to_gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py)
as an example for its usage.

## Installation
Expand All @@ -15,13 +15,13 @@ pip install gguf

[examples/writer.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/examples/writer.py) — Generates `example.gguf` in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.

[scripts/gguf-dump.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-dump.py) — Dumps a GGUF file's metadata to the console.
[scripts/gguf_dump.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf_dump.py) — Dumps a GGUF file's metadata to the console.

[scripts/gguf-set-metadata.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-set-metadata.py) — Allows changing simple metadata values in a GGUF file by key.
[scripts/gguf_set_metadata.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf_set_metadata.py) — Allows changing simple metadata values in a GGUF file by key.

[scripts/gguf-convert-endian.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-convert-endian.py) — Allows converting the endianness of GGUF files.
[scripts/gguf_convert_endian.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf_convert_endian.py) — Allows converting the endianness of GGUF files.

[scripts/gguf-new-metadata.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-new-metadata.py) — Copies a GGUF file with added/modified/removed metadata values.
[scripts/gguf_new_metadata.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf_new_metadata.py) — Copies a GGUF file with added/modified/removed metadata values.

## Development
Maintainers who participate in development of this package are advised to install it in editable mode:
Expand Down
17 changes: 4 additions & 13 deletions gguf-py/scripts/__init__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,4 @@
import os

from importlib import import_module


os.environ["NO_LOCAL_GGUF"] = "TRUE"

gguf_convert_endian_entrypoint = import_module("scripts.gguf-convert-endian").main
gguf_dump_entrypoint = import_module("scripts.gguf-dump").main
gguf_set_metadata_entrypoint = import_module("scripts.gguf-set-metadata").main
gguf_new_metadata_entrypoint = import_module("scripts.gguf-new-metadata").main

del import_module, os
from .gguf_convert_endian import main as gguf_convert_endian_entrypoint
from .gguf_dump import main as gguf_dump_entrypoint
from .gguf_set_metadata import main as gguf_set_metadata_entrypoint
from .gguf_new_metadata import main as gguf_new_metadata_entrypoint
File renamed without changes.
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Package versions must stay compatible across all top-level python scripts.
#

-r ./requirements/requirements-convert-legacy-llama.txt
-r ./requirements/requirements-convert_legacy_llama.txt

-r ./requirements/requirements-convert_hf_to_gguf.txt
-r ./requirements/requirements-convert_hf_to_gguf_update.txt
Expand Down
2 changes: 1 addition & 1 deletion requirements/requirements-convert_hf_to_gguf.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
-r ./requirements-convert-legacy-llama.txt
-r ./requirements-convert_legacy_llama.txt
torch~=2.2.1
2 changes: 1 addition & 1 deletion requirements/requirements-convert_hf_to_gguf_update.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
-r ./requirements-convert-legacy-llama.txt
-r ./requirements-convert_legacy_llama.txt
torch~=2.2.1
2 changes: 1 addition & 1 deletion requirements/requirements-convert_llama_ggml_to_gguf.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
-r ./requirements-convert-legacy-llama.txt
-r ./requirements-convert_legacy_llama.txt
Loading
Loading