Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support to export gguf q4_0 and q4_1 format #393

Merged
merged 30 commits into from
Jan 8, 2025
Merged

support to export gguf q4_0 and q4_1 format #393

merged 30 commits into from
Jan 8, 2025

Conversation

n1ck-guo
Copy link
Contributor

@n1ck-guo n1ck-guo commented Dec 24, 2024

  • export function
  • q4_0
  • q4_1
  • q4_k

using llama.cpp(llama-cli) test q4_0 and q4_1 quantized file, work well.

#288

Signed-off-by: n1ck-guo <heng.guo@intel.com>
pre-commit-ci bot and others added 11 commits December 24, 2024 08:06
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
auto_round/__main__.py Outdated Show resolved Hide resolved
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
auto_round/__main__.py Outdated Show resolved Hide resolved
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
endianness: gguf.GGUFEndian
use_temp_file: bool
lazy: bool
part_names: list[str]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not calling gguf code directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot. Different model series use different class to write gguf file.

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@wenhuach21 wenhuach21 changed the title [WIP] support to export gguf format support to export gguf q4_0 and q4_1 format Jan 7, 2025
@wenhuach21
Copy link
Contributor

wenhuach21 commented Jan 7, 2025

remember to add checker in save_quantized and add warning when combined with fp_layers later

@wenhuach21 wenhuach21 self-requested a review January 7, 2025 07:55
auto_round/script/llm.py Outdated Show resolved Hide resolved
for format in formats:
if format not in supported_formats:
raise ValueError(f"{format} is not supported, we only support {supported_formats}")
if format in ["gguf:q4_0", "gguf:q4_1"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support gguf later if we could inference the exact type by the quantization config

auto_round/script/llm.py Outdated Show resolved Hide resolved
Copy link
Contributor

@wenhuach21 wenhuach21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterx_xpu to itrex_xpu

Signed-off-by: n1ck-guo <heng.guo@intel.com>
@@ -1267,6 +1267,14 @@ def save_quantized(self, output_dir=None, format="auto_round", inplace=True, **k
if processor is not None:
processor.save_pretrained(output_dir)
return
if format in ["gguf:q4_0", "gguf:q4_1"]:
if self.group_size != 32:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also better check bits

@wenhuach21 wenhuach21 self-requested a review January 7, 2025 08:33
@wenhuach21 wenhuach21 merged commit 86767b0 into main Jan 8, 2025
8 checks passed
@wenhuach21 wenhuach21 deleted the hengguo/gguf branch January 8, 2025 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants