-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support to export gguf q4_0 and q4_1 format #393
Merged
Merged
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
8355347
export gguf
n1ck-guo dd55003
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f67219b
q4_0/1 port c++ to python
n1ck-guo 611c4c1
Merge branch 'hengguo/gguf' of https://github.com/intel/auto-round in…
n1ck-guo ce1c48e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7ab730b
change to llama.cpp stype and add uint8 store
n1ck-guo 287b5af
abstract
n1ck-guo 49d95a8
merge
n1ck-guo 113532a
update
n1ck-guo ee66c47
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d395c6b
fix
n1ck-guo 8b13f1f
Merge branch 'hengguo/gguf' of https://github.com/intel/auto-round in…
n1ck-guo ce2c346
update
n1ck-guo 8bceb3f
default sequence eval
n1ck-guo 722a1d8
modify by comments
n1ck-guo 8712170
update
n1ck-guo 1aa979a
pylint
n1ck-guo 515160d
clean
n1ck-guo a064c44
pylint
n1ck-guo fa2328d
fix
n1ck-guo 7906284
update
n1ck-guo 4261191
Merge branch 'main' into hengguo/gguf
n1ck-guo e525f97
add ut
n1ck-guo b0f96a0
add cuda ut
n1ck-guo c7ec3a5
add requirements
n1ck-guo 79c5c5a
format
n1ck-guo 2720287
code scane
n1ck-guo db15354
update
n1ck-guo 24a68a9
merge main
n1ck-guo cb67c1a
update
n1ck-guo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -269,23 +269,26 @@ def tune(args): | |
if args.format is None: | ||
args.format = "auto_round" | ||
supported_formats = ["auto_round", "auto_gptq", "auto_awq", "auto_round:auto_gptq", "auto_round:auto_awq", | ||
"auto_gptq:marlin", "gguf:q4_0", "gguf:q4_1", "itrex", "iterx_xpu", "fake"] | ||
"auto_gptq:marlin", "gguf:q4_0", "gguf:q4_1", "itrex", "itrex_xpu", "fake"] | ||
formats = args.format.lower().replace(' ', '').split(",") | ||
for format in formats: | ||
if format not in supported_formats: | ||
raise ValueError(f"{format} is not supported, we only support {supported_formats}") | ||
if format in ["gguf:q4_0", "gguf:q4_1"]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. support gguf later if we could inference the exact type by the quantization config |
||
args.bits = 4 | ||
if args.act_bits <= 8: | ||
logger.warning(f"{args.format} not support for activation quantization.") | ||
if args.group_size != 32: | ||
logger.warning(f"{args.format} not support for group_size: {args.group_size}. " | ||
"Reset group_size to 32.") | ||
args.group_size = 32 | ||
if args.format.endswith("_0"): | ||
args.asym = False | ||
if args.format.endswith("_1"): | ||
args.asym = True | ||
if args.act_bits <= 8: | ||
logger.warning(f"{args.format} not support for activation quantization.") | ||
if args.group_size != 32: | ||
logger.warning(f"{args.format} not support for group_size: {args.group_size}. " | ||
"Reset group_size to 32.") | ||
args.group_size = 32 | ||
if args.format.endswith("_0") and args.asym: | ||
logger.warning(f"{args.format} not support for asymmetric quantization, will reset to sym.") | ||
args.asym = False | ||
if args.format.endswith("_1") and not args.asym: | ||
logger.warning(f"{args.format} not support for symmetric quantization, will reset to asym.") | ||
args.asym = True | ||
logger.info(f"export format {format}, sym = {not args.asym}, group_size = {args.group_size}") | ||
|
||
if "auto_gptq" in args.format and args.asym is True: | ||
print( | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also better check bits