Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Batching of calibration data in .quantize() #70

Merged
merged 2 commits into from
Jun 26, 2024
Merged

Conversation

LRL-ModelCloud
Copy link
Collaborator

@LRL-ModelCloud LRL-ModelCloud commented Jun 26, 2024

Original code had broken batching support. Causing quantization of large calibration to take much longer than necessary.

@Qubitium Qubitium changed the title [BUG] fix the incorrect concate by torch.cat, and add pad_token protect [FIX] Incorrect concate by torch.cat, and add pad_token protect Jun 26, 2024
@Qubitium Qubitium changed the title [FIX] Incorrect concate by torch.cat, and add pad_token protect [FIX] Batching of calibration data in .quantize() Jun 26, 2024
@Qubitium Qubitium merged commit 2936e08 into main Jun 26, 2024
1 of 2 checks passed
@Qubitium Qubitium deleted the fix-bath-size branch June 26, 2024 10:22
DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024
* Fix test_serialization  test_quant_formats.py::TestQuantization::test_quantize_2

* Update test_quality.yml

* Update qlinear_marlin.py
DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024
* fix wrong torch.cat, and add pad_token protect

* Update base.py

---------

Co-authored-by: LRL-ModelCloud <lrl@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants