[FIX] Batching of calibration data in .quantize() #70

LRL-ModelCloud · 2024-06-26T09:59:27Z

Original code had broken batching support. Causing quantization of large calibration to take much longer than necessary.

* Fix test_serialization test_quant_formats.py::TestQuantization::test_quantize_2 * Update test_quality.yml * Update qlinear_marlin.py

* fix wrong torch.cat, and add pad_token protect * Update base.py --------- Co-authored-by: LRL-ModelCloud <lrl@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

fix wrong torch.cat, and add pad_token protect

2f2e636

Qubitium changed the title ~~[BUG] fix the incorrect concate by torch.cat, and add pad_token protect~~ [FIX] Incorrect concate by torch.cat, and add pad_token protect Jun 26, 2024

Qubitium changed the title ~~[FIX] Incorrect concate by torch.cat, and add pad_token protect~~ [FIX] Batching of calibration data in .quantize() Jun 26, 2024

Update base.py

eebd13e

Qubitium approved these changes Jun 26, 2024

View reviewed changes

Qubitium merged commit 2936e08 into main Jun 26, 2024
1 of 2 checks passed

Qubitium deleted the fix-bath-size branch June 26, 2024 10:22

DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024

Fix test_quant_formats (ModelCloud#70)

d0f5a49

* Fix test_serialization test_quant_formats.py::TestQuantization::test_quantize_2 * Update test_quality.yml * Update qlinear_marlin.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Batching of calibration data in .quantize() #70

[FIX] Batching of calibration data in .quantize() #70

LRL-ModelCloud commented Jun 26, 2024 •

edited by Qubitium

Loading

[FIX] Batching of calibration data in .quantize() #70

[FIX] Batching of calibration data in .quantize() #70

Conversation

LRL-ModelCloud commented Jun 26, 2024 • edited by Qubitium Loading

LRL-ModelCloud commented Jun 26, 2024 •

edited by Qubitium

Loading