Release GPTQModel v0.9.1 · ModelCloud/GPTQModel

What's Changed

v0.9.1 is a huge release with 3 new models added in addition to new BITBLAS support from Microsoft. Batching in .quantize() has been fixed so the process is now more than 50% faster for batches enabled on large number of calibration data. Also added quantized model sharding support with optional hash security checking of weight files on model load.

✨ [FEATURE + New FORMAT] Add Bitblas Format/Kernel Support by @LeiWang1999 @ZX-ModelCloud @Qubitium in #39
✨ [FEATURE] Save sharded by @LaaZa @CSY-ModelCloud @PZS-ModelCloud in #40 #69
✨ [FEATURE/SECURITY] Add verify_hash to validate model weights via stored hashes by @PZS-ModelCloud in #50
🚀 [CORE/REFACTOR] Consolidate 6+ passive use_xxx and disable_xxx args to single explicit backend arg by @ZX-ModelCloud in #68
🚀 [MODEL] DeepSeek-V2 support by @LRL-ModelCloud in #51
🚀 [MODEL] DeepSeek-V2-Lite support by @LRL-ModelCloud in #74
🚀 [MODEL] DBRX Converted support by @Qubitium @LRL-ModelCloud in #38
👾 [FIX] Batching of calibration data in .quantize() by @LRL-ModelCloud in #70
👾 [FIX] Cannot pickle 'module' object for 8 bit (fix #47) by @CSY-ModelCloud in #49
👾 [FIX] Format load check by @Qubitium in #53
👾 [FIX] save_quantized() using wrong model to obtain state_dict() by @LRL-ModelCloud in #54
👾 [FIX] Rename exllama_kernels class name to fix import/ext conflicts with autogptq by @CSY-ModelCloud in #71
🤖 [CI] Speed up unit tests @CSY-ModelCloud in #37 and #41 and #46 and #55
🤖 [CI] Improve unit tests @ZYC-ModelCloud in #58 #72
🤖 👾 [CI] FIx Marlin format desc_act must be False. by @LRL-ModelCloud in #57

New Contributors

Full Changelog: v0.9.0...v0.9.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQModel v0.9.1

What's Changed

New Contributors

Contributors