GPTQModel v0.9.1
What's Changed
v0.9.1 is a huge release with 3 new models added in addition to new BITBLAS support from Microsoft. Batching in .quantize()
has been fixed so the process is now more than 50% faster for batches enabled on large number of calibration data. Also added quantized model sharding support with optional hash security checking of weight files on model load.
- ✨ [FEATURE + New FORMAT] Add Bitblas Format/Kernel Support by @LeiWang1999 @ZX-ModelCloud @Qubitium in #39
- ✨ [FEATURE] Save sharded by @LaaZa @CSY-ModelCloud @PZS-ModelCloud in #40 #69
- ✨ [FEATURE/SECURITY] Add
verify_hash
to validate model weights via stored hashes by @PZS-ModelCloud in #50 - 🚀 [CORE/REFACTOR] Consolidate 6+ passive
use_xxx
anddisable_xxx
args to single explicitbackend
arg by @ZX-ModelCloud in #68 - 🚀 [MODEL] DeepSeek-V2 support by @LRL-ModelCloud in #51
- 🚀 [MODEL] DeepSeek-V2-Lite support by @LRL-ModelCloud in #74
- 🚀 [MODEL] DBRX Converted support by @Qubitium @LRL-ModelCloud in #38
- 👾 [FIX] Batching of calibration data in .quantize() by @LRL-ModelCloud in #70
- 👾 [FIX] Cannot pickle 'module' object for 8 bit (fix #47) by @CSY-ModelCloud in #49
- 👾 [FIX] Format load check by @Qubitium in #53
- 👾 [FIX]
save_quantized()
using wrong model to obtain state_dict() by @LRL-ModelCloud in #54 - 👾 [FIX] Rename exllama_kernels class name to fix import/ext conflicts with autogptq by @CSY-ModelCloud in #71
- 🤖 [CI] Speed up unit tests @CSY-ModelCloud in #37 and #41 and #46 and #55
- 🤖 [CI] Improve unit tests @ZYC-ModelCloud in #58 #72
- 🤖 👾 [CI] FIx Marlin format
desc_act
must be False. by @LRL-ModelCloud in #57
New Contributors
- @LeiWang1999 in #39
- @LaaZa in #40
- @PZS-ModelCloud in #50
- @ZYC-ModelCloud in #58
Full Changelog: v0.9.0...v0.9.1