Skip to content

Commit

Permalink
[DOC] prepare for 0.9.5 (#176)
Browse files Browse the repository at this point in the history
  • Loading branch information
Qubitium authored Jul 5, 2024
1 parent 50aa90a commit f0a1ee8
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 3 deletions.
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@
</p>

## News
* 07/05/2024 🚀🚀 [v0.9.5](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.5): Intel QBits support added for [2,3,4,8] bit quantization/inference on CPU. Cuda kernels have been fully deprecated in favor of Exllama(v1/v2)/Marlin/Triton.

* 07/03/2024 🚀 [v0.9.4](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.4): HF Transformers integration added and bug fixed Gemma 2 support.

* 07/02/2024 🚀 [v0.9.3](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.3): Added Gemma 2 support, faster PPL calculations on gpu, and more code/arg refractor.

* 06/30/2024 🚀 [v0.9.2](https://github.com/ModelCloud/GPTQModel/releases/tag/v0.9.2): Added auto-padding of model in/out-features for exllama and exllama v2.
Expand Down Expand Up @@ -65,7 +69,7 @@ We will backport bug fixes to AutoGPTQ on a case-by-case basis.

* `lm_head` quantization support by integrating with Intel/AutoRound.
* Customizable callback in Per-Layer quantization.
* Add Qbits (cpu inference) support from Intel/Qbits.
* <del>Add Qbits (cpu inference) support from Intel/Qbits.</del> Completed
* Add back ROCM/AMD support once everything is validated.
* Store quant loss stat and apply diffs to new quant for quality control.
* Add Tests for every single supported model.
Expand Down Expand Up @@ -266,7 +270,7 @@ Currently, `gptqmodel` supports: `LanguageModelingTask`, `SequenceClassification

### Which kernel is used by default?

GPTQModel will use Marlin, Exllama v2, Triton/CUDA kernels in that order for maximum inference performance.
GPTQModel will use Marlin, Exllama v2, Triton kernels in that order for maximum inference performance.

# Acknowledgements

Expand Down
2 changes: 1 addition & 1 deletion gptqmodel/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.9.5-dev0"
__version__ = "0.9.5"

0 comments on commit f0a1ee8

Please sign in to comment.