Speed progression on CUDA MMQ with 1.68 #934

gustrd · 2024-06-20T14:10:41Z

gustrd
Jun 20, 2024

I recently conducted experiments with all my NVIDIA cards and achieved superior results with less memory using MMQ.

With MMQ, I can now offload more layers. Even when offloading the same number of layers, I still observe a slight performance improvement.

I believe this enhancement is due to the upstream PR.

Have you had a similar experience? Do you think it's time to update the MMQ documentation?

gustrd · 2024-06-25T17:58:51Z

It looks like there is an advanced PR upstream to make MMQ the default when optimization is available.

0 replies