Replies: 1 comment
-
It looks like there is an advanced PR upstream to make MMQ the default when optimization is available. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I recently conducted experiments with all my NVIDIA cards and achieved superior results with less memory using MMQ.
With MMQ, I can now offload more layers. Even when offloading the same number of layers, I still observe a slight performance improvement.
I believe this enhancement is due to the upstream PR.
Have you had a similar experience? Do you think it's time to update the MMQ documentation?
Beta Was this translation helpful? Give feedback.
All reactions