CUDA: fix mul_mat_q not used for output tensor #3127

JohannesGaessler · 2023-09-11T20:01:49Z

As pointed out by #3110 (comment) , the recent PR #3110 has increased VRAM usage. The problem is that at some point I added a condition for using mul_mat_q over cuBLAS for debugging purposes and forgot to remove it again. This PR removes said condition which fixes the increased VRAM usage for the output tensor.

CUDA: fix mul_mat_q not used for output tensor

7bb87ed

slaren approved these changes Sep 11, 2023

View reviewed changes

JohannesGaessler merged commit 89e8959 into ggerganov:master Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: fix mul_mat_q not used for output tensor #3127

CUDA: fix mul_mat_q not used for output tensor #3127

JohannesGaessler commented Sep 11, 2023

CUDA: fix mul_mat_q not used for output tensor #3127

CUDA: fix mul_mat_q not used for output tensor #3127

Conversation

JohannesGaessler commented Sep 11, 2023