Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small #10213

Merged
merged 1 commit into from
Nov 9, 2024

Conversation

SongXiaoXi
Copy link
Contributor

When the total number of elements ne is smaller than 4 * nsm , the integer division ne / (4 * nsm) in the COUNT_EQUAL operator results in zero. This causes dne, which determines the data chunk size per thread block, to be zero. As a result, the CUDA kernel for the COUNT_EQUAL operator doesn’t execute correctly, leading to incorrect computation results.

@github-actions github-actions bot added the Nvidia GPU Issues specific to Nvidia GPUs label Nov 8, 2024
Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you come to notice this?

@SongXiaoXi
Copy link
Contributor Author

The test-backend-ops with CUDA enabled failed, which is obvious.

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Nov 9, 2024
@JohannesGaessler
Copy link
Collaborator

Sorry, I thought the threshold at which this bug would occur with the test was 500 streaming multiprocessors (which would be much larger than any NVIDIA GPU currently available) but it's actually 125 which is less than is e.g. on an RTX 4090.

Thanks for the patch.

@JohannesGaessler JohannesGaessler merged commit 5b359bb into ggerganov:master Nov 9, 2024
53 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants