Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA/HIP: fix tests/test-backend-ops #8896

Merged
merged 1 commit into from
Aug 7, 2024

Conversation

JohannesGaessler
Copy link
Collaborator

Fixes #8864 .
The problem is simply that I forgot to adapt the HIP test logic at some point.

Fixes #8863 .
The problem is that there are test cases that are (according to the corresponding functions) supported with the CUDA backend but not with the CPU backend. As a consequence, when running the tests normally these tests are not executed because there are no CPU results to compare the CUDA results to. In performance mode they are executed though and trigger asserts. Quite frankly though, without a way to assert that the results produced are actually correct such performance numbers would be useless anyways. So I edited the test code in such a way that performance is only evaluated for those ops where correctness can also be tested.

@github-actions github-actions bot added the testing Everything test related label Aug 6, 2024
@@ -539,7 +539,7 @@ struct test_case {
return false;
}

bool eval_perf(ggml_backend_t backend, const char * op_name) {
bool eval_perf(ggml_backend_t backend1, ggml_backend_t backend2, const char * op_name) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the purpose of this change, the perf mode only uses one backend.

@slaren
Copy link
Collaborator

slaren commented Aug 6, 2024

So I edited the test code in such a way that performance is only evaluated for those ops where correctness can also be tested.

This should not be necessary.

@slaren
Copy link
Collaborator

slaren commented Aug 6, 2024

It seems that the issue is that the CUDA backend does not correctly report in the supports_op function what kinds of flash attention and mul_mat operations it can do. The solution is to fix that, not to check also the CPU backend.

@JohannesGaessler
Copy link
Collaborator Author

In my opinion the reportedly supported ops being inaccurate and performance tests being executed for cases where the correctness cannot be asserted are both issues. But since the whole point of the tests is to reduce the time developers need to spend on quality assurance I don't want to discuss this at length.

@JohannesGaessler JohannesGaessler merged commit a8dbc6f into ggerganov:master Aug 7, 2024
53 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Everything test related
Projects
None yet
2 participants