Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL: Kernel function refactor #11515

Closed
wants to merge 40 commits into from

Conversation

qnixsynapse
Copy link
Contributor

@qnixsynapse qnixsynapse commented Jan 30, 2025

This PR simplifies the SYCL backend by removing the ggml_sycl_op_flatten function and integrating its responsibilities directly into the kernel functions.

The current implementation of ggml_sycl_op_flatten appears to have a misleading name, as its functionality does not align with the typical meaning of "flatten" in machine learning (i.e., collapsing dimensions). It currently performs the following operations:

  1. Checks if dst->src[1] if available and dst for non-split tensors.
  2. Unconditionally casts input and dst tensors to float32 before passing them to the kernel function.
  3. Executes the kernel function with the modified tensors.

Additionally, some unused sycl_pool_alloc objects are present in the code.

The unconditional casting to float32 might present a significant concern. It can lead to unnecessary and potentially unstable type conversions. For instance, an int32 tensor is cast to float32 in ggml_sycl_op_flatten and then back to int32 within one of the kernels which might not be numerically stable IMO.

This PR tries to achieve few things:

  1. Make sure the tensors are passed to the kernel are in original type w/o any intermediate type conversations
  2. Introduce flexibility for adding support for additional data types or easy adding of kernels
  3. Remove unused/duplicate variables (GGML_UNUSED stuffs) and unsafe type conversions

TODOs:

  • Remove ggml_sycl_op_flatten function
  • Add back split buffer type checks
  • Add try catch blocks for catching sycl::exception(s) (almost done)
  • Sort includes
  • Remove added comments and finalize

@qnixsynapse qnixsynapse marked this pull request as draft January 30, 2025 16:12
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jan 30, 2025
@qnixsynapse qnixsynapse force-pushed the refactor_kernels branch 5 times, most recently from 8b6a0d2 to e32b19a Compare February 2, 2025 13:50
@qnixsynapse
Copy link
Contributor Author

qnixsynapse commented Feb 3, 2025

Unfortunately glibc 2.41 update has broken the SYCL compiler. ... I will try to fix it in the meantime.
edit: reverted to glibc 2.40 in the meantime.

@qnixsynapse qnixsynapse marked this pull request as ready for review February 4, 2025 07:15
@qnixsynapse qnixsynapse marked this pull request as draft February 5, 2025 06:59
@qnixsynapse
Copy link
Contributor Author

Looks like non contiguous norm tests were added in #11659 . Our current kernels currently does not support it. I need to disable it in backend_supports_op function.

@qnixsynapse qnixsynapse marked this pull request as ready for review February 5, 2025 08:05
@NeoZhangJianyu
Copy link
Collaborator

@qnixsynapse
Thank you for so hard work!
It's also hard work to reviewers. :)

I see some code change is to move code to right place. I think they are safe and no need more test.

But some code change would impact the performance, like add try-catch(). Add it to more sub functions, that would impact the performance. - It's my guess.
I think there should be such code change need we review carefully. But they are hidden in the huge code as the moved code.

So much code change will make the test is hard too.
I don't think there are enough existed test cases can cover all changed code.

I know you are active developer.
But so such more code change in one PR is not good to reviewer, maintainer.

  1. move the code to right place.
    So much code change will make the trouble shooting be more complex: when we diff the code in 2+ PRs, it's hard to check the key code.
  2. function or performance change.
    They are useful to project, and bring value to llama.cpp users. We need to review and test them carefully.
    But I can't find them in short time. :<.

So, I suggest refactor this PR, thought this PR is used refactor the SYCL backend:

  1. keep the function or performance change code in high priority.
  2. refactor the code/files which is impacted in step 1.
  3. Don't refactor other files which are not impacted by step 1.
  4. If possible, for the function or performance change, use small PR for single issue.

There is still no real CI of SYCL backend. It's very hard to control the quality.
It's better to keep the minimum change in SYCL backend.

@qnixsynapse
Copy link
Contributor Author

qnixsynapse commented Feb 5, 2025

@NeoZhangJianyu Thank you for your comments.

Most of the changes in this PR is just moving the kernels from ggml-sycl.cpp to their individual files like that of CUDA backend.

Try -- catch exception handling is done one time like it was before. I did not notice any performance regression on my system.
Currently it is done on wrapper function and in this change, it is done on the kernel function.

The rest of the changes are just removals of intermediate type conversations and adding back split buffer type checks.

Apart from this, the GGML_SYCL_DEBUG macro was fixed and added to places so that non tech savvy people can help us debug easily.

I know that there is no CI for SYCL and that's why I tested and test at every point of this change.

If this change is really big, I will close it and submit smaller changes.

@Alcpz
Copy link
Collaborator

Alcpz commented Feb 5, 2025

I agree with @NeoZhangJianyu here. @qnixsynapse, you've been putting a lot of effort into the PR and it contributes a lot of good things to the backend but the scale of these changes make it a bit harder for others to provide feedback.

Since you refactor functions into different files along with other changes, some smaller modifications that require verification or understanding could get lost in the PR. Breaking these down might really help everyone else to follow along.

@qnixsynapse
Copy link
Contributor Author

@Alcpz Hmm.. I decided to close this PR and submit smaller changes.

@qnixsynapse qnixsynapse closed this Feb 5, 2025
@qnixsynapse qnixsynapse deleted the refactor_kernels branch February 5, 2025 12:28
@NeoZhangJianyu
Copy link
Collaborator

@qnixsynapse
It's great!

@qnixsynapse
Copy link
Contributor Author

@qnixsynapse It's great!

Our backend is lagging behind others in OP support, so increased collaboration from the teams would be greatly appreciated. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants