SYCL: Kernel function refactor #11515

qnixsynapse · 2025-01-30T16:12:33Z

This PR simplifies the SYCL backend by removing the ggml_sycl_op_flatten function and integrating its responsibilities directly into the kernel functions.

The current implementation of ggml_sycl_op_flatten appears to have a misleading name, as its functionality does not align with the typical meaning of "flatten" in machine learning (i.e., collapsing dimensions). It currently performs the following operations:

Checks if dst->src[1] if available and dst for non-split tensors.
Unconditionally casts input and dst tensors to float32 before passing them to the kernel function.
Executes the kernel function with the modified tensors.

Additionally, some unused sycl_pool_alloc objects are present in the code.

The unconditional casting to float32 might present a significant concern. It can lead to unnecessary and potentially unstable type conversions. For instance, an int32 tensor is cast to float32 in ggml_sycl_op_flatten and then back to int32 within one of the kernels which might not be numerically stable IMO.

This PR tries to achieve few things:

Make sure the tensors are passed to the kernel are in original type w/o any intermediate type conversations
Introduce flexibility for adding support for additional data types or easy adding of kernels
Remove unused/duplicate variables (GGML_UNUSED stuffs) and unsafe type conversions

TODOs:

Remove ggml_sycl_op_flatten function
Add back split buffer type checks
Add try catch blocks for catching sycl::exception(s) (almost done)
Sort includes
Remove added comments and finalize

qnixsynapse · 2025-02-03T16:00:41Z

Unfortunately glibc 2.41 update has broken the SYCL compiler. ... I will try to fix it in the meantime.
edit: reverted to glibc 2.40 in the meantime.

…mments

qnixsynapse · 2025-02-05T07:37:18Z

Looks like non contiguous norm tests were added in #11659 . Our current kernels currently does not support it. I need to disable it in backend_supports_op function.

… at the end of debug logs

NeoZhangJianyu · 2025-02-05T09:59:45Z

@qnixsynapse
Thank you for so hard work!
It's also hard work to reviewers. :)

I see some code change is to move code to right place. I think they are safe and no need more test.

But some code change would impact the performance, like add try-catch(). Add it to more sub functions, that would impact the performance. - It's my guess.
I think there should be such code change need we review carefully. But they are hidden in the huge code as the moved code.

So much code change will make the test is hard too.
I don't think there are enough existed test cases can cover all changed code.

I know you are active developer.
But so such more code change in one PR is not good to reviewer, maintainer.

move the code to right place.
So much code change will make the trouble shooting be more complex: when we diff the code in 2+ PRs, it's hard to check the key code.
function or performance change.
They are useful to project, and bring value to llama.cpp users. We need to review and test them carefully.
But I can't find them in short time. :<.

So, I suggest refactor this PR, thought this PR is used refactor the SYCL backend:

keep the function or performance change code in high priority.
refactor the code/files which is impacted in step 1.
Don't refactor other files which are not impacted by step 1.
If possible, for the function or performance change, use small PR for single issue.

There is still no real CI of SYCL backend. It's very hard to control the quality.
It's better to keep the minimum change in SYCL backend.

qnixsynapse · 2025-02-05T10:13:54Z

@NeoZhangJianyu Thank you for your comments.

Most of the changes in this PR is just moving the kernels from ggml-sycl.cpp to their individual files like that of CUDA backend.

Try -- catch exception handling is done one time like it was before. I did not notice any performance regression on my system.
Currently it is done on wrapper function and in this change, it is done on the kernel function.

The rest of the changes are just removals of intermediate type conversations and adding back split buffer type checks.

Apart from this, the GGML_SYCL_DEBUG macro was fixed and added to places so that non tech savvy people can help us debug easily.

I know that there is no CI for SYCL and that's why I tested and test at every point of this change.

If this change is really big, I will close it and submit smaller changes.

Alcpz · 2025-02-05T12:20:13Z

I agree with @NeoZhangJianyu here. @qnixsynapse, you've been putting a lot of effort into the PR and it contributes a lot of good things to the backend but the scale of these changes make it a bit harder for others to provide feedback.

Since you refactor functions into different files along with other changes, some smaller modifications that require verification or understanding could get lost in the PR. Breaking these down might really help everyone else to follow along.

qnixsynapse · 2025-02-05T12:26:10Z

@Alcpz Hmm.. I decided to close this PR and submit smaller changes.

NeoZhangJianyu · 2025-02-06T01:01:15Z

@qnixsynapse
It's great!

qnixsynapse · 2025-02-06T03:10:33Z

@qnixsynapse It's great!

Our backend is lagging behind others in OP support, so increased collaboration from the teams would be greatly appreciated. :)

qnixsynapse marked this pull request as draft January 30, 2025 16:12

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jan 30, 2025

qnixsynapse force-pushed the refactor_kernels branch 5 times, most recently from 8b6a0d2 to e32b19a Compare February 2, 2025 13:50

qnixsynapse marked this pull request as ready for review February 4, 2025 07:15

qnixsynapse mentioned this pull request Feb 5, 2025

Misc. bug: XMX cores not detected on Intel Arc B580 #11504

Closed

qnixsynapse added 19 commits February 5, 2025 09:01

SYCL: remove ggml_sycl_op_flatten function

2d72bd9

binbcast: use void pointer to prevent intermediate type conversions

957c11b

binbcast: move to a separate file

108be39

binbcast: add try catch sycl::exception

e1326a7

Fix GGML_SYCL_DEBUG in kernels in other files

fa7c4d8

ARGMAX: move to a separate file

95a09ab

Argsort: move to a separate file

5288bd5

ggml_sycl_compute_forward: fixup function calling names and remove co…

a153f19

…mments

argmax: move missing function to file and fix function name

51bedb8

argsort: add a space at the end of file

3a34659

Add spaces

aaf9ed0

eltwise: sort includes

a16b6b7

CPY: move to a separate file

ecacff3

eltwise: add back split buffer type checks

7d8d689

Add back split buffer type checks

04d8b03

getrows: move to a separate file

98f5fd2

diagmask: move to a separate file

8e86732

rope: add try catch sycl exception and debug log

7f2d24f

scale: move to a separate file

927925f

qnixsynapse added 16 commits February 5, 2025 09:02

pool2d: move to a separate file

eb466d7

Move sum and sum rows to a separate file

5c05a3e

norm: add try catch sycl exception

d31c62d

Add sum to backend hpp

1ccfaae

concat: Handle SYCL exceptions

bba4b66

softmax: handle SYCL exceptions and add debug logs

6dbb7ac

norm: add a space at the end of file

a6a239c

Adjust EOF spaces and usused variable

6eb30d9

ggml-sycl: sort includes

539b0c6

gemm.hpp: remove unused include

18d706a

ggml_sycl_op_argmax)Add debug logs to ggml_sycl_mul_ma0

0ae9a07

Add back ggml_sycl_set_device to kernels

7369e54

Add remaining SYCL exception handler to kernel and refactor

e592637

conv: add space before eof

52b0652

Final touches

0b602f0

ggml-sycl: hide matrix engine info for now from print sycl devices

efb5773

qnixsynapse force-pushed the refactor_kernels branch from b07f6f3 to efb5773 Compare February 5, 2025 03:32

qnixsynapse marked this pull request as draft February 5, 2025 06:59

Disable non-contiguous tensor support in norm kernels and add newline…

cfa2cc1

… at the end of debug logs

qnixsynapse marked this pull request as ready for review February 5, 2025 08:05

qnixsynapse closed this Feb 5, 2025

qnixsynapse deleted the refactor_kernels branch February 5, 2025 12:28

qnixsynapse mentioned this pull request Feb 7, 2025

Compile bug: I'm trying to compile llama.cpp with SYCL support on Arch Linux with the glibc 2.41+r2+g0a7c7a3e283a-1 #11713

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYCL: Kernel function refactor #11515

SYCL: Kernel function refactor #11515

qnixsynapse commented Jan 30, 2025 •

edited

Loading

qnixsynapse commented Feb 3, 2025 •

edited

Loading

qnixsynapse commented Feb 5, 2025

NeoZhangJianyu commented Feb 5, 2025

qnixsynapse commented Feb 5, 2025 •

edited

Loading

Alcpz commented Feb 5, 2025

qnixsynapse commented Feb 5, 2025

NeoZhangJianyu commented Feb 6, 2025

qnixsynapse commented Feb 6, 2025

SYCL: Kernel function refactor #11515

SYCL: Kernel function refactor #11515

Conversation

qnixsynapse commented Jan 30, 2025 • edited Loading

qnixsynapse commented Feb 3, 2025 • edited Loading

qnixsynapse commented Feb 5, 2025

NeoZhangJianyu commented Feb 5, 2025

qnixsynapse commented Feb 5, 2025 • edited Loading

Alcpz commented Feb 5, 2025

qnixsynapse commented Feb 5, 2025

NeoZhangJianyu commented Feb 6, 2025

qnixsynapse commented Feb 6, 2025

qnixsynapse commented Jan 30, 2025 •

edited

Loading

qnixsynapse commented Feb 3, 2025 •

edited

Loading

qnixsynapse commented Feb 5, 2025 •

edited

Loading