sycl: Use syclcompat::dp4a #10267

Rbiessy · 2024-11-12T14:41:35Z

Using the syclcompat version allow the compiler to optimize the operation with native function. This was tested with #10266.
Setting sm_80 with this change improves the t/s of TG by 28% with the 70B model. I have not measured any performance difference on PVC.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

* Using the syclcompat version allow the compiler to optimize the operation with native function

Alcpz · 2024-11-12T16:00:27Z

I think it could be useful to add a note in the SYCL readme pointing out that syclcompat is only supported from the compiler 2025.0 release. Maybe an entry in the news or in the required release section.
The changes LGTM wrt the Nvidia backend.

@airMeng / @NeoZhangJianyu, Are we maintaining support for earlier oneAPI versions? I see that the CI is configured for 2024.1, so I wonder if we have to provide support for earlier versions at least for a bit or if we have to update the compiler version of the runner.

NeoZhangJianyu · 2024-11-13T07:49:04Z

The oneAPI is updated to 2025.0 in official release.
I think we don't support/maintain for old oneAPI.

Could you update the SYCL CI yaml file to use oneAPI 2025.0?
So that the CI for SYCL is passed.

Thank you!

NeoZhangJianyu · 2024-11-13T09:01:07Z

ggml/src/ggml-sycl/dpct/helper.hpp

-                           uint32_t, int32_t>;
-
-    template <typename T1, typename T2, typename T3>
-    inline auto dp4a(T1 a, T2 b, T3 c)


suggest replacing the dp4a() implementation by syclcompat::dp4a().

no code change in other modules.

easy to optimize for different cases in future if needed.

We tried this approach some time ago in a different PR, but it was closed because faster implementations requires asm and intrinsics for every backend, and we agreed to limit ourselves to pure SYCL code. Right now, there is no way to get visibility of int intrinsics (dp4a equivalents), and the syclcompat layer shipped as part of oneAPI is trying to bridge that (and other gaps) until they are made avialable through SYCL or an extension. With this approach, backend specific improvements are removed from the app itself.

do you think we could use this PR to agree what to do with regards to syclcompat? The main problem is that dp4a is a major performance gap with other backends due to the software implementation.

I think I didn't clarify my idea.
I means the dpct::dp4a() call syclcompat::dp4a() directly.
In other models, they still call dpct::dp4a(). But the code path will be forward to syclcompat::dp4a().

Because there is no test data for Intel GPU. If it's bad, we can add code branch in dpct::dp4a() for Intel GPU with old code.

If all models call syclcompat::dp4a() directly as this PR, it's complex to implement for more branches case.

We have to be careful of branching inside dp4a though, as we would introduce branching inside the kernels. Thanks for the clarification!

As long as we don't add any branching I'm fine with wrapping syclcompat::dp4a inside dpct::dp4a. This is done in 3eff3c3. I hope this is what you meant.

Rbiessy · 2024-11-13T17:35:51Z

I think it could be useful to add a note in the SYCL readme pointing out that syclcompat is only supported from the compiler 2025.0 release. Maybe an entry in the news or in the required release section. The changes LGTM wrt the Nvidia backend.

Thanks I have updated the news section in b665ffd. The "recommended release" section above looks a bit outdated now but I'm not confident enough to update it as part of this PR.

Could you update the SYCL CI yaml file to use oneAPI 2025.0?

Done in ee76375, hopefully this is the right link.

NeoZhangJianyu · 2024-11-14T02:43:29Z

docs/backend/SYCL.md

@@ -41,6 +41,8 @@ The following release is verified with good quality:

 ## News

+- 2024.11
+  - Use syclcompat to improve the performance on some backends. This requires to use oneAPI 2025.0 or more recent.


"some backends" is strange to SYCL backend.
Maybe "some platforms" or “some GPUs".

"more recent" -> "newer"

Reworded this in 1c16516

This reverts commit 90cb61d.

NeoZhangJianyu

I test it on Intel Arc 770.
The performance is not impacted.

I think this PR is good for the first step to use new lib "syclcompat".
We hope all platforms get benefit from it in the future.

And the oneAPI 2025.0 is updated to CI.
Now, SYCL backend is moved to support 2025.0. Old oneAPI won't be recommended.

Thank you!

ggerganov · 2024-11-15T09:13:47Z

Should fix the CI: https://github.com/ggerganov/llama.cpp/actions/runs/11851871832/job/33029080834#step:8:352

Rbiessy · 2024-11-15T09:41:55Z

Should fix the CI: https://github.com/ggerganov/llama.cpp/actions/runs/11851871832/job/33029080834#step:8:352

Sorry about that, it seems the docker images also need to be updated to use DPC++ 2025.0. I will look into that.

* sycl: Use syclcompat::dp4a * Using the syclcompat version allow the compiler to optimize the operation with native function * Update news section * Update CI Windows oneAPI version to 2025.0 * Reword doc * Call syclcompat::dp4a inside dpct::dp4a This reverts commit 90cb61d.

sycl: Use syclcompat::dp4a

90cb61d

* Using the syclcompat version allow the compiler to optimize the operation with native function

github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Nov 12, 2024

Rbiessy mentioned this pull request Nov 12, 2024

sycl: Add option to set the SYCL architecture for all targets #10266

Merged

4 tasks

NeoZhangJianyu reviewed Nov 13, 2024

View reviewed changes

Rbiessy added 2 commits November 13, 2024 17:28

Update news section

b665ffd

Update CI Windows oneAPI version to 2025.0

ee76375

github-actions bot added documentation Improvements or additions to documentation devops improvements to build systems and github actions labels Nov 13, 2024

NeoZhangJianyu reviewed Nov 14, 2024

View reviewed changes

Rbiessy added 2 commits November 14, 2024 12:23

Reword doc

1c16516

Call syclcompat::dp4a inside dpct::dp4a

3eff3c3

This reverts commit 90cb61d.

NeoZhangJianyu approved these changes Nov 15, 2024

View reviewed changes

airMeng merged commit 5a54af4 into ggerganov:master Nov 15, 2024
53 checks passed

Rbiessy deleted the romain/use_syclcompat_dp4a branch November 15, 2024 09:41

Rbiessy mentioned this pull request Nov 15, 2024

sycl: Update Intel docker images to use DPC++ 2025.0 #10305

Merged

4 tasks

slaren mentioned this pull request Nov 19, 2024

sync : llama.cpp ggerganov/ggml#1016

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycl: Use syclcompat::dp4a #10267

sycl: Use syclcompat::dp4a #10267

Rbiessy commented Nov 12, 2024

Alcpz commented Nov 12, 2024 •

edited

Loading

NeoZhangJianyu commented Nov 13, 2024

NeoZhangJianyu Nov 13, 2024

Alcpz Nov 13, 2024 •

edited

Loading

NeoZhangJianyu Nov 14, 2024 •

edited

Loading

Alcpz Nov 14, 2024 •

edited

Loading

Rbiessy Nov 14, 2024

Rbiessy commented Nov 13, 2024

NeoZhangJianyu Nov 14, 2024

Rbiessy Nov 14, 2024

NeoZhangJianyu left a comment •

edited

Loading

ggerganov commented Nov 15, 2024

Rbiessy commented Nov 15, 2024

sycl: Use syclcompat::dp4a #10267

sycl: Use syclcompat::dp4a #10267

Conversation

Rbiessy commented Nov 12, 2024

Alcpz commented Nov 12, 2024 • edited Loading

NeoZhangJianyu commented Nov 13, 2024

NeoZhangJianyu Nov 13, 2024

Choose a reason for hiding this comment

Alcpz Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

NeoZhangJianyu Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Alcpz Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Rbiessy Nov 14, 2024

Choose a reason for hiding this comment

Rbiessy commented Nov 13, 2024

NeoZhangJianyu Nov 14, 2024

Choose a reason for hiding this comment

Rbiessy Nov 14, 2024

Choose a reason for hiding this comment

NeoZhangJianyu left a comment • edited Loading

Choose a reason for hiding this comment

ggerganov commented Nov 15, 2024

Rbiessy commented Nov 15, 2024

Alcpz commented Nov 12, 2024 •

edited

Loading

Alcpz Nov 13, 2024 •

edited

Loading

NeoZhangJianyu Nov 14, 2024 •

edited

Loading

Alcpz Nov 14, 2024 •

edited

Loading

NeoZhangJianyu left a comment •

edited

Loading