Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration #10133

uniartisan · 2024-11-02T05:37:12Z

Overview

This update focuses on two major optimizations for RWKV6 operators:

Standardize operator naming for better code readability
Implement CPU multi-core parallel acceleration to improve inference performance

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

uniartisan · 2024-11-02T13:02:07Z

The SYCL backend of WKV6 is still being tested and may be pushed in the near future

* metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var

ggerganov

@airMeng Can someone on your team review the SYCL changes?

ggml/src/ggml-cpu.c

ggml/src/ggml-sycl/outprod.cpp

ggml/src/ggml-sycl/wkv6.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml/src/ggml-cpu.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

airMeng

Overall LGTM excepts some minor comments

ggml/src/ggml-sycl/concat.cpp

ggml/src/ggml-sycl.cpp

ggml/src/ggml-sycl/outprod.cpp

ggml/src/ggml-sycl/wkv6.cpp

Co-authored-by: Meng, Hengyu <airdldl@163.com>

…reads

NeoZhangJianyu

@uniartisan
It's great work! Including to refactor the SYCL backend.
I test the code with base cases. They are passed.

Thank you!

* Fixes broken build for the SYCL CUDA backend caused by non-explicit gemm call in outprod (merged in with RWKV6 in Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration #10133) * Marks permuted MUL_MAT as unsupported to be able to run test-backend-ops * Fixes asserts in norm to fix debug builds.

…eleration (ggerganov#10133) * rwkv6: rename to wkv6 * rwkv6: support avx2 avx512 armv8 armv9 * rwkv6: update cuda file name * rwkv6: rename params * wkv on sycl * sycl: add some ops * sycl: Enhance OP support judgment * wkv6: drop armv9 and tranfer to GGML style ggml-ci * sync : ggml * update the function to use appropriate types * fix define error * Update ggml/src/ggml-cpu.c * add appropriate asserts * move element-wise functions outside * put the declaration outside the loop * rewrite to be more inline with the common pattern for distributing threads * use recommended way GGML_TENSOR_LOCALS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Meng, Hengyu <airdldl@163.com>

* Fixes broken build for the SYCL CUDA backend caused by non-explicit gemm call in outprod (merged in with RWKV6 in Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration ggerganov#10133) * Marks permuted MUL_MAT as unsupported to be able to run test-backend-ops * Fixes asserts in norm to fix debug builds.

…eleration (ggerganov#10133) * rwkv6: rename to wkv6 * rwkv6: support avx2 avx512 armv8 armv9 * rwkv6: update cuda file name * rwkv6: rename params * wkv on sycl * sycl: add some ops * sycl: Enhance OP support judgment * wkv6: drop armv9 and tranfer to GGML style ggml-ci * sync : ggml * update the function to use appropriate types * fix define error * Update ggml/src/ggml-cpu.c * add appropriate asserts * move element-wise functions outside * put the declaration outside the loop * rewrite to be more inline with the common pattern for distributing threads * use recommended way GGML_TENSOR_LOCALS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Meng, Hengyu <airdldl@163.com>

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 2, 2024

uniartisan force-pushed the master branch from c6f4aef to 7febf6e Compare November 2, 2024 12:47

github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Nov 2, 2024

uniartisan force-pushed the master branch from f0158fa to 7febf6e Compare November 2, 2024 13:01

github-actions bot added the documentation Improvements or additions to documentation label Nov 2, 2024

uniartisan mentioned this pull request Nov 2, 2024

ggml : add GPU support for Mamba models #6758

Open

uniartisan changed the title ~~Optimize RWKV6 Operator Naming and Implement Multi-core CPU Acceleration~~ Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration Nov 2, 2024

uniartisan force-pushed the master branch from 1643e28 to 2011dda Compare November 2, 2024 17:55

uniartisan added 7 commits November 3, 2024 16:43

rwkv6: rename to wkv6

f66c75a

rwkv6: support avx2 avx512 armv8 armv9

b4254c5

rwkv6: update cuda file name

e198f7b

rwkv6: rename params

3f75f12

wkv on sycl

2fc42b6

sycl: add some ops

bee1cec

sycl: Enhance OP support judgment

1c58096

uniartisan force-pushed the master branch from 563153a to 1c58096 Compare November 3, 2024 05:43

uniartisan and others added 2 commits November 3, 2024 17:30

Merge branch 'ggerganov:master' into master

042c3e0

wkv6: drop armv9 and tranfer to GGML style

811aa87

uniartisan force-pushed the master branch from e16e2f3 to 811aa87 Compare November 3, 2024 12:55

ggerganov and others added 8 commits November 4, 2024 22:09

flake.lock: Update (ggerganov#10146)

4d26631

metal : minor fixup in FA kernel (ggerganov#10143)

b189630

* metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var

ggml : move CPU backend to a separate file (ggerganov#10144)

89812b1

metal : fix minor string leaks (ggml/1004)

8050d02

cmake : make it possible linking ggml as external lib (ggml/1003)

eb5711c

sync : ggml

153251f

Merge branch 'ggerganov:master' into master

5f79214

fix: update changes to upstream

61c665b

fix: add defualt

9ea34a7

ggerganov reviewed Nov 4, 2024

View reviewed changes

ggml/src/ggml-cpu.c Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/outprod.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/wkv6.cpp Outdated Show resolved Hide resolved

uniartisan and others added 3 commits November 5, 2024 00:42

Update ggml/src/ggml-sycl/outprod.cpp

8c7b4ec

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Update ggml/src/ggml-sycl/wkv6.cpp

bb0685f

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

update the function to use appropriate types

81cb301

uniartisan requested a review from ggerganov November 4, 2024 13:58

fix define error

a878502

ggerganov reviewed Nov 4, 2024

View reviewed changes

ggml/src/ggml-cpu.c Outdated Show resolved Hide resolved

ggml/src/ggml-cpu.c Show resolved Hide resolved

ggml/src/ggml-cpu.c Outdated Show resolved Hide resolved

Update ggml/src/ggml-cpu.c

b816024

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

airMeng approved these changes Nov 4, 2024

View reviewed changes

ggml/src/ggml-sycl/concat.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/outprod.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/wkv6.cpp Outdated Show resolved Hide resolved

uniartisan and others added 6 commits November 5, 2024 01:20

add appropriate asserts

72e4432

move element-wise functions outside

35a1a2d

Update ggml/src/ggml-sycl/concat.cpp

6a1e977

Co-authored-by: Meng, Hengyu <airdldl@163.com>

put the declaration outside the loop

a749ba7

rewrite to be more inline with the common pattern for distributing th…

4693b46

…reads

use recommended way GGML_TENSOR_LOCALS

4574795

uniartisan requested a review from ggerganov November 4, 2024 15:57

uniartisan and others added 2 commits November 5, 2024 02:58

Merge branch 'ggerganov:master' into master

acb1b9d

remove some codes

e264c35

NeoZhangJianyu approved these changes Nov 5, 2024

View reviewed changes

uniartisan and others added 2 commits November 5, 2024 13:31

update lint

623db3b

Merge branch 'ggerganov:master' into master

98e070c

ggerganov approved these changes Nov 5, 2024

View reviewed changes

airMeng merged commit 3bcd40b into ggerganov:master Nov 7, 2024
53 checks passed

Alcpz mentioned this pull request Nov 11, 2024

sycl : Fixes broken build and test-backend-ops #10257

Merged

4 tasks

Alcpz mentioned this pull request Nov 20, 2024

sycl : offload of get_rows set to false #10432

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration #10133

Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration #10133

uniartisan commented Nov 2, 2024 •

edited

Loading

uniartisan commented Nov 2, 2024

ggerganov left a comment

airMeng left a comment

NeoZhangJianyu left a comment

Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration #10133

Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration #10133

Conversation

uniartisan commented Nov 2, 2024 • edited Loading

Overview

uniartisan commented Nov 2, 2024

ggerganov left a comment

Choose a reason for hiding this comment

airMeng left a comment

Choose a reason for hiding this comment

NeoZhangJianyu left a comment

Choose a reason for hiding this comment

uniartisan commented Nov 2, 2024 •

edited

Loading