llama: Add support for RWKV v7 architecture #11452

MollySophia · 2025-01-27T13:33:36Z

@BlinkDL 's explanation of RWKV v7:
RWKV-7 as a meta-in-context learner
Also there are plenty of tests on trained models (currently 0.1B and 0.4B) posted on his x account. Larger models are coming too in several days.

Current available RWKV v7 model repos in HF format:
https://huggingface.co/SmerkyG/RWKV7-Goose-0.1B-World2.8-HF (not an official published one, tensor names are expected to change in the future)
https://huggingface.co/mollysama/rwkv-7-world-0b4-hf
https://huggingface.co/mollysama/rwkv-7-world-1b5-hf
https://huggingface.co/RWKV-Red-Team/ARWKV-7B-Preview-0.1 (hybrid model with rwkv v7 "attn" and qwen2.5 7B's mlp, distilled from qwen2.5)

This PR contains:

GGML_OP_L2_NORM that applies pytorch-style l2 normalization, along the rows. Tested with CPU, CUDA, SYCL, Vulkan, Metal backends.
GGML_OP_RWKV_WKV7 which is the core of the RWKV v7 architecture. Implemented the naive recurrent wkv7 kernel in CPU, CUDA, SYCL, Vulkan, Metal.
Support inference of RWKV7 and ARWKV7 models.
Simple Metal kernel for the old WKV6.
Skip unused tokens in last layer ffn computation for rwkv models. (8000tps -> 8100tps prefilling for 7B v7 model)

TODO:

(within this PR or in the future) Implement chunkwise wkv7 (and possibly wkv6 as well) as per flash-linear-attention's impl.

Note: Current benchmark of ARWKV7-7B f16

# molly @ molly-workstation in ~/llama.cpp on git:rwkv-v7 x [9:49:42] 
$ ./build-test/bin/llama-bench -m ../ARWKV-7B-Preview-0_1-NoG/ARWKV-7B-Preview-0_1-NoG-F16.gguf -ngl 99
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| arwkv7 7B F16                  |  15.42 GiB |     8.27 B | CUDA       |  99 |         pp512 |      8105.20 ± 15.34 |
| arwkv7 7B F16                  |  15.42 GiB |     8.27 B | CUDA       |  99 |         tg128 |         50.62 ± 0.01 |

build: 76219859 (4579)

which is way faster than RWKV v6 7B when prefilling (still a bit slower than Qwen2.5 7B).

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

There isn't much peformance gain though. Just for more op coverage Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

MollySophia · 2025-01-29T05:45:06Z

Update: added support for fla-hub's rwkv7 hf model format. (https://huggingface.co/fla-hub/rwkv7-1.5B-world)

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

ggerganov · 2025-01-29T08:50:39Z

Just a heads up, this will likely take some time to merge - I want to finish #11213 first and then figure out how to fit RWKV in the new code, likely with it's own implementation of llama_context.

MollySophia · 2025-01-29T08:52:51Z

Just a heads up, this will likely take some time to merge - I want to finish #11213 first and then figure out how to fit RWKV in the new code, likely with it's own implementation of llama_context.

That’s great! I can help with that too

ggerganov · 2025-01-29T08:55:02Z

Great, keep a look at the #11213 PR. It's still very messy, but I hope it will soon start to make sense.

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

They passes on my m2 and m4 devices :| Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

MollySophia and others added 11 commits January 27, 2025 12:22

ggml: Add op l2_norm

a44e9c1

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

WIP: Add support for rwkv v7

666d7c5

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

wkv7 CUDA impl

ea20c3b

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

WKV7 Vulkan & sycl

686899d

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

initial support for apple

9b06a0e

update tests for 1b6 3b 7b

694b5d1

Fix metal wkv6 inference

5f4dc3e

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

ggml: metal unary exp & neg

6c15983

There isn't much peformance gain though. Just for more op coverage Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

WKV7 Metal

d11f487

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

WKV7 Vulkan bugfix

b4e6cf4

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

Add support for ARWKV7 Hybrid models

3b4ec5e

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

MollySophia marked this pull request as ready for review January 27, 2025 13:33

Apply code-format changes

e9c6311

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

MollySophia force-pushed the rwkv-v7 branch from 09d9056 to e9c6311 Compare January 27, 2025 13:38

MollySophia marked this pull request as draft January 27, 2025 14:09

MollySophia marked this pull request as ready for review January 28, 2025 09:10

rwkv7: converter script simplification

6588ccd

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

MollySophia force-pushed the rwkv-v7 branch from 16a8acd to 6588ccd Compare January 29, 2025 05:46

MollySophia added 2 commits January 29, 2025 13:58

Add _set_vocab_rwkv_world as a common function

8cfa106

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

rwkv7: Add some model type variants

01c784a

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

rwkv: skip computing output for unused tokens for hybrid models

f48c27d

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

MollySophia force-pushed the rwkv-v7 branch from 7621985 to f48c27d Compare February 1, 2025 01:53

MollySophia added 3 commits February 1, 2025 10:27

rwkv: better handling for models without gate

e8c4b29

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

remove duplicate break;

0c43fc0

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

RWKV_WKV6 testing: avoid some weird fails

5b37282

They passes on my m2 and m4 devices :| Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama: Add support for RWKV v7 architecture #11452

llama: Add support for RWKV v7 architecture #11452

MollySophia commented Jan 27, 2025 •

edited

Loading

MollySophia commented Jan 29, 2025

ggerganov commented Jan 29, 2025

MollySophia commented Jan 29, 2025

ggerganov commented Jan 29, 2025

llama: Add support for RWKV v7 architecture #11452

Are you sure you want to change the base?

llama: Add support for RWKV v7 architecture #11452

Conversation

MollySophia commented Jan 27, 2025 • edited Loading

MollySophia commented Jan 29, 2025

ggerganov commented Jan 29, 2025

MollySophia commented Jan 29, 2025

ggerganov commented Jan 29, 2025

MollySophia commented Jan 27, 2025 •

edited

Loading