ggml : unify rope norm/neox #7634

ggerganov · 2024-05-30T08:36:59Z

The RoPE modes NORM and NEOX are technically the same but simply operate on different pairs of dimensions in the head:

# norm
(x[2*i + 0], x[2*i + 1])

# neox
(x[i], x[i + n_dims/2])

However, on master the 2 implementations are quite different due to legacy reasons:

NORM does not support partial rotation, while NEOX does
CPU NORM used cached rope values while NEOX didn't
NEOX supports frequency factors, while NORM didn't
etc.

This PR will normalize the implementation in the 2 modes to make changes in the future easier.

We also remove support for xPos RoPE (ggerganov/ggml#442) since it does not seem to be used

I've also considered removing the GLM mode, but it seems to be used by ChatGLM (ggerganov/ggml#477)
@li-plus Could you confirm if GLM RoPE is still relevant today?

TODO

li-plus · 2024-05-30T08:57:21Z

Could you confirm if GLM RoPE is still relevant today?

No. ChatGLM now uses NEOX-style rope with position ids specified. mode & 4 branch is no longer used and can be completely removed. Also, the n_ctx argument is not needed now.

github-actions · 2024-05-30T09:39:55Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 521 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8953.42ms p(95)=22457.77ms fails=, finish reason: stop=470 truncated=51
Prompt processing (pp): avg=107.74tk/s p(95)=488.54tk/s
Token generation (tg): avg=33.71tk/s p(95)=46.88tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=gg/rope-refactor commit=ddac1ef6813132eb9e817460ef389bf7fe3c12a3

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 521 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717420801 --> 1717421429
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 363.58, 363.58, 363.58, 363.58, 363.58, 792.47, 792.47, 792.47, 792.47, 792.47, 744.78, 744.78, 744.78, 744.78, 744.78, 772.14, 772.14, 772.14, 772.14, 772.14, 843.53, 843.53, 843.53, 843.53, 843.53, 836.99, 836.99, 836.99, 836.99, 836.99, 836.62, 836.62, 836.62, 836.62, 836.62, 856.32, 856.32, 856.32, 856.32, 856.32, 858.28, 858.28, 858.28, 858.28, 858.28, 872.73, 872.73, 872.73, 872.73, 872.73, 897.55, 897.55, 897.55, 897.55, 897.55, 912.3, 912.3, 912.3, 912.3, 912.3, 924.02, 924.02, 924.02, 924.02, 924.02, 940.7, 940.7, 940.7, 940.7, 940.7, 947.09, 947.09, 947.09, 947.09, 947.09, 946.65, 946.65, 946.65, 946.65, 946.65, 945.71, 945.71, 945.71, 945.71, 945.71, 944.02, 944.02, 944.02, 944.02, 944.02, 939.14, 939.14, 939.14, 939.14, 939.14, 897.98, 897.98, 897.98, 897.98, 897.98, 894.54, 894.54, 894.54, 894.54, 894.54, 890.68, 890.68, 890.68, 890.68, 890.68, 894.28, 894.28, 894.28, 894.28, 894.28, 894.46, 894.46, 894.46, 894.46, 894.46, 887.91, 887.91, 887.91, 887.91, 887.91, 886.84, 886.84, 886.84, 886.84, 886.84, 888.08, 888.08, 888.08, 888.08, 888.08, 902.1, 902.1, 902.1, 902.1, 902.1, 897.25, 897.25, 897.25, 897.25, 897.25, 895.45, 895.45, 895.45, 895.45, 895.45, 895.58, 895.58, 895.58, 895.58, 895.58, 899.35, 899.35, 899.35, 899.35, 899.35, 898.62, 898.62, 898.62, 898.62, 898.62, 898.2, 898.2, 898.2, 898.2, 898.2, 899.54, 899.54, 899.54, 899.54, 899.54, 908.86, 908.86, 908.86, 908.86, 908.86, 910.37, 910.37, 910.37, 910.37, 910.37, 906.88, 906.88, 906.88, 906.88, 906.88, 909.99, 909.99, 909.99, 909.99, 909.99, 907.01, 907.01, 907.01, 907.01, 907.01, 906.52, 906.52, 906.52, 906.52, 906.52, 910.25, 910.25, 910.25, 910.25, 910.25, 910.34, 910.34, 910.34, 910.34, 910.34, 917.41, 917.41, 917.41, 917.41, 917.41, 921.0, 921.0, 921.0, 921.0, 921.0, 916.39, 916.39, 916.39, 916.39, 916.39, 914.49, 914.49, 914.49, 914.49, 914.49, 912.73, 912.73, 912.73, 912.73, 912.73, 911.58, 911.58, 911.58, 911.58, 911.58, 914.64, 914.64, 914.64, 914.64, 914.64, 913.57, 913.57, 913.57, 913.57, 913.57, 914.92, 914.92, 914.92, 914.92, 914.92, 914.76, 914.76, 914.76, 914.76, 914.76, 917.07, 917.07, 917.07, 917.07, 917.07, 919.5, 919.5, 919.5, 919.5, 919.5, 917.94, 917.94, 917.94, 917.94, 917.94, 917.68, 917.68, 917.68, 917.68, 917.68, 910.06, 910.06, 910.06, 910.06, 910.06, 909.35, 909.35, 909.35, 909.35, 909.35, 908.97, 908.97, 908.97, 908.97, 908.97, 909.59, 909.59, 909.59, 909.59, 909.59]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 521 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717420801 --> 1717421429
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 40.86, 40.86, 40.86, 40.86, 40.86, 30.18, 30.18, 30.18, 30.18, 30.18, 32.87, 32.87, 32.87, 32.87, 32.87, 34.48, 34.48, 34.48, 34.48, 34.48, 35.65, 35.65, 35.65, 35.65, 35.65, 36.49, 36.49, 36.49, 36.49, 36.49, 36.87, 36.87, 36.87, 36.87, 36.87, 36.73, 36.73, 36.73, 36.73, 36.73, 36.65, 36.65, 36.65, 36.65, 36.65, 36.54, 36.54, 36.54, 36.54, 36.54, 35.83, 35.83, 35.83, 35.83, 35.83, 35.59, 35.59, 35.59, 35.59, 35.59, 34.55, 34.55, 34.55, 34.55, 34.55, 33.15, 33.15, 33.15, 33.15, 33.15, 32.61, 32.61, 32.61, 32.61, 32.61, 31.05, 31.05, 31.05, 31.05, 31.05, 30.94, 30.94, 30.94, 30.94, 30.94, 30.96, 30.96, 30.96, 30.96, 30.96, 31.01, 31.01, 31.01, 31.01, 31.01, 30.6, 30.6, 30.6, 30.6, 30.6, 30.33, 30.33, 30.33, 30.33, 30.33, 30.26, 30.26, 30.26, 30.26, 30.26, 30.22, 30.22, 30.22, 30.22, 30.22, 30.29, 30.29, 30.29, 30.29, 30.29, 30.26, 30.26, 30.26, 30.26, 30.26, 30.53, 30.53, 30.53, 30.53, 30.53, 30.52, 30.52, 30.52, 30.52, 30.52, 30.37, 30.37, 30.37, 30.37, 30.37, 30.38, 30.38, 30.38, 30.38, 30.38, 30.53, 30.53, 30.53, 30.53, 30.53, 30.68, 30.68, 30.68, 30.68, 30.68, 30.74, 30.74, 30.74, 30.74, 30.74, 31.02, 31.02, 31.02, 31.02, 31.02, 31.05, 31.05, 31.05, 31.05, 31.05, 30.97, 30.97, 30.97, 30.97, 30.97, 30.85, 30.85, 30.85, 30.85, 30.85, 30.51, 30.51, 30.51, 30.51, 30.51, 30.03, 30.03, 30.03, 30.03, 30.03, 30.08, 30.08, 30.08, 30.08, 30.08, 30.31, 30.31, 30.31, 30.31, 30.31, 30.38, 30.38, 30.38, 30.38, 30.38, 30.51, 30.51, 30.51, 30.51, 30.51, 30.57, 30.57, 30.57, 30.57, 30.57, 30.42, 30.42, 30.42, 30.42, 30.42, 30.15, 30.15, 30.15, 30.15, 30.15, 29.67, 29.67, 29.67, 29.67, 29.67, 28.99, 28.99, 28.99, 28.99, 28.99, 28.7, 28.7, 28.7, 28.7, 28.7, 28.67, 28.67, 28.67, 28.67, 28.67, 28.67, 28.67, 28.67, 28.67, 28.67, 28.74, 28.74, 28.74, 28.74, 28.74, 28.81, 28.81, 28.81, 28.81, 28.81, 28.9, 28.9, 28.9, 28.9, 28.9, 28.91, 28.91, 28.91, 28.91, 28.91, 28.84, 28.84, 28.84, 28.84, 28.84, 28.79, 28.79, 28.79, 28.79, 28.79, 28.68, 28.68, 28.68, 28.68, 28.68, 28.72, 28.72, 28.72, 28.72, 28.72, 28.87, 28.87, 28.87, 28.87, 28.87, 28.96, 28.96, 28.96, 28.96, 28.96]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 521 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717420801 --> 1717421429
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.07, 0.07, 0.07, 0.07, 0.07, 0.26, 0.26, 0.26, 0.26, 0.26, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.26, 0.26, 0.26, 0.26, 0.26, 0.28, 0.28, 0.28, 0.28, 0.28, 0.34, 0.34, 0.34, 0.34, 0.34, 0.35, 0.35, 0.35, 0.35, 0.35, 0.33, 0.33, 0.33, 0.33, 0.33, 0.28, 0.28, 0.28, 0.28, 0.28, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.31, 0.31, 0.31, 0.31, 0.31, 0.3, 0.3, 0.3, 0.3, 0.3, 0.11, 0.11, 0.11, 0.11, 0.11, 0.25, 0.25, 0.25, 0.25, 0.25, 0.22, 0.22, 0.22, 0.22, 0.22, 0.29, 0.29, 0.29, 0.29, 0.29, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.35, 0.35, 0.35, 0.35, 0.35, 0.27, 0.27, 0.27, 0.27, 0.27, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.3, 0.3, 0.3, 0.3, 0.3, 0.32, 0.32, 0.32, 0.32, 0.32, 0.29, 0.29, 0.29, 0.29, 0.29, 0.21, 0.21, 0.21, 0.21, 0.21, 0.08, 0.08, 0.08, 0.08, 0.08, 0.09, 0.09, 0.09, 0.09, 0.09, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.27, 0.27, 0.27, 0.27, 0.27, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.48, 0.48, 0.48, 0.48, 0.48, 0.36, 0.36, 0.36, 0.36, 0.36, 0.19, 0.19, 0.19, 0.19, 0.19, 0.25, 0.25, 0.25, 0.25, 0.25, 0.24, 0.24, 0.24, 0.24, 0.24, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.32, 0.32, 0.32, 0.32, 0.32, 0.33, 0.33, 0.33, 0.33, 0.33, 0.28, 0.28, 0.28, 0.28, 0.28, 0.17, 0.17, 0.17, 0.17, 0.17, 0.09, 0.09, 0.09, 0.09, 0.09, 0.1, 0.1, 0.1, 0.1, 0.1, 0.17, 0.17, 0.17, 0.17, 0.17]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 521 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717420801 --> 1717421429
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0]

ggml-cuda/rope.cu

ggml.c

ggml-ci

As per: #7634 Signed-off-by: Joe Todd <joe.todd@codeplay.com>

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label May 30, 2024

ggerganov changed the title ~~ggml : unify rope norm/neox (CPU)~~ ggml : unify rope norm/neox May 30, 2024

ggerganov force-pushed the gg/rope-refactor branch from 4739018 to 814d57d Compare May 30, 2024 11:06

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend examples SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 30, 2024

ggerganov marked this pull request as ready for review May 30, 2024 11:30

github-actions bot added python python script changes Kompute https://github.com/KomputeProject/kompute/ labels May 30, 2024

ggerganov force-pushed the gg/rope-refactor branch from 0b564b0 to 6ebee16 Compare May 30, 2024 11:44

mofosyne added Review Complexity : High Generally require indepth knowledge of LLMs or GPUs refactoring Refactoring labels May 30, 2024

mofosyne requested review from slaren and xaedes May 30, 2024 12:21

slaren reviewed Jun 2, 2024

View reviewed changes

ggml-cuda/rope.cu Outdated Show resolved Hide resolved

ggml.c Show resolved Hide resolved

ggerganov added 9 commits June 2, 2024 20:09

ggml : unify rope norm/neox (CPU)

2fd31fe

ggml : fix compile warning

e2370c8

ggml : remove GLM rope mode

cbe4f5f

ggml-ci

metal : better rope implementation

3035c2d

ggml-ci

cuda : better rope implementation

c989fd0

ggml-ci

naming : n_orig_ctx -> n_ctx_orig

572446b

ggml-ci

dev : add reminders to update backends

437d2d6

ggml-ci

vulkan : fix ggml_rope_ext() usage

61e0a84

cuda : fix array size + indents

ddac1ef

ggml-ci

ggerganov force-pushed the gg/rope-refactor branch from 4b1e676 to ddac1ef Compare June 2, 2024 17:09

ggerganov merged commit 2b33896 into master Jun 5, 2024
83 checks passed

ggerganov deleted the gg/rope-refactor branch June 5, 2024 08:29

0cc4m mentioned this pull request Jun 7, 2024

Update Vulkan RoPE implementation #7818

Merged

joeatodd added a commit that referenced this pull request Jun 13, 2024

[SYCL] unify rope norm/neox

9b81b57

As per: #7634 Signed-off-by: Joe Todd <joe.todd@codeplay.com>

joeatodd mentioned this pull request Jun 13, 2024

sycl-exp : unify rope neox/norm #7919

Merged

4 tasks

zhentaoyu mentioned this pull request Jun 27, 2024

[SYCL] Update SYCL-Rope op and Refactor #8157

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : unify rope norm/neox #7634

ggml : unify rope norm/neox #7634

ggerganov commented May 30, 2024 •

edited

Loading

li-plus commented May 30, 2024

github-actions bot commented May 30, 2024 •

edited

Loading

ggml : unify rope norm/neox #7634

ggml : unify rope norm/neox #7634

Conversation

ggerganov commented May 30, 2024 • edited Loading

TODO

li-plus commented May 30, 2024

github-actions bot commented May 30, 2024 • edited Loading

ggerganov commented May 30, 2024 •

edited

Loading

github-actions bot commented May 30, 2024 •

edited

Loading