`llama_supports_rpc()` function #7647

martindevans · 2024-05-30T15:04:02Z

Added llama_supports_rpc function to test for RPC support at runtime. This is useful for libraries such as LLamaSharp which need to check what the binaries were compiled with before trying to use certain features.

slaren · 2024-05-30T16:48:58Z

After #7640 we should consider including the RPC backend in all the llama.cpp builds by default. Then this function wouldn't be necessary.

llama.h

rgerganov · 2024-05-31T06:28:48Z

We already have ggml_cpu_has_rpc() in ggml.h, does that work for you?

martindevans · 2024-05-31T13:01:08Z

Ah I hadn't noticed that GGML function. If that's equivalent to what I've added here that should be fine. LLamaSharp (and I would guess other wrappers) doesn't usually expose GGML functions, but if this is only temporary anyway then it's fine.

mofosyne · 2024-06-09T02:42:09Z

@martindevans just following up on this PR as it was marked merge ready. Is the CI issue due to your code change, if not then resync against last known master with working ci commit.

martindevans · 2024-06-09T15:19:46Z

Well I managed to make a mess of that merge (fixed now I think)! 😨

Is there still interest in merging this? The majority of the feedback seemed to be that this was going to be temporary, and wasn't even really needed since there's already an equivalent.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

github-actions · 2024-06-10T00:55:00Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 542 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8645.62ms p(95)=21623.48ms fails=, finish reason: stop=489 truncated=53
Prompt processing (pp): avg=97.76tk/s p(95)=387.62tk/s
Token generation (tg): avg=36.07tk/s p(95)=48.38tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=feature/llama_supports_rpc commit=a79da45dca532bb7c539e5f147302d075c9b106f

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 542 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717980264 --> 1717980894
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 290.3, 290.3, 290.3, 290.3, 290.3, 848.98, 848.98, 848.98, 848.98, 848.98, 856.75, 856.75, 856.75, 856.75, 856.75, 871.99, 871.99, 871.99, 871.99, 871.99, 938.53, 938.53, 938.53, 938.53, 938.53, 948.6, 948.6, 948.6, 948.6, 948.6, 941.9, 941.9, 941.9, 941.9, 941.9, 958.95, 958.95, 958.95, 958.95, 958.95, 969.37, 969.37, 969.37, 969.37, 969.37, 965.34, 965.34, 965.34, 965.34, 965.34, 983.18, 983.18, 983.18, 983.18, 983.18, 983.36, 983.36, 983.36, 983.36, 983.36, 993.67, 993.67, 993.67, 993.67, 993.67, 999.64, 999.64, 999.64, 999.64, 999.64, 971.35, 971.35, 971.35, 971.35, 971.35, 971.18, 971.18, 971.18, 971.18, 971.18, 972.05, 972.05, 972.05, 972.05, 972.05, 960.41, 960.41, 960.41, 960.41, 960.41, 973.92, 973.92, 973.92, 973.92, 973.92, 970.72, 970.72, 970.72, 970.72, 970.72, 969.44, 969.44, 969.44, 969.44, 969.44, 968.26, 968.26, 968.26, 968.26, 968.26, 966.5, 966.5, 966.5, 966.5, 966.5, 967.17, 967.17, 967.17, 967.17, 967.17, 958.32, 958.32, 958.32, 958.32, 958.32, 955.24, 955.24, 955.24, 955.24, 955.24, 955.94, 955.94, 955.94, 955.94, 955.94, 965.78, 965.78, 965.78, 965.78, 965.78, 959.47, 959.47, 959.47, 959.47, 959.47, 957.11, 957.11, 957.11, 957.11, 957.11, 960.77, 960.77, 960.77, 960.77, 960.77, 958.55, 958.55, 958.55, 958.55, 958.55, 955.13, 955.13, 955.13, 955.13, 955.13, 955.44, 955.44, 955.44, 955.44, 955.44, 959.57, 959.57, 959.57, 959.57, 959.57, 967.32, 967.32, 967.32, 967.32, 967.32, 963.17, 963.17, 963.17, 963.17, 963.17, 958.94, 958.94, 958.94, 958.94, 958.94, 956.93, 956.93, 956.93, 956.93, 956.93, 959.24, 959.24, 959.24, 959.24, 959.24, 959.73, 959.73, 959.73, 959.73, 959.73, 969.11, 969.11, 969.11, 969.11, 969.11, 968.95, 968.95, 968.95, 968.95, 968.95, 965.64, 965.64, 965.64, 965.64, 965.64, 961.64, 961.64, 961.64, 961.64, 961.64, 957.7, 957.7, 957.7, 957.7, 957.7, 956.63, 956.63, 956.63, 956.63, 956.63, 956.9, 956.9, 956.9, 956.9, 956.9, 955.49, 955.49, 955.49, 955.49, 955.49, 952.97, 952.97, 952.97, 952.97, 952.97, 955.25, 955.25, 955.25, 955.25, 955.25, 954.09, 954.09, 954.09, 954.09, 954.09, 954.58, 954.58, 954.58, 954.58, 954.58, 954.08, 954.08, 954.08, 954.08, 954.08, 956.18, 956.18, 956.18, 956.18, 956.18, 954.97, 954.97, 954.97, 954.97, 954.97, 953.81, 953.81, 953.81, 953.81, 953.81, 953.26, 953.26, 953.26, 953.26, 953.26, 953.93, 953.93, 953.93, 953.93, 953.93, 953.22, 953.22, 953.22, 953.22, 953.22, 953.4, 953.4, 953.4, 953.4, 953.4, 953.4]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 542 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717980264 --> 1717980894
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 19.21, 19.21, 19.21, 19.21, 19.21, 44.77, 44.77, 44.77, 44.77, 44.77, 30.39, 30.39, 30.39, 30.39, 30.39, 30.43, 30.43, 30.43, 30.43, 30.43, 32.53, 32.53, 32.53, 32.53, 32.53, 33.12, 33.12, 33.12, 33.12, 33.12, 34.45, 34.45, 34.45, 34.45, 34.45, 35.19, 35.19, 35.19, 35.19, 35.19, 35.29, 35.29, 35.29, 35.29, 35.29, 35.25, 35.25, 35.25, 35.25, 35.25, 34.68, 34.68, 34.68, 34.68, 34.68, 34.55, 34.55, 34.55, 34.55, 34.55, 34.29, 34.29, 34.29, 34.29, 34.29, 33.21, 33.21, 33.21, 33.21, 33.21, 32.22, 32.22, 32.22, 32.22, 32.22, 32.11, 32.11, 32.11, 32.11, 32.11, 30.56, 30.56, 30.56, 30.56, 30.56, 30.67, 30.67, 30.67, 30.67, 30.67, 30.74, 30.74, 30.74, 30.74, 30.74, 30.35, 30.35, 30.35, 30.35, 30.35, 30.35, 30.35, 30.35, 30.35, 30.35, 29.83, 29.83, 29.83, 29.83, 29.83, 29.84, 29.84, 29.84, 29.84, 29.84, 30.03, 30.03, 30.03, 30.03, 30.03, 30.05, 30.05, 30.05, 30.05, 30.05, 30.08, 30.08, 30.08, 30.08, 30.08, 30.22, 30.22, 30.22, 30.22, 30.22, 30.35, 30.35, 30.35, 30.35, 30.35, 30.45, 30.45, 30.45, 30.45, 30.45, 30.55, 30.55, 30.55, 30.55, 30.55, 30.8, 30.8, 30.8, 30.8, 30.8, 30.83, 30.83, 30.83, 30.83, 30.83, 30.83, 30.83, 30.83, 30.83, 30.83, 30.87, 30.87, 30.87, 30.87, 30.87, 30.96, 30.96, 30.96, 30.96, 30.96, 30.88, 30.88, 30.88, 30.88, 30.88, 30.82, 30.82, 30.82, 30.82, 30.82, 30.64, 30.64, 30.64, 30.64, 30.64, 30.67, 30.67, 30.67, 30.67, 30.67, 30.9, 30.9, 30.9, 30.9, 30.9, 31.03, 31.03, 31.03, 31.03, 31.03, 31.12, 31.12, 31.12, 31.12, 31.12, 30.96, 30.96, 30.96, 30.96, 30.96, 30.63, 30.63, 30.63, 30.63, 30.63, 30.37, 30.37, 30.37, 30.37, 30.37, 28.93, 28.93, 28.93, 28.93, 28.93, 28.81, 28.81, 28.81, 28.81, 28.81, 28.79, 28.79, 28.79, 28.79, 28.79, 28.73, 28.73, 28.73, 28.73, 28.73, 28.66, 28.66, 28.66, 28.66, 28.66, 28.67, 28.67, 28.67, 28.67, 28.67, 28.74, 28.74, 28.74, 28.74, 28.74, 28.82, 28.82, 28.82, 28.82, 28.82, 28.82, 28.82, 28.82, 28.82, 28.82, 28.8, 28.8, 28.8, 28.8, 28.8, 28.71, 28.71, 28.71, 28.71, 28.71, 28.78, 28.78, 28.78, 28.78, 28.78, 28.9, 28.9, 28.9, 28.9, 28.9, 29.06, 29.06, 29.06, 29.06, 29.06, 29.16, 29.16, 29.16, 29.16, 29.16, 29.25, 29.25, 29.25, 29.25, 29.25, 29.34]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 542 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717980264 --> 1717980894
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.08, 0.08, 0.08, 0.08, 0.08, 0.42, 0.42, 0.42, 0.42, 0.42, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.1, 0.1, 0.1, 0.1, 0.1, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.25, 0.25, 0.25, 0.25, 0.25, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.28, 0.28, 0.28, 0.28, 0.28, 0.3, 0.3, 0.3, 0.3, 0.3, 0.38, 0.38, 0.38, 0.38, 0.38, 0.3, 0.3, 0.3, 0.3, 0.3, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.27, 0.27, 0.27, 0.27, 0.27, 0.29, 0.29, 0.29, 0.29, 0.29, 0.35, 0.35, 0.35, 0.35, 0.35, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.19, 0.19, 0.19, 0.19, 0.19, 0.3, 0.3, 0.3, 0.3, 0.3, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.25, 0.25, 0.25, 0.25, 0.25, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22, 0.22, 0.22, 0.22, 0.29, 0.29, 0.29, 0.29, 0.29, 0.23, 0.23, 0.23, 0.23, 0.23, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.1, 0.1, 0.1, 0.1, 0.1, 0.32, 0.32, 0.32, 0.32, 0.32, 0.56, 0.56, 0.56, 0.56, 0.56, 0.64, 0.64, 0.64, 0.64, 0.64, 0.61, 0.61, 0.61, 0.61, 0.61, 0.33, 0.33, 0.33, 0.33, 0.33, 0.18, 0.18, 0.18, 0.18, 0.18, 0.29, 0.29, 0.29, 0.29, 0.29, 0.32, 0.32, 0.32, 0.32, 0.32, 0.21, 0.21, 0.21, 0.21, 0.21, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.21, 0.21, 0.21, 0.21, 0.21, 0.28, 0.28, 0.28, 0.28, 0.28, 0.2, 0.2, 0.2, 0.2, 0.2, 0.18, 0.18, 0.18, 0.18, 0.18, 0.22, 0.22, 0.22, 0.22, 0.22, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 542 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717980264 --> 1717980894
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0]

rgerganov · 2024-06-10T07:06:01Z

The majority of the feedback seemed to be that this was going to be temporary, and wasn't even really needed since there's already an equivalent.

I don't think we need this function unless you have some good reasons for not using the existing ggml_cpu_has_rpc().

martindevans · 2024-06-10T12:47:53Z

unless you have some good reasons for not using the existing ggml_cpu_has_rpc().

Just checking - it it exported in the C-API like all of the llama methods? if so, I think it'll work just fine for LLamaSharp to use that and I'll close this PR :)

rgerganov · 2024-06-10T13:46:31Z

yes, it is exported:

$ nm -D libllama.so | grep has_rpc
000000000018c130 T ggml_cpu_has_rpc

martindevans · 2024-06-10T13:50:00Z

Thanks for confirming, in that case I'll close this PR 👍

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label May 30, 2024

mofosyne approved these changes May 30, 2024

View reviewed changes

mofosyne added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label May 30, 2024

ggerganov reviewed May 30, 2024

View reviewed changes

llama.h Outdated Show resolved Hide resolved

martindevans force-pushed the feature/llama_supports_rpc branch from f87e6ac to 9b15621 Compare June 9, 2024 15:21

martindevans and others added 2 commits June 9, 2024 16:22

Added llama_supports_rpc function to test for RPC support at runtime.

6071787

Update llama.h

a79da45

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

martindevans force-pushed the feature/llama_supports_rpc branch from 9b15621 to a79da45 Compare June 9, 2024 15:22

mofosyne removed the build Compilation issues label Jun 10, 2024

martindevans closed this Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`llama_supports_rpc()` function #7647

`llama_supports_rpc()` function #7647

martindevans commented May 30, 2024

slaren commented May 30, 2024

rgerganov commented May 31, 2024

martindevans commented May 31, 2024

mofosyne commented Jun 9, 2024

martindevans commented Jun 9, 2024 •

edited

Loading

github-actions bot commented Jun 10, 2024

rgerganov commented Jun 10, 2024

martindevans commented Jun 10, 2024

rgerganov commented Jun 10, 2024

martindevans commented Jun 10, 2024

llama_supports_rpc() function #7647

llama_supports_rpc() function #7647

Conversation

martindevans commented May 30, 2024

slaren commented May 30, 2024

rgerganov commented May 31, 2024

martindevans commented May 31, 2024

mofosyne commented Jun 9, 2024

martindevans commented Jun 9, 2024 • edited Loading

github-actions bot commented Jun 10, 2024

rgerganov commented Jun 10, 2024

martindevans commented Jun 10, 2024

rgerganov commented Jun 10, 2024

martindevans commented Jun 10, 2024

`llama_supports_rpc()` function #7647

`llama_supports_rpc()` function #7647

martindevans commented Jun 9, 2024 •

edited

Loading