Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama_supports_rpc() function #7647

Closed

Conversation

martindevans
Copy link
Contributor

Added llama_supports_rpc function to test for RPC support at runtime. This is useful for libraries such as LLamaSharp which need to check what the binaries were compiled with before trying to use certain features.

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label May 30, 2024
@mofosyne mofosyne added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label May 30, 2024
@slaren
Copy link
Collaborator

slaren commented May 30, 2024

After #7640 we should consider including the RPC backend in all the llama.cpp builds by default. Then this function wouldn't be necessary.

llama.h Outdated Show resolved Hide resolved
@rgerganov
Copy link
Collaborator

We already have ggml_cpu_has_rpc() in ggml.h, does that work for you?

@martindevans
Copy link
Contributor Author

Ah I hadn't noticed that GGML function. If that's equivalent to what I've added here that should be fine. LLamaSharp (and I would guess other wrappers) doesn't usually expose GGML functions, but if this is only temporary anyway then it's fine.

@mofosyne
Copy link
Collaborator

mofosyne commented Jun 9, 2024

@martindevans just following up on this PR as it was marked merge ready. Is the CI issue due to your code change, if not then resync against last known master with working ci commit.

@github-actions github-actions bot added build Compilation issues script Script related testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment Vulkan Issues specific to the Vulkan backend examples python python script changes devops improvements to build systems and github actions server ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Kompute https://github.com/KomputeProject/kompute/ labels Jun 9, 2024
@martindevans
Copy link
Contributor Author

martindevans commented Jun 9, 2024

Well I managed to make a mess of that merge (fixed now I think)! 😨

Is there still interest in merging this? The majority of the feedback seemed to be that this was going to be temporary, and wasn't even really needed since there's already an equivalent.

@martindevans martindevans force-pushed the feature/llama_supports_rpc branch from f87e6ac to 9b15621 Compare June 9, 2024 15:21
martindevans and others added 2 commits June 9, 2024 16:22
@martindevans martindevans force-pushed the feature/llama_supports_rpc branch from 9b15621 to a79da45 Compare June 9, 2024 15:22
Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 542 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8645.62ms p(95)=21623.48ms fails=, finish reason: stop=489 truncated=53
  • Prompt processing (pp): avg=97.76tk/s p(95)=387.62tk/s
  • Token generation (tg): avg=36.07tk/s p(95)=48.38tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=feature/llama_supports_rpc commit=a79da45dca532bb7c539e5f147302d075c9b106f

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 542 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717980264 --> 1717980894
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 290.3, 290.3, 290.3, 290.3, 290.3, 848.98, 848.98, 848.98, 848.98, 848.98, 856.75, 856.75, 856.75, 856.75, 856.75, 871.99, 871.99, 871.99, 871.99, 871.99, 938.53, 938.53, 938.53, 938.53, 938.53, 948.6, 948.6, 948.6, 948.6, 948.6, 941.9, 941.9, 941.9, 941.9, 941.9, 958.95, 958.95, 958.95, 958.95, 958.95, 969.37, 969.37, 969.37, 969.37, 969.37, 965.34, 965.34, 965.34, 965.34, 965.34, 983.18, 983.18, 983.18, 983.18, 983.18, 983.36, 983.36, 983.36, 983.36, 983.36, 993.67, 993.67, 993.67, 993.67, 993.67, 999.64, 999.64, 999.64, 999.64, 999.64, 971.35, 971.35, 971.35, 971.35, 971.35, 971.18, 971.18, 971.18, 971.18, 971.18, 972.05, 972.05, 972.05, 972.05, 972.05, 960.41, 960.41, 960.41, 960.41, 960.41, 973.92, 973.92, 973.92, 973.92, 973.92, 970.72, 970.72, 970.72, 970.72, 970.72, 969.44, 969.44, 969.44, 969.44, 969.44, 968.26, 968.26, 968.26, 968.26, 968.26, 966.5, 966.5, 966.5, 966.5, 966.5, 967.17, 967.17, 967.17, 967.17, 967.17, 958.32, 958.32, 958.32, 958.32, 958.32, 955.24, 955.24, 955.24, 955.24, 955.24, 955.94, 955.94, 955.94, 955.94, 955.94, 965.78, 965.78, 965.78, 965.78, 965.78, 959.47, 959.47, 959.47, 959.47, 959.47, 957.11, 957.11, 957.11, 957.11, 957.11, 960.77, 960.77, 960.77, 960.77, 960.77, 958.55, 958.55, 958.55, 958.55, 958.55, 955.13, 955.13, 955.13, 955.13, 955.13, 955.44, 955.44, 955.44, 955.44, 955.44, 959.57, 959.57, 959.57, 959.57, 959.57, 967.32, 967.32, 967.32, 967.32, 967.32, 963.17, 963.17, 963.17, 963.17, 963.17, 958.94, 958.94, 958.94, 958.94, 958.94, 956.93, 956.93, 956.93, 956.93, 956.93, 959.24, 959.24, 959.24, 959.24, 959.24, 959.73, 959.73, 959.73, 959.73, 959.73, 969.11, 969.11, 969.11, 969.11, 969.11, 968.95, 968.95, 968.95, 968.95, 968.95, 965.64, 965.64, 965.64, 965.64, 965.64, 961.64, 961.64, 961.64, 961.64, 961.64, 957.7, 957.7, 957.7, 957.7, 957.7, 956.63, 956.63, 956.63, 956.63, 956.63, 956.9, 956.9, 956.9, 956.9, 956.9, 955.49, 955.49, 955.49, 955.49, 955.49, 952.97, 952.97, 952.97, 952.97, 952.97, 955.25, 955.25, 955.25, 955.25, 955.25, 954.09, 954.09, 954.09, 954.09, 954.09, 954.58, 954.58, 954.58, 954.58, 954.58, 954.08, 954.08, 954.08, 954.08, 954.08, 956.18, 956.18, 956.18, 956.18, 956.18, 954.97, 954.97, 954.97, 954.97, 954.97, 953.81, 953.81, 953.81, 953.81, 953.81, 953.26, 953.26, 953.26, 953.26, 953.26, 953.93, 953.93, 953.93, 953.93, 953.93, 953.22, 953.22, 953.22, 953.22, 953.22, 953.4, 953.4, 953.4, 953.4, 953.4, 953.4]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 542 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717980264 --> 1717980894
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 19.21, 19.21, 19.21, 19.21, 19.21, 44.77, 44.77, 44.77, 44.77, 44.77, 30.39, 30.39, 30.39, 30.39, 30.39, 30.43, 30.43, 30.43, 30.43, 30.43, 32.53, 32.53, 32.53, 32.53, 32.53, 33.12, 33.12, 33.12, 33.12, 33.12, 34.45, 34.45, 34.45, 34.45, 34.45, 35.19, 35.19, 35.19, 35.19, 35.19, 35.29, 35.29, 35.29, 35.29, 35.29, 35.25, 35.25, 35.25, 35.25, 35.25, 34.68, 34.68, 34.68, 34.68, 34.68, 34.55, 34.55, 34.55, 34.55, 34.55, 34.29, 34.29, 34.29, 34.29, 34.29, 33.21, 33.21, 33.21, 33.21, 33.21, 32.22, 32.22, 32.22, 32.22, 32.22, 32.11, 32.11, 32.11, 32.11, 32.11, 30.56, 30.56, 30.56, 30.56, 30.56, 30.67, 30.67, 30.67, 30.67, 30.67, 30.74, 30.74, 30.74, 30.74, 30.74, 30.35, 30.35, 30.35, 30.35, 30.35, 30.35, 30.35, 30.35, 30.35, 30.35, 29.83, 29.83, 29.83, 29.83, 29.83, 29.84, 29.84, 29.84, 29.84, 29.84, 30.03, 30.03, 30.03, 30.03, 30.03, 30.05, 30.05, 30.05, 30.05, 30.05, 30.08, 30.08, 30.08, 30.08, 30.08, 30.22, 30.22, 30.22, 30.22, 30.22, 30.35, 30.35, 30.35, 30.35, 30.35, 30.45, 30.45, 30.45, 30.45, 30.45, 30.55, 30.55, 30.55, 30.55, 30.55, 30.8, 30.8, 30.8, 30.8, 30.8, 30.83, 30.83, 30.83, 30.83, 30.83, 30.83, 30.83, 30.83, 30.83, 30.83, 30.87, 30.87, 30.87, 30.87, 30.87, 30.96, 30.96, 30.96, 30.96, 30.96, 30.88, 30.88, 30.88, 30.88, 30.88, 30.82, 30.82, 30.82, 30.82, 30.82, 30.64, 30.64, 30.64, 30.64, 30.64, 30.67, 30.67, 30.67, 30.67, 30.67, 30.9, 30.9, 30.9, 30.9, 30.9, 31.03, 31.03, 31.03, 31.03, 31.03, 31.12, 31.12, 31.12, 31.12, 31.12, 30.96, 30.96, 30.96, 30.96, 30.96, 30.63, 30.63, 30.63, 30.63, 30.63, 30.37, 30.37, 30.37, 30.37, 30.37, 28.93, 28.93, 28.93, 28.93, 28.93, 28.81, 28.81, 28.81, 28.81, 28.81, 28.79, 28.79, 28.79, 28.79, 28.79, 28.73, 28.73, 28.73, 28.73, 28.73, 28.66, 28.66, 28.66, 28.66, 28.66, 28.67, 28.67, 28.67, 28.67, 28.67, 28.74, 28.74, 28.74, 28.74, 28.74, 28.82, 28.82, 28.82, 28.82, 28.82, 28.82, 28.82, 28.82, 28.82, 28.82, 28.8, 28.8, 28.8, 28.8, 28.8, 28.71, 28.71, 28.71, 28.71, 28.71, 28.78, 28.78, 28.78, 28.78, 28.78, 28.9, 28.9, 28.9, 28.9, 28.9, 29.06, 29.06, 29.06, 29.06, 29.06, 29.16, 29.16, 29.16, 29.16, 29.16, 29.25, 29.25, 29.25, 29.25, 29.25, 29.34]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 542 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717980264 --> 1717980894
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.08, 0.08, 0.08, 0.08, 0.08, 0.42, 0.42, 0.42, 0.42, 0.42, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.1, 0.1, 0.1, 0.1, 0.1, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.25, 0.25, 0.25, 0.25, 0.25, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.28, 0.28, 0.28, 0.28, 0.28, 0.3, 0.3, 0.3, 0.3, 0.3, 0.38, 0.38, 0.38, 0.38, 0.38, 0.3, 0.3, 0.3, 0.3, 0.3, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.27, 0.27, 0.27, 0.27, 0.27, 0.29, 0.29, 0.29, 0.29, 0.29, 0.35, 0.35, 0.35, 0.35, 0.35, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.19, 0.19, 0.19, 0.19, 0.19, 0.3, 0.3, 0.3, 0.3, 0.3, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.25, 0.25, 0.25, 0.25, 0.25, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22, 0.22, 0.22, 0.22, 0.29, 0.29, 0.29, 0.29, 0.29, 0.23, 0.23, 0.23, 0.23, 0.23, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.09, 0.1, 0.1, 0.1, 0.1, 0.1, 0.32, 0.32, 0.32, 0.32, 0.32, 0.56, 0.56, 0.56, 0.56, 0.56, 0.64, 0.64, 0.64, 0.64, 0.64, 0.61, 0.61, 0.61, 0.61, 0.61, 0.33, 0.33, 0.33, 0.33, 0.33, 0.18, 0.18, 0.18, 0.18, 0.18, 0.29, 0.29, 0.29, 0.29, 0.29, 0.32, 0.32, 0.32, 0.32, 0.32, 0.21, 0.21, 0.21, 0.21, 0.21, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.21, 0.21, 0.21, 0.21, 0.21, 0.28, 0.28, 0.28, 0.28, 0.28, 0.2, 0.2, 0.2, 0.2, 0.2, 0.18, 0.18, 0.18, 0.18, 0.18, 0.22, 0.22, 0.22, 0.22, 0.22, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 542 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717980264 --> 1717980894
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0]
                    
Loading

@rgerganov
Copy link
Collaborator

The majority of the feedback seemed to be that this was going to be temporary, and wasn't even really needed since there's already an equivalent.

I don't think we need this function unless you have some good reasons for not using the existing ggml_cpu_has_rpc().

@mofosyne mofosyne removed the build Compilation issues label Jun 10, 2024
@mofosyne mofosyne added server and removed script Script related testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment Vulkan Issues specific to the Vulkan backend examples Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix python python script changes devops improvements to build systems and github actions server ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Kompute https://github.com/KomputeProject/kompute/ merge ready indicates that this may be ready to merge soon and is just holding out in case of objections labels Jun 10, 2024
@martindevans
Copy link
Contributor Author

unless you have some good reasons for not using the existing ggml_cpu_has_rpc().

Just checking - it it exported in the C-API like all of the llama methods? if so, I think it'll work just fine for LLamaSharp to use that and I'll close this PR :)

@rgerganov
Copy link
Collaborator

yes, it is exported:

$ nm -D libllama.so | grep has_rpc
000000000018c130 T ggml_cpu_has_rpc

@martindevans
Copy link
Contributor Author

Thanks for confirming, in that case I'll close this PR 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants