Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metal : use residency sets #11427

Merged
merged 5 commits into from
Jan 26, 2025
Merged

metal : use residency sets #11427

merged 5 commits into from
Jan 26, 2025

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Jan 26, 2025

fix #10119

Using residency sets makes the allocated memory stay wired and eliminates almost completely the overhead observed in #10119. For example, on M2 Ultra, using 7B Q8_0 model the requests are ~250ms faster thanks to this change. It seems it is not necessary to attach the residency sets to the command queue and buffers, so the change is rather simple. For each buffer, we create an associated MTLResidencySet and add the MTLBuffer objects to it. After that we commit it and request residency:

for (int i = 0; i < ctx->n_buffers; i++) {
[ctx->rset addAllocation:ctx->buffers[i].metal];
}
[ctx->rset commit];
[ctx->rset requestResidency];

build: b9126fe (4561)

Model Test t/s master t/s gg/metal-residency-sets Speedup
llama 3B F16 pp512 3289.51 3286.29 1.00
llama 3B F16 tg128 73.28 73.35 1.00
llama 3B Q4_0 pp512 2999.71 3002.93 1.00
llama 3B Q4_0 tg128 165.83 166.03 1.00
llama 3B Q8_0 pp512 2958.32 2960.69 1.00
llama 3B Q8_0 tg128 123.61 123.96 1.00

Metal backend changes

Checks the environment variable GGML_METAL_NO_RESIDENCY. If set, then no residency sets will be created, allowing the GPU memory to be collected by the OS after 1 second of inactivity. Generally, this should rarely be needed as it hurts the performance of the application, but keeping support just in case.

@ggerganov ggerganov force-pushed the gg/metal-residency-sets branch from febb813 to 4dad9fa Compare January 26, 2025 10:39
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jan 26, 2025
@ggerganov
Copy link
Owner Author

ggerganov commented Jan 26, 2025

Great news - this change finally resolves the annoying overhead that I was observing. The only remaining question is how to implement this to be compatible with macOS < 15.0.

Any suggestions?

Edit: resolved

@ggerganov ggerganov force-pushed the gg/metal-residency-sets branch from 21850f6 to 2674f02 Compare January 26, 2025 14:27
@github-actions github-actions bot added the build Compilation issues label Jan 26, 2025
@ggerganov ggerganov changed the base branch from gg/idle to master January 26, 2025 14:30
@ggerganov ggerganov marked this pull request as ready for review January 26, 2025 14:41
@ggerganov ggerganov merged commit 178a7eb into master Jan 26, 2025
51 checks passed
@ggerganov ggerganov deleted the gg/metal-residency-sets branch January 26, 2025 18:06
Animaxx added a commit to Animaxx/llama.cpp that referenced this pull request Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) build Compilation issues ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant