Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : initial ggml-backend integration #4520

Merged
merged 24 commits into from
Dec 21, 2023
Merged
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
8e6735e
llama : initial ggml-backend integration
slaren Dec 17, 2023
0808aa5
add ggml-metal
slaren Dec 19, 2023
9450791
Merge remote-tracking branch 'origin/master' into sl/ggml-backend-int
slaren Dec 19, 2023
0c5ee7c
cuda backend can be used though ggml-backend with LLAMA_GGML_BACKEND_…
slaren Dec 19, 2023
1ac01fb
add ggml_backend_buffer_clear
slaren Dec 19, 2023
c8bd5d8
add ggml_backend_buffer_is_hos, used to avoid copies if possible when…
slaren Dec 19, 2023
72a0c96
disable gpu backends with ngl 0
slaren Dec 20, 2023
d3e7242
more accurate mlock
slaren Dec 20, 2023
c3678ca
unmap offloaded part of the model
slaren Dec 20, 2023
5241045
use posix_fadvise64(.., POSIX_FADV_SEQUENTIAL) to improve performance…
slaren Dec 20, 2023
bcd87ca
update quantize and lora
slaren Dec 20, 2023
24cc321
update session copy/set to use ggml-backend
slaren Dec 20, 2023
f70f94d
use posix_fadvise instead of posix_fadvise64
slaren Dec 20, 2023
6c045a8
ggml_backend_alloc_ctx_tensors_from_buft : remove old print
slaren Dec 20, 2023
5834a25
llama_mmap::align_offset : use pointers instead of references for out…
slaren Dec 20, 2023
ecb23d4
restore progress_callback behavior
slaren Dec 20, 2023
8ed2a8e
move final progress_callback call to load_all_data
slaren Dec 20, 2023
a4e191f
cuda : fix fprintf format string (minor)
ggerganov Dec 21, 2023
a74b1a8
do not offload scales
slaren Dec 21, 2023
6a72c7f
Merge remote-tracking branch 'origin/master' into sl/ggml-backend-int
slaren Dec 21, 2023
cd4167b
llama_mmap : avoid unmapping the same fragments again in the destructor
slaren Dec 21, 2023
16582cd
Merge remote-tracking branch 'origin/master' into sl/ggml-backend-int
slaren Dec 21, 2023
323881e
remove unnecessary unmap
slaren Dec 21, 2023
f4d884f
metal : add default log function that prints to stderr, cleanup code
slaren Dec 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
use posix_fadvise64(.., POSIX_FADV_SEQUENTIAL) to improve performance…
… with mmap
slaren committed Dec 20, 2023
commit 524104581994c3a546733dee3ab7632448003cf2
9 changes: 8 additions & 1 deletion llama.cpp
Original file line number Diff line number Diff line change
@@ -33,6 +33,7 @@
#include <unistd.h>
#if defined(_POSIX_MAPPED_FILES)
#include <sys/mman.h>
#include <fcntl.h>
#endif
#if defined(_POSIX_MEMLOCK_RANGE)
#include <sys/resource.h>
@@ -840,6 +841,10 @@ struct llama_mmap {
// prefetch/readahead impairs performance on NUMA systems
if (numa) { prefetch = 0; }
#ifdef __linux__
if (posix_fadvise64(fd, 0, file->size, POSIX_FADV_SEQUENTIAL)) {
fprintf(stderr, "warning: fadvise(.., POSIX_FADV_SEQUENTIAL) failed: %s\n",
strerror(errno));
}
if (prefetch) { flags |= MAP_POPULATE; }
#endif
addr = mmap(NULL, file->size, PROT_READ, flags, fd, 0);
@@ -2314,7 +2319,9 @@ struct llama_model_loader {
}
*/
// prefetch the whole file - all the data is needed anyway
mapping.reset(new llama_mmap(&file, -1, ggml_is_numa()));
if (use_mmap) {
mapping.reset(new llama_mmap(&file, -1, ggml_is_numa()));
}
}

// for backwards compatibility only