Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge upstream #44

Merged
merged 51 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
8841ce3
llama : switch KQ multiplication to F32 precision by default (#10015)
ggerganov Oct 27, 2024
8125e6c
server : don't overfill the batch during infill (#10018)
ggerganov Oct 28, 2024
524afee
musa: workaround for Guilty Lockup in cleaning src0 (#10042)
yeahdongcn Oct 28, 2024
07028f9
flake.lock: Update (#10063)
ggerganov Oct 28, 2024
61715d5
llama : Add IBM granite template (#10013)
arch-btw Oct 28, 2024
8d8ff71
llama : remove Tail-Free sampling (#10071)
ggerganov Oct 29, 2024
8f275a7
ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the…
cyzero-kim Oct 29, 2024
c5b0f4b
llama : refactor model loader with backend registry (#10026)
slaren Oct 30, 2024
fc83a9e
ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029)
xctan Oct 30, 2024
79a2bc0
convert : more detailed convert lora usage docs (#10065)
richdougherty Oct 30, 2024
6763f71
readme : more lora detail in main example readme (#10064)
richdougherty Oct 30, 2024
b9e02e8
ggml : fix memory leaks when loading invalid gguf files (#10094)
slaren Oct 30, 2024
61408e7
kompute: add backend registry / device interfaces (#10045)
slp Oct 30, 2024
1329c0a
kompute: add mul_mat_q4_k shader (#10097)
slp Oct 31, 2024
dea5e86
ggml : check tensor name lengths in gguf files (#10100)
slaren Oct 31, 2024
0a683e8
server : include scheme when printing URL (#10106)
bakkot Oct 31, 2024
ab3d71f
loader: refactor tensor weights storage (#9935)
kylo5aby Oct 31, 2024
c02e5ab
llama : fix buffer checks for mamba and rwk (#10111)
slaren Oct 31, 2024
1e9f949
quantize : fix --keep-split (#10114)
slaren Oct 31, 2024
85679d3
llama : improve output buffer type selection (#10098)
slaren Oct 31, 2024
e597e50
build: fix build error in Windows env with OneAPI setup (#10107)
kylo5aby Nov 1, 2024
f221d56
ggml : alloc ggml_contexts on the heap (whisper/2525)
ggerganov Nov 1, 2024
815fe72
sync : ggml
ggerganov Nov 1, 2024
1804adb
ggml : remove ggml_scratch (#10121)
ggerganov Nov 1, 2024
d865d14
server : fix smart selection of available slot (#10120)
sasha0552 Nov 1, 2024
ba6f62e
readme : update hot topics
ggerganov Nov 1, 2024
418f5ee
vulkan : improve ggml_vk_create_buffer error handling (#9898)
FanShupei Nov 1, 2024
e991e31
llama : use smart pointers for ggml resources (#10117)
slaren Nov 1, 2024
a6744e4
llama : add simple-chat example (#10124)
slaren Nov 1, 2024
7554aa4
convert-lora : make `--base` optional (#10110)
ngxson Nov 2, 2024
b634f8a
simple-chat : only add bos on first prompt (#10129)
slaren Nov 2, 2024
1926d6e
llama : adjust default context size + print warnings (#10136)
ggerganov Nov 2, 2024
4595041
server : fix endpoint checks (#10135)
ggerganov Nov 2, 2024
42cadc7
server : fix slot selection by lru (#10126)
sasha0552 Nov 2, 2024
9830b69
Add apple arm to presets (#10134)
kohnech Nov 2, 2024
1839f69
flake.lock: Update (#10146)
ggerganov Nov 3, 2024
08828a6
metal : minor fixup in FA kernel (#10143)
ggerganov Nov 3, 2024
9f40989
ggml : move CPU backend to a separate file (#10144)
slaren Nov 3, 2024
e2292aa
metal : fix minor string leaks (ggml/1004)
pminev Nov 1, 2024
284e5b0
cmake : make it possible linking ggml as external lib (ggml/1003)
ykhrustalev Nov 2, 2024
ce027ad
sync : ggml
ggerganov Nov 4, 2024
329ed91
CANN: adjust backend registry refactor. (#10158)
leo-pony Nov 4, 2024
f8e5813
metal : move dequantize templates to beginning of MSL source (#0)
ggerganov Nov 4, 2024
05697f6
metal : simplify f16 and f32 dequant kernels (#0)
ggerganov Nov 4, 2024
ea02c75
cuda : clear error after changing peer access (#10153)
slaren Nov 4, 2024
6a066b9
fix build break on arm64 linux (#10166)
snadampal Nov 4, 2024
9e0ecfb
server : clarify /slots endpoint, add is_processing (#10162)
ngxson Nov 4, 2024
401558b
ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167)
slaren Nov 4, 2024
d5a409e
ggml : fix gelu tables initialization (#10172)
slaren Nov 4, 2024
3407364
Q6_K AVX improvements (#10118)
netrunnereve Nov 4, 2024
a9e8a9a
ggml : fix arch check in bf16_to_fp32 (#10164)
slaren Nov 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ jobs:
name: llama-bin-macos-arm64.zip

macOS-latest-cmake-x64:
runs-on: macos-12
runs-on: macos-13

steps:
- name: Clone
Expand Down
13 changes: 13 additions & 0 deletions CMakePresets.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,23 @@
}
},

{
"name": "arm64-apple-clang", "hidden": true,
"architecture": { "value": "arm64", "strategy": "external" },
"toolset": { "value": "host=x64", "strategy": "external" },
"cacheVariables": {
"CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-apple-clang.cmake"
}
},

{ "name": "arm64-windows-llvm-debug" , "inherits": [ "base", "arm64-windows-llvm", "debug" ] },
{ "name": "arm64-windows-llvm-release", "inherits": [ "base", "arm64-windows-llvm", "reldbg" ] },
{ "name": "arm64-windows-llvm+static-release", "inherits": [ "base", "arm64-windows-llvm", "reldbg", "static" ] },

{ "name": "arm64-apple-clang-debug" , "inherits": [ "base", "arm64-apple-clang", "debug" ] },
{ "name": "arm64-apple-clang-release" , "inherits": [ "base", "arm64-apple-clang", "reldbg" ] },
{ "name": "arm64-apple-clang+static-release" , "inherits": [ "base", "arm64-apple-clang", "reldbg", "static" ] },

{ "name": "arm64-windows-msvc-debug" , "inherits": [ "base", "arm64-windows-msvc", "debug" ] },
{ "name": "arm64-windows-msvc-release", "inherits": [ "base", "arm64-windows-msvc", "reldbg" ] },
{ "name": "arm64-windows-msvc+static-release", "inherits": [ "base", "arm64-windows-msvc", "reldbg", "static" ] },
Expand Down
27 changes: 14 additions & 13 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Define the default target now so that it is always the first target
BUILD_TARGETS = \
libllava.a \
llama-baby-llama \
llama-batched \
llama-batched-bench \
llama-bench \
Expand Down Expand Up @@ -34,6 +33,7 @@ BUILD_TARGETS = \
llama-save-load-state \
llama-server \
llama-simple \
llama-simple-chat \
llama-speculative \
llama-tokenize \
llama-vdot \
Expand All @@ -55,14 +55,14 @@ TEST_TARGETS = \
tests/test-llama-grammar \
tests/test-log \
tests/test-model-load-cancel \
tests/test-opt \
tests/test-quantize-fns \
tests/test-quantize-perf \
tests/test-rope \
tests/test-sampling \
tests/test-tokenizer-0 \
tests/test-tokenizer-1-bpe \
tests/test-tokenizer-1-spm
# tests/test-opt \

# Legacy build targets that were renamed in #7809, but should still be removed when the project is cleaned
LEGACY_TARGETS_CLEAN = main quantize quantize-stats perplexity imatrix embedding vdot q8dot convert-llama2c-to-ggml \
Expand Down Expand Up @@ -915,6 +915,7 @@ endif # GGML_METAL

OBJ_GGML += \
ggml/src/ggml.o \
ggml/src/ggml-cpu.o \
ggml/src/ggml-alloc.o \
ggml/src/ggml-backend.o \
ggml/src/ggml-quants.o \
Expand All @@ -935,7 +936,6 @@ OBJ_COMMON = \
common/console.o \
common/ngram-cache.o \
common/sampling.o \
common/train.o \
common/build-info.o \
common/json-schema-to-grammar.o

Expand Down Expand Up @@ -1047,6 +1047,12 @@ ggml/src/ggml.o: \
ggml/include/ggml.h
$(CC) $(CFLAGS) -c $< -o $@

ggml/src/ggml-cpu.o: \
ggml/src/ggml-cpu.c \
ggml/include/ggml.h \
ggml/src/ggml-common.h
$(CC) $(CFLAGS) -c $< -o $@

ggml/src/ggml-alloc.o: \
ggml/src/ggml-alloc.c \
ggml/include/ggml.h \
Expand Down Expand Up @@ -1212,11 +1218,6 @@ common/json-schema-to-grammar.o: \
common/json-schema-to-grammar.h
$(CXX) $(CXXFLAGS) -c $< -o $@

common/train.o: \
common/train.cpp \
common/train.h
$(CXX) $(CXXFLAGS) -c $< -o $@

common/ngram-cache.o: \
common/ngram-cache.cpp \
common/ngram-cache.h
Expand Down Expand Up @@ -1287,6 +1288,11 @@ llama-simple: examples/simple/simple.cpp \
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

llama-simple-chat: examples/simple-chat/simple-chat.cpp \
$(OBJ_ALL)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

llama-tokenize: examples/tokenize/tokenize.cpp \
$(OBJ_ALL)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
Expand Down Expand Up @@ -1384,11 +1390,6 @@ llama-bench: examples/llama-bench/llama-bench.cpp \
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

llama-baby-llama: examples/baby-llama/baby-llama.cpp \
$(OBJ_ALL)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

llama-export-lora: examples/export-lora/export-lora.cpp \
$(OBJ_ALL)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
Expand Down
1 change: 1 addition & 0 deletions Package.swift
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ var sources = [
"src/unicode.cpp",
"src/unicode-data.cpp",
"ggml/src/ggml.c",
"ggml/src/ggml-cpu.c",
"ggml/src/ggml-alloc.c",
"ggml/src/ggml-backend.cpp",
"ggml/src/ggml-quants.c",
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)

## Hot topics

- **Hugging Face Inference Endpoints now support GGUF out of the box! https://github.com/ggerganov/llama.cpp/discussions/9669**
- **Introducing GGUF-my-LoRA** https://github.com/ggerganov/llama.cpp/discussions/10123
- Hugging Face Inference Endpoints now support GGUF out of the box! https://github.com/ggerganov/llama.cpp/discussions/9669
- Hugging Face GGUF editor: [discussion](https://github.com/ggerganov/llama.cpp/discussions/9268) | [tool](https://huggingface.co/spaces/CISCai/gguf-editor)

----
Expand Down
Loading
Loading