Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge upstream #34

Merged
merged 70 commits into from
Aug 28, 2024
Merged
Changes from 1 commit
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
5ef07e2
server : handle models with missing EOS token (#8997)
ggerganov Aug 12, 2024
d3ae0ee
py : fix requirements check '==' -> '~=' (#8982)
ggerganov Aug 12, 2024
2589292
Fix a spelling mistake (#9001)
Septa2112 Aug 12, 2024
df5478f
ggml: fix div-by-zero (#9003)
DavidKorczynski Aug 12, 2024
1262e7e
grammar-parser : fix possible null-deref (#9004)
DavidKorczynski Aug 12, 2024
84eb2f4
docs: introduce gpustack and gguf-parser (#8873)
thxCode Aug 12, 2024
0fd93cd
llama : model-based max number of graph nodes calculation (#8970)
nicoboss Aug 12, 2024
1f67436
ci : enable RPC in all of the released builds (#9006)
rgerganov Aug 12, 2024
fc4ca27
ci : fix github workflow vulnerable to script injection (#9008)
diogoteles08 Aug 12, 2024
828d6ff
export-lora : throw error if lora is quantized (#9002)
ngxson Aug 13, 2024
06943a6
ggml : move rope type enum to ggml.h (#8949)
danbev Aug 13, 2024
43bdd3c
cmake : remove unused option GGML_CURL (#9011)
ggerganov Aug 14, 2024
98a532d
server : fix segfault on long system prompt (#8987)
compilade Aug 14, 2024
5fd89a7
Vulkan Optimizations and Fixes (#8959)
0cc4m Aug 14, 2024
234b306
server : init stop and error fields of the result struct (#9026)
jpodivin Aug 15, 2024
d5492f0
ci : disable bench workflow (#9010)
ggerganov Aug 15, 2024
6bda7ce
llama : add pre-tokenizer regexes for BLOOM and gpt3-finnish (#8850)
Exploder98 Aug 15, 2024
4af8420
common : remove duplicate function llama_should_add_bos_token (#8778)
kylo5aby Aug 15, 2024
37501d9
server : fix duplicated n_predict key in the generation_settings (#8994)
snowyu Aug 15, 2024
4b9afbb
retrieval : fix memory leak in retrieval query handling (#8955)
gtygo Aug 15, 2024
e3f6fd5
ggml : dynamic ggml_sched_max_splits based on graph_size (#9047)
nicoboss Aug 16, 2024
2a24c8c
Add Nemotron/Minitron GGUF Conversion & Inference Support (#8922)
suhara Aug 16, 2024
fb487bb
common : add support for cpu_get_num_physical_cores() on Windows (#8771)
Septa2112 Aug 16, 2024
c679e0c
llama : add EXAONE model support (#9025)
mscheong01 Aug 16, 2024
23fd453
gguf-py : bump version from 0.9.1 to 0.10.0 (#9051)
compilade Aug 16, 2024
c8ddce8
Fix inference example lacks required parameters (#9035)
Aisuko Aug 16, 2024
ee2984b
py : fix wrong input type for raw_dtype in ggml to gguf scripts (#8928)
farbodbj Aug 16, 2024
d565bb2
llava : support MiniCPM-V-2.6 (#8967)
tc-mb Aug 16, 2024
8b3befc
server : refactor middleware and /health endpoint (#9056)
ngxson Aug 16, 2024
2fb9267
Fix incorrect use of ctx_split for bias tensors (#9063)
suhara Aug 17, 2024
2339a0b
tests : add integration test for lora adapters (#8957)
ltoniazzi Aug 18, 2024
554b049
flake.lock: Update (#9068)
ggerganov Aug 18, 2024
18eaf29
rpc : prevent crashes on invalid input (#9040)
rgerganov Aug 19, 2024
1b6ff90
rpc : print error message when failed to connect endpoint (#9042)
rgerganov Aug 19, 2024
cfac111
cann: add doc for cann backend (#8867)
wangshuai09 Aug 19, 2024
90db814
tests : add missing comma in grammar integration tests (#9099)
fairydreaming Aug 20, 2024
4f8d19f
[SYCL] Fix SYCL `im2col` and `convert` Overflow with Large Dims (#9052)
zhentaoyu Aug 20, 2024
50addec
[SYCL] fallback mmvq (#9088)
airMeng Aug 20, 2024
2f3c146
llava: Add ACC OP for GPU acceleration to the Vulkan backend in the L…
cyzero-kim Aug 20, 2024
8455340
llama : std::move llm_bigram_bpe from work_queue (#9062)
danbev Aug 21, 2024
f63f603
llava : zero-initialize clip_ctx structure fields with aggregate init…
fairydreaming Aug 21, 2024
b40eb84
llama : support for `falcon-mamba` architecture (#9074)
younesbelkada Aug 21, 2024
fc54ef0
server : support reading arguments from environment variables (#9105)
ngxson Aug 21, 2024
a1631e5
llama : simplify Mamba with advanced batch splits (#8526)
compilade Aug 21, 2024
1731d42
[SYCL] Add oneDNN primitive support (#9091)
luoyu-intel Aug 22, 2024
11b84eb
[SYCL] Add a space to supress a cmake warning (#9133)
qnixsynapse Aug 22, 2024
a07c32e
llama : use F32 precision in GLM4 attention and no FA (#9130)
piDack Aug 23, 2024
3ba780e
lora : fix llama conversion script with ROPE_FREQS (#9117)
ngxson Aug 23, 2024
8f824ff
quantize : fix typo in usage help of `quantize.cpp` (#9145)
joaodinissf Aug 24, 2024
e11bd85
CPU/CUDA: Gemma 2 FlashAttention support (#8542)
JohannesGaessler Aug 24, 2024
f91fc56
CUDA: fix Gemma 2 numerical issues for FA (#9166)
JohannesGaessler Aug 25, 2024
93bc383
common: fixed not working find argument --n-gpu-layers-draft (#9175)
GermanAizek Aug 25, 2024
436787f
llama : fix time complexity of string replacement (#9163)
jart Aug 26, 2024
f12ceac
ggml-ci : try to improve build time (#9160)
slaren Aug 26, 2024
0c41e03
metal : gemma2 flash attention support (#9159)
slaren Aug 26, 2024
e5edb21
server : update deps (#9183)
ggerganov Aug 26, 2024
7a3df79
ci : add VULKAN support to ggml-ci (#9055)
ggerganov Aug 26, 2024
879275a
tests : fix compile warnings for unreachable code (#9185)
ggerganov Aug 26, 2024
fc18425
ggml : add SSM Metal kernels (#8546)
ggerganov Aug 26, 2024
06658ad
metal : separate scale and mask from QKT in FA kernel (#9189)
ggerganov Aug 26, 2024
7d787ed
ggml : do not crash when quantizing q4_x_x with an imatrix (#9192)
slaren Aug 26, 2024
ad76569
common : Update stb_image.h to latest version (#9161)
arch-btw Aug 27, 2024
75e1dbb
llama : fix llama3.1 rope_freqs not respecting custom head_dim (#9141)
nyxkrage Aug 27, 2024
2e59d61
llama : fix ChatGLM4 wrong shape (#9194)
CausalLM Aug 27, 2024
a77feb5
server : add some missing env variables (#9116)
ngxson Aug 27, 2024
78eb487
llama : fix qs.n_attention_wv for DeepSeek-V2 (#9156)
compilade Aug 27, 2024
3246fe8
Fix minicpm example directory (#9111)
xyb Aug 27, 2024
231cff5
sync : ggml
ggerganov Aug 27, 2024
20f1789
vulkan : fix build (#0)
ggerganov Aug 27, 2024
23e298c
Merge branch 'ggerganov:master' into master
l3utterfly Aug 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
quantize : fix typo in usage help of quantize.cpp (ggml-org#9145)
  • Loading branch information
joaodinissf authored Aug 24, 2024
commit 8f824ffe8ee1feadd14428f1dda1283fa3b933be
2 changes: 1 addition & 1 deletion examples/quantize/quantize.cpp
Original file line number Diff line number Diff line change
@@ -104,7 +104,7 @@ static void usage(const char * executable) {
printf(" --exclude-weights tensor_name: use importance matrix for this/these tensor(s)\n");
printf(" --output-tensor-type ggml_type: use this ggml_type for the output.weight tensor\n");
printf(" --token-embedding-type ggml_type: use this ggml_type for the token embeddings tensor\n");
printf(" --keep-split: will generate quatized model in the same shards as input");
printf(" --keep-split: will generate quantized model in the same shards as input\n");
printf(" --override-kv KEY=TYPE:VALUE\n");
printf(" Advanced option to override model metadata by key in the quantized model. May be specified multiple times.\n");
printf("Note: --include-weights and --exclude-weights cannot be used together\n");