Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge from upstream #20

Merged
merged 59 commits into from
May 22, 2024
Merged
Changes from 1 commit
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
24ecb58
Revert "server bench: fix bench not waiting for model load (#7284)" (…
phymbert May 16, 2024
9c4fdcb
[Server] Added --verbose option to README [no ci] (#7335)
reuank May 17, 2024
934266c
ggml : rewrite silu and softmax for cpu (#7154)
jart May 17, 2024
ee94172
server : add support for the RPC backend (#7305)
rgerganov May 17, 2024
e18bc6a
convert : fix Qwen/Qwen-7b conversion (#7308)
amd-lalithnc May 17, 2024
359cbe3
ggml-quants, llama : removed excess checks (#7274)
GermanAizek May 17, 2024
29c60d8
tokenization: add warning for double BOS (#7332)
JohannesGaessler May 17, 2024
27b0406
llama : use n_embd_head_v when reshaping kqv (#7327)
fairydreaming May 17, 2024
d273c14
py : convert-hf-to-gguf-update improvements (#7340)
akx May 17, 2024
51e9d02
Added a single test function script and fix debug-test.sh to be more …
mofosyne May 17, 2024
f4bd8b3
rpc : set SO_REUSEADDR for the server socket (#7320)
rgerganov May 17, 2024
82ca83d
ROCm: use native CMake HIP support (#5966)
GZGavinZhao May 17, 2024
0fc1e82
CUDA: faster large batch FA without tensor cores (#7314)
JohannesGaessler May 17, 2024
b43272a
Unicode codepoint flags for custom regexs (#7245)
jaime-m-p May 17, 2024
ef277de
cmake : fix typo in AMDGPU_TARGETS (#7356)
Engininja2 May 18, 2024
0583484
ggml : fix quants nans when all the group weights are very close to z…
slaren May 18, 2024
b49a13d
convert : fix set_vocab_sentencepiece (#6866)
ggerganov May 18, 2024
de73196
github-actions-labeler: initial commit (#7330)
mofosyne May 18, 2024
c1b295e
Update and fix Vulkan soft_max and argsort implementations (#7237)
0cc4m May 18, 2024
ca57e0f
perplexity : ndot progress and show stats with < 100 tasks (#7348)
strawberrymelonpanda May 18, 2024
0f98acf
llama : add support for larger Granite Code Models (20B, 34B) (#7324)
sroecker May 18, 2024
d233b50
cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263)
Engininja2 May 18, 2024
cb42c29
server: correct --threads documentation [no ci] (#7362)
JohannesGaessler May 18, 2024
133d99c
CUDA: deduplicate FlashAttention code (#7352)
JohannesGaessler May 18, 2024
511182e
android : use "ci-android" branch for CI (#7341)
ggerganov May 18, 2024
059031b
ci : re-enable sanitizer runs (#7358)
ggerganov May 18, 2024
f5bf761
Capture CUDA logging output (#7298)
fraxy-v May 18, 2024
854d365
cmake : update android comments (#7341)
ggerganov May 19, 2024
e23b974
labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363)
mofosyne May 19, 2024
ab33f7a
cuda : clear error after buffer allocation failure (#7376)
slaren May 19, 2024
6aade19
Add StableLM2 pre-tokenizer (#7349)
aahouzi May 19, 2024
4185839
server: fix seed being reported back (#7382)
JohannesGaessler May 19, 2024
1b01f06
server: add test for token probs (#7347)
JohannesGaessler May 19, 2024
5ca49cb
ggml: implement quantized KV cache for FA (#7372)
JohannesGaessler May 19, 2024
e4e6f67
ggml : fix another case of quants nans (#7387)
slaren May 19, 2024
f030ec1
Vulkan Embedding Fix (#7360)
0cc4m May 19, 2024
1ea2a00
quantize : fix --keep-split check (#7374)
fredlas May 19, 2024
d359f30
llama : remove MPI backend (#7395)
slaren May 19, 2024
33c8d50
Add provisions for windows support for BF16 code including CMake prov…
Srihari-mcw May 20, 2024
2789baf
tests : fix --keep_split -> --keep-split (#7374)
ggerganov May 20, 2024
e932094
server : return error on too large embedding input (#7389)
ggerganov May 20, 2024
1cc0155
server : tuning tests (#7388)
ggerganov May 20, 2024
65c5820
ggml : add loongarch lsx and lasx support (#6454)
junchao-loongson May 20, 2024
213e90e
ggml-opencl, llama: using reserve() if count already known (#7272)
GermanAizek May 20, 2024
26cd423
Update README.md (#7410)
binganao May 20, 2024
6bf9b66
[SYCL] Update SYCL upscale operation (#7321)
AidanBeltonS May 20, 2024
3bc10cb
server : fix temperature + disable some tests (#7409)
ggerganov May 20, 2024
db10f01
rpc : track allocated buffers (#7411)
rgerganov May 20, 2024
20385ce
perplexity: update README FP16 results [no ci] (#7413)
JohannesGaessler May 20, 2024
fabf30b
llama : remove Persimmon (#7408)
ggerganov May 20, 2024
917dc8c
Tokenizer SPM fixes for phi-3 and llama-spm (#7375)
jaime-m-p May 20, 2024
d7e852c
Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425)
jaime-m-p May 21, 2024
d8ee902
CUDA: deduplicate mmq code (#7397)
JohannesGaessler May 21, 2024
11474e7
examples: cache hf model when --model not provided (#7353)
amirzia May 21, 2024
c3f8d58
tests : test-tokenizer-0.sh print more info (#7402)
ggerganov May 21, 2024
fcf6538
CUDA: fix unused warning in mmq.cu (#7442)
JohannesGaessler May 21, 2024
e402de3
`grammars`: fix resampling logic regression (#7424)
ochafik May 21, 2024
6369bf0
metal : handle F16 inf values, fix FA partial offload (#7434)
ggerganov May 21, 2024
201cc11
llama : add phi3 128K model support (#7225)
liuwei-git May 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
ggml : fix another case of quants nans (ggml-org#7387)
  • Loading branch information
slaren authored May 19, 2024

Verified

This commit was signed with the committer’s verified signature.
sainak Aakash Singh
commit e4e6f67be6a8a697f5f89a28c98934e53c99c359
2 changes: 1 addition & 1 deletion ggml-quants.c
Original file line number Diff line number Diff line change
@@ -1149,7 +1149,7 @@ static float make_qx_quants(int n, int nmax, const float * restrict x, int8_t *
sumlx += w*x[i]*l;
suml2 += w*l*l;
}
float scale = sumlx/suml2;
float scale = suml2 ? sumlx/suml2 : 0.0f;
if (return_early) return suml2 > 0 ? 0.5f*(scale + 1/iscale) : 1/iscale;
float best = scale * sumlx;
for (int is = -9; is <= 9; ++is) {