Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge from upstream #45

Merged
merged 66 commits into from
Nov 16, 2024
Merged
Changes from 1 commit
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
b8deef0
llama : add <|tool_call|> formatting to Granite template (#10177)
gabe-l-hart Nov 5, 2024
a1eaf6a
metal : add quantized FA support (#10149)
ggerganov Nov 6, 2024
1dc04b2
ggml : adjust is_first_call init value (#10193)
ggerganov Nov 6, 2024
94d8cb8
metal : fix from ptr buffer name (#10189)
slaren Nov 6, 2024
b11f9ba
server : remove hack for extra parallel slot (#10187)
ggerganov Nov 6, 2024
5c333e0
metal : add BF16 support (#8439)
ggerganov Nov 6, 2024
3bcd40b
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acc…
uniartisan Nov 7, 2024
2319126
fix q4_0_8_8 format for corrupted tokens issue (#10198)
snadampal Nov 7, 2024
5107e8c
DRY: Fixes clone functionality (#10192)
wwoodsTM Nov 7, 2024
60e17ce
Remove identical wte/etw logic for jais (#10203)
fmz Nov 7, 2024
97404c4
ggml : add ggml-cpu.h to the public headers (#10204)
slaren Nov 7, 2024
a2c6fd7
scripts : sync update
ggerganov Nov 7, 2024
3b08828
sync : ggml
ggerganov Nov 7, 2024
eec4d71
scripts : add amx to sync-ggml.sh [no ci]
ggerganov Nov 7, 2024
a71d81c
server : revamp chat UI with vuejs and daisyui (#10175)
ngxson Nov 7, 2024
76c6e7f
server : minor UI fix (#10207)
ngxson Nov 7, 2024
d05b312
swift : exclude ggml-metal-embed.metal (#10211)
jhen0409 Nov 8, 2024
841f27a
metal : optimize FA kernels (#10171)
ggerganov Nov 8, 2024
695ad75
metal : improve clarity (minor) (#10171)
ggerganov Nov 8, 2024
ec450d3
metal : opt-in compile flag for BF16 (#10218)
ggerganov Nov 8, 2024
8fc393f
scripts : fix pattern and get n_tokens in one go (#10221)
lhpqaq Nov 9, 2024
e892134
ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156)
amritahs-ibm Nov 9, 2024
5b359bb
ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL oper…
SongXiaoXi Nov 9, 2024
46323fa
metal : hide debug messages from normal log
ggerganov Nov 9, 2024
f018acb
llama : fix Qwen model type strings
ggerganov Nov 9, 2024
bb38cdd
metal : fix F32 accumulation in FA vec kernel (#10232)
ggerganov Nov 9, 2024
39a334a
metal : fix build and some more comments (#10229)
ggerganov Nov 9, 2024
6423c65
metal : reorder write loop in mul mat kernel + style (#10231)
ggerganov Nov 9, 2024
160687b
vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10…
jeffbolznv Nov 10, 2024
505f332
server : (web UI) Add back sampler settings (#10239)
MaggotHATE Nov 10, 2024
4b3a921
flake.lock: Update (#10243)
ggerganov Nov 10, 2024
b141e5f
server : enable KV cache defrag by default (#10233)
ggerganov Nov 11, 2024
b0cefea
metal : more precise Q*K in FA vec kernel (#10247)
ggerganov Nov 11, 2024
54ef9cf
vulkan: Throttle the number of shader compiles during the build step.…
jeffbolznv Nov 11, 2024
80dd7ff
vulkan: Optimize contiguous copies (#10254)
jeffbolznv Nov 13, 2024
2e82ffa
sycl : Fixes to broken builds and test-backend-ops (#10257)
Alcpz Nov 13, 2024
a0ec17b
metadata: Detailed Dataset Authorship Metadata (#8875)
mofosyne Nov 13, 2024
0e712a5
server : fix incorrect res in validate_model_chat_template (#10272)
jhen0409 Nov 13, 2024
ff7fb67
server : add missing docs (#10269)
z80maniac Nov 13, 2024
1ee9eea
docs : update bindings list (#10261)
xuegao-tzx Nov 13, 2024
5ea926d
sync : ggml
ggerganov Nov 13, 2024
fb4a0ec
llama : propagate the results of `graph_compute` (#9525)
Xarbirus Nov 13, 2024
66798e4
vulkan: Use macros to make the mat mul pipeline creation more concise…
jeffbolznv Nov 13, 2024
af148c9
vulkan: Optimize binary ops (#10270)
jeffbolznv Nov 14, 2024
2a82891
speculative : fix out-of-bounds access (#10289)
ggerganov Nov 14, 2024
4a8ccb3
CUDA: no -sm row for very small matrices (#10185)
JohannesGaessler Nov 14, 2024
ae8de6d
ggml : build backends as libraries (#10256)
slaren Nov 14, 2024
1607a5e
backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921)
chaxu01 Nov 15, 2024
5a54af4
sycl: Use syclcompat::dp4a (#10267)
Rbiessy Nov 15, 2024
4802ad3
scripts : fix regex in sync [no ci]
ggerganov Nov 15, 2024
231f936
cann: dockerfile and doc adjustment (#10302)
noemotiovon Nov 15, 2024
9901068
server : (web UI) add copy button for code block, fix api key (#10242)
ngxson Nov 15, 2024
57f8355
sycl: Update Intel docker images to use DPC++ 2025.0 (#10305)
Rbiessy Nov 15, 2024
f0204a0
ci: build test musa with cmake (#10298)
yeahdongcn Nov 15, 2024
1842922
AVX BF16 and single scale quant optimizations (#10212)
netrunnereve Nov 15, 2024
cbf5541
sync : ggml
ggerganov Nov 15, 2024
3225008
ggml : vulkan logs (whisper/2547)
thewh1teagle Nov 15, 2024
09ecbcb
cmake : fix ppc64 check (whisper/0)
ggerganov Nov 15, 2024
883d206
ggml : fix some build issues
slaren Nov 15, 2024
4047be7
scripts: update compare-llama-bench.py (#10319)
JohannesGaessler Nov 15, 2024
74d73dc
Make updates to fix issues with clang-cl builds while using AVX512 fl…
Srihari-mcw Nov 15, 2024
89e4caa
llama : save number of parameters and the size in llama_model (#10286)
FirstTimeEZ Nov 16, 2024
1e58ee1
ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324)
eddnjjn Nov 16, 2024
dd3a6ce
vulkan : add cmake preset debug/release (#10306)
FirstTimeEZ Nov 16, 2024
772703c
vulkan: Optimize some mat-vec mul quant shaders (#10296)
jeffbolznv Nov 16, 2024
bce287c
Merge branch 'layla-build' into merge
l3utterfly Nov 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
sycl: Use syclcompat::dp4a (ggml-org#10267)
* sycl: Use syclcompat::dp4a

* Using the syclcompat version allow the compiler to optimize the
  operation with native function

* Update news section

* Update CI Windows oneAPI version to 2025.0

* Reword doc

* Call syclcompat::dp4a inside dpct::dp4a

This reverts commit 90cb61d.
Rbiessy authored Nov 15, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit 5a54af4d4f588f109f31e456483fdf77096399d9
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -930,7 +930,7 @@ jobs:
shell: bash

env:
WINDOWS_BASEKIT_URL: https://registrationcenter-download.intel.com/akdlm/IRC_NAS/7dff44ba-e3af-4448-841c-0d616c8da6e7/w_BaseKit_p_2024.1.0.595_offline.exe
WINDOWS_BASEKIT_URL: https://registrationcenter-download.intel.com/akdlm/IRC_NAS/b380d914-366b-4b77-a74a-05e3c38b3514/intel-oneapi-base-toolkit-2025.0.0.882_offline.exe
WINDOWS_DPCPP_MKL: intel.oneapi.win.cpp-dpcpp-common:intel.oneapi.win.mkl.devel
ONEAPI_ROOT: "C:/Program Files (x86)/Intel/oneAPI"
steps:
2 changes: 2 additions & 0 deletions docs/backend/SYCL.md
Original file line number Diff line number Diff line change
@@ -41,6 +41,8 @@ The following release is verified with good quality:

## News

- 2024.11
- Use syclcompat to improve the performance on some platforms. This requires to use oneAPI 2025.0 or newer.

- 2024.8
- Use oneDNN as the default GEMM library, improve the compatibility for new Intel GPUs.
24 changes: 2 additions & 22 deletions ggml/src/ggml-sycl/dpct/helper.hpp
Original file line number Diff line number Diff line change
@@ -15,6 +15,7 @@

#include <sycl/sycl.hpp>
#include <sycl/half_type.hpp>
#include <syclcompat/math.hpp>
#include <oneapi/mkl.hpp>
#include <map>

@@ -1830,31 +1831,10 @@ namespace dpct
: id);
}

template <typename T>
sycl::vec<T, 4> extract_and_sign_or_zero_extend4(T val)
{
return sycl::vec<T, 1>(val)
.template as<sycl::vec<
std::conditional_t<std::is_signed_v<T>, int8_t, uint8_t>, 4>>()
.template convert<T>();
}

template <typename T1, typename T2>
using dot_product_acc_t =
std::conditional_t<std::is_unsigned_v<T1> && std::is_unsigned_v<T2>,
uint32_t, int32_t>;

template <typename T1, typename T2, typename T3>
inline auto dp4a(T1 a, T2 b, T3 c)
{
dot_product_acc_t<T1, T2> res = c;
auto va = extract_and_sign_or_zero_extend4(a);
auto vb = extract_and_sign_or_zero_extend4(b);
res += va[0] * vb[0];
res += va[1] * vb[1];
res += va[2] * vb[2];
res += va[3] * vb[3];
return res;
return syclcompat::dp4a(a, b, c);
}

struct sub_sat
8 changes: 4 additions & 4 deletions ggml/src/ggml-sycl/vecdotq.hpp
Original file line number Diff line number Diff line change
@@ -968,8 +968,8 @@ vec_dot_iq3_xxs_q8_1(const void *__restrict__ vbq,
grid1[0] ^ signs[0], signs[0], std::minus<>());
const int grid_h = dpct::vectorized_binary<sycl::uchar4>(
grid2[0] ^ signs[1], signs[1], std::minus<>());
sumi = dpct::dp4a(grid_l, *((int *)q8 + 0), sumi);
sumi = dpct::dp4a(grid_h, *((int *)q8 + 1), sumi);
sumi = dpct::dp4a(grid_l, *((const int *)q8 + 0), sumi);
sumi = dpct::dp4a(grid_h, *((const int *)q8 + 1), sumi);
q8 += 8;
aux32 >>= 7;
}
@@ -1009,8 +1009,8 @@ vec_dot_iq3_s_q8_1(const void *__restrict__ vbq,
grid1[0] ^ signs0, signs0, std::minus<>());
const int grid_h = dpct::vectorized_binary<sycl::uchar4>(
grid2[0] ^ signs1, signs1, std::minus<>());
sumi = dpct::dp4a(grid_l, *((int *)q8 + 0), sumi);
sumi = dpct::dp4a(grid_h, *((int *)q8 + 1), sumi);
sumi = dpct::dp4a(grid_l, *((const int *)q8 + 0), sumi);
sumi = dpct::dp4a(grid_h, *((const int *)q8 + 1), sumi);
q8 += 8;
}
const float d =