Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from ggerganov:master #158

Closed
wants to merge 73 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
0c39f44
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…
angt Nov 30, 2024
43957ef
build: update Makefile comments for C++ version change (#10598)
wangqin0 Dec 1, 2024
6acce39
readme : update the usage section with examples (#10596)
ggerganov Dec 1, 2024
86dc11c
server : bind to any port when specified (#10590)
alek3y Dec 1, 2024
3420909
ggml : automatic selection of best CPU backend (#10606)
slaren Dec 1, 2024
5c7a5aa
ci: add error handling for Python venv creation in run.sh (#10608)
wangqin0 Dec 1, 2024
5e1ed95
grammars : add English-only grammar (#10612)
ggerganov Dec 1, 2024
917786f
Add `mistral-v1`, `mistral-v3`, `mistral-v3-tekken` and `mistral-v7` …
jukofyork Dec 1, 2024
4cb003d
contrib : refresh (#10593)
ggerganov Dec 2, 2024
991f8aa
SYCL: Fix and switch to GGML_LOG system instead of fprintf (#10579)
qnixsynapse Dec 2, 2024
64ed209
server: Add "tokens per second" information in the backend (#10548)
lhpqaq Dec 2, 2024
8648c52
make : deprecate (#10514)
ggerganov Dec 2, 2024
642330a
llama : add enum for built-in chat templates (#10623)
ngxson Dec 2, 2024
70b98fa
server : fix default draft model parameters (#10586)
ggerganov Dec 3, 2024
844e2e1
github : minify link [no ci]
ggerganov Dec 3, 2024
515d4e5
github : minify link [no ci] (revert)
ggerganov Dec 3, 2024
0115df2
metal : small-batch mat-mul kernels (#10581)
ggerganov Dec 3, 2024
82bca22
readme : add option, update default value, fix formatting (#10271)
pothitos Dec 3, 2024
3b4f2e3
llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636)
ngxson Dec 3, 2024
667d70d
metal : add `GGML_OP_CONV_TRANSPOSE_1D` kernels (ggml/1026)
PABannier Nov 28, 2024
efb6ae9
feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel (ggml/1019)
PABannier Dec 2, 2024
e9e661b
CUDA: remove unnecessary warp reduce in FA (ggml/1032)
mahorozte Dec 3, 2024
c505471
sync : ggml
ggerganov Dec 3, 2024
1cd3df4
scripts : remove amx sync
ggerganov Dec 3, 2024
91c36c2
server : (web ui) Various improvements, now use vite as bundler (#10599)
ngxson Dec 3, 2024
cc98896
vulkan: optimize and reenable split_k (#10637)
jeffbolznv Dec 3, 2024
01e6d9b
clip : add sycl support (#10574)
piDack Dec 4, 2024
da6aac9
Add docs for creating a static build (#10268) (#10630)
mostlygeek Dec 4, 2024
cd2f37b
Avoid using __fp16 on ARM with old nvcc (#10616)
frankier Dec 4, 2024
98036d5
fix typo of README.md (#10605)
WrRan Dec 4, 2024
40c6d79
SYCL : Move to compile time oneMKL interface backend selection for NV…
s-Nick Dec 4, 2024
2759916
vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (…
jeffbolznv Dec 4, 2024
8d0cfd5
llama: Support MiniCPM-1B (with & w/o longrope) (#10559)
JFLFY2255 Dec 4, 2024
253b7fd
Fix HF repo commit to clone lora test models (#10649)
ltoniazzi Dec 4, 2024
2803540
ggml-cpu : fix HWCAP2_I8MM value (#10646)
slaren Dec 4, 2024
59f4db1
ggml : add predefined list of CPU backend variants to build (#10626)
slaren Dec 4, 2024
1da7b76
server : fix speculative decoding with context shift (#10641)
ggerganov Dec 4, 2024
f112d19
Update deprecation-warning.cpp (#10619)
aryantandon01 Dec 4, 2024
d405804
py : update outdated copy-paste instructions [no ci] (#10667)
danbev Dec 5, 2024
c2082d9
ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034)
PABannier Dec 3, 2024
a8cbab2
ggml: add `GGML_SET` Metal kernel + i32 CPU kernel (ggml/1037)
PABannier Dec 4, 2024
0cd182e
sync : ggml
ggerganov Dec 5, 2024
6fe6247
llama : add Minerva 7B model support (#10673)
Riccorl Dec 5, 2024
c9c6e01
vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash a…
jeffbolznv Dec 5, 2024
7736837
fix(server) : not show alert when DONE is received (#10674)
pminev Dec 5, 2024
6c5bc06
server : (refactoring) do not rely on JSON internally (#10643)
ngxson Dec 6, 2024
f162d45
common : bring back --no-warmup to server (#10686)
ngxson Dec 6, 2024
c5ede38
convert : add custom attention mapping
ggerganov Dec 6, 2024
784a14a
convert : add support for Roberta embeddings (#10695)
Ssukriti Dec 7, 2024
86a1934
metal : Extend how Llama.cpp locates metal resources (#10676)
ormandi Dec 7, 2024
3df784b
Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processi…
0cc4m Dec 7, 2024
c2a16c0
server : fix free of spec context and batch (#10651)
ggerganov Dec 7, 2024
19d8762
ggml : refactor online repacking (#10446)
Djip007 Dec 7, 2024
ce4a7b8
server : various fixes (#10704)
ggerganov Dec 7, 2024
d9c3ba2
ggml : disable iq4_nl interleave size 8 (#10709)
ggerganov Dec 7, 2024
3573fa8
server : (refactor) no more json in server_task input (#10691)
ngxson Dec 7, 2024
62e84d9
llama : add 128k yarn context for Qwen (#10698)
robbiemu Dec 7, 2024
ecc93d0
vulkan: compile a test shader in cmake to check for coopmat2 support …
jeffbolznv Dec 8, 2024
43ed389
llama : use cmake for swift build (#10525)
slaren Dec 8, 2024
06d7014
Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (…
stduhpf Dec 8, 2024
e52522b
server : bring back info of final chunk in stream mode (#10722)
ngxson Dec 8, 2024
ce8784b
server : fix format_infill (#10724)
ngxson Dec 8, 2024
1a05004
cmake : simplify msvc charsets (#10672)
iboB Dec 9, 2024
3d98b4c
vulkan: fix compile warnings (#10731)
jeffbolznv Dec 9, 2024
c37fb4c
Changes to CMakePresets.json to add ninja clang target on windows (#1…
Srihari-mcw Dec 9, 2024
26a8406
CUDA: fix shared memory access condition for mmv (#10740)
JohannesGaessler Dec 9, 2024
a05e2af
vulkan: disable spirv-opt for coopmat shaders (#10763)
jeffbolznv Dec 10, 2024
a86ad84
server : add flag to disable the web-ui (#10762) (#10751)
eugeniosegala Dec 10, 2024
750cb3e
CUDA: rename macros to avoid conflicts with WinAPI (#10736)
aendk Dec 10, 2024
ae4b922
imatrix : Add imatrix to --no-context-shift (#10766)
bartowski1182 Dec 10, 2024
dafae66
vulkan: dynamic subgroup size for the remaining k quants (#10745)
netrunnereve Dec 10, 2024
b685daf
vulkan: request round-to-even for fp16 in im2col/rope_head (#10767)
jeffbolznv Dec 10, 2024
43041d2
ggml: load all backends from a user-provided search path (#10699)
giladgd Dec 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 22 additions & 9 deletions .devops/full.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,36 @@
FROM ubuntu:$UBUNTU_VERSION AS build

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev libgomp1
apt-get install -y build-essential git cmake libcurl4-openssl-dev

WORKDIR /app

COPY . .

COPY requirements.txt requirements.txt
COPY requirements requirements
RUN cmake -S . -B build -DGGML_BACKEND_DL=ON -DGGML_NATIVE=OFF -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_CURL=ON -DCMAKE_BUILD_TYPE=Release && \
cmake --build build -j $(nproc) && \
mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib/ \;

RUN pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt
FROM ubuntu:$UBUNTU_VERSION as runtime

Check warning on line 17 in .devops/full.Dockerfile

View workflow job for this annotation

GitHub Actions / Push Docker image to Docker Hub (full, .devops/full.Dockerfile, linux/amd64,linux/arm64)

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/

Check warning on line 17 in .devops/full.Dockerfile

View workflow job for this annotation

GitHub Actions / Push Docker image to Docker Hub (full, .devops/full.Dockerfile, linux/amd64,linux/arm64)

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/

WORKDIR /app

COPY . .
RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev libgomp1

ENV LLAMA_CURL=1
COPY requirements.txt /app/requirements.txt
COPY requirements /app/requirements
COPY .devops/tools.sh /app/tools.sh

RUN pip install --upgrade pip setuptools wheel && \
pip install -r /app/requirements.txt

RUN make -j$(nproc)
COPY --from=build /app/build/bin/ /app/
COPY --from=build /app/lib/ /app/
COPY --from=build /app/convert_hf_to_gguf.py /app/
COPY --from=build /app/gguf-py /app/gguf-py

ENV LC_ALL=C.utf8

ENTRYPOINT ["/app/.devops/tools.sh"]
ENTRYPOINT ["/app/tools.sh"]
16 changes: 11 additions & 5 deletions .devops/llama-cli.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,27 @@ ARG UBUNTU_VERSION=22.04
FROM ubuntu:$UBUNTU_VERSION AS build

RUN apt-get update && \
apt-get install -y build-essential git
apt-get install -y build-essential git cmake libcurl4-openssl-dev

WORKDIR /app

COPY . .

RUN make -j$(nproc) llama-cli
RUN cmake -S . -B build -DGGML_BACKEND_DL=ON -DGGML_NATIVE=OFF -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_CURL=ON -DCMAKE_BUILD_TYPE=Release && \
cmake --build build -j $(nproc) && \
mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib/ \;

FROM ubuntu:$UBUNTU_VERSION AS runtime

WORKDIR /app

RUN apt-get update && \
apt-get install -y libgomp1
apt-get install -y libcurl4-openssl-dev libgomp1 curl

COPY --from=build /app/llama-cli /llama-cli
COPY --from=build /app/build/bin/llama-cli /app/
COPY --from=build /app/lib/ /app/

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/llama-cli" ]
ENTRYPOINT [ "/app/llama-cli" ]
16 changes: 10 additions & 6 deletions .devops/llama-server.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,31 @@ ARG UBUNTU_VERSION=22.04
FROM ubuntu:$UBUNTU_VERSION AS build

RUN apt-get update && \
apt-get install -y build-essential git libcurl4-openssl-dev
apt-get install -y build-essential git cmake libcurl4-openssl-dev

WORKDIR /app

COPY . .

ENV LLAMA_CURL=1

RUN make -j$(nproc) llama-server
RUN cmake -S . -B build -DGGML_BACKEND_DL=ON -DGGML_NATIVE=OFF -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_CURL=ON -DCMAKE_BUILD_TYPE=Release && \
cmake --build build -j $(nproc) && \
mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib/ \;

FROM ubuntu:$UBUNTU_VERSION AS runtime

WORKDIR /app

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev libgomp1 curl

COPY --from=build /app/llama-server /llama-server
COPY --from=build /app/build/bin/llama-server /app/
COPY --from=build /app/lib/ /app/

ENV LC_ALL=C.utf8
# Must be set to 0.0.0.0 so it can listen to requests from host machine
ENV LLAMA_ARG_HOST=0.0.0.0

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/llama-server" ]
ENTRYPOINT [ "/app/llama-server" ]
8 changes: 1 addition & 7 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,7 +1 @@


- [x] I have read the [contributing guidelines](https://github.com/ggerganov/llama.cpp/blob/master/CONTRIBUTING.md)
- Self-reported review complexity:
- [ ] Low
- [ ] Medium
- [ ] High
*Make sure to read the [contributing guidelines](https://github.com/ggerganov/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR*
170 changes: 40 additions & 130 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -160,66 +160,6 @@ jobs:
path: llama-${{ steps.tag.outputs.name }}-bin-macos-x64.zip
name: llama-bin-macos-x64.zip

ubuntu-focal-make:
runs-on: ubuntu-20.04
env:
LLAMA_NODE_AVAILABLE: true
LLAMA_PYTHON_AVAILABLE: true

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential gcc-8

- uses: actions/setup-node@v4
with:
node-version: "20"

- uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Build
id: make_build
env:
LLAMA_FATAL_WARNINGS: 1
run: |
CC=gcc-8 make -j $(nproc)

- name: Test
id: make_test
run: |
CC=gcc-8 make tests -j $(nproc)
make test -j $(nproc)

ubuntu-focal-make-curl:
runs-on: ubuntu-20.04

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential gcc-8 libcurl4-openssl-dev

- name: Build
id: make_build
env:
LLAMA_FATAL_WARNINGS: 1
LLAMA_CURL: 1
run: |
CC=gcc-8 make -j $(nproc)

ubuntu-latest-cmake:
runs-on: ubuntu-latest

Expand Down Expand Up @@ -517,36 +457,6 @@ jobs:
cmake -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON ..
cmake --build . --config Release -j $(nproc)

# TODO: build with GGML_NO_METAL because test-backend-ops fail on "Apple Paravirtual device" and I don't know
# how to debug it.
# ref: https://github.com/ggerganov/llama.cpp/actions/runs/7131777249/job/19420981052#step:5:1124
macOS-latest-make:
runs-on: macos-latest

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update

- name: Build
id: make_build
env:
LLAMA_FATAL_WARNINGS: 1
run: |
GGML_NO_METAL=1 make -j $(sysctl -n hw.logicalcpu)

- name: Test
id: make_test
run: |
GGML_NO_METAL=1 make tests -j $(sysctl -n hw.logicalcpu)
GGML_NO_METAL=1 make test -j $(sysctl -n hw.logicalcpu)

# TODO: build with GGML_METAL=OFF because test-backend-ops fail on "Apple Paravirtual device" and I don't know
# how to debug it.
# ref: https://github.com/ggerganov/llama.cpp/actions/runs/7132125951/job/19422043567?pr=4359#step:5:6584
Expand Down Expand Up @@ -660,15 +570,26 @@ jobs:
run: |
brew update

- name: xcodebuild for swift package
id: xcodebuild
- name: Build llama.cpp with CMake
id: cmake_build
run: |
xcodebuild -scheme llama -destination "${{ matrix.destination }}"
sysctl -a
mkdir build
cd build
cmake -G Xcode .. \
-DGGML_METAL_USE_BF16=ON \
-DGGML_METAL_EMBED_LIBRARY=ON \
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF \
-DLLAMA_BUILD_SERVER=OFF \
-DCMAKE_OSX_ARCHITECTURES="arm64;x86_64"
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu)
sudo cmake --install . --config Release

- name: Build Swift Example
id: make_build_swift_example
- name: xcodebuild for swift package
id: xcodebuild
run: |
make swift
xcodebuild -scheme llama-Package -destination "${{ matrix.destination }}"

windows-msys2:
runs-on: windows-latest
Expand All @@ -695,21 +616,6 @@ jobs:
mingw-w64-${{matrix.env}}-cmake
mingw-w64-${{matrix.env}}-openblas

- name: Build using make
shell: msys2 {0}
run: |
make -j $(nproc)

- name: Clean after building using make
shell: msys2 {0}
run: |
make clean

- name: Build using make w/ OpenBLAS
shell: msys2 {0}
run: |
make GGML_OPENBLAS=1 -j $(nproc)

- name: Build using CMake
shell: msys2 {0}
run: |
Expand Down Expand Up @@ -1207,6 +1113,29 @@ jobs:
- name: Checkout code
uses: actions/checkout@v4

- name: Build
id: cmake_build
run: |
sysctl -a
mkdir build
cd build
cmake -G Xcode .. \
-DGGML_METAL_USE_BF16=ON \
-DGGML_METAL_EMBED_LIBRARY=ON \
-DLLAMA_BUILD_EXAMPLES=OFF \
-DLLAMA_BUILD_TESTS=OFF \
-DLLAMA_BUILD_SERVER=OFF \
-DCMAKE_SYSTEM_NAME=iOS \
-DCMAKE_OSX_DEPLOYMENT_TARGET=14.0 \
-DCMAKE_XCODE_ATTRIBUTE_DEVELOPMENT_TEAM=ggml
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu) -- CODE_SIGNING_ALLOWED=NO
sudo cmake --install . --config Release

- name: xcodebuild for swift package
id: xcodebuild
run: |
xcodebuild -scheme llama-Package -destination 'generic/platform=iOS'

- name: Build Xcode project
run: xcodebuild -project examples/llama.swiftui/llama.swiftui.xcodeproj -scheme llama.swiftui -sdk iphoneos CODE_SIGNING_REQUIRED=NO CODE_SIGN_IDENTITY= -destination 'generic/platform=iOS' build

Expand Down Expand Up @@ -1234,32 +1163,13 @@ jobs:

./gradlew build --no-daemon

# freeBSD-latest:
# runs-on: macos-12
# steps:
# - name: Clone
# uses: actions/checkout@v4
#
# - name: Build
# uses: cross-platform-actions/action@v0.19.0
# with:
# operating_system: freebsd
# version: '13.2'
# hypervisor: 'qemu'
# run: |
# sudo pkg update
# sudo pkg install -y gmake automake autoconf pkgconf llvm15 openblas
# gmake CC=/usr/local/bin/clang15 CXX=/usr/local/bin/clang++15 -j `sysctl -n hw.ncpu`

release:
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}

runs-on: ubuntu-latest

needs:
- ubuntu-focal-make
- ubuntu-latest-cmake
- macOS-latest-make
- macOS-latest-cmake
- windows-latest-cmake
- windows-2019-cmake-cuda
Expand Down
26 changes: 16 additions & 10 deletions .github/workflows/server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,20 +76,26 @@ jobs:
run: |
pip install -r examples/server/tests/requirements.txt

- name: Verify server deps
id: verify_server_deps
# Setup nodejs (to be used for verifying bundled index.html)
- uses: actions/setup-node@v4
with:
node-version: 22

- name: Verify bundled index.html
id: verify_server_index_html
run: |
git config --global --add safe.directory $(realpath .)
cd examples/server
git ls-files --others --modified
cd examples/server/webui
git status
./deps.sh
npm ci
npm run build
git status
not_ignored_files="$(git ls-files --others --modified)"
echo "Modified files: ${not_ignored_files}"
if [ -n "${not_ignored_files}" ]; then
echo "Repository is dirty or server deps are not built as expected"
echo "${not_ignored_files}"
modified_files="$(git status -s)"
echo "Modified files: ${modified_files}"
if [ -n "${modified_files}" ]; then
echo "Repository is dirty or server/webui is not built as expected"
echo "Hint: You may need to follow Web UI build guide in server/README.md"
echo "${modified_files}"
exit 1
fi

Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,10 @@ examples/server/*.mjs.hpp
!examples/sycl/*.bat
!examples/sycl/*.sh

# Server Web UI temporary files
node_modules
examples/server/webui/dist

# Python

/.venv
Expand Down
Loading
Loading