Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA backend #2310

Merged
merged 52 commits into from
May 15, 2024
Merged
Changes from 1 commit
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
dc03f81
remove lingering references to SOM license
cebtenzzre Apr 29, 2024
c482bae
backend: bring some upstream changes into llama.cpp.cmake
cebtenzzre Apr 29, 2024
1684f7b
backend: update llama.cpp and list of supported models
cebtenzzre Apr 30, 2024
dc8cd8c
backend: bring more upstream changes into llama.cpp.cmake
cebtenzzre Apr 30, 2024
75d05c5
llamamodel: remove dependency on internal llama_token_to_piece
cebtenzzre Apr 30, 2024
1ebd3cf
backend: initial port to Occam's Vulkan backend
cebtenzzre Apr 30, 2024
9f92731
vulkan: improve the implementation
cebtenzzre May 2, 2024
ce2164e
backend: also build CUDA backend by default
cebtenzzre May 2, 2024
011935e
backend: make CUDA build useful
cebtenzzre May 2, 2024
9e457bf
backend: add option to build ROCm backend
cebtenzzre May 2, 2024
c47900e
cuda: implement device enumeration
cebtenzzre May 6, 2024
fef041c
cmake: don't build CPU variant on Linux/Windows
cebtenzzre May 6, 2024
bbe5cc0
llmodel: add a backend field to LLModel::GPUDevice
cebtenzzre May 6, 2024
dc334ae
llmodel: select a backend, not a build variant
cebtenzzre May 6, 2024
b54151f
llmodel: list GPU devices from all backends
cebtenzzre May 6, 2024
d4feaeb
python: implement selectable GPU backend
cebtenzzre May 6, 2024
1a10587
chat: implement basic UI backend selection
cebtenzzre May 6, 2024
a9ffe5a
cmake: set RUNPATH of llamamodel-mainline-cuda correctly
cebtenzzre May 7, 2024
d30d5b2
backend: fix kompute-avxonly build
cebtenzzre May 7, 2024
16170f4
cuda: fix dependency bundling on Windows
cebtenzzre May 7, 2024
2417105
ci: install CUDA toolkit
cebtenzzre May 7, 2024
9e8f7c3
cuda: ignore libcuda.so.* on Linux
cebtenzzre May 7, 2024
9e54277
ci: install additional deps to make linuxdeployqt happy
cebtenzzre May 7, 2024
335b129
cmake: do not ship static llama library
cebtenzzre May 8, 2024
b14992c
llama.cpp: do not install kompute or fmt
cebtenzzre May 8, 2024
2b5d6d3
cmake: let linuxdeployqt handle CUDA deps on Linux
cebtenzzre May 8, 2024
831a146
cmake: install llmodel to lib/ on Linux and bin/ on Windows
cebtenzzre May 8, 2024
eec9ec9
cmake: do not install import libraries on Windows
cebtenzzre May 8, 2024
9eaa326
kompute: use slightly newer Vulkan headers to avoid installation
cebtenzzre May 8, 2024
004a262
llama.cpp: rebase onto latest master
cebtenzzre May 8, 2024
f1558d7
chat: specify backend of reported device
cebtenzzre May 8, 2024
fe78377
Merge branch 'main' into add-cuda-support
cebtenzzre May 8, 2024
0e3d90a
cmake: fix upstream spelling error
cebtenzzre May 8, 2024
530b224
chat: give the "force metal" option a chance of working
cebtenzzre May 8, 2024
2e0e4fc
chat: make device selection meaningful on macOS
cebtenzzre May 8, 2024
bb0402b
cmake: remove an old reference to libbert-*.dylib
cebtenzzre May 8, 2024
0069196
llama.cpp: sync with upstream for CUDA graphs
cebtenzzre May 9, 2024
ad0c3ea
Merge branch 'main' into add-cuda-support
cebtenzzre May 9, 2024
6394494
backend: use "cpu" impl if "kompute" is not found
cebtenzzre May 9, 2024
553fc89
python: fix documentation for device parameter
cebtenzzre May 9, 2024
035ea59
cmake: fix chat install if built without Kompute or CUDA
cebtenzzre May 9, 2024
85b9e2f
cmake: clearly indicate how to build without CUDA/Kompute on failure
cebtenzzre May 9, 2024
c9b6732
llmodel: fix compile errors
cebtenzzre May 9, 2024
41ce930
metal: copy default.metallib, not ggml-metal.metal
cebtenzzre May 9, 2024
14588bc
llamamodel: update model whitelist
cebtenzzre May 10, 2024
21bc8c8
build_and_run: mention compiler and CUDA
cebtenzzre May 11, 2024
0cbca27
settings: prefix vk devices with "Vulkan: ", update old names
cebtenzzre May 13, 2024
8dbe93d
kompute: fix device name leaks
cebtenzzre May 13, 2024
dbf38b2
chat: bump version to 2.8.0
cebtenzzre May 15, 2024
1c09b6d
Merge branch 'main' into add-cuda-support
cebtenzzre May 15, 2024
c7f8c93
python: update README to reflect CUDA build dependency
cebtenzzre May 15, 2024
5875d83
python: bump version for CUDA support
cebtenzzre May 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
chat: implement basic UI backend selection
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
cebtenzzre committed May 7, 2024
commit 1a1058730024bf1458f9d9ad8af75a994be39cc6
4 changes: 4 additions & 0 deletions gpt4all-backend/llmodel.h
Original file line number Diff line number Diff line change
@@ -10,6 +10,8 @@
#include <string_view>
#include <vector>

using namespace std::string_literals;

#define LLMODEL_MAX_PROMPT_BATCH 128

class Dlhandle;
@@ -51,6 +53,8 @@ class LLModel {
GPUDevice(const char *backend, int index, int type, size_t heapSize, std::string name, std::string vendor):
backend(backend), index(index), type(type), heapSize(heapSize), name(std::move(name)),
vendor(std::move(vendor)) {}

std::string uiName() const { return backend == "cuda"s ? "CUDA: " + name : name; }
};

class Implementation {
16 changes: 10 additions & 6 deletions gpt4all-chat/chatllm.cpp
Original file line number Diff line number Diff line change
@@ -302,19 +302,23 @@ bool ChatLLM::loadModel(const ModelInfo &modelInfo)
QElapsedTimer modelLoadTimer;
modelLoadTimer.start();

auto requestedDevice = MySettings::globalInstance()->device();
auto n_ctx = MySettings::globalInstance()->modelContextLength(modelInfo);
m_ctx.n_ctx = n_ctx;
auto ngl = MySettings::globalInstance()->modelGpuLayers(modelInfo);

std::string buildVariant = "auto";
#if defined(Q_OS_MAC) && defined(__arm__)
std::string backend = "auto";
#if !defined(Q_OS_MAC)
if (requestedDevice.startsWith("CUDA: "))
backend = "cuda";
#elif defined(__arm__)
if (m_forceMetal)
buildVariant = "metal";
backend = "metal";
#endif
QString constructError;
m_llModelInfo.model = nullptr;
try {
m_llModelInfo.model = LLModel::Implementation::construct(filePath.toStdString(), buildVariant, n_ctx);
m_llModelInfo.model = LLModel::Implementation::construct(filePath.toStdString(), backend, n_ctx);
} catch (const LLModel::MissingImplementationError &e) {
modelLoadProps.insert("error", "missing_model_impl");
constructError = e.what();
@@ -346,19 +350,19 @@ bool ChatLLM::loadModel(const ModelInfo &modelInfo)

// Pick the best match for the device
QString actualDevice = m_llModelInfo.model->implementation().buildVariant() == "metal" ? "Metal" : "CPU";
const QString requestedDevice = MySettings::globalInstance()->device();
if (requestedDevice == "CPU") {
emit reportFallbackReason(""); // fallback not applicable
} else {
const size_t requiredMemory = m_llModelInfo.model->requiredMem(filePath.toStdString(), n_ctx, ngl);
std::vector<LLModel::GPUDevice> availableDevices = m_llModelInfo.model->availableGPUDevices(requiredMemory);
LLModel::GPUDevice *device = nullptr;

// NB: relies on the fact that Kompute devices are listed first
if (!availableDevices.empty() && requestedDevice == "Auto" && availableDevices.front().type == 2 /*a discrete gpu*/) {
device = &availableDevices.front();
} else {
for (LLModel::GPUDevice &d : availableDevices) {
if (QString::fromStdString(d.name) == requestedDevice) {
if (QString::fromStdString(d.uiName()) == requestedDevice) {
device = &d;
break;
}
2 changes: 1 addition & 1 deletion gpt4all-chat/mysettings.cpp
Original file line number Diff line number Diff line change
@@ -68,7 +68,7 @@ MySettings::MySettings()
std::vector<LLModel::GPUDevice> devices = LLModel::Implementation::availableGPUDevices();
QVector<QString> deviceList{ "Auto" };
for (LLModel::GPUDevice &d : devices)
deviceList << QString::fromStdString(d.name);
deviceList << QString::fromStdString(d.uiName());
deviceList << "CPU";
setDeviceList(deviceList);
}