Discussion: Cortex.cpp Engine Dependencies Architecture (e.g. CUDA Toolkit) #1046

namchuai · 2024-08-29T03:20:01Z

Motivation

Do we package the cuda toolkit to the engine?
Yes? Then will have to do the same for llamacpp, tensorrt-llm and onnx?
No? Will download separatedly
Folder structures (e.g if user have llamacpp, tensorrt at the same time)?

Resources
Llamacpp release
Currently we are downloading toolkit dependency via https://catalog.jan.ai/dist/cuda-dependencies/<version>/<platform>/cuda.tar.gz

cc @vansangpfiev @nguyenhoangthuan99 @dan-homebrew

Update sub-tasks:

The text was updated successfully, but these errors were encountered:

vansangpfiev · 2024-08-29T03:27:20Z

On my perspective, we should download CUDA toolkit separately. We support multiple engines: cortex.llamacpp and cortex.tensorrt-llm, both need CUDA toolkit to run. CUDA is backward compatible so we only need the latest CUDA toolkit version that supported by nvidia-driver version.
For example:

Nvidia driver: "527.41" compatible with CUDA 12.4
cortex.llamacpp: depends on CUDA 12.2
cortex.tensorrt-llm: depends on CUDA 12.4
We only need to download CUDA 12.4 to support both engines .

Edit: I just checked the cuda matrix compatibility and it is incorrect that CUDA is always backward compatible

Related ticket: https://github.com/janhq/cortex/issues/1047

Edit 2: The above image is forward compatibility between cuda and nvidia-version

From CUDA 11 onwards, applications compiled with a CUDA Toolkit release from within a CUDA major release family can run

So yes, CUDA is backward compatible within a CUDA major release
reference: https://docs.nvidia.com/deploy/cuda-compatibility/#minor-version-compatibility

nguyenhoangthuan99 · 2024-08-29T03:59:24Z

I also think we need to Download CUDA toolkit separately, both tensorrt llm and llamacpp require cuda, plus:
inside tensorrt llm package (~ 1GB) doesn't include cuda toolkit lib (cublas, cuparse, ... which is very heavy ~400Mb), if we decided to pack everything in 1 package for both tensorrt-llm and llamacpp, the size will increase.

namchuai · 2024-08-29T04:28:08Z

I'm referring this table to check for the compatibility between driver and toolkit
https://docs.nvidia.com/deeplearning/cudnn/latest/reference/support-matrix.html#gpu-cuda-toolkit-and-cuda-driver-requirements

dan-menlo · 2024-08-29T10:13:14Z

Can I verify my understanding of the issue:

Decision
For Nvidia GPU users, the different engines have CUDA dependencies that are large 200-400mb downloads.

Per-engine CUDA dependencies (i.e. install separately)
Download 1 CUDA Toolkit for all engines

My initial thoughts

I think per-engine CUDA dependencies are more sustainable architecture
It is less efficient, but easier to manage long-term
i.e. llama.cpp packages its cudart files that have been verified
i.e. TensorRT-LLM packages its own CUDA dependencies (may change in the future)

This will be disk-space inefficient. However, the alternative seems to be dependency hell, which I think is even worse.

Folder Structure

My ideal outcome for Cortex is where each engine is its own submodule, and manages its own folder structure.
We have invested in making cortex.llama.cpp a separate module; it should ideally be independent and package with its dependencies

/cortex
    /engines
        /llama.cpp-extension
            /deps                               # CUDA dlls
        /tensorrt-llm-extension
            /deps                               # CUDA dlls

That said, am open to all ideas, especially @vansangpfiev's

vansangpfiev · 2024-08-29T11:25:09Z

If disk-space inefficient is acceptable, I think we can go with option 1.
Please note that we will have some blockers for this option:

dynamic library search path: we will have two paths for llamacpp and tensorrt-llm, a potential issue can happen when we mix them together. For ubuntu and MacOS, I think we can solve that issue by compiling with rpath flag. For windows, I have created related issue
CI changes for cortex.llamacpp and cortex.tensorrt-llm to pack CUDA dependencies

namchuai · 2024-08-30T00:42:48Z

Thanks @vansangpfiev and @dan-homebrew

I'm confirming that we agree with:
Question 1: Packaging CUDA toolkit dependencies into corresponding engine.
Caveats:

CI changes for cortex.llamacpp and cortex.tensorrt-llm to pack CUDA dependencies

Question 2: Storing CUDA dependencies under corresponding engines.

/cortex
    /engines
        /cortex.llamacpp
            /deps                               # CUDA dlls
        /cortex.tensorrt-llm
            /deps                               # CUDA dlls

Caveats:

dynamic library search path on windows

Additional thought
@vansangpfiev , I think when we change the CI for engine, could we associated a file which contains the versions of the engine and info of its dependencies. This will help for engine list command in the future.
wdyt? cc @nguyenhoangthuan99

freelerobot · 2024-09-04T05:49:03Z

What if llamacpp vs tensorrtllm dependencies start to conflict?
Do we care about engine portability. And does doing a dynamic library search path on windows affect portability.
How will we do maintenance and updates? i.e.

cortex update requires dependency bumps
cortex update doesn't require dependency bump (easier)

Is this a dumb idea: store CUDA dependencies in a central location, such as a separate deps directory at the project root, and then use symbolic links or environment variables to point to the engine-specific dependencies.

/.cortex
    /deps
        /cuda
            cuda-11.5 or whatever versioning
    /engines
        /cortex.llamacpp
            /bin
        /cortex.tensorrt-llm
            /bin

Are there dependency mgmt tools we can use to manage this better?

namchuai · 2024-09-04T06:38:19Z

@0xSage , here's my thought. Please correct me if I'm wrong @nguyenhoangthuan99 @vansangpfiev

What if llamacpp vs tensorrtllm dependencies start to conflict?

Yeah, that's why we separating dependencies for cortex.llamacpp and cortex.tensorrt-llm

    /engines
        /cortex.llamacpp
            /deps
        /cortex.tensorrt-llm
            /deps

Do we care about engine portability. And does doing a dynamic library search path on windows affect portability.

Hmm, I might not really get what do you meant by portability. Regarding the dynamic library search path, it's because, in windows, the program will search for the DLLs at the current path (IIRC, same path as the executable). But we are about to put the dependencies under cortex.llamacpp/deps and cortex.tenssorrt-llm/deps, we need to tell the OS where to look for the DLLs.

How will we do maintenance and updates?
I think this is good question. We haven't decided on this yet. WDYT? @vansangpfiev @dan-homebrew @nguyenhoangthuan99 @hiento09
Separating CUDA dependencies?
I think this is good idea. But separates DLLs might cost us some efforts to handle it properly? I'm not sure. WDYT? @vansangpfiev @dan-homebrew @nguyenhoangthuan99
Are there dependency mgmt tools we can use to manage this better?
I think no. Currently, I think we only have cortexcpp and the default behavior of install is replace.

vansangpfiev · 2024-09-04T06:57:17Z

For 3, I think we can do the maintenance and updates by versioning: generate a file (for example version.txt) for each release, which has metadata for engine version and cuda version. We will update cuda dependencies if needed.
For 4, I think it is easier for us to locate all cuda dependencies in the same folder as engine because we don't need to check which cuda version is using for which engine version

dan-menlo · 2024-09-04T09:25:58Z

@vansangpfiev @namchuai @0xSage Quick responses:

Per-Engine Dependencies

Is this a dumb idea: store CUDA dependencies in a central location, such as a separate deps directory at the project root, and then use symbolic links or environment variables to point to the engine-specific dependencies.

From an architecture perspective, I would like for us to approach this from "each engine manages its own dependencies"
I would like to optimize for architectural simplicity at this stage of our library
I would like to re-use llama.cpp's bundled CUDA dependencies
Shared dependencies can be a subsequent optimization, that we tackle down the road as llama.cpp and TensorRT-LLM stabilize. From my POV, they are still changing very rapidly

I also agree with @vansangpfiev: let's co-locate all CUDA dependencies with the engine folder.

Simple > Complex, especially since model files are >4gb.

Updating Engines

For 3, I think we can do the maintenance and updates by versioning: generate a file (for example version.txt) for each release, which has metadata for engine version and cuda version. We will update cuda dependencies if needed.

I also think we need to think through the CLI and API commands:

cortex engines update tensorrt-llm
PUT <API URL>?

Naming

I wonder whether it is better for us to have clearer naming for Cortex engines:

llamacpp-engine
onnx-engine
tensorrt-llm-engine

This articulates the concept of Cortex engines more clearly. Hopefully, with a clear API, the community can also step in to help build backends.

We would need to reason through cortex.python separately.

I think engine extensions can be in C++ or in Python
cortex.python might be more of a "Python base template", where Engine Extension dev can define Python version to bundle

github-project-automation bot added this to Jan & Cortex Aug 29, 2024

vansangpfiev mentioned this issue Aug 29, 2024

epic: cortex engines commands #1072

Closed

17 tasks

imtuyethan moved this to Planning in Jan & Cortex Sep 2, 2024

imtuyethan added the type: planning Opening up a discussion label Sep 2, 2024

dan-menlo self-assigned this Sep 3, 2024

dan-menlo moved this from Planning to Scheduled in Jan & Cortex Sep 3, 2024

dan-menlo assigned namchuai Sep 3, 2024

dan-menlo moved this from Scheduled to Need Investigation in Jan & Cortex Sep 3, 2024

dan-menlo changed the title ~~Discussion: Cuda toolkit and engines~~ Discussion: Cortex Engine Dependencies Architecture (e.g. CUDA Toolkit) Sep 3, 2024

dan-menlo changed the title ~~Discussion: Cortex Engine Dependencies Architecture (e.g. CUDA Toolkit)~~ Discussion: Cortex.cpp Engine Dependencies Architecture (e.g. CUDA Toolkit) Sep 4, 2024

freelerobot mentioned this issue Sep 4, 2024

Discussion: Cortex.cpp Data Structures #1040

Closed

dan-menlo mentioned this issue Sep 5, 2024

bug: The model can't start / The specified module could not be found janhq/jan#3552

Closed

4 tasks

janhq locked and limited conversation to collaborators Sep 5, 2024

dan-menlo converted this issue into discussion #1117 Sep 5, 2024

github-project-automation bot moved this from Need Investigation to Completed in Jan & Cortex Sep 5, 2024

dan-menlo moved this from Completed to Discontinued in Jan & Cortex Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Discussion: Cortex.cpp Engine Dependencies Architecture (e.g. CUDA Toolkit) #1046

Discussion: Cortex.cpp Engine Dependencies Architecture (e.g. CUDA Toolkit) #1046

namchuai commented Aug 29, 2024 •

edited by vansangpfiev

Loading

vansangpfiev commented Aug 29, 2024 •

edited

Loading

nguyenhoangthuan99 commented Aug 29, 2024

namchuai commented Aug 29, 2024

dan-menlo commented Aug 29, 2024 •

edited

Loading

vansangpfiev commented Aug 29, 2024

namchuai commented Aug 30, 2024 •

edited

Loading

freelerobot commented Sep 4, 2024 •

edited

Loading

namchuai commented Sep 4, 2024

vansangpfiev commented Sep 4, 2024

dan-menlo commented Sep 4, 2024 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Discussion: Cortex.cpp Engine Dependencies Architecture (e.g. CUDA Toolkit) #1046

Discussion: Cortex.cpp Engine Dependencies Architecture (e.g. CUDA Toolkit) #1046

Comments

namchuai commented Aug 29, 2024 • edited by vansangpfiev Loading

vansangpfiev commented Aug 29, 2024 • edited Loading

nguyenhoangthuan99 commented Aug 29, 2024

namchuai commented Aug 29, 2024

dan-menlo commented Aug 29, 2024 • edited Loading

vansangpfiev commented Aug 29, 2024

namchuai commented Aug 30, 2024 • edited Loading

freelerobot commented Sep 4, 2024 • edited Loading

namchuai commented Sep 4, 2024

vansangpfiev commented Sep 4, 2024

dan-menlo commented Sep 4, 2024 • edited Loading

Per-Engine Dependencies

Updating Engines

Naming

This issue was moved to a discussion.

namchuai commented Aug 29, 2024 •

edited by vansangpfiev

Loading

vansangpfiev commented Aug 29, 2024 •

edited

Loading

dan-menlo commented Aug 29, 2024 •

edited

Loading

namchuai commented Aug 30, 2024 •

edited

Loading

freelerobot commented Sep 4, 2024 •

edited

Loading

dan-menlo commented Sep 4, 2024 •

edited

Loading