bug: [0.5.12] Cannot disable GPU offloading #4369

pguyennet · 2024-12-30T21:24:22Z

Jan version

0.5.12

Describe the Bug

Can you help me disable GPU offloading ? I am talking about this setting :

I want to set it to 0.

In settings, GPU is disabled :

Thanks !

Steps to Reproduce

1 load model from GGUF
2 Locate ngl slider (bottom right, in model tab)
3 Try to disable/ set to 0

Screenshots / Logs

No response

What is your OS?

MacOS
Windows
Linux

imtuyethan · 2024-12-31T08:52:07Z

@pguyennet Thanks for reporting this! Let me explain what's happening:
The NGL (Number of GPU Layers) slider is actually dependent on GPU acceleration being enabled - that's why you can't set it to 0 when GPU is disabled. If you want to run fully on CPU, you only need to disable GPU acceleration, NGL doesn't work in this case.

This makes sense technically, but I totally agree the UX could be clearer!

pguyennet · 2024-12-31T17:10:11Z

Thanks for your answer @imtuyethan ! Can you tell me how to truly disable GPU acceleration then ? Because in my screenshot it is disabled but there is still the option ?

If you mean that this option doesn't do anything when GPU acceleration is disabled can you tell me why it affects my inference speed ? Here are the figures I am talking about :

ngl = 1 -> 17.8 tok/s
ngl = 50 -> 13.7 tok/s
ngl = 100 -> refuses to answer

Note : On ollama avx-512 cpu runner (ngl = 0 !) I've got 32 tok/s. Thanks !

louis-jan · 2025-01-02T02:03:05Z

@pguyennet Could you please share the log files and the settings.json file located in the app data folder? We'll investigate then.

pguyennet · 2025-01-02T16:29:10Z

Hey sure here are the files as requested :

settings.json
app.log

Thanks again ! I love your work the sole thing that prevent me from switching is the lower inference speed compared to Ollama.

louis-jan · 2025-01-03T03:14:29Z

Hi @pguyennet, there's another log file named cortex.log in ~/.config/Jan/data/logs. Could you please upload it too?

pguyennet · 2025-01-03T16:24:55Z

Hi @louis-jan here you go :

cortex.log
Thanks !

louis-jan · 2025-01-04T03:16:57Z

@pguyennet Can you help me find the model yml file in the app data folder (models/source/author/repo..) and remove the ngl: line? Create a new thread to see if it's removed.

What quantized version of the model are you using and inference parameters such as context_length, cpu_threads? on both sides. It seems you don't have avx-512 support but avx2 (but ye it backward compatible), cmiiw.

avx-512 cpu runner (ngl = 0 !) I've got 32 tok/s

louis-jan · 2025-01-04T03:17:47Z

Hi @louis-jan here you go :

cortex.log Thanks !

Thanks!

pguyennet · 2025-01-06T16:25:04Z

@pguyennet Can you help me find the model yml file in the app data folder (models/source/author/repo..) and remove the ngl: line? Create a new thread to see if it's removed.

What quantized version of the model are you using and inference parameters such as context_length, cpu_threads? on both sides. It seems you don't have avx-512 support but avx2 (but ye it backward compatible), cmiiw.

avx-512 cpu runner (ngl = 0 !) I've got 32 tok/s

Hey @louis-jan hope you had a nice week end !

So I removed the ngl line in the model.yml file and then created a new thread. The ngl option is still there but at 0 ! Output rate is still aroud 17-18 tokens/s but the big change is that when I modify the ngl value it doesn't affect the output rate anymore.

I tried deleting and reimporting the model but I can't reproduce the affected output rate problem.

Also Jan auto updated to 0.5.13 so I reinstalled the 0.5.12 to try again and I still can't reproduce the problem. I don't know what changed but seem like there was a misconfiguration somewhere. I should have started with that sorry for taking your time.

Do you still want the model info and inference parameters ?
Or do you want to close this issue since I can't reproduce my problem anymore ?
(It seems like now the problem isn't the ngl option it's just that the Jan engine inference speed is lower than ollama's)

Anyway thanks for your time !

louis-jan · 2025-01-06T16:49:46Z

Hi @pguyennet, I'd like to close, but could you share some details about the model quantization version you use when running on Jan and the model you use with Ollama? I'd like to reproduce myself here.

pguyennet · 2025-01-06T17:19:55Z

Hey sure @louis-jan the model is granite-moe-3b at q8_0 from here : huggingface. I have a ryzen 7 6850u cpu (8c16T) and 16GB ram.

Here are my ollama settings : (ollama show info)

  Model
    architecture        granitemoe
    parameters          3.3B
    context length      131072
    embedding length    1536
    quantization        Q8_0

  Parameters
    num_ctx       2048
    num_thread    8

And here are my Jan settings (idk how to export settings so here is a screenshot):
GPU acceleration is disabled.

Hope this helps !

louis-jan · 2025-01-07T02:27:07Z

Hey sure @louis-jan the model is granite-moe-3b at q8_0 from here : huggingface. I have a ryzen 7 6850u cpu (8c16T) and 16GB ram.

Here are my ollama settings : (ollama show info)
  Model
    architecture        granitemoe
    parameters          3.3B
    context length      131072
    embedding length    1536
    quantization        Q8_0

  Parameters
    num_ctx       2048
    num_thread    8
And here are my Jan settings (idk how to export settings so here is a screenshot): GPU acceleration is disabled.

Hope this helps !

Awesome, thanks @pguyennet

pguyennet added the type: bug Something isn't working label Dec 30, 2024

github-project-automation bot added this to Jan & Cortex Dec 30, 2024

github-project-automation bot moved this to Investigating in Jan & Cortex Dec 30, 2024

imtuyethan assigned louis-jan Jan 2, 2025

imtuyethan added this to the v0.5.14 milestone Jan 3, 2025

imtuyethan added the category: hardware label Jan 3, 2025

imtuyethan changed the title ~~bug: Cannot disable GPU offloading~~ bug: [0.5.12] Cannot disable GPU offloading Jan 3, 2025

louis-jan closed this as not planned Won't fix, can't repro, duplicate, stale Jan 6, 2025

github-project-automation bot moved this from Investigating to QA in Jan & Cortex Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: [0.5.12] Cannot disable GPU offloading #4369

bug: [0.5.12] Cannot disable GPU offloading #4369

pguyennet commented Dec 30, 2024

imtuyethan commented Dec 31, 2024

pguyennet commented Dec 31, 2024

louis-jan commented Jan 2, 2025

pguyennet commented Jan 2, 2025

louis-jan commented Jan 3, 2025

pguyennet commented Jan 3, 2025

louis-jan commented Jan 4, 2025 •

edited

Loading

louis-jan commented Jan 4, 2025

pguyennet commented Jan 6, 2025

louis-jan commented Jan 6, 2025

pguyennet commented Jan 6, 2025

louis-jan commented Jan 7, 2025

bug: [0.5.12] Cannot disable GPU offloading #4369

bug: [0.5.12] Cannot disable GPU offloading #4369

Comments

pguyennet commented Dec 30, 2024

Jan version

Describe the Bug

Steps to Reproduce

Screenshots / Logs

What is your OS?

imtuyethan commented Dec 31, 2024

pguyennet commented Dec 31, 2024

louis-jan commented Jan 2, 2025

pguyennet commented Jan 2, 2025

louis-jan commented Jan 3, 2025

pguyennet commented Jan 3, 2025

louis-jan commented Jan 4, 2025 • edited Loading

louis-jan commented Jan 4, 2025

pguyennet commented Jan 6, 2025

louis-jan commented Jan 6, 2025

pguyennet commented Jan 6, 2025

louis-jan commented Jan 7, 2025

louis-jan commented Jan 4, 2025 •

edited

Loading