Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: [0.5.12] Cannot disable GPU offloading #4369

Closed
1 of 3 tasks
pguyennet opened this issue Dec 30, 2024 · 12 comments
Closed
1 of 3 tasks

bug: [0.5.12] Cannot disable GPU offloading #4369

pguyennet opened this issue Dec 30, 2024 · 12 comments
Assignees
Labels
category: hardware type: bug Something isn't working
Milestone

Comments

@pguyennet
Copy link

Jan version

0.5.12

Describe the Bug

Can you help me disable GPU offloading ? I am talking about this setting :

screenshot-2024-12-30-21-19-12

I want to set it to 0.

In settings, GPU is disabled :

screenshot-2024-12-30-21-21-04

Thanks !

Steps to Reproduce

1 load model from GGUF
2 Locate ngl slider (bottom right, in model tab)
3 Try to disable/ set to 0

Screenshots / Logs

No response

What is your OS?

  • MacOS
  • Windows
  • Linux
@pguyennet pguyennet added the type: bug Something isn't working label Dec 30, 2024
@github-project-automation github-project-automation bot moved this to Investigating in Jan & Cortex Dec 30, 2024
@imtuyethan
Copy link
Contributor

@pguyennet Thanks for reporting this! Let me explain what's happening:
The NGL (Number of GPU Layers) slider is actually dependent on GPU acceleration being enabled - that's why you can't set it to 0 when GPU is disabled. If you want to run fully on CPU, you only need to disable GPU acceleration, NGL doesn't work in this case.

This makes sense technically, but I totally agree the UX could be clearer!

@pguyennet
Copy link
Author

Thanks for your answer @imtuyethan ! Can you tell me how to truly disable GPU acceleration then ? Because in my screenshot it is disabled but there is still the option ?

If you mean that this option doesn't do anything when GPU acceleration is disabled can you tell me why it affects my inference speed ? Here are the figures I am talking about :

ngl = 1 -> 17.8 tok/s
ngl = 50 -> 13.7 tok/s
ngl = 100 -> refuses to answer

Note : On ollama avx-512 cpu runner (ngl = 0 !) I've got 32 tok/s. Thanks !

@louis-jan
Copy link
Contributor

@pguyennet Could you please share the log files and the settings.json file located in the app data folder? We'll investigate then.

CleanShot 2025-01-02 at 09 02 58@2x

@pguyennet
Copy link
Author

Hey sure here are the files as requested :

settings.json
app.log

Thanks again ! I love your work the sole thing that prevent me from switching is the lower inference speed compared to Ollama.

@louis-jan
Copy link
Contributor

Hi @pguyennet, there's another log file named cortex.log in ~/.config/Jan/data/logs. Could you please upload it too?

@imtuyethan imtuyethan added this to the v0.5.14 milestone Jan 3, 2025
@imtuyethan imtuyethan changed the title bug: Cannot disable GPU offloading bug: [0.5.12] Cannot disable GPU offloading Jan 3, 2025
@pguyennet
Copy link
Author

Hi @louis-jan here you go :

cortex.log
Thanks !

@louis-jan
Copy link
Contributor

louis-jan commented Jan 4, 2025

@pguyennet Can you help me find the model yml file in the app data folder (models/source/author/repo..) and remove the ngl: line? Create a new thread to see if it's removed.

What quantized version of the model are you using and inference parameters such as context_length, cpu_threads? on both sides. It seems you don't have avx-512 support but avx2 (but ye it backward compatible), cmiiw.

avx-512 cpu runner (ngl = 0 !) I've got 32 tok/s

@louis-jan
Copy link
Contributor

Hi @louis-jan here you go :

cortex.log Thanks !

Thanks!

@pguyennet
Copy link
Author

@pguyennet Can you help me find the model yml file in the app data folder (models/source/author/repo..) and remove the ngl: line? Create a new thread to see if it's removed.

What quantized version of the model are you using and inference parameters such as context_length, cpu_threads? on both sides. It seems you don't have avx-512 support but avx2 (but ye it backward compatible), cmiiw.

avx-512 cpu runner (ngl = 0 !) I've got 32 tok/s

Hey @louis-jan hope you had a nice week end !

So I removed the ngl line in the model.yml file and then created a new thread. The ngl option is still there but at 0 ! Output rate is still aroud 17-18 tokens/s but the big change is that when I modify the ngl value it doesn't affect the output rate anymore.

I tried deleting and reimporting the model but I can't reproduce the affected output rate problem.

Also Jan auto updated to 0.5.13 so I reinstalled the 0.5.12 to try again and I still can't reproduce the problem. I don't know what changed but seem like there was a misconfiguration somewhere. I should have started with that sorry for taking your time.

Do you still want the model info and inference parameters ?
Or do you want to close this issue since I can't reproduce my problem anymore ?
(It seems like now the problem isn't the ngl option it's just that the Jan engine inference speed is lower than ollama's)

Anyway thanks for your time !

@louis-jan
Copy link
Contributor

Hi @pguyennet, I'd like to close, but could you share some details about the model quantization version you use when running on Jan and the model you use with Ollama? I'd like to reproduce myself here.

@louis-jan louis-jan closed this as not planned Won't fix, can't repro, duplicate, stale Jan 6, 2025
@github-project-automation github-project-automation bot moved this from Investigating to QA in Jan & Cortex Jan 6, 2025
@pguyennet
Copy link
Author

Hey sure @louis-jan the model is granite-moe-3b at q8_0 from here : huggingface. I have a ryzen 7 6850u cpu (8c16T) and 16GB ram.

Here are my ollama settings : (ollama show info)

  Model
    architecture        granitemoe
    parameters          3.3B
    context length      131072
    embedding length    1536
    quantization        Q8_0

  Parameters
    num_ctx       2048
    num_thread    8

And here are my Jan settings (idk how to export settings so here is a screenshot):
GPU acceleration is disabled.

screenshot-2025-01-06-17-13-04

Hope this helps !

@louis-jan
Copy link
Contributor

Hey sure @louis-jan the model is granite-moe-3b at q8_0 from here : huggingface. I have a ryzen 7 6850u cpu (8c16T) and 16GB ram.

Here are my ollama settings : (ollama show info)

  Model
    architecture        granitemoe
    parameters          3.3B
    context length      131072
    embedding length    1536
    quantization        Q8_0

  Parameters
    num_ctx       2048
    num_thread    8

And here are my Jan settings (idk how to export settings so here is a screenshot): GPU acceleration is disabled.

screenshot-2025-01-06-17-13-04

Hope this helps !

Awesome, thanks @pguyennet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: hardware type: bug Something isn't working
Projects
Status: QA
Development

No branches or pull requests

3 participants