Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ollama] certain models are not loaded with correct n_ctx #58

Open
blakkd opened this issue Jan 5, 2025 · 10 comments
Open

[Ollama] certain models are not loaded with correct n_ctx #58

blakkd opened this issue Jan 5, 2025 · 10 comments

Comments

@blakkd
Copy link

blakkd commented Jan 5, 2025

I have 2 daily driver models: qwen2.5-coder:32b-instruct-q4_K_M and qwq:32b-preview-q4_K_M.
Both are used in other applications.

Screenshot

The issue is that the Ollama num_ctx parameter is not always respected.
I don't know what triggers this behavior.

~ ❯❯❯ ollama show --modelfile qwq:32b-preview-q4_K_M_16k_flash_fullgpu_step_0.4                (base) 
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM qwq:32b-preview-q4_K_M_16k_flash_fullgpu_step_0.4

[...]

SYSTEM You should think step by step.
PARAMETER num_gpu 65
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER temperature 0.4
PARAMETER num_ctx 16384

Loads correctly in 16384 (ollama serve logs)

[...]
llama_new_context_with_model: n_ctx         = 16384
[...]

BUT

~ ❯❯❯ ollama show --modelfile qwen2.5-coder:32b-instruct-q4_K_M_16k_flash_fullgpu_0.6          (base) 
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM qwen2.5-coder:32b-instruct-q4_K_M_16k_flash_fullgpu_0.6

[...]

PARAMETER mirostat 0
PARAMETER num_ctx 16384
PARAMETER num_gpu 65
PARAMETER temperature 0.6

Loads in 8192 instead (ollama serve logs)

[...]
llama_new_context_with_model: n_ctx         = 8192
[...]
@lee88688
Copy link
Owner

lee88688 commented Jan 7, 2025

Aider can config context. but this plugin doesn't give this to user. so I think this may cause this issue. does this influence on your result?

@blakkd
Copy link
Author

blakkd commented Jan 7, 2025

Yes I think this is the cause cause I just noticed I had the same issue with aider alone.
Same, specifically with qwen2.5-32b-coder, I don't know what's going on.
That said, in aider yes, setting num_ctx in .aider.model.settings.yml works as intended.

- name: aider/extra_params
  extra_params:
    extra_headers:
      Custom-Header: value
    num_ctx: 16384

This correctly results in all my models being loaded with a 16384 token window.

So yeah, we should have a way to set it up in aider-composer too to have better control.

@blakkd
Copy link
Author

blakkd commented Jan 7, 2025

And thanks for such a quick response!

@lee88688
Copy link
Owner

lee88688 commented Jan 8, 2025

Yes I think this is the cause cause I just noticed I had the same issue with aider alone. Same, specifically with qwen2.5-32b-coder, I don't know what's going on. That said, in aider yes, setting num_ctx in .aider.model.settings.yml works as intended.

- name: aider/extra_params
  extra_params:
    extra_headers:
      Custom-Header: value
    num_ctx: 16384

This correctly results in all my models being loaded with a 16384 token window.

So yeah, we should have a way to set it up in aider-composer too to have better control.

this is a good advise. but currently I am working on other features. and how to design the settings may be a big problem.
by the way, is there other ways to solve this issue?

@blakkd
Copy link
Author

blakkd commented Jan 8, 2025

Are you foreseeing problems because of the fact there is 2 separated models?
What about adding a global num_ctx field (which would set it for any model) eg. in-between Model and API Key in the sidebar? Or hidden in vscode settings, but maybe that's a bit dirty.
I unfortunately don't see other options. That said, hopefully only few models might be affected, so I think there is no urge ;)

@lee88688
Copy link
Owner

lee88688 commented Jan 9, 2025

thanks, since this may not be urgent, It has more time. maybe others have a better solution.

@blakkd blakkd closed this as not planned Won't fix, can't repro, duplicate, stale Jan 9, 2025
@blakkd
Copy link
Author

blakkd commented Jan 9, 2025

Maybe Ive been a bit hasty closing this, I'll let this up to you

@blakkd blakkd reopened this Jan 9, 2025
@blakkd
Copy link
Author

blakkd commented Jan 9, 2025

@lee88688 btw are you reachable somewhere through PM? Ive a little something I wanna share related to your project

@lee88688
Copy link
Owner

lee88688 commented Jan 9, 2025

@blakkd you can contact me with my email in my profile.

@blakkd
Copy link
Author

blakkd commented Jan 9, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants