Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/props endpoint: provide context size through default_generation_settings #1237

Merged

Conversation

kallewoof
Copy link

It seems like a good idea to be able to directly pull and use the context size from the backend, rather than relying on users to manually getting both synced up all the time.

This adds the default_generation_settings dictionary with a single (for now) n_ctx entry to the /props endpoint response, which allows clients to update their context size to match the max context size used in koboldcpp.

It may be useful to include other default generation settings as well, but I am keeping it minimal for now.

@LostRuins
Copy link
Owner

LostRuins commented Nov 25, 2024

Does this adhere to the format for the /props endpoint set by llama.cpp?

Because otherwise you can already use the koboldcpp endpoint /api/extra/true_max_context_length and /api/latest/config/max_context_length

If nobody else is using it, I would prefer not to create redundant endpoints.
https://lite.koboldai.net/koboldcpp_api#/api%2Fv1/get_api_v1_config_max_context_length

@kallewoof
Copy link
Author

kallewoof commented Nov 25, 2024

Does this adhere to the format for the /props endpoint set by llama.cpp?

Yes. Adding this field to the pre-existing /props endpoint would make them compatible in terms of getting the backend max context size.

Because otherwise you can already use the koboldcpp endpoint /api/extra/true_max_context_length and /api/latest/config/max_context_length

Didn't realize that.

If nobody else is using it, I would prefer not to create redundant endpoints. https://lite.koboldai.net/koboldcpp_api#/api%2Fv1/get_api_v1_config_max_context_length

  1. This is actually an existing endpoint (which returns chat_template and total_slots right now). This in a way adds the missing third field that llama.cpp provides called default_generation_settings, but limits its contents (for now) to only providing the context size.
  2. Using the pre-existing endpoints for koboldcpp would achieve the goal, so I am neutral about merging this.
  3. The benefit of merging would be to make the /props endpoint complete in terms of "root" fields, and would make it completely transparent for clients wishing to fetch the max context size (if we disregard version handling).

Copy link
Owner

@LostRuins LostRuins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright then

@LostRuins LostRuins merged commit fd320f6 into LostRuins:concedo_experimental Nov 26, 2024
@kallewoof kallewoof deleted the 202411-props-gensettings branch November 26, 2024 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants