Server: Update /props endpoint to correctly return default server parameters #8418
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In #8402, we added the ability to set default request parameters on the command line.
One shortcoming of that PR is that the author failed to update the
/props
endpoint, so it was returning bogus information.Example:
grammar
andn_ctx
:In particular, note that this endpoint (incorrectly) returns 2048 for
n_ctx
, and a blank string forgrammar
.There were a few possible ways to fix this, but the lowest-friction method was, during
init()
, to initialize each of the slot's sampling parameters by copying from the global context's sampling parameters. This is similar to the one-liner method that we used in #8402, but while that operated at runtime (when the jobs are fired off), this one operates at initialization.The first slot is then chosen, and the default parameters are serialized to json and stored in
default_generation_settings_for_props
-- the same as happened before. It's nice to have this serialized and saved this way, because even if the slot's parameters are overwritten by a later request, the value stored indefault_generation_settings_for_props
will always represent the defaults.And this is what the end result looks like when querying
/props
:It now contains the correct values of
n_ctx
= 1024 and our non-blankgrammar
-- success!This solution does not add any increased memory usage, and I can't think of any edge cases that it falls down. Yesterday when I first tried to fix this, I got wrapped around the axle with an overly complicated approach. I'm glad I slept on it for a day because I think that today's solution is much more elegant.
Tagging @ngxson in particular for review on this one.
Thank you!