-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server : fix templates for llama2, llama3 and zephyr in new UI #8196
Conversation
Just noticed that there is plenty of discussion about the future direction of templating in the server in #4216. |
I'm not familiar with the template system in JS, but seems like there are still some issues. For example, llama 3 never has new lines before EOT token (
Yes it would be nice to do so. Currently there is no task to add this such feature. Probably we can extend |
@ngxson Thanks - I have removed the extra newlines in llama3. As for the llama2 template, the Making the chat example available through the |
I am closing this PR to be able to update my repo without causing too much "noise" (as I accidentally made it from "master"), and to wait for the outcome of the investigation for #8694. |
closing for now |
This change makes some adjustments to the pre-defined chat templates in the new server UI, which in my interpretation bring them in line to the recommended versions. I have done this for the following templates:
as these are the models I have some experience with. There might be further similar discrepancies for other templates, but I have not checked those. For the Llama models, I have also removed the start-of-text tokens at the beginning, as they are automatically added by the server, and their duplication leads to a warning message.
It would of course be nicer to connect this to the
llama_chat_apply_template()
implementation to have only one set of templates to maintain and test in the codebase, for example by making the server UI use the chat endpoint rather than the completion one. Is anyone already working on this?