-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server : remove system prompt support #9811
Comments
I am using system_prompt along with parallel slots and the built-in chat template of each model. What will be the preferred way to keep the current behaviour, a conditionned chatbot for all my users ? |
What does your context currently look? Is it:
|
Kind of, for llama3.1 it is looking like <|start_header_id|>system<|end_header_id|> My understanding is that {system prompt} is updated at launch or through the api endpoint, and to replicate the current behaviour I would need to update the template myself. |
There are 2 types of system prompts:
@GuillaumeBruand I'm afraid your context likely looks like this: {system prompt here}
<|start_header_id|>system<|end_header_id|>
<|eot_id|><|start_header_id|>user<|end_header_id|>
Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Hi there<|eot_id|><|start_header_id|>user<|end_header_id|> Which is technically incorrect. The system prompt configured at launch or passed through the API is not applied inside the chat templates - this is stated in the |
@ggerganov Thanks for the clarification, I will update my workflow according your recommandations so that I will not rely anymore on this (deprecated?) feature. |
@ggerganov I just noticed that the system_prompt feature was removed when I downloaded the new version and now I'm a bit confused. What I understand from the discussion above is that 'system_prompt' never really worked well together with the chat template and actually put the prompt in front of the context formatted with the template. I did not know that and I'm surprised that it worked so well in my setup 😅. Now that this has been removed, does this mean I have to submit the system prompt with every request to my server? I really liked the 'system_prompt' option, because I'm always spinning up the server to use it with the same, rather long, system prompt 😞. |
Yes, your understanding is correct. Although it wasn't used properly, I can imagine that it still helped in the intended way when prefixing the system prompt at the very beginning. But the main reason to remove this was to simplify a bit the logic in
Yes. We can probably think about reintroducing the option and use the CLI system prompt as default value for the chat template system prompt when it is not passed by the client. But somebody with more experience in chat templates would have to implement this. |
Thanks for clarifying! I think it would be really great to reintroduce the system prompt option in some way to "pre-condition" the server. One more question. Do I have to use 'n_keep' in combination with the system prompt if I send it with every request (and the whole chat history)? |
No, just send the requests with Generally |
Awesome, thanks a lot! |
Sorry for the slight necromancy, but I would love some clarity with how to handle the system prompt correctly.
What is the built in llama-server GUI doing when it comes to the system prompt? It handles it perfectly. It responds quickly after the initial response (prompt is not being sent every turn) and it adheres to the prompt perfectly, truly zero deviation which is very important for my application. Thus I actually want to know what is llama-server GUI doing and I want to replicate exactly that with my own calls to the API. EDIT: never mind, I inspected the network for the payload, but if there is anything else that someone can recommend, input is welcome |
The web ui is simply filling the correct chat template. For example, using Qwen: {
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"id": 1736669686387,
"role": "user",
"content": "Hello"
},
{
"id": 1736669686390,
"role": "assistant",
"content": "Hello! How can I assist you today?",
"timings": {
"prompt_n": 20,
"prompt_ms": 696.321,
"predicted_n": 13,
"predicted_ms": 670.689
}
},
{
"id": 1736669772617,
"role": "user",
"content": "Just showing an example of system prompt usage."
}
],
"stream": true,
"cache_prompt": true,
...
} The p.s. @ngxson While writing this answer, I noticed that the web ui sends back |
give an example of several scenarios I want to implement. Let's assume the startup prompt is 'a', the user system prompt is 'b', and the user prompt is 'c'.
1.When the user does not input a system prompt: The total input will be 'a' + 'c'.
2.When the user inputs a system prompt: The total input will be 'b' + 'c' or 'a' + 'b' + 'c'.
The purpose of this approach is to enhance my friends' experience with open-source AI and make it more enjoyable. Teaching them how to set up prompts and other configurations is often too cumbersome, and they tend to give up quickly. Using a startup prompt effectively simplifies the process and improves the overall user experience.
This translation maintains the technical details while using natural English phrasing. It also preserves the explanatory tone of the original text. Let me know if you need any adjustments!
…------------------ 原始邮件 ------------------
发件人: "ggerganov/llama.cpp" ***@***.***>;
发送时间: 2025年1月12日(星期天) 下午4:27
***@***.***>;
***@***.******@***.***>;
主题: Re: [ggerganov/llama.cpp] server : remove system prompt support (Issue #9811)
The web ui is simply filling the correct chat template. For example, using Qwen:
{ "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "id": 1736669686387, "role": "user", "content": "Hello" }, { "id": 1736669686390, "role": "assistant", "content": "Hello! How can I assist you today?", "timings": { "prompt_n": 20, "prompt_ms": 696.321, "predicted_n": 13, "predicted_ms": 670.689 } }, { "id": 1736669772617, "role": "user", "content": "Just showing an example of system prompt usage." } ], "stream": true, "cache_prompt": true, ... }
The system role is the message type that contains the system prompt. Just make sure to add it to the start of your request and it should work as expected. Add the -lv 1 to llama-server to inspect the received requests from the web ui and understand better what data is being sent.
p.s. @ngxson While writing this answer, I noticed that the web ui sends back "timings" information for previous messages. Think we should remove these from the requests.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
The "system_prompt" related functionality is quite outdated and is introducing unnecessary complexity. It only sort of makes sense for non-finetuned models in order to save the computation of a common prefix when there are multiple parallel slots. But in practice, only finetuned models are utilized for this use case and they always require a chat template, which is incompatible with the current implementation of the system prompt. So in order to simplify the code a bit, we should remove the system prompt related functionality from the server.
The text was updated successfully, but these errors were encountered: