Native api hangs when providing system_prompt #3766

chiefMarlin · 2023-10-24T18:15:40Z

Hi,
I am playing around with the native api and it works well when just using basic example
curl --request POST \ --url http://localhost:8080/completion \ --header "Content-Type: application/json" \ --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'

However if i add system_prompt parameter to the query, it hangs indefinitely and there is nothing printed on the server side and no load is seen using nvtop

Server command

llama-server --host 0.0.0.0 -m /models/mistral-7b-instruct-v0.1.Q5_K_M.gguf -c 8000 -ngl 100

Server output

llama server listening at http://0.0.0.0:8080 {"timestamp":1698170944,"level":"INFO","function":"main","line":2499,"message":"HTTP server listening","hostname":"0.0.0.0","port":8080} all slots are idle and system prompt is empty, clear the KV cache
Adding -v to llama-server makes no difference in output

This query hangs when system_prompt is used

Query

curl --request POST \ --url http://127.0.0.1:8080/completion \ --header "Content-Type: application/json" \ --data '{ "prompt": "User: What is your name ?\nAssistant:", "system_prompt": { "anti_prompt": "User:", "assistant_name": "Assistant:", "prompt": "You are an angry assistant that swears alot and your name is Bob\n" }, "temperature": 0.8 }'

Any ideas what i am missing here ? What i am trying to achieve is to give some context.

Whats even more strange is that after trying above query, simple quries no longer work either as they just hang in the same way until server restart.

The text was updated successfully, but these errors were encountered:

chiefMarlin · 2023-10-24T18:35:47Z

I tried using this instead and it seems to work

{
	"prompt": "You are an angry assistant named Bob that swears alot\nUser: What is your name ?",
	"anti_prompt": "User:",
	"assistant_name": "Assistant:",
	"temperature": 0.7
}

According to https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md
Should it be in this format instead ?

{
    "system_prompt": {
        "prompt": "Transcript of a never ending dialog, where the User interacts with an Assistant.\nThe Assistant is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.\nUser: Recommend.........g \"Surely You're Joking, Mr. Feynman!\" and \"What Do You Care What Other People Think?\".\nUser:",
        "anti_prompt": "User:",
        "assistant_name": "Assistant:"
    }
}

If i try to load the system prompt via -spf argument it does generate some errors:
llama-server -m /models/mistral-7b-instruct-v0.1.Q5_K_M.gguf -c 8000 -ngl 100 -spf prompt.json --host 0.0.0.0

prompt.json

{
  "system_prompt": {
    "prompt": "Transcript of a never ending dialog, where the User interacts with an Assistant.\nThe Assistant is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.\nUser: Recommend a nice restaurant in the area.\nAssistant: I recommend the restaurant \"The Golden Duck\". It is a 5 star restaurant with a great view of the city. The food is delicious and the service is excellent. The prices are reasonable and the portions are generous. The restaurant is located at 123 Main Street, New York, NY 10001. The phone number is (212) 555-1234. The hours are Monday through Friday from 11:00 am to 10:00 pm. The restaurant is closed on Saturdays and Sundays.\nUser: Who is Richard Feynman?\nAssistant: Richard Feynman was an American physicist who is best known for his work in quantum mechanics and particle physics. He was awarded the Nobel Prize in Physics in 1965 for his contributions to the development of quantum electrodynamics. He was a popular lecturer and author, and he wrote several books, including \"Surely You're Joking, Mr. Feynman!\" and \"What Do You Care What Other People Think?\".\nUser:",
    "anti_prompt": "User:",
    "assistant_name": "Assistant:"
  }
}

Error

llama server listening at http://0.0.0.0:8080

{"timestamp":1698173160,"level":"INFO","function":"main","line":2499,"message":"HTTP server listening","hostname":"0.0.0.0","port":8080}
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt

ggerganov · 2023-10-24T18:49:52Z

I'm working on a fix in #3767

So far it does not block on system prompt update, but somehow the anti prompt does not seem to work.
First time I'm using the API so I might be missing something too

chiefMarlin · 2023-10-24T18:55:20Z

Thanks for the update

chiefMarlin · 2023-10-24T19:41:05Z

Branch fix-server-system seems to fix the issue, this request now produces a proper output. 👍

{
	"prompt": "User: Tell me about yourself ?",
	"system_prompt": {
		"anti_prompt": "User:",
		"assistant_name": "Assistant:",
		"prompt": "You are an angry assistant that swears alot, your name is Bob\n"
	},
	"temperature": 0.1
}

chiefMarlin added the bug Something isn't working label Oct 24, 2023

ggerganov mentioned this issue Oct 24, 2023

server : do not block system prompt update #3767

Merged

chiefMarlin closed this as completed Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native api hangs when providing system_prompt #3766

Native api hangs when providing system_prompt #3766

chiefMarlin commented Oct 24, 2023 •

edited

Loading

chiefMarlin commented Oct 24, 2023 •

edited

Loading

ggerganov commented Oct 24, 2023

chiefMarlin commented Oct 24, 2023

chiefMarlin commented Oct 24, 2023

Native api hangs when providing system_prompt #3766

Native api hangs when providing system_prompt #3766

Comments

chiefMarlin commented Oct 24, 2023 • edited Loading

Server command

Server output

Query

chiefMarlin commented Oct 24, 2023 • edited Loading

prompt.json

Error

ggerganov commented Oct 24, 2023

chiefMarlin commented Oct 24, 2023

chiefMarlin commented Oct 24, 2023

chiefMarlin commented Oct 24, 2023 •

edited

Loading

chiefMarlin commented Oct 24, 2023 •

edited

Loading