Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native api hangs when providing system_prompt #3766

Closed
chiefMarlin opened this issue Oct 24, 2023 · 4 comments
Closed

Native api hangs when providing system_prompt #3766

chiefMarlin opened this issue Oct 24, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@chiefMarlin
Copy link

chiefMarlin commented Oct 24, 2023

Hi,
I am playing around with the native api and it works well when just using basic example
curl --request POST \ --url http://localhost:8080/completion \ --header "Content-Type: application/json" \ --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'

However if i add system_prompt parameter to the query, it hangs indefinitely and there is nothing printed on the server side and no load is seen using nvtop

Server command

llama-server --host 0.0.0.0 -m /models/mistral-7b-instruct-v0.1.Q5_K_M.gguf -c 8000 -ngl 100

Server output

llama server listening at http://0.0.0.0:8080 {"timestamp":1698170944,"level":"INFO","function":"main","line":2499,"message":"HTTP server listening","hostname":"0.0.0.0","port":8080} all slots are idle and system prompt is empty, clear the KV cache
Adding -v to llama-server makes no difference in output

This query hangs when system_prompt is used

Query

curl --request POST \ --url http://127.0.0.1:8080/completion \ --header "Content-Type: application/json" \ --data '{ "prompt": "User: What is your name ?\nAssistant:", "system_prompt": { "anti_prompt": "User:", "assistant_name": "Assistant:", "prompt": "You are an angry assistant that swears alot and your name is Bob\n" }, "temperature": 0.8 }'

Any ideas what i am missing here ? What i am trying to achieve is to give some context.

Whats even more strange is that after trying above query, simple quries no longer work either as they just hang in the same way until server restart.

@chiefMarlin chiefMarlin added the bug Something isn't working label Oct 24, 2023
@chiefMarlin
Copy link
Author

chiefMarlin commented Oct 24, 2023

I tried using this instead and it seems to work

{
	"prompt": "You are an angry assistant named Bob that swears alot\nUser: What is your name ?",
	"anti_prompt": "User:",
	"assistant_name": "Assistant:",
	"temperature": 0.7
}

According to https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md
Should it be in this format instead ?

{
    "system_prompt": {
        "prompt": "Transcript of a never ending dialog, where the User interacts with an Assistant.\nThe Assistant is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.\nUser: Recommend.........g \"Surely You're Joking, Mr. Feynman!\" and \"What Do You Care What Other People Think?\".\nUser:",
        "anti_prompt": "User:",
        "assistant_name": "Assistant:"
    }
}

If i try to load the system prompt via -spf argument it does generate some errors:
llama-server -m /models/mistral-7b-instruct-v0.1.Q5_K_M.gguf -c 8000 -ngl 100 -spf prompt.json --host 0.0.0.0

prompt.json

{
  "system_prompt": {
    "prompt": "Transcript of a never ending dialog, where the User interacts with an Assistant.\nThe Assistant is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.\nUser: Recommend a nice restaurant in the area.\nAssistant: I recommend the restaurant \"The Golden Duck\". It is a 5 star restaurant with a great view of the city. The food is delicious and the service is excellent. The prices are reasonable and the portions are generous. The restaurant is located at 123 Main Street, New York, NY 10001. The phone number is (212) 555-1234. The hours are Monday through Friday from 11:00 am to 10:00 pm. The restaurant is closed on Saturdays and Sundays.\nUser: Who is Richard Feynman?\nAssistant: Richard Feynman was an American physicist who is best known for his work in quantum mechanics and particle physics. He was awarded the Nobel Prize in Physics in 1965 for his contributions to the development of quantum electrodynamics. He was a popular lecturer and author, and he wrote several books, including \"Surely You're Joking, Mr. Feynman!\" and \"What Do You Care What Other People Think?\".\nUser:",
    "anti_prompt": "User:",
    "assistant_name": "Assistant:"
  }
}

Error

llama server listening at http://0.0.0.0:8080

{"timestamp":1698173160,"level":"INFO","function":"main","line":2499,"message":"HTTP server listening","hostname":"0.0.0.0","port":8080}
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt
llama_decode_internal: n_tokens == 0llama_decode: failed to decode, ret = -1
update_system_prompt: llama_decode() failed
updating system prompt

@ggerganov
Copy link
Owner

I'm working on a fix in #3767

So far it does not block on system prompt update, but somehow the anti prompt does not seem to work.
First time I'm using the API so I might be missing something too

@chiefMarlin
Copy link
Author

Thanks for the update

@chiefMarlin
Copy link
Author

Branch fix-server-system seems to fix the issue, this request now produces a proper output. 👍

{
	"prompt": "User: Tell me about yourself ?",
	"system_prompt": {
		"anti_prompt": "User:",
		"assistant_name": "Assistant:",
		"prompt": "You are an angry assistant that swears alot, your name is Bob\n"
	},
	"temperature": 0.1
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants