Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: phi-3 end token not handled? #6903

Closed
infozzdatalabs opened this issue Apr 25, 2024 · 13 comments
Closed

server: phi-3 end token not handled? #6903

infozzdatalabs opened this issue Apr 25, 2024 · 13 comments

Comments

@infozzdatalabs
Copy link

Phi-3 4k model include in all responses the end token "<|end|>"

Im using: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf and llama.cpp for docker cuda server in the latest version.

Thanks in advance.

@x4080
Copy link

x4080 commented Apr 25, 2024

i can confirm that, llama 3 template also, it seems there's change in llama cpp and utils.hpp not including the stop token

@thecivilizedgamer
Copy link

Seeing the same issue, with both Phi 3 and Llama 3, using server with the latest changes. I had to roll back to an older commit to get Llama 3 working properly again

@thecivilizedgamer
Copy link

thecivilizedgamer commented Apr 25, 2024

i can confirm that, llama 3 template also, it seems there's change in llama cpp and utils.hpp not including the stop token

Did you try Llama 3 with the latest commit? I was just made aware that it should have been fixed by this PR #6860

I pulled the latest changes and tried again just now, and Llama 3 is working again for me. But Phi 3 still has issues with the stop token for server, at least for chat completions:

Hello there! It's great to interact with you. How can I assist you today?<|end|>

Edit: I didn't try with a newer quant, so I suppose it might be an issue with the specific model I'm using

@teleprint-me
Copy link
Contributor

teleprint-me commented Apr 26, 2024

I've seen this with every model I've used so far. Will have to test. Been busy working, but I've been using the stop option to handle stop tokens and it goes away. e.g. mistral </s>, llama3 <|eot_id|>, phi3 <|end|>, etc.

@x4080
Copy link

x4080 commented Apr 26, 2024

My temporary solution is re-adding :

    llama_params["stop"].push_back("<|eot_id|>"); // llama 3
    llama_params["stop"].push_back("<|end|>"); // phi 3

into util.hpp

@phymbert
Copy link
Collaborator

You can pass the stop tokens in the payload:

https://github.com/ggerganov/llama.cpp/pull/6916/files

@teleprint-me
Copy link
Contributor

It's in the API for both server and main examples.

stop: Specify a JSON array of stopping strings. These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: []

REST API Docs

@Any-Winter-4079
Copy link

Any-Winter-4079 commented Apr 27, 2024

I am running into this issue but with ./main:

./main -m models/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf -n 1024 -e -c 4096 -ngl -1 -r "['<|end|>']" -p '1-2='

1-2=0<|end|><|assistant|> The equation 1-2=0 is incorrect. The correct result of 1-2 is -1. Therefore, the equation 1-2 ≠ 0 is true.<|end|><|assistant|> Yes, that is correct. The equation 1-2 does not equal 0; instead, it equals -1. The equation 1-2 = 0 is false.<|end|><|assistant|> Absolutely, you've got it right. The expression 1-2 indeed equals -1, not 0. So, the equation 1-2 = 0 is not true.<|end|><|endoftext|> [end of text]

Commit: 8a56075

On commit 928e0b7 I get gibberish which is probably related to #6944

It looks like -r is not being handled correctly even if added manually?

Another example, using the chat template:

./main -m models/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-fp16.gguf -n 256 -e -c 4096 -ngl -1 -r "[<|end|>]" -p '<|user|>Result of 1-2:<|end|>\n<|assistant|>'

<|user|>Tell me the final result of 1-2:<|end|>
<|assistant|> The result of 1 - 2 is -1.<|end|><|assistant|> When you subtract 2 from 1, you get -1. Here's the calculation:

1 - 2 = -1

Subtraction is the operation of removing one quantity from another. So if you have 1 item and remove 2 items, you cannot do so directly; instead, you end up with a deficit, represented by -1 in this context.<|end|><|endoftext|> [end of text]

Or am I doing something wrong here?

@infozzdatalabs
Copy link
Author

It's in the API for both server and main examples.

stop: Specify a JSON array of stopping strings. These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: []

REST API Docs

Yes, I noticed that as well. The stop parameter you mentioned is available for the "/completion" endpoint. However, when using the OpenAI API, the endpoint is "/v1/chat/completions".

@victorlwchen
Copy link

I also encountered this issue, I found that <|end|> token was processed as a User_Defined Type instead of a Control Type, causing it to be output as a normal token during the token_id to text conversion.

Referring to vllm-project/vllm#4182 , they would additionally handle the generation_config.json. Apparently, for phi-3 model, <|end|> should be considered as Control type token.

However, the handling of the phi-3 model by convert-hf-to-gguf.py would categorize the token ids inside added_tokens.json as USER_DEFINED type.

@maziyarpanahi
Copy link

Out of curiosity, can llama_params["stop"] be saved inside GGUF metadata? So I can edit GGUF models before shipping them and add the proper stops instead of asking users to do it?

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@mosujiba
Copy link

When I run Phi-3 with llama-cli -cnv and the default chat template #8068, it still spits out the end token in all responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants