-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix func call tokens for internlm2 #8506
Conversation
if foken_data.get("special") and not foken_data["content"] in func_call_tokens: | ||
toktypes[token_id] = SentencePieceTokenTypes.CONTROL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure these tokens should be marked as USER_DEFINED; this would mean they would always be specially pre-tokenized even if parse_special
is false
, making it impossible to avoid injections from text containing these tokens when not desired. This is related to #8228.
If this is simply a display issue, then it might be more appropriate to revisit whether to detokenize the control tokens output by the model.
These may be relevant:
llama.cpp/examples/main/main.cpp
Line 858 in 5e116e8
assistant_ss << llama_token_to_piece(ctx, id, false); |
llama.cpp/examples/server/server.cpp
Line 1185 in 5e116e8
const std::string token_str = llama_token_to_piece(ctx, result.tok, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@compilade hi, these special tokens are used to parse function content in the output string, meaning they should be pre-tokenized even if parse_special
is false
. The following is an example. Now llama-cli
works with --special
, but llama-server
does not work with --special
. Ideally, it's desired only to show these function-call related special tokens.
tool_calls = None
if request.tool_choice != 'none' and '<|plugin|>' in text:
if final_res.finish_reason == 'stop':
final_res.finish_reason = 'tool_calls'
# TODO may move to generate function
text, action = text.split('<|action_start|><|plugin|>')
action = action.split('<|action_end|>'.strip())[0]
action = action[action.find('{'):]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RunningLeon Thanks for giving an example.
these special tokens are used to parse function content in the output string, meaning they should be pre-tokenized even if
parse_special
isfalse
.
If I understand correctly, what you want is to get the function call tokens to render in the output, right? Pre-tokenization is about the input. If these tokens were pre-tokenized even when parse_special
is false
, this means it would be impossible to include <|plugin|>
in some non-special text without the model seeing it as the <|plugin|>
token.
For example:
$ ./bin/llama-tokenize --log-disable -m ../models/internlm2_5-vocab.gguf -p "What is a <|plugin|>?"
1 -> '<s>'
3993 -> 'What'
505 -> ' is'
395 -> ' a'
262 -> ' '
92538 -> '<|plugin|>'
345 -> '?'
$ ./bin/llama-tokenize --log-disable -m ../models/internlm2_5-vocab.gguf -p "What is a <|plugin|>?" --no-parse-special
1 -> '<s>'
3993 -> 'What'
505 -> ' is'
395 -> ' a'
497 -> ' <'
352 -> '|'
9267 -> 'plugin'
352 -> '|'
46973 -> '>?'
If the problem is about the output of llama-server
, this should be fixable by changing how it calls the llama_token_to_piece
function.
Ideally, it's desired only to show these function-call related special tokens.
If you want to hide control tokens, while still showing these ones, then... hmm. This seems complicated to do with the current token attributes (USER_DEFINED and CONTROL), given that USER_DEFINED is intended for always pre-tokenized tokens like the multi-space tokens in GPT-NeoX, while CONTROL is intended for tokens with special meaning, like <|im_start|>
, and in my opinion the function call tokens fit the intention for CONTROL tokens.
Why not show all special tokens, like llama-cli --special
does?
Would this work?
diff --git a/examples/server/server.cpp b/examples/server/server.cpp
index badeb912..7813a295 100644
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@@ -1182,7 +1182,7 @@ struct server_context {
bool process_token(completion_token_output & result, server_slot & slot) {
// remember which tokens were sampled - used for repetition penalties during sampling
- const std::string token_str = llama_token_to_piece(ctx, result.tok, false);
+ const std::string token_str = llama_token_to_piece(ctx, result.tok, params.special);
slot.sampled = result.tok;
// search stop word and delete it
Now
llama-cli
works with--special
, butllama-server
does not work with--special
.
This is because llama-cli
handles it here:
llama.cpp/examples/main/main.cpp
Line 766 in 5e116e8
const std::string token_str = llama_token_to_piece(ctx, id, params.special); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@compilade thanks for your quick response. Yes, ideally, we want to hide control tokens and show function call related tokens. But since <|plugin|>
can be input, there might be a problem. So llama-server
with --special
works for me. @apresence, hi, what do you think of it as the real user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe that works! I'm happy to test it once a fix is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@compilade @ggerganov hi, guys. What's the good way to include special tokens as input when using llama-cli
and llama-server
? I find something interesting.
The sys prompt
is '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "get_current_weather", "parameters": {"required": ["location"], "type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string"}}}, "description": "Get the current weather in a given location"}]<|im_end|>\n'
which has special tokens.
It does not work when using --prompt
to pass the sys prompt while creating service using llama-server
, but it works with --system-prompt-file
when putting them in a local file. Besides, it also does not work when you put special tokens in messages
like this
from openai import OpenAI
client = OpenAI(
api_key='YOUR_API_KEY',
base_url='http://localhost:8080/v1'
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
model=model_name,
messages=[
{"role": "system", "content": 'You are InternLM2-Chat, a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "get_current_weather", "parameters": {"required": ["location"], "type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string"}}}, "description": "Get the current weather in a given location"}]'},
{"role": "user", "content": "I want to know today's weather in Shanghai"},
],
temperature=0.8,
top_p=0.8
)
print(response)
Fixed in #8553 |
Fix function call tokens are not shown when calling
llama-server
for internlm2Related issue #8405