Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix func call tokens for internlm2 #8506

Closed
wants to merge 1 commit into from

Conversation

RunningLeon
Copy link
Contributor

Fix function call tokens are not shown when calling llama-server for internlm2
Related issue #8405

@github-actions github-actions bot added the python python script changes label Jul 16, 2024
Comment on lines +2233 to 2234
if foken_data.get("special") and not foken_data["content"] in func_call_tokens:
toktypes[token_id] = SentencePieceTokenTypes.CONTROL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these tokens should be marked as USER_DEFINED; this would mean they would always be specially pre-tokenized even if parse_special is false, making it impossible to avoid injections from text containing these tokens when not desired. This is related to #8228.

If this is simply a display issue, then it might be more appropriate to revisit whether to detokenize the control tokens output by the model.

These may be relevant:

assistant_ss << llama_token_to_piece(ctx, id, false);

const std::string token_str = llama_token_to_piece(ctx, result.tok, false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@compilade hi, these special tokens are used to parse function content in the output string, meaning they should be pre-tokenized even if parse_special is false. The following is an example. Now llama-cli works with --special, but llama-server does not work with --special. Ideally, it's desired only to show these function-call related special tokens.

    tool_calls = None
    if request.tool_choice != 'none' and '<|plugin|>' in text:
        if final_res.finish_reason == 'stop':
            final_res.finish_reason = 'tool_calls'
        # TODO may move to generate function
        text, action = text.split('<|action_start|><|plugin|>')
        action = action.split('<|action_end|>'.strip())[0]
        action = action[action.find('{'):]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RunningLeon Thanks for giving an example.

these special tokens are used to parse function content in the output string, meaning they should be pre-tokenized even if parse_special is false.

If I understand correctly, what you want is to get the function call tokens to render in the output, right? Pre-tokenization is about the input. If these tokens were pre-tokenized even when parse_special is false, this means it would be impossible to include <|plugin|> in some non-special text without the model seeing it as the <|plugin|> token.

For example:

$ ./bin/llama-tokenize --log-disable -m ../models/internlm2_5-vocab.gguf -p "What is a <|plugin|>?"
     1 -> '<s>'
  3993 -> 'What'
   505 -> ' is'
   395 -> ' a'
   262 -> ' '
 92538 -> '<|plugin|>'
   345 -> '?'

$ ./bin/llama-tokenize --log-disable -m ../models/internlm2_5-vocab.gguf -p "What is a <|plugin|>?" --no-parse-special
     1 -> '<s>'
  3993 -> 'What'
   505 -> ' is'
   395 -> ' a'
   497 -> ' <'
   352 -> '|'
  9267 -> 'plugin'
   352 -> '|'
 46973 -> '>?'

If the problem is about the output of llama-server, this should be fixable by changing how it calls the llama_token_to_piece function.

Ideally, it's desired only to show these function-call related special tokens.

If you want to hide control tokens, while still showing these ones, then... hmm. This seems complicated to do with the current token attributes (USER_DEFINED and CONTROL), given that USER_DEFINED is intended for always pre-tokenized tokens like the multi-space tokens in GPT-NeoX, while CONTROL is intended for tokens with special meaning, like <|im_start|>, and in my opinion the function call tokens fit the intention for CONTROL tokens.

Why not show all special tokens, like llama-cli --special does?

Would this work?

diff --git a/examples/server/server.cpp b/examples/server/server.cpp
index badeb912..7813a295 100644
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@@ -1182,7 +1182,7 @@ struct server_context {
 
     bool process_token(completion_token_output & result, server_slot & slot) {
         // remember which tokens were sampled - used for repetition penalties during sampling
-        const std::string token_str = llama_token_to_piece(ctx, result.tok, false);
+        const std::string token_str = llama_token_to_piece(ctx, result.tok, params.special);
         slot.sampled = result.tok;
 
         // search stop word and delete it

Now llama-cli works with --special, but llama-server does not work with --special.

This is because llama-cli handles it here:

const std::string token_str = llama_token_to_piece(ctx, id, params.special);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@compilade thanks for your quick response. Yes, ideally, we want to hide control tokens and show function call related tokens. But since <|plugin|> can be input, there might be a problem. So llama-server with --special works for me. @apresence, hi, what do you think of it as the real user?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe that works! I'm happy to test it once a fix is available.

Copy link
Contributor Author

@RunningLeon RunningLeon Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@compilade @ggerganov hi, guys. What's the good way to include special tokens as input when using llama-cli and llama-server? I find something interesting.
The sys prompt is '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "get_current_weather", "parameters": {"required": ["location"], "type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string"}}}, "description": "Get the current weather in a given location"}]<|im_end|>\n'
which has special tokens.
It does not work when using --prompt to pass the sys prompt while creating service using llama-server, but it works with --system-prompt-file when putting them in a local file. Besides, it also does not work when you put special tokens in messages like this

from openai import OpenAI
client = OpenAI(
    api_key='YOUR_API_KEY',
    base_url='http://localhost:8080/v1'
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
  model=model_name,
  messages=[
    {"role": "system", "content": 'You are InternLM2-Chat, a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "get_current_weather", "parameters": {"required": ["location"], "type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string"}}}, "description": "Get the current weather in a given location"}]'},
    {"role": "user", "content": "I want to know today's weather in Shanghai"},
  ],
  temperature=0.8,
  top_p=0.8
)
print(response)

@ggerganov
Copy link
Owner

Fixed in #8553

@ggerganov ggerganov closed this Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants