Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix func call tokens for internlm2 #8506

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions convert_hf_to_gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -2213,7 +2213,7 @@ def set_vocab(self):

chat_eos_token = '<|im_end|>'
chat_eos_token_id = None

func_call_tokens =('<|plugin|>', '<|interpreter|>', '<|action_end|>', '<|action_start|>')
tokenizer_config_file = self.dir_model / 'tokenizer_config.json'
if tokenizer_config_file.is_file():
with open(tokenizer_config_file, "r", encoding="utf-8") as f:
Expand All @@ -2230,7 +2230,7 @@ def set_vocab(self):
tokens[token_id] = token
scores[token_id] = -1000.0
toktypes[token_id] = SentencePieceTokenTypes.USER_DEFINED
if foken_data.get("special"):
if foken_data.get("special") and not foken_data["content"] in func_call_tokens:
toktypes[token_id] = SentencePieceTokenTypes.CONTROL
Comment on lines +2233 to 2234
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these tokens should be marked as USER_DEFINED; this would mean they would always be specially pre-tokenized even if parse_special is false, making it impossible to avoid injections from text containing these tokens when not desired. This is related to #8228.

If this is simply a display issue, then it might be more appropriate to revisit whether to detokenize the control tokens output by the model.

These may be relevant:

assistant_ss << llama_token_to_piece(ctx, id, false);

const std::string token_str = llama_token_to_piece(ctx, result.tok, false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@compilade hi, these special tokens are used to parse function content in the output string, meaning they should be pre-tokenized even if parse_special is false. The following is an example. Now llama-cli works with --special, but llama-server does not work with --special. Ideally, it's desired only to show these function-call related special tokens.

    tool_calls = None
    if request.tool_choice != 'none' and '<|plugin|>' in text:
        if final_res.finish_reason == 'stop':
            final_res.finish_reason = 'tool_calls'
        # TODO may move to generate function
        text, action = text.split('<|action_start|><|plugin|>')
        action = action.split('<|action_end|>'.strip())[0]
        action = action[action.find('{'):]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RunningLeon Thanks for giving an example.

these special tokens are used to parse function content in the output string, meaning they should be pre-tokenized even if parse_special is false.

If I understand correctly, what you want is to get the function call tokens to render in the output, right? Pre-tokenization is about the input. If these tokens were pre-tokenized even when parse_special is false, this means it would be impossible to include <|plugin|> in some non-special text without the model seeing it as the <|plugin|> token.

For example:

$ ./bin/llama-tokenize --log-disable -m ../models/internlm2_5-vocab.gguf -p "What is a <|plugin|>?"
     1 -> '<s>'
  3993 -> 'What'
   505 -> ' is'
   395 -> ' a'
   262 -> ' '
 92538 -> '<|plugin|>'
   345 -> '?'

$ ./bin/llama-tokenize --log-disable -m ../models/internlm2_5-vocab.gguf -p "What is a <|plugin|>?" --no-parse-special
     1 -> '<s>'
  3993 -> 'What'
   505 -> ' is'
   395 -> ' a'
   497 -> ' <'
   352 -> '|'
  9267 -> 'plugin'
   352 -> '|'
 46973 -> '>?'

If the problem is about the output of llama-server, this should be fixable by changing how it calls the llama_token_to_piece function.

Ideally, it's desired only to show these function-call related special tokens.

If you want to hide control tokens, while still showing these ones, then... hmm. This seems complicated to do with the current token attributes (USER_DEFINED and CONTROL), given that USER_DEFINED is intended for always pre-tokenized tokens like the multi-space tokens in GPT-NeoX, while CONTROL is intended for tokens with special meaning, like <|im_start|>, and in my opinion the function call tokens fit the intention for CONTROL tokens.

Why not show all special tokens, like llama-cli --special does?

Would this work?

diff --git a/examples/server/server.cpp b/examples/server/server.cpp
index badeb912..7813a295 100644
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@@ -1182,7 +1182,7 @@ struct server_context {
 
     bool process_token(completion_token_output & result, server_slot & slot) {
         // remember which tokens were sampled - used for repetition penalties during sampling
-        const std::string token_str = llama_token_to_piece(ctx, result.tok, false);
+        const std::string token_str = llama_token_to_piece(ctx, result.tok, params.special);
         slot.sampled = result.tok;
 
         // search stop word and delete it

Now llama-cli works with --special, but llama-server does not work with --special.

This is because llama-cli handles it here:

const std::string token_str = llama_token_to_piece(ctx, id, params.special);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@compilade thanks for your quick response. Yes, ideally, we want to hide control tokens and show function call related tokens. But since <|plugin|> can be input, there might be a problem. So llama-server with --special works for me. @apresence, hi, what do you think of it as the real user?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe that works! I'm happy to test it once a fix is available.

Copy link
Contributor Author

@RunningLeon RunningLeon Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@compilade @ggerganov hi, guys. What's the good way to include special tokens as input when using llama-cli and llama-server? I find something interesting.
The sys prompt is '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "get_current_weather", "parameters": {"required": ["location"], "type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string"}}}, "description": "Get the current weather in a given location"}]<|im_end|>\n'
which has special tokens.
It does not work when using --prompt to pass the sys prompt while creating service using llama-server, but it works with --system-prompt-file when putting them in a local file. Besides, it also does not work when you put special tokens in messages like this

from openai import OpenAI
client = OpenAI(
    api_key='YOUR_API_KEY',
    base_url='http://localhost:8080/v1'
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
  model=model_name,
  messages=[
    {"role": "system", "content": 'You are InternLM2-Chat, a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "get_current_weather", "parameters": {"required": ["location"], "type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string"}}}, "description": "Get the current weather in a given location"}]'},
    {"role": "user", "content": "I want to know today's weather in Shanghai"},
  ],
  temperature=0.8,
  top_p=0.8
)
print(response)


tokenizer_file = self.dir_model / 'tokenizer.json'
Expand All @@ -2249,7 +2249,7 @@ def set_vocab(self):
tokens[token_id] = token
scores[token_id] = -1000.0
toktypes[token_id] = SentencePieceTokenTypes.USER_DEFINED
if foken_data.get("special"):
if foken_data.get("special") and not foken_data["content"] in func_call_tokens:
toktypes[token_id] = SentencePieceTokenTypes.CONTROL

self.gguf_writer.add_tokenizer_model("llama")
Expand Down
Loading