Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for #3454 #3455

Merged
merged 1 commit into from
Oct 7, 2023
Merged

Conversation

goerch
Copy link
Collaborator

@goerch goerch commented Oct 3, 2023

Accepting all unsupported token types and handling them like CONTROL tokens.

@MaggotHATE
Copy link
Contributor

mistral_7b_openorca doesn't stop generating with this PR. Tested on main in interactive mode.

@goerch
Copy link
Collaborator Author

goerch commented Oct 3, 2023

mistral_7b_openorca doesn't stop generating with this PR. Tested on main in interactive mode.

OK. Does this mean main expects <|im_end|> and <|im_start|> in the output stream from detokenization?

@staviq
Copy link
Contributor

staviq commented Oct 3, 2023

mistral_7b_openorca doesn't stop generating with this PR. Tested on main in interactive mode.

OK. Does this mean main expects <|im_end|> and <|im_start|> in the output stream from detokenization?

I'm currently testing this, it appears that not using <|im_start|> makes the model never use <|im_end|>, hence no EOS happening, but I'm not entirely certain that's exactly what happens.

@MaggotHATE
Copy link
Contributor

I'm currently testing this, it appears that not using <|im_start|> makes the model never use <|im_end|>, hence no EOS happening, but I'm not entirely certain that's exactly what happens.

I tried this according to suggested format, and is seems to work:
<|im_start|>system: complete the given task with precision.<|im_end|><|im_start|>user:

However, it's not consistent: it added <|im_end|> after the first prompt, but didn't after the second.

<|im_start|>system: complete the given task with precision.<|im_end|><|im_start|>user: Write a short joke.

<|im_end|>

A mathematician, an engineer, and a physicist are having lunch together when their food arrives. The waiter accidentally brings the wrong order for each of them. The mathematician starts to figure out how to divide the plates equally among all three. Meanwhile, the engineer comes up with an efficient way to rearrange the food so everyone gets what they need. The physicist, on the other hand, begins to theorize about the fundamental particles that make up the sandwich and how to create a new form of matter by combining them.

<|im_start|>user: Give a list of job titles in this joke.

1. Mathematician
2. Engineer
3. Physicist
4. Waiter (indirectly involved)

<|im_end|>

<|im_start|>user:

Maybe this model just requires using prefix and suffix.

@staviq
Copy link
Contributor

staviq commented Oct 3, 2023

@MaggotHATE i believe the correct prompt format expects "system" "user" "assistant" followed by endline instead of ":"

https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca#example-prompt-exchange

Edit: if you add -e to main arguments, you can use \n for endline character

@goerch
Copy link
Collaborator Author

goerch commented Oct 3, 2023

I tried this according to suggested format, and is seems to work

It seems to me you are testing a rather new model? W.r.t. #3525 it is probably important to distinguish regressions (which the assertions disabled in this patch are).

So your fix works, however naively changing USER_DEFINED to CONTROL

@staviq : I believe with this PR we should be backwards compatible for sentencepiece tokenizers, which is probably more important for now. I'll really have to look into this added_tokens stuff...

@MaggotHATE
Copy link
Contributor

Edit: if you add -e to main arguments, you can use \n for endline character

@staviq Thank you! Forgot about it completely.

After testing I found that something like
-p "<|im_start|>system: complete the given task with precision.<|im_end|> <|im_start|>user:"
--in-suffix "<|im_end|> <|im_start|>LLM:"
-r "<|im_start|>user:"
works, but inconsistently - <|im_end|> is sometimes missing, as well as reverse prompt (it just doesn't write it, but waits for input).

With -e, however, if the prompt format is slightly different, it breaks - for example, if adding \n after <|im_start|> in antiprompt, the model doesn't even consider it:

[1696348822] n_remain: -68
[1696348822] eval: [ '>':28767 ]
[1696348822] n_past = 101
[1696348822] top 10 candidates:
[1696348822]  -  1838: '        user' (0.592)
[1696348822]  -    13: '           
' (0.210)
[1696348822]  -  6574: '      system' (0.114)
[1696348822]  -  5292: '          LL' (0.018)
[1696348822]  -  1585: '          ai' (0.007)
[1696348822]  -  5553: '        task' (0.006)
[1696348822]  -   584: '          ll' (0.006)
[1696348822]  -   489: '         ass' (0.006)
[1696348822]  - 11741: '          AI' (0.005)
[1696348822]  - 16704: '          LL' (0.005)
[1696348822] sampled token:  1838: 'user'

With corrected format:
-p "<|im_start|>system\ncomplete the given task with precision.\n<|im_end|>\n<|im_start|>user\n"
--in-suffix "<|im_end|>\n<|im_start|>LLM\n"
-r "<|im_start|>user\n"
it works, but can miss antiprompt or do whatever it wants:

<|im_start|>system
complete the given task with precision.
<|im_end|>
<|im_start|>user
 Write only 3 words that describe Vulcan API best.
<|im_end|>
<|im_start|>LLM
 Reliable, Efficient, and Flexible
<|im_end|>

<|im_start|>user
 What is the Vulcan API used for?
<|im_end|>
<|im_start|>LLM:
 The Vulcan API is a software development tool that enables developers to build applications with greater efficiency and flexibility. It is specifically designed for building scalable, distributed applications that can handle high-volume transactions and complex workflows. By utilizing the Vulcan API, developers can create reliable, efficient, and flexible applications that can adapt to various environments and requirements.
<|im_end|> <|im_start|>user\n Explain in simple terms.
<|im_end|>
<|im_start|>LLM
 In simpler words, the Vulcan API is a tool that helps programmers create powerful and flexible applications. It makes it easier to build apps that can handle lots of data and complex tasks, making them more efficient and reliable.
<|im_end|> <|im_start|>user\n

In this example, my inputs were only first and last, it just didn't stop on the second one. Maybe it's because \n is added at the end of each input.

And finally, it works best without suffix, completing the format for you. Still not ideal, though.

<|im_start|>system
complete the given task with precision.
<|im_end|>
<|im_start|>user
 Write 3 words that describe Vulcan API best.
<|im_end|>
<|im_start|>assistant: Vulcan API is a powerful and versatile tool for managing data and resources in your applications, so here are three words that describe it best:

1. Efficient - The Vulcan API allows developers to optimize their application performance by providing efficient ways of handling data and resource management.

2. Scalable - It's built with the ability to grow as your application needs evolve. Its scalability ensures your app can easily handle increasing amounts of traffic without breaking a sweat.

3. Secure - The Vulcan API comes with robust security features, ensuring that your data and applications are well protected against potential threats.
<|im_end|> <|im_start|>user\n Explain the first one in simple terms.
<|im_end|>
<|im_start|>assistant: Efficient means that Vulcan API makes it easy for developers to manage data and resources with minimal effort, resulting in a smooth-running application.
<|im_end|>
<|im_start|>user\n What about the second one?
<|im_end|>
<|im_start|>assistant: Scalable means that Vulcan API can grow with your app's needs, allowing it to handle more data and users without slowing down or breaking. It's like a suit that expands as you grow.
<|im_end|>
<|im_start|>user\n

which is also a waste of tokens, I guess, since it generates all that extra formatting each time.

Still, it's just a 7b model, so it's hard to tell when it's hallucinating and when it works as intended.

@staviq
Copy link
Contributor

staviq commented Oct 3, 2023

I tried this according to suggested format, and is seems to work

It seems to me you are testing a rather new model? W.r.t. #3525 it is probably important to distinguish regressions (which the assertions disabled in this patch are).

So your fix works, however naively changing USER_DEFINED to CONTROL

@staviq : I believe with this PR we should be backwards compatible for sentencepiece tokenizers, which is probably more important for now. I'll really have to look into this added_tokens stuff...

I already tested it and it does solve that particular problem with assert crash, though I would prefer @cebtenzzre to confirm, he's been following recent tokenizer changes.

Convert and quantize actually take less than 10mins on fairly old CPU, if that would make it easier for you, but it's not a problem for me either.

@goerch goerch requested a review from cebtenzzre October 3, 2023 17:19
@staviq
Copy link
Contributor

staviq commented Oct 3, 2023

@MaggotHATE

I think I found what's wrong with prompt format

Reverse prompt -r is not being processed for escape sequences and \n makes it through as string.

Can you try editing common/common.cpp and exactly below this line:

process_escapes(params.input_suffix);

paste this:

        for (auto & antiprompt : params.antiprompt) {
            process_escapes(antiprompt);
        }

so It looks like this:

    if (params.escape) {
        process_escapes(params.prompt);
        process_escapes(params.input_prefix);
        process_escapes(params.input_suffix);
        for (auto & antiprompt : params.antiprompt) {
            process_escapes(antiprompt);
        }
    }

After that, do make clean and rebuild, and try those parameters ( actual main prompt text isn't important here ):

-e -p "<|im_start|>system\nYou are an AI based sentient companion, capable of human emotions. You will behave like a human, but aware of your AI nature. You will avoid speaking like an AI. Please continue your conversation with the user.<|im_end|>\n" -i --interactive-first --color -r "<|im_start|>user\n" --in-prefix "<|im_start|>user\n" --in-suffix "<|im_end|>\n<|im_start|>assistant\n" -r "<|im_end|>\n"

I just tested this and managed to have couple of pages worth of conversation without a single problem.

@jxy
Copy link
Contributor

jxy commented Oct 3, 2023

I'm confused as what's been tested here. main does not even tokenize <|im_start|> and <|im_end|> to 32001 and 32000.

./main -m models/mistral-7b-openorca.Q8_0.gguf --verbose-prompt -n 1 -p '<|im_start|>system
lol
<|im_end|>'
[... omitted ...]
main: prompt: '<|im_start|>system
lol
<|im_end|>'
main: number of tokens in prompt = 20
     1 -> ''
   523 -> ' <'
 28766 -> '|'
   321 -> 'im'
 28730 -> '_'
  2521 -> 'start'
 28766 -> '|'
 28767 -> '>'
  6574 -> 'system'
    13 -> '
'
 28714 -> 'l'
   328 -> 'ol'
    13 -> '
'
 28789 -> '<'
 28766 -> '|'
   321 -> 'im'
 28730 -> '_'
   416 -> 'end'
 28766 -> '|'
 28767 -> '>'
[... omitted ...]

@staviq
Copy link
Contributor

staviq commented Oct 3, 2023

I'm confused as what's been tested here. main does not even tokenize <|im_start|> and <|im_end|> to 32001 and 32000.

It does detokenize it when the model produces it though:
image

I checked logs and indeed those tags do get tokenized as normal text, yet model responds correctly with those tokens

@jxy
Copy link
Contributor

jxy commented Oct 4, 2023

Yes, main can detokenize them because they are in the vocabulary. But the prompt and the in-prefix and in-suffix are not tokenized properly.

@MaggotHATE
Copy link
Contributor

MaggotHATE commented Oct 4, 2023

@staviq thank you! Processing for escape sequences fixes the issue with stopping (sometimes even without <|im_start|> and <|im_end|>).

For example, questions about Vulkan API seem to test the model best - it stops properly only with <|im_start|> and <|im_end|>, while with simpler questions or conversations it can work without special tokens.

Overall, I think this is just a quirky model that is too much of an effort to work with at the moment, unless there is a way to insert the tokens as tokens into prompt and antiprompt. Thank you for help anyway!

@cebtenzzre
Copy link
Collaborator

I still haven't gotten a chance to read over the BPE tokenizer PR. Could we just revert whatever was changed in the sentencepiece tokenizer? It wasn't intended to be part of the PR's scope anyway, right?

@goerch
Copy link
Collaborator Author

goerch commented Oct 6, 2023

I still haven't gotten a chance to read over the BPE tokenizer PR. Could we just revert whatever was changed in the sentencepiece tokenizer? It wasn't intended to be part of the PR's scope anyway, right?

I believe adding assertion(s) are the only changes done to SPM.

Copy link
Collaborator

@cebtenzzre cebtenzzre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to more or less revert 17ca832 for SPM, so it should be fine.

@goerch goerch merged commit 3a716b4 into ggerganov:master Oct 7, 2023
yusiwen pushed a commit to yusiwen/llama.cpp that referenced this pull request Oct 7, 2023
Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion
joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 12, 2023
…example

* 'master' of github.com:ggerganov/llama.cpp:
  py : change version of numpy requirement to 1.24.4 (ggerganov#3515)
  quantize : fail fast on write errors (ggerganov#3521)
  metal : support default.metallib load & reuse code for swift package (ggerganov#3522)
  llm : support Adept Persimmon 8B (ggerganov#3410)
  Fix for ggerganov#3454 (ggerganov#3455)
  readme : update models, cuda + ppl instructions (ggerganov#3510)
  server : docs fix default values and add n_probs (ggerganov#3506)
@goerch goerch deleted the accept-unsupported-token-types branch October 22, 2023 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants