-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizer SPM fixes for phi-3 and llama-spm #7375
Conversation
The file 'added_tokens.json' does not exist for phi-3 or llama-spm. Read from 'tokenizer_config.json'. Then read from 'tokenizer.json'.
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
The server tests are now failing:
Seems like the tokenizer fixes have changed the contents of the generated text and the regex might need some adjusting |
a0704bf
to
7fb66eb
Compare
Oh, sorry. I found the problem.
I'm fixing... |
Modifications to make SPM tokenizer match AutoTokenizer.
Tested with vocabs from models phi-3 and llama-spm.