Mistral-7B does not load due to missing pretokenizer #62

MatthewChang · 2024-07-04T04:12:23Z

Running examples/generate_json.py (an other mistral based examples) fails due to

Traceback (most recent call last):
  File "/private/home/matthewchang/work/transformers-CFG/examples/generate_json_array.py", line 24, in <module>
    grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
  File "/private/home/matthewchang/work/transformers-CFG/transformers_cfg/grammar_utils.py", line 13, in __init__
    super().__init__(*args, **kwargs)
  File "/private/home/matthewchang/work/transformers-CFG/transformers_cfg/token_grammar_recognizer.py", line 167, in __init__
    super().__init__(grammar_str, tokenizer, start_rule_name, unicode)
  File "/private/home/matthewchang/work/transformers-CFG/transformers_cfg/token_grammar_recognizer.py", line 36, in __init__
    self.unicode_trie = ByteTrie.from_tokenizer(tokenizer, unicode=unicode)
  File "/private/home/matthewchang/work/transformers-CFG/transformers_cfg/tokenization/trie.py", line 60, in from_tokenizer
    mapping = get_mapping(tokenizer, unicode=unicode)
  File "/private/home/matthewchang/work/transformers-CFG/transformers_cfg/tokenization/mapping.py", line 12, in get_mapping
    log.debug(f"tokenizer model type: {get_tokenizer_model_type(tokenizer)}")
  File "/private/home/matthewchang/work/transformers-CFG/transformers_cfg/utils.py", line 90, in get_tokenizer_model_type
    or tokenizer_json["pre_tokenizer"]["pretokenizers"][1]["type"]
KeyError: 'pretokenizers'

I am able to reproduce this on a main with a clean conda environment. This is fixed by this PR: #61

The text was updated successfully, but these errors were encountered:

Saibo-creator · 2024-07-06T01:45:15Z

Thanks for contributing ! @MatthewChang
I merged your PR :)

FYI, I made a refactoring of the tokenizer interface to reduce some complexity #65 , which get rid of the depedency on this key.

Saibo-creator closed this as completed Jul 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral-7B does not load due to missing pretokenizer #62

Mistral-7B does not load due to missing pretokenizer #62

MatthewChang commented Jul 4, 2024

Saibo-creator commented Jul 6, 2024

Mistral-7B does not load due to missing pretokenizer #62

Mistral-7B does not load due to missing pretokenizer #62

Comments

MatthewChang commented Jul 4, 2024

Saibo-creator commented Jul 6, 2024