[Tokenizer][bug] LLAVA 1.6 tokenizer problem #31901

lanking520 · 2024-07-11T05:46:26Z

System Info

Any OS system that can run transformers

related issue: vllm-project/vllm#6224

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

transformers (4.42.3) and got the issue

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("llava-hf/llava-v1.6-34b-hf")
print(tokenizer.encode("<image>"))

print(tokenizer.vocab_size)

output

[64003]
64000

Can confirm transformers (4.40.1) does not have this issue

[64000]
64000

Expected behavior

It is supposed to be the same?

[64000]
64000

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-07-11T06:03:36Z

After trying this code on a few different versions, it looks like this got changed between 4.41.2 and 4.42.0.

DarkLight1337 · 2024-07-11T06:06:12Z

Also, this issue only occurs for llava-hf/llava-v1.6-34b-hf. It works fine for llava-hf/llava-v1.6-mistral-7b-hf and llava-hf/llava-v1.6-vicuna-7b-hf.

Update: I think this is the same issue as #31713.

zucchini-nlp · 2024-07-11T06:40:11Z

Answered in #31713 (comment)

zucchini-nlp · 2024-07-12T04:25:02Z

Closing because #31902 was merged

zucchini-nlp closed this as completed Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tokenizer][bug] LLAVA 1.6 tokenizer problem #31901

[Tokenizer][bug] LLAVA 1.6 tokenizer problem #31901

lanking520 commented Jul 11, 2024

DarkLight1337 commented Jul 11, 2024

DarkLight1337 commented Jul 11, 2024 •

edited

Loading

zucchini-nlp commented Jul 11, 2024

zucchini-nlp commented Jul 12, 2024

[Tokenizer][bug] LLAVA 1.6 tokenizer problem #31901

[Tokenizer][bug] LLAVA 1.6 tokenizer problem #31901

Comments

lanking520 commented Jul 11, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

DarkLight1337 commented Jul 11, 2024

DarkLight1337 commented Jul 11, 2024 • edited Loading

zucchini-nlp commented Jul 11, 2024

zucchini-nlp commented Jul 12, 2024

DarkLight1337 commented Jul 11, 2024 •

edited

Loading