Inconsistent behaviors of generate() between versions before and after 4.41.* on mistralai/Mistral-7B-Instruct-v0.2 #31251

bzhangj13zzz · 2024-06-05T05:46:01Z

System Info

transformers version: 4.41.1
Platform: Linux-5.15.0-84-generic-x86_64-with-glibc2.31
Python version: 3.9.19
Huggingface_hub version: 0.23.1
Safetensors version: 0.4.3
Accelerate version: 0.30.1
Accelerate config: not found
PyTorch version (GPU?): 2.3.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help?

@ArthurZucker @gante

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

The following code has different behaviors before and after v4.41.* on mistralai/Mistral-7B-Instruct-v0.2, it may affect more models but I have only tested on this model.

from importlib.metadata import version
model = AutoModelForCausalLM.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', device_map="auto")
tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2')
messages = [{"role": "user", "content": "This is a test message"}]
messages = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
tokenizer.pad_token = tokenizer.eos_token
model_inputs = tokenizer(messages, return_tensors="pt", padding=True, add_special_tokens=False).to('cuda')

generation_config = GenerationConfig(do_sample=False, max_new_tokens=256, pad_token_id=tokenizer.eos_token_id)
generated_ids = model.generate(
    **model_inputs, generation_config=generation_config)[:, model_inputs["input_ids"].shape[1] :]

generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

Furthermore, when I provide the generation configuration in the following way, the output is also not consistent in v4.41*:

from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', device_map="auto")
tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2')
messages = [{"role": "user", "content": "This is a test message"}]
messages = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
tokenizer.pad_token = tokenizer.eos_token
model_inputs = tokenizer(messages, return_tensors="pt", padding=True, add_special_tokens=False).to('cuda')

# Differs here
generated_ids = model.generate(
    **model_inputs, max_new_tokens=256, do_sample=False, pad_token_id=tokenizer.eos_token_id)[:, model_inputs["input_ids"].shape[1] :]

generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

Expected behavior

In previous versions (e.g. 4.39.3), both ways of providing the configuration parameters will lead to the same generation:

I see. If you have any specific question or request, feel free to ask and I'll do my best to help you out. Otherwise, this message will be considered as a test and no further action will be taken. Let me know if you need anything else.

But in v4.41.* ( e.g. 4.41.1), the first way will generate:

I see. If you have any specific question or request, feel free to ask and I'll do my best to help you out. Otherwise, this message will be considered as a test and no further action will be taken. Let me know if you need anything else.\n\nHere are some examples of questions or requests I can help with:\n\n* "What's the weather like in New York City today?"\n* "Can you help me find a recipe for chocolate chip cookies?"\n* "How do I set up a new email account?"\n* "What's the best way to get from JFK airport to Times Square?"\n* "Can you recommend a good book to read?"\n\nLet me know if you have any other question or request, and I'll be happy to help. Otherwise, have a great day!\n\nIf you have any other question or request, feel free to ask and I'll do my best to help you out. Otherwise, this message will be considered as a test and no further action will be taken. Let me know if you need anything else.\n\nHere are some examples of questions or requests I can help with:\n\n* "What's the weather

while the second way will generate:

I see. If you have any specific question or request, feel free to ask and I'll do my best to help you out. Otherwise, this message will be considered as a test and no further action will be taken. Let me know if you need anything else.

which is consistent with the older versions ( I suppose it's the expected behavior)

The behavior of generate() seems to be inconsistent both across versions and within 4.41.*. Or maybe I have misunderstood the documentation on how to provide the configuration parameters. Any help will be much appreciated!

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-06-05T05:48:22Z

cc @zucchini-nlp

exs-dmiketa · 2024-06-05T08:30:22Z

Came across this exact same error, I believe, but for a different model. It feels like passing e.g. eos_token=None inside GenerationConfig is now interpreted not as "do not change EOS token" but "set EOS token to None".

bzhangj13zzz · 2024-06-06T05:06:36Z

It does seem to be the issue, if I add eos_token_id=tokenizer.eos_token_id to the generation_config it will work as expected.

bzhangj13zzz mentioned this issue Jun 5, 2024

OIE Question clear-nus/edc#4

Closed

zucchini-nlp mentioned this issue Jun 5, 2024

Generation: fix handling of special tokens #31254

Merged

amyeroberts added the Generation label Jun 6, 2024

zucchini-nlp closed this as completed in #31254 Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent behaviors of generate() between versions before and after 4.41.* on mistralai/Mistral-7B-Instruct-v0.2 #31251

Inconsistent behaviors of generate() between versions before and after 4.41.* on mistralai/Mistral-7B-Instruct-v0.2 #31251

bzhangj13zzz commented Jun 5, 2024 •

edited

Loading

ArthurZucker commented Jun 5, 2024

exs-dmiketa commented Jun 5, 2024

bzhangj13zzz commented Jun 6, 2024

Inconsistent behaviors of generate() between versions before and after 4.41.* on mistralai/Mistral-7B-Instruct-v0.2 #31251

Inconsistent behaviors of generate() between versions before and after 4.41.* on mistralai/Mistral-7B-Instruct-v0.2 #31251

Comments

bzhangj13zzz commented Jun 5, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Jun 5, 2024

exs-dmiketa commented Jun 5, 2024

bzhangj13zzz commented Jun 6, 2024

bzhangj13zzz commented Jun 5, 2024 •

edited

Loading