Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviors of generate() between versions before and after 4.41.* on mistralai/Mistral-7B-Instruct-v0.2 #31251

Closed
1 of 4 tasks
bzhangj13zzz opened this issue Jun 5, 2024 · 3 comments · Fixed by #31254

Comments

@bzhangj13zzz
Copy link

bzhangj13zzz commented Jun 5, 2024

System Info

  • transformers version: 4.41.1
  • Platform: Linux-5.15.0-84-generic-x86_64-with-glibc2.31
  • Python version: 3.9.19
  • Huggingface_hub version: 0.23.1
  • Safetensors version: 0.4.3
  • Accelerate version: 0.30.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@ArthurZucker @gante

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The following code has different behaviors before and after v4.41.* on mistralai/Mistral-7B-Instruct-v0.2, it may affect more models but I have only tested on this model.

from importlib.metadata import version
model = AutoModelForCausalLM.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', device_map="auto")
tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2')
messages = [{"role": "user", "content": "This is a test message"}]
messages = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
tokenizer.pad_token = tokenizer.eos_token
model_inputs = tokenizer(messages, return_tensors="pt", padding=True, add_special_tokens=False).to('cuda')

generation_config = GenerationConfig(do_sample=False, max_new_tokens=256, pad_token_id=tokenizer.eos_token_id)
generated_ids = model.generate(
    **model_inputs, generation_config=generation_config)[:, model_inputs["input_ids"].shape[1] :]

generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

Furthermore, when I provide the generation configuration in the following way, the output is also not consistent in v4.41*:

from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', device_map="auto")
tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2')
messages = [{"role": "user", "content": "This is a test message"}]
messages = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
tokenizer.pad_token = tokenizer.eos_token
model_inputs = tokenizer(messages, return_tensors="pt", padding=True, add_special_tokens=False).to('cuda')

# Differs here
generated_ids = model.generate(
    **model_inputs, max_new_tokens=256, do_sample=False, pad_token_id=tokenizer.eos_token_id)[:, model_inputs["input_ids"].shape[1] :]

generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

Expected behavior

In previous versions (e.g. 4.39.3), both ways of providing the configuration parameters will lead to the same generation:

I see. If you have any specific question or request, feel free to ask and I'll do my best to help you out. Otherwise, this message will be considered as a test and no further action will be taken. Let me know if you need anything else.

But in v4.41.* ( e.g. 4.41.1), the first way will generate:

I see. If you have any specific question or request, feel free to ask and I'll do my best to help you out. Otherwise, this message will be considered as a test and no further action will be taken. Let me know if you need anything else.\n\nHere are some examples of questions or requests I can help with:\n\n* "What's the weather like in New York City today?"\n* "Can you help me find a recipe for chocolate chip cookies?"\n* "How do I set up a new email account?"\n* "What's the best way to get from JFK airport to Times Square?"\n* "Can you recommend a good book to read?"\n\nLet me know if you have any other question or request, and I'll be happy to help. Otherwise, have a great day!\n\nIf you have any other question or request, feel free to ask and I'll do my best to help you out. Otherwise, this message will be considered as a test and no further action will be taken. Let me know if you need anything else.\n\nHere are some examples of questions or requests I can help with:\n\n* "What's the weather

while the second way will generate:

I see. If you have any specific question or request, feel free to ask and I'll do my best to help you out. Otherwise, this message will be considered as a test and no further action will be taken. Let me know if you need anything else.

which is consistent with the older versions ( I suppose it's the expected behavior)

The behavior of generate() seems to be inconsistent both across versions and within 4.41.*. Or maybe I have misunderstood the documentation on how to provide the configuration parameters. Any help will be much appreciated!

@ArthurZucker
Copy link
Collaborator

cc @zucchini-nlp

@exs-dmiketa
Copy link

Came across this exact same error, I believe, but for a different model. It feels like passing e.g. eos_token=None inside GenerationConfig is now interpreted not as "do not change EOS token" but "set EOS token to None".

@bzhangj13zzz
Copy link
Author

It does seem to be the issue, if I add eos_token_id=tokenizer.eos_token_id to the generation_config it will work as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants