`StaticCache` Bad generation results with Llama after v4.39.0 #30417

mobicham · 2024-04-23T10:42:46Z

System Info

transformers version: 4.41.0.dev0
Platform: Linux-5.15.0-89-generic-x86_64-with-glibc2.
Python version: 3.10.
Huggingface_hub version: 0.20.
Safetensors version: 0.4.
Accelerate version: 0.21.0

Who can help?

@ArthurZucker @gante

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

The generation output quality with the current 4.41.0.dev0 version is very bad compared to the previous 4.39.0 version, at least with Llama. With quantized models, it outputs complete gibberish. The same code works totally fine with 4.39.0

import torch, os
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id  = "meta-llama/Llama-2-7b-chat-hf"
model     = AutoModelForCausalLM.from_pretrained(model_id, cache_dir='.', torch_dtype=torch.float16, attn_implementation="sdpa").to('cuda').eval();
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir='.') 

tokenizer.add_bos_token = False
tokenizer.add_eos_token = False

prompt = "<s>[INST] How do I build a car? [/INST]"

gen_out = model.generate(**tokenizer([prompt], return_tensors="pt").to(model.device), do_sample=False, 
                                                cache_implementation="static", max_new_tokens=100, pad_token_id=tokenizer.eos_token_id, 
                                                temperature=None, top_p=None, use_cache=False)

print()
print(tokenizer.decode(gen_out[0]))

# version: 4.39.0 - works as expected
<s> [INST] How do I build a car? [/INST]  Building a car is a complex and challenging project that requires a significant amount of time, money, and expertise. Here are some general steps that you might consider when building a car:

1. Define your goals: What kind of car do you want to build? What features do you want to include? What is your budget? Answering these questions will help you determine the scope of your project and what you need to do to get started.
2. Research and plan:

# version: 4.41.0 - bad output, outputs gibberish 
<s> [INST] How do I build a car? [/INST]  I's (the 0-2) are dots d's the traveling 4 v5 8 out the9 of the9 1t 1 ch do not always and the9 10 11 is-not 1 rt 1 c0 the0.

To build a car, you will need to have a good understanding of mechanical systems, electrical systems, and fabrication techniques. You will also need to have a

Expected behavior

The output should be the same as with the previous 4.39.0 version

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-04-23T12:33:35Z

Mmm that's super weird, it's most probably generate as the test_torch_compile here is all green

transformers/tests/models/llama/test_modeling_llama.py

Line 686 in 7bfe577

def test_compile_static_cache(self):

gante · 2024-04-23T13:56:30Z

having a look

amyeroberts added the Generation label Apr 23, 2024

mobicham mentioned this issue Apr 23, 2024

Static cache is locked after torch.compile with model.generate #30351

Closed

4 tasks

ArthurZucker changed the title ~~Bad generation results with Llama after v4.39.0~~ StaticCache Bad generation results with Llama after v4.39.0 Apr 23, 2024

This was referenced Apr 23, 2024

Llama: SDPA FA2 path + static cache fix #30437

Closed

Cache: Static cache as a standalone object #30476

Merged

gante closed this as completed in #30476 Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`StaticCache` Bad generation results with Llama after v4.39.0 #30417

`StaticCache` Bad generation results with Llama after v4.39.0 #30417

mobicham commented Apr 23, 2024

ArthurZucker commented Apr 23, 2024

gante commented Apr 23, 2024

StaticCache Bad generation results with Llama after v4.39.0 #30417

StaticCache Bad generation results with Llama after v4.39.0 #30417

Comments

mobicham commented Apr 23, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Apr 23, 2024

gante commented Apr 23, 2024

`StaticCache` Bad generation results with Llama after v4.39.0 #30417

`StaticCache` Bad generation results with Llama after v4.39.0 #30417