A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539

YunLiu1 · 2024-12-02T03:24:31Z

A generation issue: when ignore_eos=False and the model's pad_token==eos_token (like Llama3), the generated results in same batch size erased.

What does this PR do?

When generating text with Optimum-habana, if BS>1, ignore_eos=False, the model's pad_token==eos_token (like Llama3.1-8B),
This is an example:
I submit 2 prompts ("Hello world,", "How are you?"), the BS=2, the short one is padded at the left:

After generation, the first pad_token is recognized as eos_token, and the response is erased.

Fixes

I modified the post-process to ignore the left pad_tokens, and only erase the tokens after real eos_token.

Unit Tests

The changed codes passed this Unit Tests.
eos_test_py.txt

Function Tests

And it passed the Function Tests:
lm_eval mmlu_pro_business for Meta-Llama-3.1-8B-Instruct (pad_token=eos_toekn, bs=8):

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
business	1	custom-extract	5	exact_match	↑	0.4791	±	0.0178

lm_eval mmlu_pro_business for llama2-7b (pad_token=0, bs=8):

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
business	1	custom-extract	5	exact_match	↑	0.1888	±	0.0139

…oken (like Llama3), the generated results in same batch size erased.

regisss · 2024-12-02T21:20:20Z

@YunLiu1 Can you provide an example of command that enables to reproduce this issue please?

YunLiu1 · 2024-12-03T00:12:08Z

@YunLiu1 Can you provide an example of command that enables to reproduce this issue please?

Sure, because in run_generation.py the ignore_eos is always True, you need to change the code first,
Edit examples/text-generation/run_generation.py, L512, explicitly set "ignore_eos=False,",
Then run the cmd:

python3 ~/optimum-habana/examples/text-generation/run_generation.py
--model_name_or_path /host/mnt/disk3/hf_models/Meta-Llama-3.1-8B
--use_hpu_graphs --use_kv_cache --bf16 --batch_size 2
--warmup 0 --n_iterations 1
--prompt "Hello world," "How are you?"

There is no output for the short prompt "Hello world,"

regisss · 2024-12-10T09:44:08Z

@YunLiu1 When I run this command, I get:

Input/outputs:
input 1: ('Hello world,',)
output 1.1: ('Hello world, I am a new member of the forum. I am a 20 year old male from the UK. I have been diagnosed with Aspergers and ADHD. I have been diagnosed with Aspergers for 2 years now. I have been diagnosed with ADHD for 1 year now. I have been diagnosed with depression for 2 years now. I have been diagnosed with anxiety for 1 year now. I have been diagnosed with OCD for 1 year now. I have been diagnosed with social',)

input 2: ('How are you?',)
output 2.1: ('How are you? I hope you are well. I am writing to you to ask for your help. I am a student at the University of the West Indies, Mona Campus. I am currently doing a research project on the topic of the impact of the COVID-19 pandemic on the mental health of Jamaican youth. I am hoping to get your help with this project. I am asking you to complete a survey that will take about 10 minutes of your time. The survey is completely anonymous and confidential. I am',)

which looks fine. Can you try again on the latest main branch and let me know if you still see it please?

Besides, you can set ignore_eos to False from the command line using the argument --no-ignore_eos.

regisss · 2024-12-10T09:47:34Z

It's possible #1546 and #1569 helped to fix this

YunLiu1 · 2024-12-11T00:19:03Z

@regisss
Hi I confirmed this issue has bee fixed in #1569, this PR is no longer needed.

For the issue: when ignore_eos=False and the model's pad_token==eos_t…

134eabd

…oken (like Llama3), the generated results in same batch size erased.

YunLiu1 requested review from ssarkar2, bhargaveede and vivekgoe as code owners December 2, 2024 03:24

regisss closed this Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539

A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539

YunLiu1 commented Dec 2, 2024

regisss commented Dec 2, 2024

YunLiu1 commented Dec 3, 2024 •

edited

Loading

regisss commented Dec 10, 2024

regisss commented Dec 10, 2024

YunLiu1 commented Dec 11, 2024

A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539

A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539

Conversation

YunLiu1 commented Dec 2, 2024

What does this PR do?

Fixes

Unit Tests

Function Tests

regisss commented Dec 2, 2024

YunLiu1 commented Dec 3, 2024 • edited Loading

regisss commented Dec 10, 2024

regisss commented Dec 10, 2024

YunLiu1 commented Dec 11, 2024

YunLiu1 commented Dec 3, 2024 •

edited

Loading