fix gemma-2-27b text generation pytest #1828

skaulintel · 2025-03-06T22:20:07Z

fixes the following pytest

python -m pytest tests/test_text_generation_example.py tests/test_encoder_decoder.py -v -s -k "gemma-2-27b and test_text_generation_bf16_1x" --token=****

without it, get the following assertionerror

E           AssertionError: assert False
E            +  where False = <built-in function eq>('DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep', 'DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient')

conftest.py:74: AssertionError
========================================================================================== short test summary info ===========================================================================================
FAILED tests/test_text_generation_example.py::test_text_generation_bf16_1x[google/gemma-2-27b-1-False-True] - AssertionError: assert False

regisss · 2025-03-07T13:15:12Z

I don't think there is an issue with Gemma2. The reason why I added the code block

if self.config.final_logit_softcapping is not None:
    ...

is because it has been in Transformers since Gemma2 was added. I'm not sure why it was not included here in #1280 and #1504 (any idea @billishyahao @Luca-Calabria ?).

final_logit_softcapping is actually specified in the configuration of the model so this piece of code is indeed used.

Moreover, the output of the model with this change still makes sense:

DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep

I think what we should do here is rather to update the baseline here:

optimum-habana/tests/baselines/fixture/tests/test_text_generation_example.json

Line 225 in 74ee9e6

    
           "output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

uartie · 2025-03-07T16:25:50Z

I think what we should do here is rather to update the baseline here:

optimum-habana/tests/baselines/fixture/tests/test_text_generation_example.json

Line 225 in 74ee9e6

"output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

You can use rebase to update baseline:

python -m pytest --rebase tests/test_text_generation_example.py::test_text_generation_bf16_1x[google/gemma-2-27b-1-False-True]

skaulintel · 2025-03-07T19:04:54Z

I don't think there is an issue with Gemma2. The reason why I added the code block
if self.config.final_logit_softcapping is not None:
    ...
is because it has been in Transformers since Gemma2 was added. I'm not sure why it was not included here in #1280 and #1504 (any idea @billishyahao @Luca-Calabria ?).

final_logit_softcapping is actually specified in the configuration of the model so this piece of code is indeed used.

Moreover, the output of the model with this change still makes sense:

DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep

I think what we should do here is rather to update the baseline here:

optimum-habana/tests/baselines/fixture/tests/test_text_generation_example.json

Line 225 in 74ee9e6

"output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

it makes sense, but there seems to be a lot of repetition. The output before this change seemed a little better.

regisss · 2025-03-07T21:15:49Z

This happens with greedy search, especially with models that have not been instruction fine-tuned. I'll take a look to see how to get more realistic results by tweaking a few generation parameters.

Luca-Calabria · 2025-03-07T21:52:59Z

I don't think there is an issue with Gemma2. The reason why I added the code block
if self.config.final_logit_softcapping is not None:
    ...
is because it has been in Transformers since Gemma2 was added. I'm not sure why it was not included here in #1280 and #1504 (any idea @billishyahao @Luca-Calabria ?).

final_logit_softcapping is actually specified in the configuration of the model so this piece of code is indeed used.

Moreover, the output of the model with this change still makes sense:

DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep

I think what we should do here is rather to update the baseline here:

optimum-habana/tests/baselines/fixture/tests/test_text_generation_example.json

Line 225 in 74ee9e6

"output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

I have not a clear answer why it was not part of Gemma2 enabling PRs, but if this block was part of transformers and was not integrated on Gemma2 for Gaudi then it is something to add.
The baseline should be changed accordingly the new output.

regisss · 2025-03-10T10:47:21Z

@skaulintel It seems casting the logits to float when they are extracted from the forward pass of the model solves it: 02c4aa0#diff-c7b7c0b91ade41a0c87f1ad1f6784e4d51fb88c6a65f350042aca052b7ca1558R960

This used to be done in previous versions of Transformers. Now they have removed it but it seems it slightly affects a few models on Gaudi. So I reverted this change in the commit posted above. Closing this PR.

fix gemma-2-27b text generation pytest

25d8cb7

skaulintel requested a review from regisss as a code owner March 6, 2025 22:20

skaulintel requested review from libinta, shepark, jiminha and hsubramony March 6, 2025 22:20

libinta added the transformers_4_49 label Mar 7, 2025

regisss closed this Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gemma-2-27b text generation pytest #1828

fix gemma-2-27b text generation pytest #1828

skaulintel commented Mar 6, 2025

regisss commented Mar 7, 2025

uartie commented Mar 7, 2025

skaulintel commented Mar 7, 2025

regisss commented Mar 7, 2025

Luca-Calabria commented Mar 7, 2025

regisss commented Mar 10, 2025

fix gemma-2-27b text generation pytest #1828

fix gemma-2-27b text generation pytest #1828

Conversation

skaulintel commented Mar 6, 2025

regisss commented Mar 7, 2025

uartie commented Mar 7, 2025

skaulintel commented Mar 7, 2025

regisss commented Mar 7, 2025

Luca-Calabria commented Mar 7, 2025

regisss commented Mar 10, 2025