Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix gemma-2-27b text generation pytest #1828

Closed
wants to merge 1 commit into from

Conversation

skaulintel
Copy link
Collaborator

fixes the following pytest

python -m pytest tests/test_text_generation_example.py tests/test_encoder_decoder.py -v -s -k "gemma-2-27b and test_text_generation_bf16_1x" --token=****

without it, get the following assertionerror

E           AssertionError: assert False
E            +  where False = <built-in function eq>('DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep', 'DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient')

conftest.py:74: AssertionError
========================================================================================== short test summary info ===========================================================================================
FAILED tests/test_text_generation_example.py::test_text_generation_bf16_1x[google/gemma-2-27b-1-False-True] - AssertionError: assert False

@regisss
Copy link
Collaborator

regisss commented Mar 7, 2025

I don't think there is an issue with Gemma2. The reason why I added the code block

if self.config.final_logit_softcapping is not None:
    ...

is because it has been in Transformers since Gemma2 was added. I'm not sure why it was not included here in #1280 and #1504 (any idea @billishyahao @Luca-Calabria ?).

final_logit_softcapping is actually specified in the configuration of the model so this piece of code is indeed used.

Moreover, the output of the model with this change still makes sense:

DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep

I think what we should do here is rather to update the baseline here:

"output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

@uartie
Copy link
Contributor

uartie commented Mar 7, 2025

I think what we should do here is rather to update the baseline here:

"output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

You can use rebase to update baseline:

python -m pytest --rebase tests/test_text_generation_example.py::test_text_generation_bf16_1x[google/gemma-2-27b-1-False-True]

@skaulintel
Copy link
Collaborator Author

I don't think there is an issue with Gemma2. The reason why I added the code block

if self.config.final_logit_softcapping is not None:
    ...

is because it has been in Transformers since Gemma2 was added. I'm not sure why it was not included here in #1280 and #1504 (any idea @billishyahao @Luca-Calabria ?).

final_logit_softcapping is actually specified in the configuration of the model so this piece of code is indeed used.

Moreover, the output of the model with this change still makes sense:

DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep

I think what we should do here is rather to update the baseline here:

"output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

it makes sense, but there seems to be a lot of repetition. The output before this change seemed a little better.

@regisss
Copy link
Collaborator

regisss commented Mar 7, 2025

This happens with greedy search, especially with models that have not been instruction fine-tuned. I'll take a look to see how to get more realistic results by tweaking a few generation parameters.

@Luca-Calabria
Copy link
Contributor

I don't think there is an issue with Gemma2. The reason why I added the code block

if self.config.final_logit_softcapping is not None:
    ...

is because it has been in Transformers since Gemma2 was added. I'm not sure why it was not included here in #1280 and #1504 (any idea @billishyahao @Luca-Calabria ?).

final_logit_softcapping is actually specified in the configuration of the model so this piece of code is indeed used.

Moreover, the output of the model with this change still makes sense:

DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep

I think what we should do here is rather to update the baseline here:

"output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

I have not a clear answer why it was not part of Gemma2 enabling PRs, but if this block was part of transformers and was not integrated on Gemma2 for Gaudi then it is something to add.
The baseline should be changed accordingly the new output.

@regisss
Copy link
Collaborator

regisss commented Mar 10, 2025

@skaulintel It seems casting the logits to float when they are extracted from the forward pass of the model solves it: 02c4aa0#diff-c7b7c0b91ade41a0c87f1ad1f6784e4d51fb88c6a65f350042aca052b7ca1558R960

This used to be done in previous versions of Transformers. Now they have removed it but it seems it slightly affects a few models on Gaudi. So I reverted this change in the commit posted above. Closing this PR.

@regisss regisss closed this Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants