Generate: consistently handle special tokens as tensors #29788

gante · 2024-03-21T19:31:48Z

What does this PR do?

To enable torch.compile with generate, some special token-related operations have to be rewritten into torch operations. That requires special tokens to be tensors instead of integers or a list of integers. (See #29374 for a working prototype)

This PR reworks special token usage in generate to consistently treat them as a tensor, as opposed to e.g. keeping track of eos_token_id in integer and in tensor form.

👉 Review suggestion: start by reading _prepare_special_tokens and how it fits in generate.

Requirements before merging this PR:

merge Generate: fix logits processors doctests #29718 [Fixes doctests]

Tests ran locally:

logits processors doctests (pytest --doctest-modules src/transformers/generation/logits_process.py -vv), needs requirement to be merged first
generate doctests (pytest --doctest-modules src/transformers/generation/utils.py -vv)
generate integration tests (RUN_SLOW=1 py.test tests/generation/test_utils.py -vv)
cache integration tests (RUN_SLOW=1 py.test tests/test_cache_utils.py -vv) -- same failures as in main
llama slow tests (RUN_SLOW=1 py.test tests/models/llama/test_modeling_llama.py -vv)
whisper slow tests (RUN_SLOW=1 py.test tests/models/whisper/test_modeling_whisper.py -vv) -- same failures as in main

HuggingFaceDocBuilderDev · 2024-03-21T19:50:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2024-03-25T16:33:16Z

src/transformers/generation/utils.py

@@ -567,24 +570,6 @@ def _prepare_decoder_input_ids_for_generation(

        return decoder_input_ids, model_kwargs

-    def _get_decoder_start_token_id(


The logic of this function is now within _prepare_special_tokens, which preprocesses all special tokens

gante · 2024-03-25T16:35:17Z

src/transformers/generation/utils.py

@@ -1221,6 +1208,55 @@ def _prepare_generation_config(

        return generation_config, model_kwargs

+    def _prepare_special_tokens(


ALL preprocessing logic for the special tokens now resides in this function 🧹

gante · 2024-03-25T16:36:25Z

src/transformers/generation/utils.py

@@ -1221,6 +1208,55 @@ def _prepare_generation_config(

        return generation_config, model_kwargs

+    def _prepare_special_tokens(
+        self, generation_config: GenerationConfig, kwargs_has_attention_mask: Optional[bool] = None


kwargs_has_attention_mask is an optional argument so we can use this function in tests, to prepare special tokens.

gante · 2024-03-25T16:38:03Z

src/transformers/generation/utils.py

-            eos_token_id = [eos_token_id]
-        eos_token_id_tensor = torch.tensor(eos_token_id).to(input_ids.device) if eos_token_id is not None else None
+
+        if not isinstance(pad_token_id, torch.Tensor):


The decoding functions are backward compatible (for now), and we can still pass int/list(int) as special tokens.

The doctests in generate test this.

gante · 2024-03-25T16:39:58Z

src/transformers/models/musicgen/modeling_musicgen.py

@@ -2170,6 +2170,24 @@ def _maybe_initialize_input_ids_for_generation(
                break
        return torch.ones((batch_size, 1), dtype=torch.long, device=self.device) * bos_token_id

+    def _get_decoder_start_token_id(


Musicgen (and its melody variant) have their own custom generate, relying on this method.

I've intentionally not updated this custom generate, to pressure us into moving towards a single generate function.

gante · 2024-03-25T16:41:14Z

src/transformers/generation/utils.py

    ) -> torch.LongTensor:
+        # No information for attention mask inference -> return default attention mask


The logic rewritten in functions like this is torch.compile(..., fullgraph=True) compatible 😉

zucchini-nlp

Thanks for working on this 😄

src/transformers/generation/utils.py

src/transformers/generation/logits_process.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

gante · 2024-03-29T17:01:34Z

let's merge #29956 first, so the diff here becomes much smaller (the EOS-as-stopping-criteria made the diff more elaborate)

(Arthur -- don't review this one until that is merged, I'll ping you again)

github-actions · 2024-04-23T08:03:47Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

gante changed the title ~~Generate: special tokens as tensors~~ Generate: consistently handle special tokens as tensors Mar 22, 2024

gante force-pushed the special_tokens_as_tensors branch from a746766 to 7863c5d Compare March 25, 2024 12:03

gante mentioned this pull request Mar 25, 2024

Pass device in Logits Processor's init #29804

Merged

gante marked this pull request as ready for review March 25, 2024 16:30

gante requested review from ArthurZucker and zucchini-nlp March 25, 2024 16:30

gante commented Mar 25, 2024

View reviewed changes

zucchini-nlp approved these changes Mar 26, 2024

View reviewed changes

src/transformers/generation/utils.py Show resolved Hide resolved

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

src/transformers/generation/logits_process.py Show resolved Hide resolved

gante and others added 15 commits March 29, 2024 12:18

tmp commit

cd110d1

variable name

fe2c907

fix _prepare_decoder_input_ids_for_generation

83c4730

eos is always 1D tensor

74461b6

default attn mask with input embeds

835a19d

clone tensor

1c3aaea

last test fixed?

1f73173

logits processors with eos token as tensor input

f347ad3

special tokens as tensors in the decoding functions

c8bff34

eos_token_id is optional in NoBadWordsLogitsProcessor

1a45607

fix last test?

013e02d

plan B: update the attributes in generation_config

a01c798

forgot the most important part in the previous commit :D

29dd408

PR suggestions

2ff9f39

Update src/transformers/generation/utils.py

dd5bf8c

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

gante force-pushed the special_tokens_as_tensors branch from 7537207 to dd5bf8c Compare March 29, 2024 13:06

stopping criteria with tensor input

48c2c45

gante removed the request for review from ArthurZucker March 29, 2024 17:47

github-actions bot closed this May 1, 2024

gante mentioned this pull request May 2, 2024

Generate: consistently handle special tokens as tensors #30624

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: consistently handle special tokens as tensors #29788

Generate: consistently handle special tokens as tensors #29788

gante commented Mar 21, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 21, 2024

gante Mar 25, 2024

gante Mar 25, 2024

gante Mar 25, 2024

gante Mar 25, 2024 •

edited

Loading

gante Mar 25, 2024

gante Mar 25, 2024

zucchini-nlp left a comment

gante commented Mar 29, 2024 •

edited

Loading

github-actions bot commented Apr 23, 2024

		@@ -567,24 +570,6 @@ def _prepare_decoder_input_ids_for_generation(

		return decoder_input_ids, model_kwargs

		def _get_decoder_start_token_id(

		@@ -1221,6 +1208,55 @@ def _prepare_generation_config(

		return generation_config, model_kwargs

		def _prepare_special_tokens(

		) -> torch.LongTensor:
		# No information for attention mask inference -> return default attention mask

Generate: consistently handle special tokens as tensors #29788

Generate: consistently handle special tokens as tensors #29788

Conversation

gante commented Mar 21, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 21, 2024

gante Mar 25, 2024

Choose a reason for hiding this comment

gante Mar 25, 2024

Choose a reason for hiding this comment

gante Mar 25, 2024

Choose a reason for hiding this comment

gante Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

gante Mar 25, 2024

Choose a reason for hiding this comment

gante Mar 25, 2024

Choose a reason for hiding this comment

zucchini-nlp left a comment

Choose a reason for hiding this comment

gante commented Mar 29, 2024 • edited Loading

github-actions bot commented Apr 23, 2024

gante commented Mar 21, 2024 •

edited

Loading

gante Mar 25, 2024 •

edited

Loading

gante commented Mar 29, 2024 •

edited

Loading