Generate: store special token tensors under a unique variable name #31980

gante · 2024-07-15T16:26:08Z

What does this PR do?

See this comment in the end-to-end generation PR: #30788 (comment)

Problem TL;DR

Compiled code can't create tensors as x=torch.tensor(x) (tensor output var name = non-tensor input var name), the graph overwrites the argument (non-tensor) with the output (tensor), causing the graph to be missing the input to this node the 2nd time it is called (because it was overwritten). It is a known limitation of torch.compile.

Our code that converts the special tokens into tensors falls into this pattern.

Solution

Write the special tokens converted into tensors under a new variable name. A bit annoying, we will now have e.g. pad_token_id (integer) and pad_token_tensor (tensor) in the generation_config object throughout generate.

Note: We can't change the variable name where these tokens are read from, which would be much cleaner. This is because:

in end-to-end compilation we can't deepcopy
the attribute is set outside generate (at GenerationConfig creation time)

as such, even if we create an auxiliary attribute with a different name to read from, the original attribute will be overwritten, leading to the original issue 😢

gante · 2024-07-15T16:28:32Z

src/transformers/generation/utils.py

+            # If `generation_config` is provided, let's fallback ALL special tokens to the default values for the model
+            if not using_model_generation_config:
+                if generation_config.bos_token_id is None:
+                    generation_config.bos_token_id = self.generation_config.bos_token_id
+                if generation_config.eos_token_id is None:
+                    generation_config.eos_token_id = self.generation_config.eos_token_id
+                if generation_config.pad_token_id is None:
+                    generation_config.pad_token_id = self.generation_config.pad_token_id
+                if generation_config.decoder_start_token_id is None:
+                    generation_config.decoder_start_token_id = self.generation_config.decoder_start_token_id


This is equivalent to the changes in this PR, which are better suited to this function -- handling retrocompatibility wrt config files

gante · 2024-07-15T16:29:12Z

tests/generation/test_utils.py

@@ -3196,6 +3196,39 @@ def test_assisted_decoding_in_gpu_cpu(self):
        )
        self.assertTrue(input_length <= out.shape[-1] <= input_length + 20)

+    def test_special_tokens_fall_back_to_model_default(self):


This test was missing in #31254 😛

HuggingFaceDocBuilderDev · 2024-07-15T16:59:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Wow nice finding!

ArthurZucker · 2024-07-16T12:13:32Z

src/transformers/generation/utils.py

+        # NOTE: this must be written into a different attribute name than the one holding the original special tokens
+        # (in their non-tensor form), in order to enable end-to-end compilation. See
+        # https://pytorch.org/docs/stable/torch.compiler_cudagraph_trees.html#limitations
+        generation_config.bos_token_tensor = bos_token_tensor
+        generation_config.eos_token_tensor = eos_token_tensor
+        generation_config.pad_token_tensor = eos_token_tensor
+        generation_config.decoder_start_token_tensor = decoder_start_token_tensor


would it work to have a property, eos_token_id with _eos_token_tensor underlying? When you set it you cast to tensor format. might be simpler in general ?

uhmmm @property will further tangle us with state/python classes, which I'm not a fan of for compile purposes 🤔

I am going to rename the tensor variables from xxx_token_tensor to _xxx_token_tensor though, to help with readability!

gante · 2024-07-16T15:25:00Z

src/transformers/models/musicgen/modeling_musicgen.py

@@ -1539,75 +1539,43 @@ def generate(
        logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
        stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()

-        if generation_config.pad_token_id is None and generation_config.eos_token_id is not None:


some more musicgen standardization :) (=copy paste new, upgraded patterns from the main generate)

confirmed: slow tests have the same failures as on main

zucchini-nlp

An interesting behavior from compile. Thanks for handling!

gante requested review from ArthurZucker and zucchini-nlp July 15, 2024 16:26

gante commented Jul 15, 2024

View reviewed changes

gante added 2 commits July 15, 2024 16:40

rename stuff

9ed1e49

english; this one shouldn't be changed

d21329f

gante force-pushed the different_var_special_tokens branch from 3e4948d to d21329f Compare July 15, 2024 16:40

ArthurZucker approved these changes Jul 16, 2024

View reviewed changes

gante added 3 commits July 16, 2024 14:48

add a _ to the new var names

10c5972

musicgen

baafeaa

derp

47108a3

gante commented Jul 16, 2024

View reviewed changes

zucchini-nlp approved these changes Jul 17, 2024

View reviewed changes

gante merged commit c38c55f into huggingface:main Jul 22, 2024
23 checks passed

gante deleted the different_var_special_tokens branch July 22, 2024 13:06

JingyaHuang mentioned this pull request Sep 18, 2024

Add support for multiple controlnet huggingface/optimum-neuron#691

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: store special token tensors under a unique variable name #31980

Generate: store special token tensors under a unique variable name #31980

gante commented Jul 15, 2024 •

edited

Loading

gante Jul 15, 2024

gante Jul 15, 2024

HuggingFaceDocBuilderDev commented Jul 15, 2024

ArthurZucker left a comment

ArthurZucker Jul 16, 2024

gante Jul 16, 2024

gante Jul 16, 2024 •

edited

Loading

gante Jul 16, 2024

zucchini-nlp left a comment

Generate: store special token tensors under a unique variable name #31980

Generate: store special token tensors under a unique variable name #31980

Conversation

gante commented Jul 15, 2024 • edited Loading

What does this PR do?

Problem TL;DR

Solution

gante Jul 15, 2024

Choose a reason for hiding this comment

gante Jul 15, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 15, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Jul 16, 2024

Choose a reason for hiding this comment

gante Jul 16, 2024

Choose a reason for hiding this comment

gante Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

gante Jul 16, 2024

Choose a reason for hiding this comment

zucchini-nlp left a comment

Choose a reason for hiding this comment

gante commented Jul 15, 2024 •

edited

Loading

gante Jul 16, 2024 •

edited

Loading