Generate: fix logits processors doctests #29718

gante · 2024-03-18T20:25:11Z

What does this PR do?

The doctests got stale 👀 (related PR to prevent this from happening again: #29716)

There are 2 main categories of fixes:

Fixes where there is randomness involved: updates seed and potentially the output of the "bad" example. I can reproduce the existing doctest results if I go back to an older version (like v4.35), but I don't think it's worth diving through to find the root cause, as many harmless things can change the output of sampling;
Whisper fixes (cc @sanchit-gandhi )

All tests are passing after these changes (pytest --doctest-modules src/transformers/generation/logits_process.py -vv)

gante · 2024-03-18T20:26:35Z

cc @zucchini-nlp, to rebase your PRs after this gets merged :)

gante · 2024-03-18T20:27:24Z

src/transformers/generation/logits_process.py

    >>> def prefix_allowed_tokens_fn(batch_id, input_ids):
    ...     '''
    ...     Attempts to generate 'Bob Marley' when 'Bob' is detected.
    ...     In this case, `batch_id` is not used, but you can set rules for each batch member.
    ...     '''
    ...     if input_ids[-1] == entity[0]:
-    ...         return entity[1]
+    ...         return [entity[1].item()]


prefix_allowed_tokens_fn should be a Callable[[int, torch.Tensor], List[int]], as explained in the docs

gante · 2024-03-18T20:28:30Z

src/transformers/generation/logits_process.py

@@ -1604,13 +1610,13 @@ class LogitNormalization(LogitsProcessor, LogitsWarper):
    >>> # By default, the scores are not normalized -- the sum of their exponentials is NOT a normalized probability
    >>> # distribution, summing to 1
    >>> outputs = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
-    >>> print(torch.sum(torch.exp(outputs.scores[-1])))
-    tensor(816.3250)


This value was sensible to numerical fluctuations across versions, and this exact value was not relevant for the test. The main point is that it is not approximately 1.0 :)

gante · 2024-03-18T20:29:28Z

src/transformers/generation/logits_process.py

@@ -1641,7 +1647,7 @@ class SuppressTokensAtBeginLogitsProcessor(LogitsProcessor):
    >>> # Whisper has `begin_suppress_tokens` set by default (= `[220, 50256]`). 50256 is the EOS token, so this means
    >>> # it can't generate and EOS token in the first iteration, but it can in the others.
    >>> outputs = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
-    >>> print(outputs.scores[1][0, 50256])  # 1 (and not 0) is the first freely generated token
+    >>> print(outputs.scores[0][0, 50256])


Whisper processor changes: @sanchit-gandhi let me know if they make sense, according to recent changes in Whisper

Looks good to me - thanks for the updated @gante!

gante · 2024-03-18T20:30:16Z

src/transformers/generation/logits_process.py

@@ -1714,36 +1720,6 @@ class ForceTokensLogitsProcessor(LogitsProcessor):
    indices that will be forced before generation. The processor will set their log probs to `inf` so that they are
    sampled at their corresponding index. Originally created for
    [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper).
-
-    Examples:


This processor is going to be removed in v4.40, so I didn't want to spend time fixing the test :D

gante · 2024-03-18T20:31:55Z

src/transformers/models/whisper/generation_whisper.py

-        else:
-            generation_config = copy.deepcopy(generation_config)
+        # 1. prepare generation config
+        generation_config, kwargs = self._prepare_generation_config(generation_config, **kwargs)


This function from the main generate body (_prepare_generation_config) pulls generation parameterization from kwargs into generation_config.

Some Whisper-based doctests were incorrect without this functionality.

amyeroberts

Thanks for working on this fix!

I have a few questions about the changes, in particular why we need to change the seed

amyeroberts · 2024-03-18T20:29:02Z

src/transformers/models/whisper/generation_whisper.py

-        else:
-            generation_config = copy.deepcopy(generation_config)
+        # 1. prepare generation config
+        generation_config, kwargs = self._prepare_generation_config(generation_config, **kwargs)


The lines above imply there's a self.generation_config which should be used if generation_config is None

self._prepare_generation_config() does precisely that:

transformers/src/transformers/generation/utils.py

Line 1204 in 484e10f

generation_config = self.generation_config

It is a more complex version of this original if/else that preserves additional backward (and forward!) compatibility features of generate :)

amyeroberts · 2024-03-18T20:29:23Z

src/transformers/generation/logits_process.py

-    >>> set_seed(0)
+    >>> set_seed(1)


Why change the seed?

The seed is changed because the sample output is changed (more on that below), and a new seed was selected to illustrate the point of the example 🤗 I wanted a seed that produced a bad output in the unparameterized call and a good output in the parameterized call. Bear in mind that the model used in the examples is very small, and thus noisy with sampling.

We need to change the seed because the output of sampling has changed. There are many innocuous changes that can cause this: tiny numerical differences due to different versions, tiny numerical differences due to reordering of operations, corrections in the architecture code, different RNG behavior in torch (unlikely), and so on. As I've written in the PR header, I don't think it's worth our time finding the exact cause. The results in most other sampling tests are unchanged, there are many innocuous changes that can cause this, and it may be time-consuming to pin the cause.

src/transformers/generation/logits_process.py

amyeroberts · 2024-03-18T20:32:56Z

src/transformers/generation/logits_process.py

-
-    Examples:
-    ```python
-    >>> from transformers import AutoProcessor, WhisperForConditionalGeneration
-    >>> from datasets import load_dataset
-
-    >>> processor = AutoProcessor.from_pretrained("openai/whisper-tiny.en")
-    >>> model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
-    >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
-    >>> inputs = processor(ds[0]["audio"]["array"], return_tensors="pt")
-
-    >>> # This Whisper model forces the generation to start with `50362` at the first position by default, i.e.
-    >>> # `"forced_decoder_ids": [[1, 50362]]`. This means all other tokens are masked out.
-    >>> outputs = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
-    >>> print(
-    ...     all(outputs.scores[0][0, i] == float("-inf") for i in range(processor.tokenizer.vocab_size) if i != 50362)
-    ... )
-    True
-    >>> print(outputs.scores[0][0, 50362])
-    tensor(0.)
-
-    >>> # If we disable `forced_decoder_ids`, we stop seeing that effect
-    >>> outputs = model.generate(**inputs, return_dict_in_generate=True, output_scores=True, forced_decoder_ids=None)
-    >>> print(
-    ...     all(outputs.scores[0][0, i] == float("-inf") for i in range(processor.tokenizer.vocab_size) if i != 50362)
-    ... )
-    False
-    >>> print(outputs.scores[0][0, 50362])
-    tensor(19.3140)
-    ```


Why remove the example here?

This processor is going to be removed in v4.40, so I didn't want to spend time fixing the test :D

:)

amyeroberts · 2024-03-18T20:34:31Z

src/transformers/generation/logits_process.py

+    >>> print(torch.allclose(torch.sum(torch.exp(outputs.scores[-1])), torch.Tensor((1.000,)), rtol=1e-4))
+    False


The previous output was more informative imo - there's infinitely many ways to not be close to 1

True, but it is beyond the scope of the example -- the key point here is adding the flag normalizes the probability distribution.

Testing against the exact number caused the test to fail. In fact, if we run this test on different hardware (local compute vs DGX), we get a slightly different number. We could work around it with torch.allclose, but I don't think it adds value to the test :)

amyeroberts · 2024-03-18T20:36:43Z

src/transformers/generation/logits_process.py

@@ -1641,7 +1647,7 @@ class SuppressTokensAtBeginLogitsProcessor(LogitsProcessor):
    >>> # Whisper has `begin_suppress_tokens` set by default (= `[220, 50256]`). 50256 is the EOS token, so this means
    >>> # it can't generate and EOS token in the first iteration, but it can in the others.
    >>> outputs = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
-    >>> print(outputs.scores[1][0, 50256])  # 1 (and not 0) is the first freely generated token


out of interest - what changed here?

I believe the indexing of first freely decoded token changed recently in Whisper, but I'd like to have @sanchit-gandhi confirming the correctness of these changes :)

This might be a possible BC issue :/

HuggingFaceDocBuilderDev · 2024-03-18T20:45:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks for working on this!

Happy with the changes - only concern is the difference in the whisper processor @sanchit-gandhi can you confirm this?

amyeroberts · 2024-03-20T10:54:53Z

src/transformers/generation/logits_process.py

+    >>> print(torch.allclose(torch.sum(torch.exp(outputs.scores[-1])), torch.Tensor((1.000,)), rtol=1e-4))
+    False


amyeroberts · 2024-03-20T10:55:27Z

src/transformers/generation/logits_process.py

@@ -1714,36 +1720,6 @@ class ForceTokensLogitsProcessor(LogitsProcessor):
    indices that will be forced before generation. The processor will set their log probs to `inf` so that they are
    sampled at their corresponding index. Originally created for
    [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper).
-
-    Examples:


sanchit-gandhi

Thanks for the fixes @gante!

gante added 2 commits March 18, 2024 19:25

fix norm

f97477f

fix logits processors doctests

6ef20c2

gante requested review from sanchit-gandhi and amyeroberts March 18, 2024 20:25

gante commented Mar 18, 2024

View reviewed changes

amyeroberts reviewed Mar 18, 2024

View reviewed changes

gante mentioned this pull request Mar 19, 2024

Clean-up generation tests after moving methods to private #29582

Merged

5 tasks

amyeroberts approved these changes Mar 20, 2024

View reviewed changes

gante mentioned this pull request Mar 22, 2024

Generate: consistently handle special tokens as tensors #29788

Closed

7 tasks

sanchit-gandhi approved these changes Apr 2, 2024

View reviewed changes

gante merged commit 5080ab1 into huggingface:main Apr 2, 2024
19 checks passed

gante deleted the fix_logits_doctests branch April 2, 2024 16:18

gante mentioned this pull request May 2, 2024

Generate: consistently handle special tokens as tensors #30624

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: fix logits processors doctests #29718

Generate: fix logits processors doctests #29718

gante commented Mar 18, 2024 •

edited

Loading

gante commented Mar 18, 2024

gante Mar 18, 2024 •

edited

Loading

gante Mar 18, 2024

gante Mar 18, 2024 •

edited

Loading

sanchit-gandhi Apr 2, 2024

gante Mar 18, 2024

amyeroberts Mar 20, 2024

gante Mar 18, 2024

amyeroberts left a comment

amyeroberts Mar 18, 2024

gante Mar 19, 2024 •

edited

Loading

amyeroberts Mar 18, 2024

gante Mar 19, 2024

amyeroberts Mar 18, 2024

gante Mar 19, 2024

amyeroberts Mar 18, 2024

gante Mar 19, 2024

amyeroberts Mar 20, 2024

amyeroberts Mar 18, 2024

gante Mar 19, 2024

amyeroberts Mar 20, 2024

HuggingFaceDocBuilderDev commented Mar 18, 2024

amyeroberts left a comment

amyeroberts Mar 20, 2024

amyeroberts Mar 20, 2024

sanchit-gandhi left a comment

		>>> print(torch.allclose(torch.sum(torch.exp(outputs.scores[-1])), torch.Tensor((1.000,)), rtol=1e-4))
		False

Generate: fix logits processors doctests #29718

Generate: fix logits processors doctests #29718

Conversation

gante commented Mar 18, 2024 • edited Loading

What does this PR do?

gante commented Mar 18, 2024

gante Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante Mar 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Mar 18, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanchit-gandhi left a comment

Choose a reason for hiding this comment

gante commented Mar 18, 2024 •

edited

Loading

gante Mar 18, 2024 •

edited

Loading

gante Mar 18, 2024 •

edited

Loading

gante Mar 19, 2024 •

edited

Loading