LLaVaNeXT: pad on right if training #32134

zucchini-nlp · 2024-07-22T10:06:50Z

What does this PR do?

Fixes #32112. The issue didn't mention any bug, as it's using batch-size=1 for training but it led me to this PR. This way we are more confident on which side to pad the inputs. I verified the padding is "right" when training with Trainer and in inference it runs to the block where we infer padding from attn mask

amyeroberts

Thanks for digging into this and fixing!

Could you add a quick test which checks the padding is set as expected for the different modes?

HuggingFaceDocBuilderDev · 2024-07-22T10:31:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2024-07-22T13:35:03Z

Added a test @amyeroberts

amyeroberts

Looks great - thanks for handling and adding the test!

amyeroberts · 2024-07-22T14:54:26Z

docs/source/en/model_doc/llava-next-video.md

@@ -43,6 +43,13 @@ The original code can be found [here](https://github.com/LLaVA-VL/LLaVA-NeXT/tre

 - We advise users to use `padding_side="left"` when computing batched generation as it leads to more accurate results. Simply make sure to call `processor.tokenizer.padding_side = "left"` before generating.

+<Tip warning={true}>
+
+- Llava-Next uses different number of patches for images and thus has to pad the inputs inside modeling code, aside from the padding done when processing the inputs. The default setting is "left-padding" if model is in `eval()` mode, otherwise "right-padding".


pad on right if training

744dad2

amyeroberts reviewed Jul 22, 2024

View reviewed changes

docs

f81de21

add tests

ae898af

amyeroberts reviewed Jul 22, 2024

View reviewed changes

amyeroberts approved these changes Jul 22, 2024

View reviewed changes

zucchini-nlp merged commit 3aefb4e into huggingface:main Jul 23, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaVaNeXT: pad on right if training #32134

LLaVaNeXT: pad on right if training #32134

zucchini-nlp commented Jul 22, 2024 •

edited

Loading

amyeroberts left a comment

HuggingFaceDocBuilderDev commented Jul 22, 2024

zucchini-nlp commented Jul 22, 2024

amyeroberts left a comment

amyeroberts Jul 22, 2024

LLaVaNeXT: pad on right if training #32134

LLaVaNeXT: pad on right if training #32134

Conversation

zucchini-nlp commented Jul 22, 2024 • edited Loading

What does this PR do?

amyeroberts left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 22, 2024

zucchini-nlp commented Jul 22, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Jul 22, 2024

Choose a reason for hiding this comment

zucchini-nlp commented Jul 22, 2024 •

edited

Loading