(Willing to PR) Make `tokenizer.padding_side` an argument instead of only being a field #30447

fzyzcjy · 2024-04-24T05:43:07Z

Hi thanks for the library! When using tokenizer, for example, for batch-generation with GPT2 (in https://discuss.huggingface.co/t/batch-generation-with-gpt2/1517), it seems that currently I have to do something like:

tokenizer.padding_side = 'left'
data = tokenizer(['sentence one', 'another'])
tokenizer.padding_side = 'right'

Therefore, it would be great to have:

data = tokenizer(['sentence one', 'another'], padding_side = 'left')

just like what we do today for many options like padding_strategy etc.

(see above)

Yes, I am willing to PR

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-04-24T08:27:53Z

ArthurZucker · 2024-05-20T09:29:13Z

Sure, feel free to open a PR and ping @itazap 🤗

amyeroberts added Core: Tokenization Internals of the library; Tokenization. Feature request Request for a new feature labels Apr 24, 2024

zucchini-nlp mentioned this issue Sep 3, 2024

Uniformize kwargs for LLaVa processor and update docs #32858

Merged

5 tasks

Provide feedback