Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Uniformize processors in text+image multimodal models. #27768

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

molbap
Copy link
Contributor

@molbap molbap commented Nov 30, 2023

What does this PR do?

This PR is a work in progress aiming at uniformizing all text-image multimodal processors. Ideally, leveraging AutoProcessor(...) or an equivalent for every model would be the best.

The processor is one of the most fundamental blocks of transformers, and modifying it can only be done with careful deprecation cycles. It is however the opportunity to enforce a standard, design-wise, for future processing utilties and down-the-line pipeline integrations.

For instance align has a current __call__ method def __call__(self, text=None, images=None, padding="max_length", max_length=64, return_tensors=None, **kwargs)
altclip has __call__(self, text=None, images=None, return_tensors=None, **kwargs)
blip has

    def __call__(
        self,
        images: ImageInput = None,
        text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
        add_special_tokens: bool = True,
        padding: Union[bool, str, PaddingStrategy] = False,
        truncation: Union[bool, str, TruncationStrategy] = None,
        max_length: Optional[int] = None,
        stride: int = 0,
        pad_to_multiple_of: Optional[int] = None,
        return_attention_mask: Optional[bool] = None,
        return_overflowing_tokens: bool = False,
        return_special_tokens_mask: bool = False,
        return_offsets_mapping: bool = False,
        return_token_type_ids: bool = False,
        return_length: bool = False,
        verbose: bool = True,
        return_tensors: Optional[Union[str, TensorType]] = None,
        **kwargs,
    ) -> BatchEncoding:

And so on, with recently for instance Kosmos-2

    def __call__(
        self,
        images: ImageInput = None,
        text: Union[TextInput, List[TextInput]] = None,
        bboxes: BboxInput = None,
        num_image_tokens: Optional[int] = 64,
        first_image_token_id: Optional[int] = None,
        add_special_tokens: bool = True,
        add_eos_token: bool = False,
        padding: Union[bool, str, PaddingStrategy] = False,
        truncation: Union[bool, str, TruncationStrategy] = None,
        max_length: Optional[int] = None,
        pad_to_multiple_of: Optional[int] = None,
        return_attention_mask: Optional[bool] = None,
        return_length: bool = False,
        verbose: bool = True,
        return_tensors: Optional[Union[str, TensorType]] = None,
        **kwargs,
    ) -> BatchFeature:

Currently, there are 30 text + image models that have a dedicated processing_<model> file. All should be reviewed and made pipeline-compatible. All of them have to be checked, modified or wrapped with a common class.

  • align
  • altclip
  • blip
  • blip_2
  • bridgetower
  • chinese_clip
  • clipseg
  • clip
  • donut
  • flava
  • fuyu
  • git
  • idefics
  • instructblip
  • kosmos2
  • layoutlmv2
  • layoutlmv3
  • layoutxlm
  • mgp_str
  • nougat
  • oneformer
  • owlv2
  • owlvit
  • perceiver
  • pix2struct
  • troc
  • tvp
  • vilt
  • vision_text_dual_encoder
  • x_clip

Related works:

Before submitting

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@LysandreJik
Copy link
Member

Still being worked on but a longer-term project; putting the WIP label so that the bot doesn't close it.

@LysandreJik LysandreJik added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Dec 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants