Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for BLIP-2 multimodal feature extraction #25474

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

youssefadr
Copy link
Contributor

@youssefadr youssefadr commented Aug 12, 2023

What does this PR do?

This PR introduces the addition of get_image_feature and get_text_feature methods to the Blip2ForConditionalGeneration class. These changes align with the original Qformer implementation, which utilized both text and image inputs.

The current implementation in HuggingFace lacks support for multimodal embeddings, especially the capacity to extract embeddings by passing both text and image to the QFormer. This PR addresses this shortcoming.

Fixes #25300 #25245

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@sgugger
Copy link
Collaborator

sgugger commented Aug 13, 2023

cc @amyeroberts and @younesbelkada

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @youssefadr ! let us know when the PR is ready for review

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Sep 21, 2023
@ArthurZucker
Copy link
Collaborator

Hey @youssefadr, feel free to re-open this if you still plan on adding this feature

@youssefadr
Copy link
Contributor Author

@ArthurZucker Thank you, I'll do it, I am sorry I couldn't find time lately for it. It is on my backlog!

@youssefadr
Copy link
Contributor Author

@NielsRogge Hi, should I open a new PR for this, or is there a way to re-open this one? It seems to be requested by a lot of users.

@ArthurZucker ArthurZucker reopened this Jan 3, 2024
@huggingface huggingface deleted a comment from github-actions bot Jan 30, 2024
@huggingface huggingface deleted a comment from github-actions bot Feb 27, 2024
@ArthurZucker ArthurZucker added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Feb 27, 2024
@ajpuyle
Copy link

ajpuyle commented Mar 29, 2024

Is this still being worked on? I am in need of this tool.

@ArthurZucker
Copy link
Collaborator

it's stale, feel free to take it over

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add zero-shot classification task for BLIP-2
6 participants