Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SiqlipVisionModel does not support "device map= auto": no split modules`attribute #31400

Closed
lucasjinreal opened this issue Jun 13, 2024 · 8 comments · Fixed by #31566
Closed
Labels
Feature request Request for a new feature

Comments

@lucasjinreal
Copy link

Feature request

How to make SiglipVisionModel can support Auto map to map to multiple GPUs.

Motivation

Currently, using MLLM with siglip, the whole model might need automap, since the vision encoder part is SiglipVisionModel, if it doesn;t support , would be a big problme

Also, would consider add flashattention support for Siglip?

Your contribution

None for now

@lucasjinreal lucasjinreal added the Feature request Request for a new feature label Jun 13, 2024
@amyeroberts
Copy link
Collaborator

Hi @lucasjinreal, thanks for opening a feature request!

Could you share a code snippet of how the model is being created with auto_map and the running environment (run transformers-cli env in the terminal and copy-paste the output)? SigLip should support device_map="auto"

@zucchini-nlp
Copy link
Member

I checked with the code in model-doc for SigLip and also got an error. Supporting "device_map" in VLMs is indeed needed important. I believe _no_split_modules should be same as in ClipModel

For flashattention afaik current VLMs in transformers use optimized attn implementations only for LLM backbone (e.g. LLaVa supports Flash-Attn and SDPA even though CLIP doesn't). There's an issue for adding SDPA attn (#30565) to all VLMs, I can open another tracker-issue for Flash-Attn but will not able to work on it right now. Open for community contributions

@lucasjinreal
Copy link
Author

lucasjinreal commented Jun 13, 2024

I have surpassed this error, by simply add a _no_split_modules = [] to the attribute.

But it could be better add inside transoformers, it's just a single line. I could submit a PR for this.

As for flashattn, it's a really needs, it can boost vlms training more faster.

@zucchini-nlp
Copy link
Member

@lucasjinreal cool, PR would be nice but you need to test in multi-gpu setting that everything is being split correctly. I don't think that an empty "split_modules" will work as the most similar CLIP doesn't split at some modules. If you don't have multiple gpus, I can run some tests after the PR is open :)

Flash-Attn noted, thanks, will add to my todo list!

@lucasjinreal
Copy link
Author

I have been using empty list and make it can be deivicemap auto on multiple GPUs, currently inference is normal. I still didn't know why CLIPVIsionModel should make CLIPENcoderLayer didn't automap though.

@NielsRogge NielsRogge changed the title :SiqlipVisionModel does not supportdevice map= au no split modules`attribute SiqlipVisionModel does not support "device map= auto": no split modules`attribute Jun 13, 2024
@zucchini-nlp
Copy link
Member

zucchini-nlp commented Jun 17, 2024

@lucasjinreal i just noticed that SigLip already has _no_split_modules in TextModel and in VisionModel, yet not in the SigLipModel. If I do _no_split_modules=[] as you tried, device mismatch error is raised so we have to add text and vision models' _no_split_modules to enable it

LMK if you're up to opening a PR :)

@lucasjinreal
Copy link
Author

Hi, In my cased I just using SIglipVisionModel as a parent class and used a SiglipVisionModelSplit(SiglipVisionModel) in my MLLM.

So I think it not appliable to inside of transformers. Let me think a better way to do this

@zucchini-nlp
Copy link
Member

I believe the best solution is to copy 'no-split-modules' that are already indicated in text-vision components, and add them in SiglipModel's 'no-split-modules'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants