-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segformer Support #382
Comments
Hi @HugeBob, |
Hi @michaelbenayoun, I had the same thing in head than Bob. For example, nvidias Segformers (https://huggingface.co/nvidia/mit-b0), are apparently based on "a hierarchical Transformer encoder and a lightweight all-MLP decode head". Those transformer based models, even though they are implemented in transformers cannot be optimized with optimum ? Thank you, Theo |
Hi @TheoMrc For pipelines, it might not be usable because it was not available in transformers last time I checked. What we can do on our end is to add support for an |
Hi again, Thanks for your answer, After some very interesting reading time in various documentations, I'm guessing from your answer that:
BetterTransformer example from Hugging Face
Anyway thanks a lot for your time, See you around, Theo |
To answer to each of your points:
Maybe! |
Thanks once again for your answer. Just a quick follow-up bellow
Although I don't mind since I have no idea what it does 😎. Sounded nice though, since it does not impact model outputs but appears to halve latency in some cases Optimum tutorial.
(This, I read a bit about the theory)
From this, I built my own custom Pipeline class with strategic outputs based on my own application (I basically want to output the segmentation map i.e. the argmax of all logits).
Everything worked perfectly for torch models (tested only on CPU).
Although, I could not manage to implement a custom transformers.Pipeline because several class attributes are not implemented in ORTModel (ORTModel.config for example) which are necessary upon For now, I just built custom "pipeline functions" that work on cpu only, but avoid unecessary tasks performed in implemented Pipeline classes, doing only what is necessary for my goal
Next step for me is to enable GPU coverage, which I am sure I will find how-to in optimum.ORTModel source code, for example in the ORTModelForImageClassification source code. I'd love to try to actually implement it and do a PR for ORTModelForSemanticSegmentation, which would be supported in pipelines. Appart from this, I think everything will actually be the exact same as the ORTModelForImageClassification. Have you already started writing this class ? Thanks for your time, Theo |
Hi @TheoMrc, First, thank you for your feedbacks, they are very valuable! About you questions:
python -m optimum.exporters.onnx --model model_name --task semantic-segmentation segformer_onnx
You can open a PR and I can help you there, what do you think? |
Hi @TheoMrc, Just to expand on the second point of @michaelbenayoun, as Segformer is based on transformer encoder architecture, we can apply a bert-like optimization, by registering Segformer in (But there is a caveat due to the fact that Segformer's encoder blocks have different And if you are interested in contributing the |
With a quick test, the automatic detection of I am thinking of letting BERT-like model infer |
Hi @michaelbenayoun and @JingyaHuang, Thanks to you both for your answers, as a "novice" in the field, I find it personnally extremely useful to speak with you. I started ML in python through a from "scratch" manner in tensorflow, and then torch, for which I had to grasp more of the theory, create new loss functions, ... Hugging Face is very nice but hides most the complicated stuff, which is very handy to get working prototypes but surely makes it easy to ignore the way this work. I surely plan to learn and understand everything :) Michael:
I will clone the optimum repo and open a PR once I have a first (hopefully working) version of the ORTModelForImageSegmentation and tag you both for review ! Before my previous answer, following your advice, I first had tried this from commandline with optimum :
But initially failed because I tried to input the path to pytorch_model.bin as model_name instead of parent directory (actually, might also be because I did not input a task). Michael:
Jingya:
Being a computer vision guy (and a biologist), I only use segformers from HuggingFace :
I'd obviously enjoy any performance gain from segformer optimization support ! Michael:
To be noted, my quantized_model.onnx file (117 Mo) is half the size when compared to the original model.onnx (246 Mo). Not sure how relevant this is. I don't know how to check what was quantized, maybe you could redirect me to some documentation ? As well, I tested inference on my old-ish laptop CPU which tends to overheat, and latency is quite variable. I'll test inference on my main machine and comeback with more reliable latency data. Thanks again for your time, see you soon after my PR |
Fixed in #539 |
Feature request
Would love for Optimum to add support for transformers.SegformerForSemanticSegmentation
https://huggingface.co/docs/transformers/model_doc/segformer#transformers.SegformerForSemanticSegmentation
As best I could tell, semantic segmentation is not something that Optimum currently supports for any models (https://huggingface.co/docs/optimum/main/en/pipelines) would love for this to be improved!
Motivation
I use HuggingFace's Segformer for an image segmentation model I have and would love to improve my inference speeds.
Your contribution
I don't know what a PR is so I kind of doubt it.
The text was updated successfully, but these errors were encountered: