-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT]: EETQ quantizer support #30262
Conversation
@younesbelkada |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome PR @dtlzhuangz ! EETQ library is written in such way that the integration is very smooth. We can quantize on the fly, serialize the quantized model and even reload it with minimal changes in transformers 🔥 I left a few minor comments. Make sure to fix the style with make style
.
docs/source/en/quantization.md
Outdated
Make sure you have eetq installed via the source code https://github.com/NetEase-FuXi/EETQ | ||
``` | ||
git clone https://github.com/NetEase-FuXi/EETQ.git | ||
cd EETQ/ | ||
git submodule update --init --recursive | ||
pip install . | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan to release EETQ on pypi ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this great work ! In addition to @SunMarc 's comments I have tiny additional comments
1- Can you add pip install git+https://github.com/NetEase-FuXi/EETQ.git
inside the quantization docker file here: https://github.com/huggingface/transformers/blob/main/docker/transformers-quantization-latest-gpu/Dockerfile
2- Can you elaborate on the hardware restrictions in the documentation section? (i.e. if it works only from cuda compute capability 8.0 and above, or also 7.0 etc)
3- Yes let's use camel case for the newly introduced files (EETQ
--> Eetq
)
4- Can you make sure the styling checks pass make fixup
to make the CI happy?
Thanks again and looking forward to merging this !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dtlzhuangz, thanks for the fast response. I've answered your questions. After fixing the issues pointed by @younesbelkada, we will ask a core maintainer for a final review.
Sorry, could you help me fix the error of 'Import block is un-sorted or un-formatted'? I'm not quite familiar with the CI. |
Yes, I took care of that. You just needed to do |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thank you so much for your guidance and effort! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very smooth integration ! Thanks for delivering this to the community ! LGTM with only two nits
r""" | ||
Safety checker that arguments are correct | ||
""" | ||
accepted_weights = ["int8"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: is there any plans to support 4-bit group-wise quantization as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, no at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok no worries!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to build the dockerfile and the tests are passing 🔥 Thanks again @dtlzhuangz for the clean PR. Do you any plan to release the package on pypi ? Installing from source is not ideal since it takes quite a lot of time to build the wheels + users are subject to breaking changes since there is no release yet.
Sorry for replying to the question late. My colleague and I are setting out to do it but the built files depend on the version of torch. Error occurs if the version mismatches. If there is no solution, we have to install a specific version of torch when installing EETQ |
Hi @SunMarc @younesbelkada @amyeroberts, we have released the .whl in the release page and updated the document. Please make a review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work thanks! I will let @amyeroberts make a final review and merge it if all is good ! Thanks again for all your great work ! @dtlzhuangz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding!
Just a few small comments to address
modules_to_not_convert = ["lm_head"] if modules_to_not_convert is None else modules_to_not_convert | ||
|
||
if quantization_config.modules_to_not_convert is not None: | ||
modules_to_not_convert.extend(quantization_config.modules_to_not_convert) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to use sets here - otherwise we can end up with duplicate modules added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added modules_to_not_convert = list(set(modules_to_not_convert))
if current_key_name is None: | ||
current_key_name = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go outside of the for-loop, we only need to check it for noneness once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. It has been done.
def test_raise_if_non_quantized(self): | ||
model_id = "facebook/opt-125m" | ||
quantization_config = EetqConfig() | ||
_ = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", quantization_config=quantization_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't test any error is raised here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed it.
if torch_dtype is None: | ||
torch_dtype = torch.float16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 logger.info message should be added here
Hi @amyeroberts . I have fixed all the comments. I think ci errors should not be because of me, the errors occured after I modified the quantization.md. Please make a check. |
@dtlzhuangz Regarding the failing tests - a fix has been merged into main. Could you rebase? |
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Re-ran the testing suite and tests seem to pass now ! 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this and iterating!
Thank you all for your help! @SunMarc @amyeroberts @younesbelkada |
Great work thanks everyone involved in this ! |
What does this PR do?
EETQ supports int8 per-channel weight-only quantization for NVIDIA GPUS. The high-performance GEMM and GEMV kernels are from FasterTransformer and TensorRT-LLM. It requires no calibration dataset and does not need to pre-quantize your model. Moreover, the accuracy degradation is negligible owing to the per-channel quantization.
NetEase-FuXi/EETQ#13
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Fixes NetEase-FuXi/EETQ#13