Add Ascend NPU support for nf4 quant #1422

ji-huazhong · 2024-11-21T10:49:59Z

What does this PR do?

This PR adds Ascend NPU support for nf4 quant/dequant and allows QLoRA fine-tuning for LLMs using transformers, peft, and trl.

You may notice that the nf4 quantization method is currently implemented in PyTorch. This interim measure is due to the fact that the high-performance version implemented with AscendC is still in progress 😞 . Meanwhile, we've received feedback from many in the Ascend NPU community expressing their keen interest in using QLoRA to fine-tune LLMs as soon as possible, so there is this PR.

Related PR: huggingface/transformers#31512

Collaborators

@SlightwindSec @Ginray @MatrixPlayer

cc @Titus-von-Koeller @matthewdouglas

ji-huazhong · 2024-11-21T11:35:18Z

Refer to this blog, I did a E2E test on the llama2-7b-hf with QLoRA fine-tuning in my env with NPU device, it works 🤗.

Here is the script I used.

baymax591 · 2024-11-22T07:57:53Z

Thanks a lot for sharing this PR and the video demo! Thanks to the demo, I was able to successfully run NF4 quant/dequant on the NPU with ease. The detailed explanation in the video really helped me understand the process and key steps. Looking forward to more updates in the future—great work!

baymax591 · 2024-11-22T08:02:46Z

I hope this PR can be merged soon, as it provides valuable improvements. Looking forward to seeing it merged!
cc @Titus-von-Koeller

SunMarc · 2024-11-27T12:56:54Z

Nice work and thanks for the demo ! Can you have a look @matthewdouglas ?

github-actions · 2024-11-27T19:02:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

matthewdouglas · 2024-11-27T19:10:03Z

I will be able to look in more detail next week, but at first glance it looks nice. Thanks @statelesshz !

Co-authored-by: Slightwind <slightwindsec@gmail.com> Co-authored-by: Ginray <ginray0215@gmail.com>

matthewdouglas · 2024-12-05T19:33:30Z

@statelesshz We really appreciate the contribution! Apart from a lint check, I think we can go ahead and merge this.

For awareness, we are currently planning to adopt usage of torch.library to register custom ops for multi-backend. When we've progressed further on that we will want to come back and migrate to an updated interface.

ji-huazhong · 2024-12-06T07:32:59Z

@statelesshz We really appreciate the contribution! Apart from a lint check, I think we can go ahead and merge this.

For awareness, we are currently planning to adopt usage of torch.library to register custom ops for multi-backend. When we've progressed further on that we will want to come back and migrate to an updated interface.

@matthewdouglas Thank you for the feedback. I have addressed the lint check warnings, and I think the PR is now ready for merging. 🤗
Could you please re-trigger the CI to ensure everything is in order?

MengqingCao · 2024-12-28T09:13:10Z

bitsandbytes/autograd/_functions.py

@@ -519,7 +519,12 @@ def forward(ctx, A, B, out=None, bias=None, quant_state: Optional[F.QuantState]

        # 1. Dequantize
        # 2. MatmulnN
-        output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)
+        if A.device.type == "npu":


A quick question: why not we use torch.nn.functional.linear directly? Thanks in advance for your answer

* Add npu support for nf4 quant Co-authored-by: Slightwind <slightwindsec@gmail.com> Co-authored-by: Ginray <ginray0215@gmail.com> * code format * update * pass lint check and fix typos * add npu to supported devices --------- Co-authored-by: Slightwind <slightwindsec@gmail.com> Co-authored-by: Ginray <ginray0215@gmail.com>

ji-huazhong mentioned this pull request Nov 21, 2024

add bnb support for Ascend NPU huggingface/transformers#31512

Merged

5 tasks

ji-huazhong and others added 3 commits December 5, 2024 11:25

Add npu support for nf4 quant

82d8926

Co-authored-by: Slightwind <slightwindsec@gmail.com> Co-authored-by: Ginray <ginray0215@gmail.com>

code format

632262f

update

7ba303f

ji-huazhong force-pushed the npu-backend branch from f5a4c57 to 632262f Compare December 5, 2024 07:53

matthewdouglas approved these changes Dec 5, 2024

View reviewed changes

ji-huazhong added 2 commits December 6, 2024 08:35

pass lint check and fix typos

fe9d837

add npu to supported devices

51304ff

matthewdouglas merged commit 9948333 into bitsandbytes-foundation:multi-backend-refactor Dec 6, 2024
2 checks passed

ji-huazhong deleted the npu-backend branch December 9, 2024 11:51

ji-huazhong mentioned this pull request Dec 10, 2024

Add installation doc for bnb on Ascend NPU #1442

Merged

MengqingCao reviewed Dec 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ascend NPU support for nf4 quant #1422

Add Ascend NPU support for nf4 quant #1422

ji-huazhong commented Nov 21, 2024 •

edited

Loading

ji-huazhong commented Nov 21, 2024 •

edited

Loading

baymax591 commented Nov 22, 2024

baymax591 commented Nov 22, 2024

SunMarc commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

matthewdouglas commented Nov 27, 2024

matthewdouglas commented Dec 5, 2024

ji-huazhong commented Dec 6, 2024

MengqingCao Dec 28, 2024

Add Ascend NPU support for nf4 quant #1422

Add Ascend NPU support for nf4 quant #1422

Conversation

ji-huazhong commented Nov 21, 2024 • edited Loading

What does this PR do?

Collaborators

ji-huazhong commented Nov 21, 2024 • edited Loading

baymax591 commented Nov 22, 2024

baymax591 commented Nov 22, 2024

SunMarc commented Nov 27, 2024

github-actions bot commented Nov 27, 2024

matthewdouglas commented Nov 27, 2024

matthewdouglas commented Dec 5, 2024

ji-huazhong commented Dec 6, 2024

MengqingCao Dec 28, 2024

Choose a reason for hiding this comment

ji-huazhong commented Nov 21, 2024 •

edited

Loading

ji-huazhong commented Nov 21, 2024 •

edited

Loading