Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change bnb tests #34713

Merged
merged 16 commits into from
Dec 18, 2024
Merged

change bnb tests #34713

merged 16 commits into from
Dec 18, 2024

Conversation

jiqing-feng
Copy link
Contributor

@jiqing-feng jiqing-feng commented Nov 13, 2024

  1. BNB for CPU and XPU path do not support autocast lora finetune for now.
  2. XPU do not support gpt2 for now.
  3. Add llama tests

@jiqing-feng jiqing-feng marked this pull request as draft November 13, 2024 06:32
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng jiqing-feng marked this pull request as ready for review November 19, 2024 05:22
@jiqing-feng jiqing-feng changed the title fix training tests change bnb tests Nov 19, 2024
@Titus-von-Koeller Titus-von-Koeller self-assigned this Nov 19, 2024
@Titus-von-Koeller
Copy link
Contributor

Thanks for this great work in conjunction with BNB PR #1418, @jiqing-feng 🔥🤗

I'll do my best to provide feedback on this ASAP so we can iterate. That said, I need to balance it with other high-impact topics like quantization improvements, the custom_ops registration refactor (which underpins merging all this into main on BNB) and general maintenance (e.g., resolving the currently broken CI integration tests). Still, this remains one of our top priorities before the end of the year, and we’re aiming to make maximum progress on this topic and bring these new functionalities live ASAP.

Thanks so much to you and the Intel team ❤️ for your continued valuable work and support on this! It’s highly appreciated, and I’m looking forward to a final sprint to materialize the fruits of our collaboration.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@Titus-von-Koeller
Copy link
Contributor

Will review this today and tmr.

@matthewdouglas matthewdouglas self-requested a review December 9, 2024 21:17
@matthewdouglas
Copy link
Member

I'm going to take over for reviewing this. Will work on getting access to hw to run this on.
cc: @SunMarc

Copy link
Member

@matthewdouglas matthewdouglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to try this on an Intel machine with GPU Max 1100 where everything looks good with this PR applied.

Before this PR (on transformers latest release), we see several failures and a crash:

tests/quantization/bnb/test_4bit.py::Pipeline4BitTest::test_pipeline Fatal Python error: Aborted

Thread 0x00007f538b656640 (most recent call first):
  File "/opt/conda/lib/python3.11/threading.py", line 331 in wait
  File "/opt/conda/lib/python3.11/threading.py", line 629 in wait
  File "/opt/conda/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
  File "/opt/conda/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
  File "/opt/conda/lib/python3.11/threading.py", line 1002 in _bootstrap

Current thread 0x00007f5843c5d740 (most recent call first):
  File "/opt/conda/lib/python3.11/site-packages/transformers/models/bloom/modeling_bloom.py", line 68 in build_alibi_tensor
  File "/opt/conda/lib/python3.11/site-packages/transformers/models/bloom/modeling_bloom.py", line 577 in build_alibi_tensor
  File "/opt/conda/lib/python3.11/site-packages/transformers/models/bloom/modeling_bloom.py", line 671 in forward
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747 in _call_impl
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736 in _wrapped_call_impl
  File "/opt/conda/lib/python3.11/site-packages/transformers/models/bloom/modeling_bloom.py", line 973 in forward
  File "/opt/conda/lib/python3.11/site-packages/accelerate/hooks.py", line 170 in new_forward
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747 in _call_impl
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736 in _wrapped_call_impl
  File "/opt/conda/lib/python3.11/site-packages/transformers/generation/utils.py", line 3222 in _sample
  File "/opt/conda/lib/python3.11/site-packages/transformers/generation/utils.py", line 2231 in generate
  File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116 in decorate_context
  File "/opt/conda/lib/python3.11/site-packages/transformers/pipelines/text_generation.py", line 370 in _forward
  File "/opt/conda/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1208 in forward
  File "/opt/conda/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1308 in run_single
  File "/opt/conda/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1301 in __call__
  File "/opt/conda/lib/python3.11/site-packages/transformers/pipelines/text_generation.py", line 272 in __call__
  File "/usr/src_host/transformers/tests/quantization/bnb/test_4bit.py", line 513 in test_pipeline``` 

@jiqing-feng
Copy link
Contributor Author

jiqing-feng commented Dec 18, 2024

Hi @matthewdouglas . Thanks for your testing.
When you said before this PR, does it means you observed the failed issue without this PR and the failed issue disappear with this PR?
Do I need any changes before merging?

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR!

@matthewdouglas
Copy link
Member

@jiqing-feng That's correct: I observed that with this PR applied the tests run correctly.. No changes needed. Thanks!

@matthewdouglas matthewdouglas merged commit 69e31eb into huggingface:main Dec 18, 2024
8 checks passed
@jiqing-feng jiqing-feng deleted the bnb_cpu branch December 19, 2024 02:02
@matthewdouglas matthewdouglas mentioned this pull request Dec 19, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants