Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AudioClassificationPipelineTests::test_small_model_pt_fp16 fails for CUDA/XPU but passes for CPU #36340

Closed
dvrogozh opened this issue Feb 21, 2025 · 3 comments · Fixed by #36359

Comments

@dvrogozh
Copy link
Contributor

With:

On:

  • Nvidia A10
  • Intel Data Center GPU Max (PVC)

The commit f19135a introduced AudioClassificationPipelineTests::test_small_model_pt_fp16 test which passes on CPU, but fails if running on CUDA or XPU. PR was:

Logs (for cuda):

python3 -m pytest tests/pipelines/test_pipelines_audio_classification.py::AudioClassificationPipelineTests::test_small_model_pt_fp16
===================================== test session starts ======================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0
rootdir: /home/dvrogozh/git/huggingface/transformers
configfile: pyproject.toml
plugins: rich-0.2.0, xdist-3.6.1, asyncio-0.23.8, timeout-2.3.1
asyncio: mode=strict
collected 1 item

tests/pipelines/test_pipelines_audio_classification.py::AudioClassificationPipelineTests::test_small_model_pt_fp16 FAILED [100%]

=========================================== FAILURES ===========================================
__________________ AudioClassificationPipelineTests.test_small_model_pt_fp16 ___________________

self = <tests.pipelines.test_pipelines_audio_classification.AudioClassificationPipelineTests testMethod=test_small_model_pt_fp16>

    @require_torch
    def test_small_model_pt_fp16(self):
        model = "anton-l/wav2vec2-random-tiny-classifier"

        audio_classifier = pipeline("audio-classification", model=model, torch_dtype=torch.float16)

        audio = np.ones((8000,))
        output = audio_classifier(audio, top_k=4)

        EXPECTED_OUTPUT = [
            {"score": 0.0839, "label": "no"},
            {"score": 0.0837, "label": "go"},
            {"score": 0.0836, "label": "yes"},
            {"score": 0.0835, "label": "right"},
        ]
        EXPECTED_OUTPUT_PT_2 = [
            {"score": 0.0845, "label": "stop"},
            {"score": 0.0844, "label": "on"},
            {"score": 0.0841, "label": "right"},
            {"score": 0.0834, "label": "left"},
        ]
>       self.assertIn(nested_simplify(output, decimals=4), [EXPECTED_OUTPUT, EXPECTED_OUTPUT_PT_2])
E       AssertionError: [{'score': 0.0833, 'label': 'go'}, {'score': 0.0833, 'label': 'off'}, {'score': 0.0833, 'label': 'stop'}, {'score': 0.0833, 'label': 'on'}] not found in [[{'score': 0.0839, 'label': 'no'}, {'score': 0.0837, 'label': 'go'}, {'score': 0.0836, 'label': 'yes'}, {'score': 0.0835, 'label': 'right'}], [{'score': 0.0845, 'label': 'stop'}, {'score': 0.0844, 'label': 'on'}, {'score': 0.0841, 'label': 'right'}, {'score': 0.0834, 'label': 'left'}]]

tests/pipelines/test_pipelines_audio_classification.py:159: AssertionError
------------------------------------- Captured stderr call -------------------------------------
Device set to use cuda:0
======================================= warnings summary =======================================
tests/pipelines/test_pipelines_audio_classification.py::AudioClassificationPipelineTests::test_small_model_pt_fp16
  /home/dvrogozh/git/huggingface/transformers/src/transformers/configuration_utils.py:315: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================== short test summary info ====================================
FAILED tests/pipelines/test_pipelines_audio_classification.py::AudioClassificationPipelineTests::test_small_model_pt_fp16 - AssertionError: [{'score': 0.0833, 'label': 'go'}, {'score': 0.0833, 'label': 'off'}, {'sco...
================================= 1 failed, 1 warning in 1.99s =================================

Note: for XPU (upstream pytorch XPU, not IPEX) log is actually the same including same scores for labels

Does CUDA/XPU work correctly in this test? (I am confused seeing same scores 0.0833 for all 4 labels).

Overall, expectation is that test passes for CUDA/XPU or excluded for pytorch device backends if it's CPU specific.

CC: @jiqing-feng @ydshieh

@Ajinkya-25
Copy link

Ajinkya-25 commented Feb 22, 2025

Hello,
I think this issue occurs because CUDA/XPU computation may lead to numerical instability when using torch.float16 due to lower precision compared to float32 or CPU computations as scores for every label is same i.e 0.833 try using mixed precision instead of torch.float16 or torch.float32.
you can also try using another model for prediction as "anton-l/wav2vec2-random-tiny-classifier" might not be optimized for torch.float16

@jiqing-feng
Copy link
Contributor

jiqing-feng commented Feb 24, 2025

Hi @dvrogozh . Thanks for your issue, could you please try this PR: 36359 to check if it fix your issue?

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 24, 2025

It's indeed failing on our T4 runners. Thank you for pointing this out 👍

dvrogozh added a commit to dvrogozh/torch-xpu-ops that referenced this issue Feb 25, 2025
Changes:
* Benchmarking scripts are pruned from Transformers by v4.49.0 due to
  deprecation. So we don't need to test them anymore.
* Some cuda specific tests were generalized to cover non-cuda devices
  which uncovered some issues.
* Some new tests were added which fail for both cuda and xpu.
* Few regressions due to changes on Transformers side

Fixed tests:
* huggingface/transformers@b912f5e
  * `tests/models/git/test_modeling_git.py::GitModelTest::test_inputs_embeds_matches_input_ids`
* huggingface/transformers@b5aaf87
  * `tests/pipelines/test_pipelines_video_classification.py::VideoClassificationPipelineTests::test_small_model_pt`
  * `tests/test_pipeline_mixin.py::VideoClassificationPipelineTests::test_small_model_pt`
* huggingface/transformers@42c8ccf
  * `tests/generation/test_utils.py::GenerationIntegrationTests::test_generated_length_assisted_generation`
* huggingface/transformers@9fd123a
  * `test_model_parallelization`
  * `test_model_parallel_equal_results`

Commits which added new tests (or enabled previously skipped tests) which fail:
* huggingface/transformers@23d782e
  * `tests/pipelines/test_pipelines_text_generation.py::TextGenerationPipelineTests::test_return_dict_in_generate`
  * `tests/test_pipeline_mixin.py::TextGenerationPipelineTests::test_return_dict_in_generate`
* huggingface/transformers@2fa876d
  * `test_cpu_offload` (some of)
  * `test_disk_offload_bin` (some of)
  * `test_disk_offload_safetensors` (some of)
  * `tests/pipelines/test_pipelines_text_generation.py::TextGenerationPipelineTests::test_small_model_pt_bloom_accelerate`
* huggingface/transformers@be2ac09
  * `tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationModelTest::test_generate_compilation_all_outputs`
  * `tests/models/paligemma2/test_modeling_paligemma2.py::PaliGemma2ForConditionalGenerationModelTest::test_generate_compilation_all_outputs`
* huggingface/transformers#36340
  * `tests/pipelines/test_pipelines_audio_classification.py::AudioClassificationPipelineTests::test_small_model_pt_fp16`
* huggingface/transformers@1fae54c
  * `tests/trainer/test_trainer.py::TrainerIntegrationPrerunTest::test_gradient_accumulation_loss_alignment_with_model_loss`
* huggingface/transformers@15ec971
  * `tests/models/qwen2_5_vl/test_processor_qwen2_5_vl.py::Qwen2_5_VLProcessorTest::test_chat_template_video_custom_sampling`
  * `tests/models/qwen2_5_vl/test_processor_qwen2_5_vl.py::Qwen2_5_VLProcessorTest::test_chat_template_video_special_processing`

Regressions:
* huggingface/transformers@365fecb
  * `tests/generation/test_utils.py::GenerationIntegrationTests::test_encoder_decoder_generate_attention_mask`
* huggingface/transformers@da334bc
  * `tests/generation/test_utils.py::GenerationIntegrationTests::test_generate_input_features_as_encoder_kwarg`
* huggingface/transformers@bcfc9d7
  * `tests/models/llava/test_modeling_llava.py::LlavaForConditionalGenerationModelTest::test_config`
* huggingface/transformers#36267
  * `tests/utils/test_import_utils.py`
* huggingface/transformers#36267
  * `tests/models/marian/test_modeling_marian.py`

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants