AudioClassificationPipelineTests::test_small_model_pt_fp16 fails for CUDA/XPU but passes for CPU #36340

dvrogozh · 2025-02-21T23:42:04Z

With:

92c5ca9

On:

Nvidia A10
Intel Data Center GPU Max (PVC)

The commit f19135a introduced AudioClassificationPipelineTests::test_small_model_pt_fp16 test which passes on CPU, but fails if running on CUDA or XPU. PR was:

fix low-precision audio classification pipeline #35435

Logs (for cuda):

python3 -m pytest tests/pipelines/test_pipelines_audio_classification.py::AudioClassificationPipelineTests::test_small_model_pt_fp16
===================================== test session starts ======================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0
rootdir: /home/dvrogozh/git/huggingface/transformers
configfile: pyproject.toml
plugins: rich-0.2.0, xdist-3.6.1, asyncio-0.23.8, timeout-2.3.1
asyncio: mode=strict
collected 1 item

tests/pipelines/test_pipelines_audio_classification.py::AudioClassificationPipelineTests::test_small_model_pt_fp16 FAILED [100%]

=========================================== FAILURES ===========================================
__________________ AudioClassificationPipelineTests.test_small_model_pt_fp16 ___________________

self = <tests.pipelines.test_pipelines_audio_classification.AudioClassificationPipelineTests testMethod=test_small_model_pt_fp16>

    @require_torch
    def test_small_model_pt_fp16(self):
        model = "anton-l/wav2vec2-random-tiny-classifier"

        audio_classifier = pipeline("audio-classification", model=model, torch_dtype=torch.float16)

        audio = np.ones((8000,))
        output = audio_classifier(audio, top_k=4)

        EXPECTED_OUTPUT = [
            {"score": 0.0839, "label": "no"},
            {"score": 0.0837, "label": "go"},
            {"score": 0.0836, "label": "yes"},
            {"score": 0.0835, "label": "right"},
        ]
        EXPECTED_OUTPUT_PT_2 = [
            {"score": 0.0845, "label": "stop"},
            {"score": 0.0844, "label": "on"},
            {"score": 0.0841, "label": "right"},
            {"score": 0.0834, "label": "left"},
        ]
>       self.assertIn(nested_simplify(output, decimals=4), [EXPECTED_OUTPUT, EXPECTED_OUTPUT_PT_2])
E       AssertionError: [{'score': 0.0833, 'label': 'go'}, {'score': 0.0833, 'label': 'off'}, {'score': 0.0833, 'label': 'stop'}, {'score': 0.0833, 'label': 'on'}] not found in [[{'score': 0.0839, 'label': 'no'}, {'score': 0.0837, 'label': 'go'}, {'score': 0.0836, 'label': 'yes'}, {'score': 0.0835, 'label': 'right'}], [{'score': 0.0845, 'label': 'stop'}, {'score': 0.0844, 'label': 'on'}, {'score': 0.0841, 'label': 'right'}, {'score': 0.0834, 'label': 'left'}]]

tests/pipelines/test_pipelines_audio_classification.py:159: AssertionError
------------------------------------- Captured stderr call -------------------------------------
Device set to use cuda:0
======================================= warnings summary =======================================
tests/pipelines/test_pipelines_audio_classification.py::AudioClassificationPipelineTests::test_small_model_pt_fp16
  /home/dvrogozh/git/huggingface/transformers/src/transformers/configuration_utils.py:315: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================== short test summary info ====================================
FAILED tests/pipelines/test_pipelines_audio_classification.py::AudioClassificationPipelineTests::test_small_model_pt_fp16 - AssertionError: [{'score': 0.0833, 'label': 'go'}, {'score': 0.0833, 'label': 'off'}, {'sco...
================================= 1 failed, 1 warning in 1.99s =================================

Note: for XPU (upstream pytorch XPU, not IPEX) log is actually the same including same scores for labels

Does CUDA/XPU work correctly in this test? (I am confused seeing same scores 0.0833 for all 4 labels).

Overall, expectation is that test passes for CUDA/XPU or excluded for pytorch device backends if it's CPU specific.

CC: @jiqing-feng @ydshieh

The text was updated successfully, but these errors were encountered:

Ajinkya-25 · 2025-02-22T13:52:26Z

Hello,
I think this issue occurs because CUDA/XPU computation may lead to numerical instability when using torch.float16 due to lower precision compared to float32 or CPU computations as scores for every label is same i.e 0.833 try using mixed precision instead of torch.float16 or torch.float32.
you can also try using another model for prediction as "anton-l/wav2vec2-random-tiny-classifier" might not be optimized for torch.float16

jiqing-feng · 2025-02-24T01:14:00Z

Hi @dvrogozh . Thanks for your issue, could you please try this PR: 36359 to check if it fix your issue?

ydshieh · 2025-02-24T10:33:54Z

It's indeed failing on our T4 runners. Thank you for pointing this out 👍

Changes: * Benchmarking scripts are pruned from Transformers by v4.49.0 due to deprecation. So we don't need to test them anymore. * Some cuda specific tests were generalized to cover non-cuda devices which uncovered some issues. * Some new tests were added which fail for both cuda and xpu. * Few regressions due to changes on Transformers side Fixed tests: * huggingface/transformers@b912f5e * `tests/models/git/test_modeling_git.py::GitModelTest::test_inputs_embeds_matches_input_ids` * huggingface/transformers@b5aaf87 * `tests/pipelines/test_pipelines_video_classification.py::VideoClassificationPipelineTests::test_small_model_pt` * `tests/test_pipeline_mixin.py::VideoClassificationPipelineTests::test_small_model_pt` * huggingface/transformers@42c8ccf * `tests/generation/test_utils.py::GenerationIntegrationTests::test_generated_length_assisted_generation` * huggingface/transformers@9fd123a * `test_model_parallelization` * `test_model_parallel_equal_results` Commits which added new tests (or enabled previously skipped tests) which fail: * huggingface/transformers@23d782e * `tests/pipelines/test_pipelines_text_generation.py::TextGenerationPipelineTests::test_return_dict_in_generate` * `tests/test_pipeline_mixin.py::TextGenerationPipelineTests::test_return_dict_in_generate` * huggingface/transformers@2fa876d * `test_cpu_offload` (some of) * `test_disk_offload_bin` (some of) * `test_disk_offload_safetensors` (some of) * `tests/pipelines/test_pipelines_text_generation.py::TextGenerationPipelineTests::test_small_model_pt_bloom_accelerate` * huggingface/transformers@be2ac09 * `tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationModelTest::test_generate_compilation_all_outputs` * `tests/models/paligemma2/test_modeling_paligemma2.py::PaliGemma2ForConditionalGenerationModelTest::test_generate_compilation_all_outputs` * huggingface/transformers#36340 * `tests/pipelines/test_pipelines_audio_classification.py::AudioClassificationPipelineTests::test_small_model_pt_fp16` * huggingface/transformers@1fae54c * `tests/trainer/test_trainer.py::TrainerIntegrationPrerunTest::test_gradient_accumulation_loss_alignment_with_model_loss` * huggingface/transformers@15ec971 * `tests/models/qwen2_5_vl/test_processor_qwen2_5_vl.py::Qwen2_5_VLProcessorTest::test_chat_template_video_custom_sampling` * `tests/models/qwen2_5_vl/test_processor_qwen2_5_vl.py::Qwen2_5_VLProcessorTest::test_chat_template_video_special_processing` Regressions: * huggingface/transformers@365fecb * `tests/generation/test_utils.py::GenerationIntegrationTests::test_encoder_decoder_generate_attention_mask` * huggingface/transformers@da334bc * `tests/generation/test_utils.py::GenerationIntegrationTests::test_generate_input_features_as_encoder_kwarg` * huggingface/transformers@bcfc9d7 * `tests/models/llava/test_modeling_llava.py::LlavaForConditionalGenerationModelTest::test_config` * huggingface/transformers#36267 * `tests/utils/test_import_utils.py` * huggingface/transformers#36267 * `tests/models/marian/test_modeling_marian.py` Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

jiqing-feng mentioned this issue Feb 24, 2025

fix audio classification pipeline fp16 test on cuda #36359

Merged

ydshieh closed this as completed in #36359 Feb 25, 2025

dvrogozh mentioned this issue Feb 26, 2025

ci: update Transformers to v4.49.0 intel/torch-xpu-ops#1282

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AudioClassificationPipelineTests::test_small_model_pt_fp16 fails for CUDA/XPU but passes for CPU #36340

AudioClassificationPipelineTests::test_small_model_pt_fp16 fails for CUDA/XPU but passes for CPU #36340

dvrogozh commented Feb 21, 2025

Ajinkya-25 commented Feb 22, 2025 •

edited

Loading

jiqing-feng commented Feb 24, 2025 •

edited

Loading

ydshieh commented Feb 24, 2025

AudioClassificationPipelineTests::test_small_model_pt_fp16 fails for CUDA/XPU but passes for CPU #36340

AudioClassificationPipelineTests::test_small_model_pt_fp16 fails for CUDA/XPU but passes for CPU #36340

Comments

dvrogozh commented Feb 21, 2025

Ajinkya-25 commented Feb 22, 2025 • edited Loading

jiqing-feng commented Feb 24, 2025 • edited Loading

ydshieh commented Feb 24, 2025

Ajinkya-25 commented Feb 22, 2025 •

edited

Loading

jiqing-feng commented Feb 24, 2025 •

edited

Loading