reduce GQA test combinations #22918

tianleiwu · 2024-11-21T16:39:24Z

Description

Reduce GQA test combinations to save about 35 minutes test time in CI pipelines.
Show latency of transformers tests
Use seed in DMMHA test to avoid random failure.
For test_flash_attn_rocm.py, test skipping condition from "has cuda ep" to "not has rocm ep", so that it does not run in cpu build.
For test_flash_attn_cuda.py, move flash attention and memory efficient attention tests to different classes, so that we can skip a test suite instead of checking in each test.

Motivation and Context

It takes too long to run GQA tests in CI pipelines since there are too many combinations.

Linux GPU CUDA CI Pipeline

Before: 5097 passed, 68 skipped, 8 warnings in 1954.64s (0:32:34)
After: 150 passed, 176 skipped, 8 warnings in 530.38s (0:08:50)
Time Saved: 1424 seconds (0:23:44)

Linux CPU CI Pipeline

Before: 5093 passed, 72 skipped, 4 warnings in 467.04s (0:07:47)

212.96s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
154.12s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
26.45s transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

After: 116 passed, 210 skipped, 4 warnings in 93.41s (0:01:33)

0.97s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
19.23s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
2.41s transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

Time Saved: 374 seconds (0:06:14).

Windows GPU CUDA CI Pipeline

Before: 1781 passed, 72 skipped, 6 warnings in 605.48s (0:10:05)
After: 116 passed, 118 skipped, 6 warnings in 275.48s (0:04:35)
Time Saved: 330 seconds (0:05:30)

### Description * Reduce GQA test combinations to save about 35 minutes test time in CI pipelines. * Show latency of transformers tests * Use seed in DMMHA test to avoid random failure. * For test_flash_attn_rocm.py, test skipping condition from "has cuda ep" to "not has rocm ep", so that it does not run in cpu build. * For test_flash_attn_cuda.py, move flash attention and memory efficient attention tests to different classes, so that we can skip a test suite instead of checking in each test. ### Motivation and Context It takes too long to run GQA tests in CI pipelines since there are too many combinations. ###### Linux GPU CI Pipeline Before: 5097 passed, 68 skipped, 8 warnings in 1954.64s (0:32:34) After: 150 passed, 176 skipped, 8 warnings in 530.38s (0:08:50) Time Saved: **1424** seconds (0:23:44) ###### Windows GPU CUDA CI Pipeline Before: 1781 passed, 72 skipped, 6 warnings in 605.48s (0:10:05) After: 116 passed, 118 skipped, 6 warnings in 275.48s (0:04:35) Time Saved: **330** seconds (0:05:30) ###### Linux CPU CI Pipeline Before: 5093 passed, 72 skipped, 4 warnings in 467.04s (0:07:47) - 212.96s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past - 154.12s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past - 26.45s transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch After: 116 passed, 210 skipped, 4 warnings in 93.41s (0:01:33) - 0.97s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past - 19.23s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past - 2.41s transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch Time Saved: **374** seconds (0:06:14).

reduce gqa/mha test combinations

9cc43c2

tianleiwu requested review from aciddelgado, kunal-vaishnavi and jiafatom November 21, 2024 16:39

kunal-vaishnavi approved these changes Nov 21, 2024

View reviewed changes

tianleiwu merged commit 8d99b1a into main Nov 21, 2024
93 checks passed

tianleiwu deleted the tlwu/reduce_mha_gqa_test_combinations branch November 21, 2024 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce GQA test combinations #22918

reduce GQA test combinations #22918

tianleiwu commented Nov 21, 2024 •

edited

Loading

reduce GQA test combinations #22918

reduce GQA test combinations #22918

Conversation

tianleiwu commented Nov 21, 2024 • edited Loading

Description

Motivation and Context

Linux GPU CUDA CI Pipeline

Linux CPU CI Pipeline

Windows GPU CUDA CI Pipeline

tianleiwu commented Nov 21, 2024 •

edited

Loading