Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2.5.x] Fix stray bracket breaking pytest; fix include-patch for cross-compilation #346

Merged
merged 15 commits into from
Feb 11, 2025

Conversation

h-vetinari
Copy link
Member

@h-vetinari h-vetinari commented Feb 6, 2025

Fixes #348
Fixes #349

Previously:

Similar to #344 and #345, test if removing cures the issues we're seeing on the CUDA MKL build.

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Feb 6, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13223778212. Examine the logs at this URL for more detail.

@h-vetinari h-vetinari changed the title Test removing pytest-xdist Fix stray bracket breaking pytest; restrict include patch to windows Feb 7, 2025
@h-vetinari h-vetinari changed the base branch from main to v2.5.x February 7, 2025 23:05
@h-vetinari h-vetinari changed the title Fix stray bracket breaking pytest; restrict include patch to windows [v2.5.x] Fix stray bracket breaking pytest; restrict include patch to windows Feb 7, 2025
@h-vetinari h-vetinari marked this pull request as ready for review February 7, 2025 23:12
@h-vetinari
Copy link
Member Author

Sigh, what's happening with the windows builds now?

Provisioning base env with micromamba
  Downloading micromamba 1.5.10-0
  ****  Online  ****
  
  
  CertUtil: -URLCache command completed successfully.
  Creating environment
  '"C:\Users\RUNNER~1.CIR\AppData\Local\Temp\micromamba-20747\micromamba.exe"' is not recognized as an internal or external command,
  operable program or batch file.
  Error: Process completed with exit code 1.

This doesn't seem to affect azure-pipelines, so it looks like it's specific to the windows server. Did anything change there recently? @wolfv @baszalmstra

@Tobias-Fischer
Copy link
Contributor

I’ve seen the same on azure recently, restart fixed it ..

@h-vetinari
Copy link
Member Author

I’ve seen the same on azure recently, restart fixed it ..

I had tried restarting, but it didn't work. 🤷

The good thing is that the windows builds here aren't relevant (because nothing changed compared to the previous published builds), but for #326 I'll need to get this going again.

@h-vetinari
Copy link
Member Author

h-vetinari commented Feb 8, 2025

OK, now that the test suite runs, there's some more failures in the MKL plus CUDA job, but for tests that are explicitly "exercising terrible failures"

        # NOTE: We're just exercising terrible failures here.
        version = _get_torch_cuda_version()
        SM80OrLater = torch.cuda.is_available() and torch.cuda.get_device_capability() >= (8, 0)
        SM70 = torch.cuda.is_available() and torch.cuda.get_device_capability() == (7, 0)
        SM75 = torch.cuda.is_available() and torch.cuda.get_device_capability() == (7, 5)
    
        if TEST_WITH_ROCM:
            _test(17, k, n, use_transpose_a, use_transpose_b, True)
        elif version >= (11, 7):
            if not use_transpose_a and use_transpose_b:
                if SM80OrLater or (version >= (12, 3) and (SM70 or SM75)):
                    _test(17, k, n, use_transpose_a, use_transpose_b, version > (11, 7))
                else:
                    with self.assertRaisesRegex(RuntimeError,
                                                "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul"):
                        _test(17, k, n, use_transpose_a, use_transpose_b)
    
            if use_transpose_a and not use_transpose_b:
                with self.assertRaisesRegex(RuntimeError,
                                            "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul"):
                    _test(17, k, n, use_transpose_a, use_transpose_b)
    
            if use_transpose_a and use_transpose_b:
                with self.assertRaisesRegex(RuntimeError,
                                            "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul"):
                    _test(17, k, n, use_transpose_a, use_transpose_b)
    
            if not use_transpose_a and not use_transpose_b:
                if SM80OrLater or (version >= (12, 3) and (SM70 or SM75)):
                    _test(17, k, n, use_transpose_a, use_transpose_b)
                else:
                    with self.assertRaisesRegex(RuntimeError,
                                                "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul"):
                        _test(17, k, n, use_transpose_a, use_transpose_b)
        else:
            with self.assertRaisesRegex(RuntimeError, "_int_mm_out_cuda not compiled for CUDA"):
>               _test(17, k, n, use_transpose_a, use_transpose_b, False)
E               AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 1 m 16 n 17 k 16 mat1_ld 16 mat2_ld 17 result_ld 16 abType 3 cType 10 computeType 72 scaleType 10"

The test also has a bunch of skips à la:

    @unittest.skipIf(IS_WINDOWS, "Skipped on Windows!")
    @unittest.skipIf(SM90OrLater and not TEST_WITH_ROCM, "Expected failure on sm90")
    @unittest.skipIf(IS_FBCODE and IS_REMOTE_GPU, "cublas runtime error")
    @skipCUDAIfRocmVersionLessThan((6, 0))
    @onlyCUDA

OTOH, I don't know where the _int_mm_out_cuda not compiled for CUDA comes from, but we probably need to look at pytorch/pytorch@9d37cef.

It might have something to do with changing the ATEN include directory?

-set(ATEN_INCLUDE_DIR "${CMAKE_INSTALL_PREFIX}/${AT_INSTALL_INCLUDE_DIR}")
+set(ATEN_INCLUDE_DIR "${TORCH_INSTALL_PREFIX}/${AT_INSTALL_INCLUDE_DIR}")

But still weird that this would only show up for MKL.

Thoughts & ideas welcome! @hmaarrfk @mgorny @danpetry

@h-vetinari h-vetinari changed the title [v2.5.x] Fix stray bracket breaking pytest; restrict include patch to windows [v2.5.x] Fix stray bracket breaking pytest; fix include-patch for cross-compilation Feb 9, 2025
@h-vetinari
Copy link
Member Author

OK, more digging on the _int_mm stuff. Here's a relevant issue I found, though more importantly, looking at the actual error messages again, I'm pretty sure that something must be going wrong in the cuda version determination, which looks like

version = _get_torch_cuda_version()

Because if I remove some of the content of the if branches in the stacktrace above:

        if TEST_WITH_ROCM:
            _test(17, k, n, use_transpose_a, use_transpose_b, True)
        elif version >= (11, 7):
            #
            # we're apparently not taking this branch even though we really should!
            #
        else:
            with self.assertRaisesRegex(RuntimeError, "_int_mm_out_cuda not compiled for CUDA"):
>               _test(17, k, n, use_transpose_a, use_transpose_b, False)
E               AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 1 m 16 n 17 k 16 mat1_ld 16 mat2_ld 17 result_ld 16 abType 3 cType 10 computeType 72 scaleType 10"

Likewise, another failure looks like this:

        # cuSOLVER path supports underdetermined systems
        version = torch.testing._internal.common_cuda._get_torch_cuda_version()
        cusolver_not_available = (version < (10, 1))
    
        if device != 'cpu' and cusolver_not_available:
            a = torch.rand(2, 3, dtype=dtype, device=device)
            b = torch.rand(2, 1, dtype=dtype, device=device)
            with self.assertRaisesRegex(RuntimeError, r'only overdetermined systems'):
>               torch.linalg.lstsq(a, b)
E               AssertionError: RuntimeError not raised

and if version had the correct value, it should be impossible for us that and cusolver_not_available is satisfied.

@h-vetinari
Copy link
Member Author

OK, _get_torch_cuda_version does

    if torch.version.cuda is None:
        return (0, 0)

which explains this behaviour. AFAICT, torch.version.cuda is set through CUDA_VERSION, which we're not defining here. This should be a relatively simple to add and test. 🤞

@h-vetinari
Copy link
Member Author

OK, we're finally back to a fully green CI (first time since merging #331), but unfortunately, the stream of issues hasn't abated yet. In the meantime, we've had #350 & #354 come in. Given the ~24h necessary for a full CI run, I'm going to stop iterating on this now and get the fixes for the previous set of issues out the door at least.

Though I really wanted to merge a green CI as-is this time, I'll pick up the fix from #355, which is low-risk. Unfortunately, my tentative fix for #354 in #326 (c92777a) didn't work out. I'll iterate on this separately from this PR; on windows we're not as resource-constrained as on linux, so this shouldn't take too long. Famous last words 🤞

@h-vetinari
Copy link
Member Author

Sigh... The linux+CUDA+openblas job had been passing a number of times, but failed upon merge with

=================================== FAILURES ===================================
_____________________ test/inductor/test_torchinductor.py ______________________
[gw1] linux -- Python 3.12.8 $PREFIX/bin/python
worker 'gw1' crashed while running 'test/inductor/test_torchinductor.py::SweepInputsCpuTest::test_cpu_broadcast3_dense'
=============================== warnings summary ===============================
[...]
=========================== short test summary info ============================
FAILED [0.0000s] test/inductor/test_torchinductor.py::SweepInputsCpuTest::test_cpu_broadcast3_dense
= 1 failed, 14515 passed, 2663 skipped, 91 xfailed, 143960 warnings in 4243.69s (1:10:43) =

I'll restart once the rest of the CI finishes, and if it happens a second time, we can add a skip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants