[v2.5.x] Fix stray bracket breaking pytest; fix include-patch for cross-compilation #346

h-vetinari · 2025-02-06T20:26:45Z

Fixes #348
Fixes #349

Previously:

Similar to #344 and #345, test if removing cures the issues we're seeing on the CUDA MKL build.

see also ba2564b

conda-forge-admin · 2025-02-06T20:28:12Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13223778212. Examine the logs at this URL for more detail.}

This reverts commit 952a64b.

This reverts commit fc080fa.

h-vetinari · 2025-02-07T23:26:53Z

Sigh, what's happening with the windows builds now?

Provisioning base env with micromamba
  Downloading micromamba 1.5.10-0
  ****  Online  ****
  
  
  CertUtil: -URLCache command completed successfully.
  Creating environment
  '"C:\Users\RUNNER~1.CIR\AppData\Local\Temp\micromamba-20747\micromamba.exe"' is not recognized as an internal or external command,
  operable program or batch file.
  Error: Process completed with exit code 1.

This doesn't seem to affect azure-pipelines, so it looks like it's specific to the windows server. Did anything change there recently? @wolfv @baszalmstra

Tobias-Fischer · 2025-02-08T01:10:36Z

I’ve seen the same on azure recently, restart fixed it ..

h-vetinari · 2025-02-08T02:10:59Z

I’ve seen the same on azure recently, restart fixed it ..

I had tried restarting, but it didn't work. 🤷

The good thing is that the windows builds here aren't relevant (because nothing changed compared to the previous published builds), but for #326 I'll need to get this going again.

h-vetinari · 2025-02-08T20:16:19Z

OK, now that the test suite runs, there's some more failures in the MKL plus CUDA job, but for tests that are explicitly "exercising terrible failures"

        # NOTE: We're just exercising terrible failures here.
        version = _get_torch_cuda_version()
        SM80OrLater = torch.cuda.is_available() and torch.cuda.get_device_capability() >= (8, 0)
        SM70 = torch.cuda.is_available() and torch.cuda.get_device_capability() == (7, 0)
        SM75 = torch.cuda.is_available() and torch.cuda.get_device_capability() == (7, 5)
    
        if TEST_WITH_ROCM:
            _test(17, k, n, use_transpose_a, use_transpose_b, True)
        elif version >= (11, 7):
            if not use_transpose_a and use_transpose_b:
                if SM80OrLater or (version >= (12, 3) and (SM70 or SM75)):
                    _test(17, k, n, use_transpose_a, use_transpose_b, version > (11, 7))
                else:
                    with self.assertRaisesRegex(RuntimeError,
                                                "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul"):
                        _test(17, k, n, use_transpose_a, use_transpose_b)
    
            if use_transpose_a and not use_transpose_b:
                with self.assertRaisesRegex(RuntimeError,
                                            "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul"):
                    _test(17, k, n, use_transpose_a, use_transpose_b)
    
            if use_transpose_a and use_transpose_b:
                with self.assertRaisesRegex(RuntimeError,
                                            "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul"):
                    _test(17, k, n, use_transpose_a, use_transpose_b)
    
            if not use_transpose_a and not use_transpose_b:
                if SM80OrLater or (version >= (12, 3) and (SM70 or SM75)):
                    _test(17, k, n, use_transpose_a, use_transpose_b)
                else:
                    with self.assertRaisesRegex(RuntimeError,
                                                "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul"):
                        _test(17, k, n, use_transpose_a, use_transpose_b)
        else:
            with self.assertRaisesRegex(RuntimeError, "_int_mm_out_cuda not compiled for CUDA"):
>               _test(17, k, n, use_transpose_a, use_transpose_b, False)
E               AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 1 m 16 n 17 k 16 mat1_ld 16 mat2_ld 17 result_ld 16 abType 3 cType 10 computeType 72 scaleType 10"

The test also has a bunch of skips à la:

    @unittest.skipIf(IS_WINDOWS, "Skipped on Windows!")
    @unittest.skipIf(SM90OrLater and not TEST_WITH_ROCM, "Expected failure on sm90")
    @unittest.skipIf(IS_FBCODE and IS_REMOTE_GPU, "cublas runtime error")
    @skipCUDAIfRocmVersionLessThan((6, 0))
    @onlyCUDA

OTOH, I don't know where the _int_mm_out_cuda not compiled for CUDA comes from, but we probably need to look at pytorch/pytorch@9d37cef.

It might have something to do with changing the ATEN include directory?

pytorch-cpu-feedstock/recipe/patches/0018-make-ATEN_INCLUDE_DIR-relative-to-TORCH_INSTALL_PREF.patch

Lines 21 to 22 in be20390

    
           -set(ATEN_INCLUDE_DIR "${CMAKE_INSTALL_PREFIX}/${AT_INSTALL_INCLUDE_DIR}") 
        
           +set(ATEN_INCLUDE_DIR "${TORCH_INSTALL_PREFIX}/${AT_INSTALL_INCLUDE_DIR}")

But still weird that this would only show up for MKL.

Thoughts & ideas welcome! @hmaarrfk @mgorny @danpetry

h-vetinari · 2025-02-09T07:42:42Z

OK, more digging on the _int_mm stuff. Here's a relevant issue I found, though more importantly, looking at the actual error messages again, I'm pretty sure that something must be going wrong in the cuda version determination, which looks like

version = _get_torch_cuda_version()

Because if I remove some of the content of the if branches in the stacktrace above:

        if TEST_WITH_ROCM:
            _test(17, k, n, use_transpose_a, use_transpose_b, True)
        elif version >= (11, 7):
            #
            # we're apparently not taking this branch even though we really should!
            #
        else:
            with self.assertRaisesRegex(RuntimeError, "_int_mm_out_cuda not compiled for CUDA"):
>               _test(17, k, n, use_transpose_a, use_transpose_b, False)
E               AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 1 m 16 n 17 k 16 mat1_ld 16 mat2_ld 17 result_ld 16 abType 3 cType 10 computeType 72 scaleType 10"

Likewise, another failure looks like this:

        # cuSOLVER path supports underdetermined systems
        version = torch.testing._internal.common_cuda._get_torch_cuda_version()
        cusolver_not_available = (version < (10, 1))
    
        if device != 'cpu' and cusolver_not_available:
            a = torch.rand(2, 3, dtype=dtype, device=device)
            b = torch.rand(2, 1, dtype=dtype, device=device)
            with self.assertRaisesRegex(RuntimeError, r'only overdetermined systems'):
>               torch.linalg.lstsq(a, b)
E               AssertionError: RuntimeError not raised

and if version had the correct value, it should be impossible for us that and cusolver_not_available is satisfied.

h-vetinari · 2025-02-09T07:50:07Z

OK, _get_torch_cuda_version does

    if torch.version.cuda is None:
        return (0, 0)

which explains this behaviour. AFAICT, torch.version.cuda is set through CUDA_VERSION, which we're not defining here. This should be a relatively simple to add and test. 🤞

h-vetinari · 2025-02-11T00:08:10Z

OK, we're finally back to a fully green CI (first time since merging #331), but unfortunately, the stream of issues hasn't abated yet. In the meantime, we've had #350 & #354 come in. Given the ~24h necessary for a full CI run, I'm going to stop iterating on this now and get the fixes for the previous set of issues out the door at least.

Though I really wanted to merge a green CI as-is this time, I'll pick up the fix from #355, which is low-risk. Unfortunately, my tentative fix for #354 in #326 (c92777a) didn't work out. I'll iterate on this separately from this PR; on windows we're not as resource-constrained as on linux, so this shouldn't take too long. Famous last words 🤞

h-vetinari · 2025-02-11T12:23:34Z

Sigh... The linux+CUDA+openblas job had been passing a number of times, but failed upon merge with

=================================== FAILURES ===================================
_____________________ test/inductor/test_torchinductor.py ______________________
[gw1] linux -- Python 3.12.8 $PREFIX/bin/python
worker 'gw1' crashed while running 'test/inductor/test_torchinductor.py::SweepInputsCpuTest::test_cpu_broadcast3_dense'
=============================== warnings summary ===============================
[...]
=========================== short test summary info ============================
FAILED [0.0000s] test/inductor/test_torchinductor.py::SweepInputsCpuTest::test_cpu_broadcast3_dense
= 1 failed, 14515 passed, 2663 skipped, 91 xfailed, 143960 warnings in 4243.69s (1:10:43) =

I'll restart once the rest of the CI finishes, and if it happens a second time, we can add a skip.

h-vetinari added 2 commits February 7, 2025 07:24

remove smoke tests

2e5cbe2

drop unsuccessful patch; now covered by skip

816820d

see also ba2564b

h-vetinari mentioned this pull request Feb 7, 2025

Test if last passing run can be reproduced #345

Closed

h-vetinari force-pushed the no_xdist branch from 543f181 to 79fe8f5 Compare February 7, 2025 10:41

h-vetinari mentioned this pull request Feb 7, 2025

pytest crashes for linux64+CUDA+MKL #348

Closed

h-vetinari force-pushed the no_xdist branch from 79fe8f5 to d653406 Compare February 7, 2025 12:07

h-vetinari added 2 commits February 7, 2025 23:46

deactivate pytest-xdist for linux-64+CUDA+MKL

952a64b

temporary: skip known-passing builds to conserve resources

fc080fa

h-vetinari force-pushed the no_xdist branch from d653406 to fc080fa Compare February 7, 2025 12:47

h-vetinari added 5 commits February 8, 2025 09:56

Revert "deactivate pytest-xdist for linux-64+CUDA+MKL"

052a502

This reverts commit 952a64b.

fix stray ) that broke pytest

27cbb8f

limit include-path manipulation patch to windows; unix can use symlinks

749e463

Revert "temporary: skip known-passing builds to conserve resources"

0d2b108

This reverts commit fc080fa.

bump build number

bc4f8b1

h-vetinari changed the title ~~Test removing pytest-xdist~~ Fix stray bracket breaking pytest; restrict include patch to windows Feb 7, 2025

h-vetinari changed the base branch from main to v2.5.x February 7, 2025 23:05

h-vetinari changed the title ~~Fix stray bracket breaking pytest; restrict include patch to windows~~ [v2.5.x] Fix stray bracket breaking pytest; restrict include patch to windows Feb 7, 2025

h-vetinari marked this pull request as ready for review February 7, 2025 23:12

h-vetinari requested review from Tobias-Fischer, baszalmstra, beckermr, benjaminrwilson, hmaarrfk, jeongseok-meta, mgorny and sodre as code owners February 7, 2025 23:12

h-vetinari added 4 commits February 8, 2025 14:18

fix the way we look for $PREFIX/include in cross-compilation

044e935

skip test_mutable_custom_op_fixed_layout everywhere

25fc472

robustify CONDA_BUILD_CROSS_COMPILATION handling

c47b3eb

skip torchinductor failures on osx

d36c1b4

h-vetinari changed the title ~~[v2.5.x] Fix stray bracket breaking pytest; restrict include patch to windows~~ [v2.5.x] Fix stray bracket breaking pytest; fix include-patch for cross-compilation Feb 9, 2025

h-vetinari added 2 commits February 9, 2025 18:54

set CUDA_VERSION during build; test torch.version.cuda is correct

2af8e58

enable torch.backends tests also for win+CUDA

1acfd43

This was referenced Feb 10, 2025

Pytorch missing DLLs when installed into an environment on Windows, but not on linux #354

Open

Build with -Wl,-z,noexecstack to fix glibc 2.41 compatibility #355

Closed

h-vetinari added a commit that referenced this pull request Feb 11, 2025

Merge pull request #346 from h-vetinari/no_xdist

baf725e

h-vetinari merged commit 1acfd43 into conda-forge:v2.5.x Feb 11, 2025
27 checks passed

h-vetinari deleted the no_xdist branch February 11, 2025 00:27

This was referenced Feb 11, 2025

OSX Arm64 cross-compilation of pytorch extensions fails on conda-forge #349

Closed

ImportError: libtorch_cpu.so: cannot enable executable stack as shared object requires: Invalid argument #350

Closed

pytorch v2.6.0 #326

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2.5.x] Fix stray bracket breaking pytest; fix include-patch for cross-compilation #346

[v2.5.x] Fix stray bracket breaking pytest; fix include-patch for cross-compilation #346

h-vetinari commented Feb 6, 2025 •

edited

Loading

conda-forge-admin commented Feb 6, 2025 •

edited

Loading

h-vetinari commented Feb 7, 2025

Tobias-Fischer commented Feb 8, 2025

h-vetinari commented Feb 8, 2025

h-vetinari commented Feb 8, 2025 •

edited

Loading

h-vetinari commented Feb 9, 2025

h-vetinari commented Feb 9, 2025

h-vetinari commented Feb 11, 2025

h-vetinari commented Feb 11, 2025

[v2.5.x] Fix stray bracket breaking pytest; fix include-patch for cross-compilation #346

[v2.5.x] Fix stray bracket breaking pytest; fix include-patch for cross-compilation #346

Conversation

h-vetinari commented Feb 6, 2025 • edited Loading

conda-forge-admin commented Feb 6, 2025 • edited Loading

h-vetinari commented Feb 7, 2025

Tobias-Fischer commented Feb 8, 2025

h-vetinari commented Feb 8, 2025

h-vetinari commented Feb 8, 2025 • edited Loading

h-vetinari commented Feb 9, 2025

h-vetinari commented Feb 9, 2025

h-vetinari commented Feb 11, 2025

h-vetinari commented Feb 11, 2025

h-vetinari commented Feb 6, 2025 •

edited

Loading

conda-forge-admin commented Feb 6, 2025 •

edited

Loading

h-vetinari commented Feb 8, 2025 •

edited

Loading