Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CMake metadata for CUDA-enabled libtorch #339

Merged
merged 25 commits into from
Feb 3, 2025

Conversation

h-vetinari
Copy link
Member

@h-vetinari h-vetinari commented Jan 30, 2025

Follow-up to #318.

Fixes #333 (when tests are passing 🤞)
Doesn't address #334 tensorpipe is not supported on windows apparently

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Jan 30, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13102492026. Examine the logs at this URL for more detail.

@h-vetinari h-vetinari changed the title Fix CMake metadata for CUDA-enabled libtorch; enable tensorpipe on windows Fix CMake metadata for CUDA-enabled libtorch Jan 30, 2025
@h-vetinari h-vetinari force-pushed the cmake branch 4 times, most recently from ffcd410 to 37380b3 Compare January 30, 2025 12:33
@h-vetinari
Copy link
Member Author

@danpetry, if you're feeling motivated: Next step is getting rid of cuda_select_nvcc_arch_flags here, because it doesn't exist anymore in a post-find_package(CUDA) world. However, there's no good equivalent AFAICT. CMake wants to use CMAKE_CUDA_ARCHITECTURES these days, but that has another spelling for the arches, and the pytorch build repeatedly sets these dynamically for some reason (and says they cannot rely on CMake).

@mgorny
Copy link
Contributor

mgorny commented Jan 30, 2025

(and says they cannot rely on CMake)

FWICS that comment dates back to 2021, but they've started requiring CMake 3.18 in 2022 — so perhaps it just meant they couldn't bump minimum CMake version yet back then?

@danpetry
Copy link
Contributor

@h-vetinari the motivation is there but unfortunately my daughter has got sick and is out of daycare for the next couple of days. The next time I'll be able to work on it will be Monday now. I'd be happy to know about the state/next steps then.

@h-vetinari
Copy link
Member Author

FWICS that comment dates back to 2021, but they've started requiring CMake 3.18 in 2022 — so perhaps it just meant they couldn't bump minimum CMake version yet back then?

Yeah, a lot changed over the years, and the CMake code there is still pretty crusty. Mainly I want to keep the surgery minimal to not introduce behaviour changes; and making all the places where they set CUDA archs explicitly a no-op seems... excessive?

One idea I just had would be vendoring CMake's implementation of cuda_select_nvcc_arch_flags...

@h-vetinari
Copy link
Member Author

Wishing a speedy recovery to your daughter @danpetry, take your time!

h-vetinari added a commit to h-vetinari/pytorch-cpu-feedstock that referenced this pull request Jan 31, 2025
@mgorny
Copy link
Contributor

mgorny commented Jan 31, 2025

Mainly I want to keep the surgery minimal to not introduce behaviour changes

Are you aiming to submit these changes upstream? While being conservative for conda-forge patching makes sense, I think it'd fine to go all the way in main and see what they think.

@h-vetinari
Copy link
Member Author

Are you aiming to submit these changes upstream? While being conservative for conda-forge patching makes sense, I think it'd fine to go all the way in main and see what they think.

TBH, it's unlikely. I can throw something over the fence for upstream to use as a jumping off point, but the CMake files are pretty crusty (and sprawling), plus I don't know the codebase, so I feel this would require an extraordinary amount of time to get into mergeable shape.

@rgommers
Copy link

For context: we've pushed on these types of build system changes in PyTorch in the past - it's close to impossible to land a structural change like moving away from find_package(CUDA) unless there's active commitment & collaboration from a PyTorch release team member. Otherwise even working PRs are likely to not be merged and accumulate merge conflicts fairly rapidly.

@mgorny
Copy link
Contributor

mgorny commented Jan 31, 2025

Heh, so I guess my success with pytorch/pytorch#145487 was either exceptional (or I'm judging it prematurely).

@rgommers
Copy link

Heh, so I guess my success with pytorch/pytorch#145487 was either exceptional (or I'm judging it prematurely).

Or maybe things improved - but let's see after it gets merged:)

@h-vetinari
Copy link
Member Author

There's something very strange going on with the CMake cache. For e2c551d, the cache worked fine (logs), meaning that the libtorch bits do not get rebuilt when building pytorch. However, for 138456c, we suddenly take hours longer (logs), because now we end up taking ~6h for the first pytorch builds.

At first I thought this was caused by removing -vvv from pip install. I reverted that, but it still seems to happen. The only relevant changes between e2c551d and the head are

--- a/recipe/build.sh
+++ b/recipe/build.sh
@@ -219,6 +219,8 @@ elif [[ ${cuda_compiler_version} != "None" ]]; then
     export USE_STATIC_CUDNN=0
     export MAGMA_HOME="${PREFIX}"
     export USE_MAGMA=1
+    # turn off noisy nvcc warnings
+    export CUDAFLAGS="-w --ptxas-options=-w"
 else
     if [[ "$target_platform" != *-64 ]]; then
       # Breakpad seems to not work on aarch64 or ppc64le
diff --git a/recipe/meta.yaml b/recipe/meta.yaml
index e3b8b81..e110190 100644
--- a/recipe/meta.yaml
+++ b/recipe/meta.yaml
@@ -69,10 +69,11 @@ source:
     - patches/0016-point-include-paths-to-PREFIX-include.patch
     - patches/0017-Add-conda-prefix-to-inductor-include-paths.patch
     - patches/0018-make-ATEN_INCLUDE_DIR-relative-to-TORCH_INSTALL_PREF.patch
-    - patches/0019-remove-DESTINATION-lib-from-CMake-install-TARGETS-di.patch               # [win]
+    - patches/0019-remove-DESTINATION-lib-from-CMake-install-TARGETS-di.patch                       # [win]
     - patches/0020-make-library-name-in-test_mutable_custom_op_fixed_la.patch
     - patches/0021-avoid-deprecated-find_package-CUDA-in-caffe2-CMake-m.patch
-    - patches_submodules/0001-remove-DESTINATION-lib-from-CMake-install-directives.patch    # [win]
+    - patches_submodules/fbgemm/0001-remove-DESTINATION-lib-from-CMake-install-directives.patch     # [win]
+    - patches_submodules/tensorpipe/0001-switch-away-from-find_package-CUDA.patch

 build:
   number: {{ build }}
diff --git a/recipe/patches_submodules/tensorpipe/0001-switch-away-from-find_package-CUDA.patch b/recipe/patches_submodules/tensorpipe/0001-switch-away-from-find_package-CUDA.patch
new file mode 100644
index 0000000..fe411d7
--- /dev/null
+++ b/recipe/patches_submodules/tensorpipe/0001-switch-away-from-find_package-CUDA.patch
@@ -0,0 +1,22 @@
+From 9a1de62dd1b3d816d6fb87c2041f4005ab5c683d Mon Sep 17 00:00:00 2001
+From: "H. Vetinari" <h.vetinari@gmx.com>
+Date: Sun, 2 Feb 2025 08:54:01 +1100
+Subject: [PATCH] switch away from find_package(CUDA)
+
+---
+ tensorpipe/CMakeLists.txt | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/third_party/tensorpipe/tensorpipe/CMakeLists.txt b/third_party/tensorpipe/tensorpipe/CMakeLists.txt
+index efcffc2..1c3b2ca 100644
+--- a/third_party/tensorpipe/tensorpipe/CMakeLists.txt
++++ b/third_party/tensorpipe/tensorpipe/CMakeLists.txt
+@@ -234,7 +234,7 @@ if(TP_USE_CUDA)
+   # TP_INCLUDE_DIRS is list of include path to be used
+   set(TP_CUDA_INCLUDE_DIRS)
+
+-  find_package(CUDA REQUIRED)
++  find_package(CUDAToolkit REQUIRED)
+   list(APPEND TP_CUDA_LINK_LIBRARIES ${CUDA_LIBRARIES})
+   list(APPEND TP_CUDA_INCLUDE_DIRS ${CUDA_INCLUDE_DIRS})
+

none of which are plausible as a cache-busting mechanism. Maybe this is somehow similar to #343, but I really don't know what could be causing this.

I've compared the host/build environments between the libtorch build and first pytorch build from the good run, and this is the result.

--- a/cache_good.txt
+++ b/cache_good.txt
@@ -3,11 +3,11 @@ The following NEW packages will be INSTALLED:
     _libgcc_mutex:               0.1-conda_forge                    conda-forge
     _openmp_mutex:               4.5-2_gnu                          conda-forge
     attr:                        2.5.1-h166bdaf_1                   conda-forge
-    brotli-python:               1.1.0-py312h2ec8cdc_2              conda-forge
+    brotli-python:               1.1.0-py313h46c70d0_2              conda-forge
     bzip2:                       1.0.8-h4bc722e_7                   conda-forge
     ca-certificates:             2025.1.31-hbcca054_0               conda-forge
     certifi:                     2024.12.14-pyhd8ed1ab_0            conda-forge
-    cffi:                        1.17.1-py312h06ac9bb_0             conda-forge
+    cffi:                        1.17.1-py313hfab6e84_0             conda-forge
     charset-normalizer:          3.4.1-pyhd8ed1ab_0                 conda-forge
     cuda-cccl_linux-64:          12.6.77-ha770c72_0                 conda-forge
     cuda-crt-dev_linux-64:       12.6.85-ha770c72_0                 conda-forge
@@ -67,35 +67,35 @@ The following NEW packages will be INSTALLED:
     liblzma:                     5.6.3-hb9d3cd8_1                   conda-forge
     libmagma:                    2.8.0-h566cb83_2                   conda-forge
     libmagma_sparse:             2.8.0-h0af6554_0                   conda-forge
+    libmpdec:                    4.0.0-h4bc722e_0                   conda-forge
     libnl:                       3.11.0-hb9d3cd8_0                  conda-forge
-    libnsl:                      2.0.1-hd590300_0                   conda-forge
     libnvjitlink:                12.6.85-hbd13f7d_0                 conda-forge
     libprotobuf:                 5.28.3-h6128344_1                  conda-forge
     libsqlite:                   3.48.0-hee588c1_1                  conda-forge
     libstdcxx:                   14.2.0-hc0a3c3a_1                  conda-forge
     libstdcxx-ng:                14.2.0-h4852527_1                  conda-forge
     libsystemd0:                 257.2-h3dc2cb9_0                   conda-forge
+    libtorch:                    2.5.1-cuda126_generic_h744fda7_212 local
     libudev1:                    257.2-h9a4d06a_0                   conda-forge
     libuuid:                     2.38.1-h0b41bf4_0                  conda-forge
     libuv:                       1.50.0-hb9d3cd8_0                  conda-forge
-    libxcrypt:                   4.4.36-hd590300_1                  conda-forge
     libzlib:                     1.3.1-hb9d3cd8_2                   conda-forge
     lz4-c:                       1.10.0-h5888daf_1                  conda-forge
     magma:                       2.8.0-h51420fd_0                   conda-forge
     nccl:                        2.25.1.1-ha44e49d_0                conda-forge
     ncurses:                     6.5-h2d0b736_3                     conda-forge
-    numpy:                       2.2.2-py312h72c5963_0              conda-forge
+    numpy:                       2.2.2-py313h17eae1a_0              conda-forge
     nvtx-c:                      3.1.0-ha770c72_1                   conda-forge
     openssl:                     3.4.0-h7b32b05_1                   conda-forge
-    pip:                         25.0-pyh8b19718_0                  conda-forge
+    pip:                         25.0-pyh145f28c_0                  conda-forge
     pkg-config:                  0.29.2-h4bc722e_1009               conda-forge
     pybind11:                    2.13.6-pyh1ec8472_2                conda-forge
     pybind11-global:             2.13.6-pyh415d2e4_2                conda-forge
     pycparser:                   2.22-pyh29332c3_1                  conda-forge
     pysocks:                     1.7.1-pyha55dd90_7                 conda-forge
-    python:                      3.12.8-h9e4cc4f_1_cpython          conda-forge
-    python_abi:                  3.12-5_cp312                       conda-forge
-    pyyaml:                      6.0.2-py312h178313f_2              conda-forge
+    python:                      3.13.1-ha99a958_105_cp313          conda-forge
+    python_abi:                  3.13-5_cp313                       conda-forge
+    pyyaml:                      6.0.2-py313h8060acc_2              conda-forge
     rdma-core:                   55.0-h5888daf_0                    conda-forge
     readline:                    8.2-h8228510_1                     conda-forge
     requests:                    2.32.3-pyhd8ed1ab_1                conda-forge
@@ -106,10 +106,8 @@ The following NEW packages will be INSTALLED:
     typing_extensions:           4.12.2-pyha770c72_1                conda-forge
     tzdata:                      2025a-h78e105d_0                   conda-forge
     urllib3:                     2.3.0-pyhd8ed1ab_0                 conda-forge
-    wheel:                       0.45.1-pyhd8ed1ab_1                conda-forge
     yaml:                        0.2.5-h7f98852_2                   conda-forge
-    zlib:                        1.3.1-hb9d3cd8_2                   conda-forge
-    zstandard:                   0.23.0-py312hef9b889_1             conda-forge
+    zstandard:                   0.23.0-py313h80202fe_1             conda-forge
     zstd:                        1.5.6-ha6fb4c9_0                   conda-forge

 The following NEW packages will be INSTALLED:
@@ -158,7 +156,6 @@ The following NEW packages will be INSTALLED:
     libgcc-devel_linux-64:       13.3.0-h84ea5a7_101           conda-forge
     libgcc-ng:                   14.2.0-h69a702a_1             conda-forge
     libgomp:                     14.2.0-h77fa898_1             conda-forge
-    libiconv:                    1.17-hd590300_2               conda-forge
     liblzma:                     5.6.3-hb9d3cd8_1              conda-forge
     libmpdec:                    4.0.0-h4bc722e_0              conda-forge
     libnghttp2:                  1.64.0-h161d5f1_0             conda-forge
@@ -172,20 +169,16 @@ The following NEW packages will be INSTALLED:
     libuuid:                     2.38.1-h0b41bf4_0             conda-forge
     libuv:                       1.50.0-hb9d3cd8_0             conda-forge
     libzlib:                     1.3.1-hb9d3cd8_2              conda-forge
-    lz4-c:                       1.10.0-h5888daf_1             conda-forge
     make:                        4.4.1-hb9d3cd8_2              conda-forge
     ncurses:                     6.5-h2d0b736_3                conda-forge
     ninja:                       1.12.1-h297d8ca_0             conda-forge
     openssl:                     3.4.0-h7b32b05_1              conda-forge
-    popt:                        1.16-h0b475e3_2002            conda-forge
     protobuf:                    5.28.3-py313h46c70d0_0        conda-forge
     python:                      3.13.1-ha99a958_105_cp313     conda-forge
     python_abi:                  3.13-5_cp313                  conda-forge
     readline:                    8.2-h8228510_1                conda-forge
     rhash:                       1.4.5-hb9d3cd8_0              conda-forge
-    rsync:                       3.4.1-h168f954_0              conda-forge
     sysroot_linux-64:            2.17-h0157908_18              conda-forge
     tk:                          8.6.13-noxft_h4845f30_101     conda-forge
     tzdata:                      2025a-h78e105d_0              conda-forge
-    xxhash:                      0.8.3-hb9d3cd8_0              conda-forge
     zstd:                        1.5.6-ha6fb4c9_0              conda-forge

Here's the comparison for the bad cache

--- a/cache_bad.txt
+++ b/cache_bad.txt
@@ -3,11 +3,11 @@ The following NEW packages will be INSTALLED:
     _libgcc_mutex:               0.1-conda_forge                    conda-forge
     _openmp_mutex:               4.5-2_gnu                          conda-forge
     attr:                        2.5.1-h166bdaf_1                   conda-forge
-    brotli-python:               1.1.0-py312h2ec8cdc_2              conda-forge
+    brotli-python:               1.1.0-py311hfdbb021_2              conda-forge
     bzip2:                       1.0.8-h4bc722e_7                   conda-forge
     ca-certificates:             2025.1.31-hbcca054_0               conda-forge
     certifi:                     2024.12.14-pyhd8ed1ab_0            conda-forge
-    cffi:                        1.17.1-py312h06ac9bb_0             conda-forge
+    cffi:                        1.17.1-py311hf29c0ef_0             conda-forge
     charset-normalizer:          3.4.1-pyhd8ed1ab_0                 conda-forge
     cuda-cccl_linux-64:          12.6.77-ha770c72_0                 conda-forge
     cuda-crt-dev_linux-64:       12.6.85-ha770c72_0                 conda-forge
@@ -75,6 +75,7 @@ The following NEW packages will be INSTALLED:
     libstdcxx:                   14.2.0-hc0a3c3a_1                  conda-forge
     libstdcxx-ng:                14.2.0-h4852527_1                  conda-forge
     libsystemd0:                 257.2-h3dc2cb9_0                   conda-forge
+    libtorch:                    2.5.1-cuda126_generic_h744fda7_212 local
     libudev1:                    257.2-h9a4d06a_0                   conda-forge
     libuuid:                     2.38.1-h0b41bf4_0                  conda-forge
     libuv:                       1.50.0-hb9d3cd8_0                  conda-forge
@@ -84,7 +85,7 @@ The following NEW packages will be INSTALLED:
     magma:                       2.8.0-h51420fd_0                   conda-forge
     nccl:                        2.25.1.1-ha44e49d_0                conda-forge
     ncurses:                     6.5-h2d0b736_3                     conda-forge
-    numpy:                       2.2.2-py312h72c5963_0              conda-forge
+    numpy:                       2.0.2-py311h71ddf71_1              conda-forge
     nvtx-c:                      3.1.0-ha770c72_1                   conda-forge
     openssl:                     3.4.0-h7b32b05_1                   conda-forge
     pip:                         25.0-pyh8b19718_0                  conda-forge
@@ -93,9 +94,9 @@ The following NEW packages will be INSTALLED:
     pybind11-global:             2.13.6-pyh415d2e4_2                conda-forge
     pycparser:                   2.22-pyh29332c3_1                  conda-forge
     pysocks:                     1.7.1-pyha55dd90_7                 conda-forge
-    python:                      3.12.8-h9e4cc4f_1_cpython          conda-forge
-    python_abi:                  3.12-5_cp312                       conda-forge
-    pyyaml:                      6.0.2-py312h178313f_2              conda-forge
+    python:                      3.11.11-h9e4cc4f_1_cpython         conda-forge
+    python_abi:                  3.11-5_cp311                       conda-forge
+    pyyaml:                      6.0.2-py311h2dc5d0c_2              conda-forge
     rdma-core:                   55.0-h5888daf_0                    conda-forge
     readline:                    8.2-h8228510_1                     conda-forge
     requests:                    2.32.3-pyhd8ed1ab_1                conda-forge
@@ -108,8 +109,7 @@ The following NEW packages will be INSTALLED:
     urllib3:                     2.3.0-pyhd8ed1ab_0                 conda-forge
     wheel:                       0.45.1-pyhd8ed1ab_1                conda-forge
     yaml:                        0.2.5-h7f98852_2                   conda-forge
-    zlib:                        1.3.1-hb9d3cd8_2                   conda-forge
-    zstandard:                   0.23.0-py312hef9b889_1             conda-forge
+    zstandard:                   0.23.0-py311hbc35293_1             conda-forge
     zstd:                        1.5.6-ha6fb4c9_0                   conda-forge

 The following NEW packages will be INSTALLED:
@@ -158,10 +158,9 @@ The following NEW packages will be INSTALLED:
     libgcc-devel_linux-64:       13.3.0-h84ea5a7_101           conda-forge
     libgcc-ng:                   14.2.0-h69a702a_1             conda-forge
     libgomp:                     14.2.0-h77fa898_1             conda-forge
-    libiconv:                    1.17-hd590300_2               conda-forge
     liblzma:                     5.6.3-hb9d3cd8_1              conda-forge
-    libmpdec:                    4.0.0-h4bc722e_0              conda-forge
     libnghttp2:                  1.64.0-h161d5f1_0             conda-forge
+    libnsl:                      2.0.1-hd590300_0              conda-forge
     libprotobuf:                 5.28.3-h6128344_1             conda-forge
     libsanitizer:                13.3.0-heb74ff8_1             conda-forge
     libsqlite:                   3.48.0-hee588c1_1             conda-forge
@@ -171,21 +170,18 @@ The following NEW packages will be INSTALLED:
     libstdcxx-ng:                14.2.0-h4852527_1             conda-forge
     libuuid:                     2.38.1-h0b41bf4_0             conda-forge
     libuv:                       1.50.0-hb9d3cd8_0             conda-forge
+    libxcrypt:                   4.4.36-hd590300_1             conda-forge
     libzlib:                     1.3.1-hb9d3cd8_2              conda-forge
-    lz4-c:                       1.10.0-h5888daf_1             conda-forge
     make:                        4.4.1-hb9d3cd8_2              conda-forge
     ncurses:                     6.5-h2d0b736_3                conda-forge
     ninja:                       1.12.1-h297d8ca_0             conda-forge
     openssl:                     3.4.0-h7b32b05_1              conda-forge
-    popt:                        1.16-h0b475e3_2002            conda-forge
-    protobuf:                    5.28.3-py313h46c70d0_0        conda-forge
-    python:                      3.13.1-ha99a958_105_cp313     conda-forge
-    python_abi:                  3.13-5_cp313                  conda-forge
+    protobuf:                    5.28.3-py311hfdbb021_0        conda-forge
+    python:                      3.11.11-h9e4cc4f_1_cpython    conda-forge
+    python_abi:                  3.11-5_cp311                  conda-forge
     readline:                    8.2-h8228510_1                conda-forge
     rhash:                       1.4.5-hb9d3cd8_0              conda-forge
-    rsync:                       3.4.1-h168f954_0              conda-forge
     sysroot_linux-64:            2.17-h0157908_18              conda-forge
     tk:                          8.6.13-noxft_h4845f30_101     conda-forge
     tzdata:                      2025a-h78e105d_0              conda-forge
-    xxhash:                      0.8.3-hb9d3cd8_0              conda-forge
     zstd:                        1.5.6-ha6fb4c9_0              conda-forge

Final comparison, here's the pytorch good cache environments vs the `pytoch bad cache environments:

git diff HEAD:cache_good.txt..HEAD:cache_bad.txt
--- a/cache_good.txt
+++ b/cache_bad.txt
@@ -3,11 +3,11 @@ The following NEW packages will be INSTALLED:
     _libgcc_mutex:               0.1-conda_forge                    conda-forge
     _openmp_mutex:               4.5-2_gnu                          conda-forge
     attr:                        2.5.1-h166bdaf_1                   conda-forge
-    brotli-python:               1.1.0-py313h46c70d0_2              conda-forge
+    brotli-python:               1.1.0-py311hfdbb021_2              conda-forge
     bzip2:                       1.0.8-h4bc722e_7                   conda-forge
     ca-certificates:             2025.1.31-hbcca054_0               conda-forge
     certifi:                     2024.12.14-pyhd8ed1ab_0            conda-forge
-    cffi:                        1.17.1-py313hfab6e84_0             conda-forge
+    cffi:                        1.17.1-py311hf29c0ef_0             conda-forge
     charset-normalizer:          3.4.1-pyhd8ed1ab_0                 conda-forge
     cuda-cccl_linux-64:          12.6.77-ha770c72_0                 conda-forge
     cuda-crt-dev_linux-64:       12.6.85-ha770c72_0                 conda-forge
@@ -67,8 +67,8 @@ The following NEW packages will be INSTALLED:
     liblzma:                     5.6.3-hb9d3cd8_1                   conda-forge
     libmagma:                    2.8.0-h566cb83_2                   conda-forge
     libmagma_sparse:             2.8.0-h0af6554_0                   conda-forge
-    libmpdec:                    4.0.0-h4bc722e_0                   conda-forge
     libnl:                       3.11.0-hb9d3cd8_0                  conda-forge
+    libnsl:                      2.0.1-hd590300_0                   conda-forge
     libnvjitlink:                12.6.85-hbd13f7d_0                 conda-forge
     libprotobuf:                 5.28.3-h6128344_1                  conda-forge
     libsqlite:                   3.48.0-hee588c1_1                  conda-forge
@@ -79,23 +79,24 @@ The following NEW packages will be INSTALLED:
     libudev1:                    257.2-h9a4d06a_0                   conda-forge
     libuuid:                     2.38.1-h0b41bf4_0                  conda-forge
     libuv:                       1.50.0-hb9d3cd8_0                  conda-forge
+    libxcrypt:                   4.4.36-hd590300_1                  conda-forge
     libzlib:                     1.3.1-hb9d3cd8_2                   conda-forge
     lz4-c:                       1.10.0-h5888daf_1                  conda-forge
     magma:                       2.8.0-h51420fd_0                   conda-forge
     nccl:                        2.25.1.1-ha44e49d_0                conda-forge
     ncurses:                     6.5-h2d0b736_3                     conda-forge
-    numpy:                       2.2.2-py313h17eae1a_0              conda-forge
+    numpy:                       2.0.2-py311h71ddf71_1              conda-forge
     nvtx-c:                      3.1.0-ha770c72_1                   conda-forge
     openssl:                     3.4.0-h7b32b05_1                   conda-forge
-    pip:                         25.0-pyh145f28c_0                  conda-forge
+    pip:                         25.0-pyh8b19718_0                  conda-forge
     pkg-config:                  0.29.2-h4bc722e_1009               conda-forge
     pybind11:                    2.13.6-pyh1ec8472_2                conda-forge
     pybind11-global:             2.13.6-pyh415d2e4_2                conda-forge
     pycparser:                   2.22-pyh29332c3_1                  conda-forge
     pysocks:                     1.7.1-pyha55dd90_7                 conda-forge
-    python:                      3.13.1-ha99a958_105_cp313          conda-forge
-    python_abi:                  3.13-5_cp313                       conda-forge
-    pyyaml:                      6.0.2-py313h8060acc_2              conda-forge
+    python:                      3.11.11-h9e4cc4f_1_cpython         conda-forge
+    python_abi:                  3.11-5_cp311                       conda-forge
+    pyyaml:                      6.0.2-py311h2dc5d0c_2              conda-forge
     rdma-core:                   55.0-h5888daf_0                    conda-forge
     readline:                    8.2-h8228510_1                     conda-forge
     requests:                    2.32.3-pyhd8ed1ab_1                conda-forge
@@ -106,8 +107,9 @@ The following NEW packages will be INSTALLED:
     typing_extensions:           4.12.2-pyha770c72_1                conda-forge
     tzdata:                      2025a-h78e105d_0                   conda-forge
     urllib3:                     2.3.0-pyhd8ed1ab_0                 conda-forge
+    wheel:                       0.45.1-pyhd8ed1ab_1                conda-forge
     yaml:                        0.2.5-h7f98852_2                   conda-forge
-    zstandard:                   0.23.0-py313h80202fe_1             conda-forge
+    zstandard:                   0.23.0-py311hbc35293_1             conda-forge
     zstd:                        1.5.6-ha6fb4c9_0                   conda-forge

 The following NEW packages will be INSTALLED:
@@ -157,8 +159,8 @@ The following NEW packages will be INSTALLED:
     libgcc-ng:                   14.2.0-h69a702a_1             conda-forge
     libgomp:                     14.2.0-h77fa898_1             conda-forge
     liblzma:                     5.6.3-hb9d3cd8_1              conda-forge
-    libmpdec:                    4.0.0-h4bc722e_0              conda-forge
     libnghttp2:                  1.64.0-h161d5f1_0             conda-forge
+    libnsl:                      2.0.1-hd590300_0              conda-forge
     libprotobuf:                 5.28.3-h6128344_1             conda-forge
     libsanitizer:                13.3.0-heb74ff8_1             conda-forge
     libsqlite:                   3.48.0-hee588c1_1             conda-forge
@@ -168,14 +170,15 @@ The following NEW packages will be INSTALLED:
     libstdcxx-ng:                14.2.0-h4852527_1             conda-forge
     libuuid:                     2.38.1-h0b41bf4_0             conda-forge
     libuv:                       1.50.0-hb9d3cd8_0             conda-forge
+    libxcrypt:                   4.4.36-hd590300_1             conda-forge
     libzlib:                     1.3.1-hb9d3cd8_2              conda-forge
     make:                        4.4.1-hb9d3cd8_2              conda-forge
     ncurses:                     6.5-h2d0b736_3                conda-forge
     ninja:                       1.12.1-h297d8ca_0             conda-forge
     openssl:                     3.4.0-h7b32b05_1              conda-forge
-    protobuf:                    5.28.3-py313h46c70d0_0        conda-forge
-    python:                      3.13.1-ha99a958_105_cp313     conda-forge
-    python_abi:                  3.13-5_cp313                  conda-forge
+    protobuf:                    5.28.3-py311hfdbb021_0        conda-forge
+    python:                      3.11.11-h9e4cc4f_1_cpython    conda-forge
+    python_abi:                  3.11-5_cp311                  conda-forge
     readline:                    8.2-h8228510_1                conda-forge
     rhash:                       1.4.5-hb9d3cd8_0              conda-forge
     sysroot_linux-64:            2.17-h0157908_18              conda-forge

Of course which python version of pytorch ends up getting built first in a megabuild is random, and some of the diff is due to that.

I did realize though that 2a0827b was not balanced between the two environments, so I'm going to fix that upon merging, and cross my fingers that that might fix it.

@h-vetinari
Copy link
Member Author

Sigh, this is the second time I see this - at first I thought it was just a flake:

+ OMP_NUM_THREADS=4
+ python -m pytest -n 2 test/test_autograd.py test/test_autograd_fallback.py test/test_custom_ops.py test/test_linalg.py test/test_mkldnn.py test/test_modules.py test/test_nn.py test/test_torch.py test/test_xnnpack_integration.py -k 'not ((TestTorch and test_print) or (TestAutograd and test_profiler_seq_nr) or (TestAutograd and test_profiler_propagation) or test_mutable_custom_op_fixed_layout or test_BCELoss_weights_no_reduce_cuda or test_ctc_loss_cudnn_tensor_cuda  or (TestTorch and test_index_add_correctness) or test_sdpa_inference_mode_aot_compile or (TestNN and test_grid_sample) or test_indirect_device_assert or (GPUTests and test_scatter_reduce2) or (TestLinalgCPU and test_inverse_errors_large_cpu) or test_reentrant_parent_error_on_cpu_cuda) or test_base_does_not_require_grad_mode_nothing or test_base_does_not_require_grad_mode_warn or test_composite_registered_to_cpu_mode_nothing)' -m 'not hypothesis' --durations=50
============================= test session starts ==============================
platform linux -- Python 3.11.11, pytest-8.3.4, pluggy-1.5.0
rootdir: $SRC_DIR
plugins: rerunfailures-15.0, hypothesis-6.125.1, flakefinder-1.1.0, xdist-3.6.1
created: 2/2 workers
workers [8992 items]

INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/_pytest/main.py", line 283, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>                          ^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/_pytest/main.py", line 337, in _main
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/pluggy/_hooks.py", line 513, in __call__
INTERNALERROR>     return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
INTERNALERROR>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/pluggy/_manager.py", line 120, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
INTERNALERROR>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/pluggy/_callers.py", line 139, in _multicall
INTERNALERROR>     raise exception.with_traceback(exception.__traceback__)
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/pluggy/_callers.py", line 122, in _multicall
INTERNALERROR>     teardown.throw(exception)  # type: ignore[union-attr]
INTERNALERROR>     ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/_pytest/logging.py", line 803, in pytest_runtestloop
INTERNALERROR>     return (yield)  # Run all the tests.
INTERNALERROR>             ^^^^^
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/pluggy/_callers.py", line 122, in _multicall
INTERNALERROR>     teardown.throw(exception)  # type: ignore[union-attr]
INTERNALERROR>     ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/_pytest/terminal.py", line 673, in pytest_runtestloop
INTERNALERROR>     result = yield
INTERNALERROR>              ^^^^^
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/pluggy/_callers.py", line 103, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>           ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/xdist/dsession.py", line 138, in pytest_runtestloop
INTERNALERROR>     self.loop_once()
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/xdist/dsession.py", line 163, in loop_once
INTERNALERROR>     call(**kwargs)
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/xdist/dsession.py", line 306, in worker_collectionfinish
INTERNALERROR>     self.sched.schedule()
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/xdist/scheduler/load.py", line 295, in schedule
INTERNALERROR>     self._send_tests(node, node_chunksize)
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/xdist/scheduler/load.py", line 307, in _send_tests
INTERNALERROR>     node.send_runtest_some(tests_per_node)
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/xdist/workermanage.py", line 355, in send_runtest_some
INTERNALERROR>     self.sendcommand("runtests", indices=indices)
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/xdist/workermanage.py", line 374, in sendcommand
INTERNALERROR>     self.channel.send((name, kwargs))
INTERNALERROR>   File "$PREFIX/lib/python3.11/site-packages/execnet/gateway_base.py", line 911, in send
INTERNALERROR>     raise OSError(f"cannot send to {self!r}")
INTERNALERROR> OSError: cannot send to <Channel id=3 closed>

@mgorny
Copy link
Contributor

mgorny commented Feb 3, 2025

Hmm, it may be a flake — in Gentoo I'm also seeing pytest-xdist occasionally crash with weird internal errors. I think it's an upstream bug, just extremely hard to reproduce.

@mgorny
Copy link
Contributor

mgorny commented Feb 3, 2025

I'm going to reproduce the cache problem locally, and see if I can figure something out.

@mgorny
Copy link
Contributor

mgorny commented Feb 3, 2025

Ok, I don't think it's "just" cache. FWICS only the CUDA objects get rebuilt and ccache doesn't match. Figuring out how to get verbose CMake output…

@mgorny
Copy link
Contributor

mgorny commented Feb 3, 2025

Ok, I'm seeing some weird things. For a start, for some reason the pytorch build doesn't get $BUILD_PREFIX, $SRC_DIR and so on substitutions in log. This isn't really a problem but makes comparing harder.

Anyway, for some reason CUDAFLAGS get applied to pytorch build but not to libtorch build. Which is really weird.

They definitely get set and are passed to CMake — but they don't appear in "CUDA flags" output in the "libtorch" part and aren't used in the ninja file. But they do appear and are used when reconfiguring for "pytorch".

set CUDNN_INCLUDE_DIR=%LIBRARY_PREFIX%\include

@REM turn off very noisy nvcc warnings
set "CUDAFLAGS=-w --ptxas-options=-w"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set "CUDAFLAGS=-w --ptxas-options=-w"
set "CMAKE_CUDA_FLAGS=-w --ptxas-options=-w"

I think it works. The new flags are passed in "libtorch" build, and according to diff that's the only change in CUDA invocations. I'll know for sure when libtorch recompiles and it starts building pytorch.

@mgorny
Copy link
Contributor

mgorny commented Feb 3, 2025

Ok, confirmed that pytorch outputs now get built without rebuilding .cu files.

@h-vetinari
Copy link
Member Author

Ok, confirmed that pytorch outputs now get built without rebuilding .cu files.

Thanks so much for this! I didn't really consider this possible because the variable is set in an identical way between libtorch and pytorch, but I guess unusual things are possible. 😅

@h-vetinari
Copy link
Member Author

OK, with the caching issue solved, and the test failure hopefully being a fluke, I'm going to call this "good enough" for merging now, to finally get those CMake fixes out the door. I'll open a PR for pytorch to see if we can get the ball rolling for this work upstream.

h-vetinari added a commit that referenced this pull request Feb 3, 2025
@h-vetinari h-vetinari merged commit 162a7eb into conda-forge:main Feb 3, 2025
26 of 27 checks passed
@h-vetinari h-vetinari deleted the cmake branch February 3, 2025 19:29
@mgorny
Copy link
Contributor

mgorny commented Feb 3, 2025

OK, with the caching issue solved, and the test failure hopefully being a fluke, I'm going to call this "good enough" for merging now, to finally get those CMake fixes out the door. I'll open a PR for pytorch to see if we can get the ball rolling for this work upstream.

I think you need to do that in .bat too, though.

@h-vetinari
Copy link
Member Author

I think you need to do that in .bat too, though.

Windows builds haven't changed behaviour (if anything, they seem to be a bit faster, despite adding ~30min of new test run time). I double-checked, and on windows the warnings already don't appear, so I didn't touch bld.bat for now. However, after your comment, I did check the logs even further back, and it seems that those ptxas warnings just never arise on windows anyway?

But if there are some caching issues to fix on windows, I'd be very glad for any support there.

@mgorny
Copy link
Contributor

mgorny commented Feb 3, 2025

But I do see it added there:

@REM turn off very noisy nvcc warnings
set "CUDAFLAGS=-w --ptxas-options=-w"

@h-vetinari
Copy link
Member Author

Yeah, no question that it's there, but it didn't have the same effect as on linux (neither w.r.t. to caching, nor w.r.t. to being actually necessary). In any case, I'm attempting removal within bld.bat in ab889b7

@h-vetinari
Copy link
Member Author

Gah, CMAKE_CUDA_FLAGS="-w --ptxas-options=-w" still doesn't suppress these ultra-spammy logs. It's really frustrating, because it makes the GHA live logs unusable, as it blows through their internal buffer (and the raw logs sometimes don't get updated for hours).

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Feb 4, 2025

Gah, CMAKE_CUDA_FLAGS="-w --ptxas-options=-w" still doesn't suppress these ultra-spammy logs. It's really frustrating, because it makes the GHA live logs unusable, as it blows through their internal buffer (and the raw logs sometimes don't get updated for hours).

I've sometimes resorted to sed filters for crazy commands that were too chatty.

@h-vetinari
Copy link
Member Author

wow, the merges here have been a bit cursed recently. both linux-64+CUDA jobs passed here before, and now we're getting a whole host of new failures:

20 new failures on linux-64 + CUDA + openblas
=========================== short test summary info ============================
FAILED [0.0052s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_16_n_16_use_transpose_a_False_use_transpose_b_False_cuda - AssertionError: AssertionError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_16_n_16_use_transpose_a_False_use_transpose_b_False_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0026s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_16_n_16_use_transpose_a_False_use_transpose_b_True_cuda - AssertionError: AssertionError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_16_n_16_use_transpose_a_False_use_transpose_b_True_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0023s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_16_n_16_use_transpose_a_True_use_transpose_b_False_cuda - AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 1 m 16 n 17 k 16 mat1_ld 16 mat2_ld 17 result_ld 16 abType 3 cType 10 computeType 72 scaleType 10"

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_16_n_16_use_transpose_a_True_use_transpose_b_False_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0020s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_16_n_16_use_transpose_a_True_use_transpose_b_True_cuda - AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 1 transpose_mat2 1 m 16 n 17 k 16 mat1_ld 16 mat2_ld 17 result_ld 16 abType 3 cType 10 computeType 72 scaleType 10"

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_16_n_16_use_transpose_a_True_use_transpose_b_True_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0025s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_16_n_32_use_transpose_a_False_use_transpose_b_False_cuda - AssertionError: AssertionError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_16_n_32_use_transpose_a_False_use_transpose_b_False_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0025s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_16_n_32_use_transpose_a_False_use_transpose_b_True_cuda - AssertionError: AssertionError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_16_n_32_use_transpose_a_False_use_transpose_b_True_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0020s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_16_n_32_use_transpose_a_True_use_transpose_b_False_cuda - AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 1 m 32 n 17 k 16 mat1_ld 32 mat2_ld 17 result_ld 32 abType 3 cType 10 computeType 72 scaleType 10"

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_16_n_32_use_transpose_a_True_use_transpose_b_False_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0020s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_16_n_32_use_transpose_a_True_use_transpose_b_True_cuda - AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 1 transpose_mat2 1 m 32 n 17 k 16 mat1_ld 16 mat2_ld 17 result_ld 32 abType 3 cType 10 computeType 72 scaleType 10"

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_16_n_32_use_transpose_a_True_use_transpose_b_True_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0025s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_32_n_16_use_transpose_a_False_use_transpose_b_False_cuda - AssertionError: AssertionError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_32_n_16_use_transpose_a_False_use_transpose_b_False_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0025s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_32_n_16_use_transpose_a_False_use_transpose_b_True_cuda - AssertionError: AssertionError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_32_n_16_use_transpose_a_False_use_transpose_b_True_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0020s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_32_n_16_use_transpose_a_True_use_transpose_b_False_cuda - AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 1 m 16 n 17 k 32 mat1_ld 16 mat2_ld 17 result_ld 16 abType 3 cType 10 computeType 72 scaleType 10"

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_32_n_16_use_transpose_a_True_use_transpose_b_False_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0020s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_32_n_16_use_transpose_a_True_use_transpose_b_True_cuda - AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 1 transpose_mat2 1 m 16 n 17 k 32 mat1_ld 32 mat2_ld 17 result_ld 16 abType 3 cType 10 computeType 72 scaleType 10"

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_32_n_16_use_transpose_a_True_use_transpose_b_True_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0026s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_32_n_32_use_transpose_a_False_use_transpose_b_False_cuda - AssertionError: AssertionError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_32_n_32_use_transpose_a_False_use_transpose_b_False_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0024s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_32_n_32_use_transpose_a_False_use_transpose_b_True_cuda - AssertionError: AssertionError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_32_n_32_use_transpose_a_False_use_transpose_b_True_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0022s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_32_n_32_use_transpose_a_True_use_transpose_b_False_cuda - AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 1 m 32 n 17 k 32 mat1_ld 32 mat2_ld 17 result_ld 32 abType 3 cType 10 computeType 72 scaleType 10"

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_32_n_32_use_transpose_a_True_use_transpose_b_False_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0035s] test/test_linalg.py::TestLinalgCUDA::test__int_mm_k_32_n_32_use_transpose_a_True_use_transpose_b_True_cuda - AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 1 transpose_mat2 1 m 32 n 17 k 32 mat1_ld 32 mat2_ld 17 result_ld 32 abType 3 cType 10 computeType 72 scaleType 10"

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test__int_mm_k_32_n_32_use_transpose_a_True_use_transpose_b_True_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0041s] test/test_linalg.py::TestLinalgCUDA::test_linalg_lstsq_input_checks_cuda_complex128 - AssertionError: RuntimeError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test_linalg_lstsq_input_checks_cuda_complex128

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0038s] test/test_linalg.py::TestLinalgCUDA::test_linalg_lstsq_input_checks_cuda_complex64 - AssertionError: RuntimeError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test_linalg_lstsq_input_checks_cuda_complex64

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0037s] test/test_linalg.py::TestLinalgCUDA::test_linalg_lstsq_input_checks_cuda_float32 - AssertionError: RuntimeError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test_linalg_lstsq_input_checks_cuda_float32

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0037s] test/test_linalg.py::TestLinalgCUDA::test_linalg_lstsq_input_checks_cuda_float64 - AssertionError: RuntimeError not raised

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCUDA.test_linalg_lstsq_input_checks_cuda_float64

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
= 20 failed, 13143 passed, 2595 skipped, 91 xfailed, 143216 warnings in 2559.23s (0:42:39) =

Looking at some of the errors

        if device != 'cpu' and cusolver_not_available:
            a = torch.rand(2, 3, dtype=dtype, device=device)
            b = torch.rand(2, 1, dtype=dtype, device=device)
>           with self.assertRaisesRegex(RuntimeError, r'only overdetermined systems'):
E           AssertionError: RuntimeError not raised

same for CUBLAS_STATUS_NOT_SUPPORTED here

>           with self.assertRaisesRegex(RuntimeError, "_int_mm_out_cuda not compiled for CUDA"):
E           AssertionError: "_int_mm_out_cuda not compiled for CUDA" does not match "CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 1 transpose_mat2 1 m 32 n 17 k 32 mat1_ld 32 mat2_ld 17 result_ld 32 abType 3 cType 10 computeType 72 scaleType 10"

At least the caching works again, but it's one step forward, two steps back. 😑

PS. Here's the diff between the passing run and the merge, it's incomprehensible to me how this could cause compilation errors:

--- a/recipe/build.sh
+++ b/recipe/build.sh
@@ -220,7 +220,7 @@ elif [[ ${cuda_compiler_version} != "None" ]]; then
     export MAGMA_HOME="${PREFIX}"
     export USE_MAGMA=1
     # turn off noisy nvcc warnings
-    export CUDAFLAGS="-w --ptxas-options=-w"
+    export CMAKE_CUDA_FLAGS="-w --ptxas-options=-w"
 else
     if [[ "$target_platform" != *-64 ]]; then
       # Breakpad seems to not work on aarch64 or ppc64le
@@ -253,7 +253,7 @@ case ${PKG_NAME} in
     cp build/CMakeCache.txt build/CMakeCache.txt.orig
     ;;
   pytorch)
-    $PREFIX/bin/python -m pip install . --no-deps --no-build-isolation -vvv --no-clean \
+    $PREFIX/bin/python -m pip install . --no-deps --no-build-isolation -v --no-clean \
         | sed "s,${CXX},\$\{CXX\},g" \
         | sed "s,${PREFIX},\$\{PREFIX\},g"
     # Keep this in ${PREFIX}/lib so that the library can be found by
diff --git a/recipe/meta.yaml b/recipe/meta.yaml
index e110190..e768515 100644
--- a/recipe/meta.yaml
+++ b/recipe/meta.yaml
@@ -334,6 +334,7 @@ outputs:
         - {{ pin_subpackage('libtorch', exact=True) }}
         - pybind11
         - eigen
+        - zlib
       run:
         - llvm-openmp    # [osx]
         - intel-openmp {{ mkl }}  # [win]

@h-vetinari
Copy link
Member Author

On MKL the same INTERNALERROR remained - so not flaky, after all. 🥲

@h-vetinari
Copy link
Member Author

On MKL the same INTERNALERROR remained - so not flaky, after all. 🥲

In fact, the linux-64 + CUDA + MKL build hasn't passed since dfadf15. The only real changes since then (aside from all the patches 0017 and onwards, which don't change much) was 9fcb3a7.

Here are the unix-relevant changes since dfadf15 (minus CMake tests, patches, & test skips)

diff --git a/recipe/build.sh b/recipe/build.sh
index 57044b0..22dde8f 100644
--- a/recipe/build.sh
+++ b/recipe/build.sh
@@ -1,9 +1,11 @@
 #!/bin/bash
 
-echo "=== Building ${PKG_NAME} (py: ${PY_VER}) ==="
-
 set -ex
 
+echo "#########################################################################"
+echo "Building ${PKG_NAME} (py: ${PY_VER}) using BLAS implementation $blas_impl"
+echo "#########################################################################"
+
 # This is used to detect if it's in the process of building pytorch
 export IN_PYTORCH_BUILD=1
 
@@ -20,9 +22,22 @@ rm -rf pyproject.toml
 export USE_CUFILE=0
 export USE_NUMA=0
 export USE_ITT=0
+
+#################### ADJUST COMPILER AND LINKER FLAGS #####################
+# Pytorch's build system doesn't like us setting the c++ standard through CMAKE_CXX_FLAGS
+# and will issue a warning.  We need to use at least C++17 to match the abseil ABI, see
+# https://github.com/conda-forge/abseil-cpp-feedstock/issues/45, which pytorch 2.5 uses already:
+# https://github.com/pytorch/pytorch/blob/v2.5.1/CMakeLists.txt#L36-L48
+export CXXFLAGS="$(echo $CXXFLAGS | sed 's/-std=c++[0-9][0-9]//g')"
+# The below three lines expose symbols that would otherwise be hidden or
+# optimised away. They were here before, so removing them would potentially
+# break users' programs
 export CFLAGS="$(echo $CFLAGS | sed 's/-fvisibility-inlines-hidden//g')"
 export CXXFLAGS="$(echo $CXXFLAGS | sed 's/-fvisibility-inlines-hidden//g')"
 export LDFLAGS="$(echo $LDFLAGS | sed 's/-Wl,--as-needed//g')"
+# The default conda LDFLAGs include -Wl,-dead_strip_dylibs, which removes all the
+# MKL sequential, core, etc. libraries, resulting in a "Symbol not found: _mkl_blas_caxpy"
+# error on osx-64.
 export LDFLAGS="$(echo $LDFLAGS | sed 's/-Wl,-dead_strip_dylibs//g')"
 export LDFLAGS_LD="$(echo $LDFLAGS_LD | sed 's/-dead_strip_dylibs//g')"
 if [[ "$c_compiler" == "clang" ]]; then
@@ -45,6 +60,7 @@ fi
 # can be imported on system without a GPU
 LDFLAGS="${LDFLAGS//-Wl,-z,now/-Wl,-z,lazy}"
 
+################ CONFIGURE CMAKE FOR CONDA ENVIRONMENT ###################
 export CMAKE_GENERATOR=Ninja
 export CMAKE_LIBRARY_PATH=$PREFIX/lib:$PREFIX/include:$CMAKE_LIBRARY_PATH
 export CMAKE_PREFIX_PATH=$PREFIX
@@ -73,6 +89,8 @@ export USE_SYSTEM_SLEEF=1
 # use our protobuf
 export BUILD_CUSTOM_PROTOBUF=OFF
 rm -rf $PREFIX/bin/protoc
+export USE_SYSTEM_PYBIND11=1
+export USE_SYSTEM_EIGEN_INSTALL=1
 
 # prevent six from being downloaded
 > third_party/NNPACK/cmake/DownloadSix.cmake
@@ -98,18 +116,29 @@ if [[ "${CI}" == "github_actions" ]]; then
     # reduce parallelism to avoid getting OOM-killed on
     # cirun-openstack-gpu-2xlarge, which has 32GB RAM, 8 CPUs
     export MAX_JOBS=4
-else
+elif [[ "${CI}" == "azure" ]]; then
     export MAX_JOBS=${CPU_COUNT}
-fi
-
-if [[ "$blas_impl" == "generic" ]]; then
-    # Fake openblas
-    export BLAS=OpenBLAS
-    export OpenBLAS_HOME=${PREFIX}
 else
-    export BLAS=MKL
+    # Leave a spare core for other tasks, per common practice.
+    # Reducing further can help with out-of-memory errors.
+    export MAX_JOBS=$((CPU_COUNT > 1 ? CPU_COUNT - 1 : 1))
 fi
 
+case "$blas_impl" in
+    "generic")
+        # Fake openblas
+        export BLAS=OpenBLAS
+        export OpenBLAS_HOME=${PREFIX}
+        ;;
+    "mkl")
+        export BLAS=MKL
+        ;;
+    *)
+        echo "[ERROR] Unsupported BLAS implementation '${blas_impl}'" >&2
+        exit 1
+        ;;
+esac
+
 if [[ "$PKG_NAME" == "pytorch" ]]; then
   # Trick Cmake into thinking python hasn't changed
   sed "s/3\.12/$PY_VER/g" build/CMakeCache.txt.orig > build/CMakeCache.txt
@@ -147,11 +176,9 @@ elif [[ ${cuda_compiler_version} != "None" ]]; then
     # all of them.
     export CUDAToolkit_BIN_DIR=${BUILD_PREFIX}/bin
     export CUDAToolkit_ROOT_DIR=${PREFIX}
-    if [[ "${target_platform}" != "${build_platform}" ]]; then
-        export CUDA_TOOLKIT_ROOT=${PREFIX}
-    fi
     # for CUPTI
     export CUDA_TOOLKIT_ROOT_DIR=${PREFIX}
+    export CUDAToolkit_ROOT=${PREFIX}
     case ${target_platform} in
         linux-64)
             export CUDAToolkit_TARGET_DIR=${PREFIX}/targets/x86_64-linux
@@ -163,12 +190,24 @@ elif [[ ${cuda_compiler_version} != "None" ]]; then
             echo "unknown CUDA arch, edit build.sh"
             exit 1
     esac
+
+    # Compatibility matrix for update: https://en.wikipedia.org/wiki/CUDA#GPUs_supported
+    # Warning from pytorch v1.12.1: In the future we will require one to
+    # explicitly pass TORCH_CUDA_ARCH_LIST to cmake instead of implicitly
+    # setting it as an env variable.
+    # Doing this is nontrivial given that we're using setup.py as an entry point, but should
+    # be addressed to pre-empt upstream changing it, as it probably won't result in a failed
+    # configuration.
+    #
+    # See:
+    # https://pytorch.org/docs/stable/cpp_extension.html (Compute capabilities)
+    # https://github.com/pytorch/pytorch/blob/main/.ci/manywheel/build_cuda.sh
     case ${cuda_compiler_version} in
-        12.6)
+        12.[0-6])
             export TORCH_CUDA_ARCH_LIST="5.0;6.0;6.1;7.0;7.5;8.0;8.6;8.9;9.0+PTX"
             ;;
         *)
-            echo "unsupported cuda version. edit build.sh"
+            echo "No CUDA architecture list exists for CUDA v${cuda_compiler_version}. See build.sh for information on adding one."
             exit 1
     esac
     export TORCH_NVCC_FLAGS="-Xfatbin -compress-all"
@@ -180,6 +219,8 @@ elif [[ ${cuda_compiler_version} != "None" ]]; then
     export USE_STATIC_CUDNN=0
     export MAGMA_HOME="${PREFIX}"
     export USE_MAGMA=1
+    # turn off noisy nvcc warnings
+    export CMAKE_CUDA_FLAGS="-w --ptxas-options=-w"
 else
     if [[ "$target_platform" != *-64 ]]; then
       # Breakpad seems to not work on aarch64 or ppc64le
@@ -203,7 +244,8 @@ case ${PKG_NAME} in
 
     mv build/lib.*/torch/bin/* ${PREFIX}/bin/
     mv build/lib.*/torch/lib/* ${PREFIX}/lib/
-    mv build/lib.*/torch/share/* ${PREFIX}/share/
+    # need to merge these now because we're using system pybind11, meaning the destination directory is not empty
+    rsync -a build/lib.*/torch/share/* ${PREFIX}/share/
     mv build/lib.*/torch/include/{ATen,caffe2,tensorpipe,torch,c10} ${PREFIX}/include/
     rm ${PREFIX}/lib/libtorch_python.*
 
@@ -211,7 +253,7 @@ case ${PKG_NAME} in
     cp build/CMakeCache.txt build/CMakeCache.txt.orig
     ;;
   pytorch)
-    $PREFIX/bin/python -m pip install . --no-deps -vvv --no-clean \
+    $PREFIX/bin/python -m pip install . --no-deps --no-build-isolation -v --no-clean \
         | sed "s,${CXX},\$\{CXX\},g" \
         | sed "s,${PREFIX},\$\{PREFIX\},g"
     # Keep this in ${PREFIX}/lib so that the library can be found by
diff --git a/recipe/meta.yaml b/recipe/meta.yaml
index d5fc48f..e1c2a2d 100644
--- a/recipe/meta.yaml
+++ b/recipe/meta.yaml
@@ -1,7 +1,10 @@
 # if you wish to build release candidate number X, append the version string with ".rcX"
 {% set version = "2.5.1" %}
-{% set build = 10 %}
+{% set build = 12 %}
 
+# Use a higher build number for the CUDA variant, to ensure that it's
+# preferred by conda's solver, and it's preferentially
+# installed where the platform supports it.
 {% if cuda_compiler_version != "None" %}
 {% set build = build + 200 %}
 {% endif %}
@@ -64,6 +67,13 @@ source:
     - patches/0015-simplify-torch.utils.cpp_extension.include_paths-use.patch
     # point to headers that are now living in $PREFIX/include instead of $SP_DIR/torch/include
     - patches/0016-point-include-paths-to-PREFIX-include.patch
+    - patches/0017-Add-conda-prefix-to-inductor-include-paths.patch
+    - patches/0018-make-ATEN_INCLUDE_DIR-relative-to-TORCH_INSTALL_PREF.patch
+    - patches/0019-remove-DESTINATION-lib-from-CMake-install-TARGETS-di.patch                       # [win]
+    - patches/0020-make-library-name-in-test_mutable_custom_op_fixed_la.patch
+    - patches/0021-avoid-deprecated-find_package-CUDA-in-caffe2-CMake-m.patch
+    - patches_submodules/fbgemm/0001-remove-DESTINATION-lib-from-CMake-install-directives.patch     # [win]
+    - patches_submodules/tensorpipe/0001-switch-away-from-find_package-CUDA.patch
 
 build:
   number: {{ build }}
@@ -117,6 +127,7 @@ requirements:
     - protobuf
     - make      # [linux]
     - sccache   # [win]
+    - rsync     # [unix]
   host:
     # GPU requirements
     - cudnn                           # [cuda_compiler_version != "None"]
@@ -167,6 +178,9 @@ requirements:
     - libuv
     - pkg-config  # [unix]
     - typing_extensions
+    - pybind11
+    - eigen
+    - zlib
   run:
     # GPU requirements without run_exports
     - {{ pin_compatible('cudnn') }}                       # [cuda_compiler_version != "None"]
@@ -299,6 +330,9 @@ outputs:
         - pkg-config  # [unix]
         - typing_extensions
         - {{ pin_subpackage('libtorch', exact=True) }}
+        - pybind11
+        - eigen
+        - zlib
       run:
         - llvm-openmp    # [osx]
         - intel-openmp {{ mkl }}  # [win]
@@ -314,6 +348,7 @@ outputs:
         - filelock
         - jinja2
         - networkx
+        - pybind11
         - nomkl                 # [blas_impl != "mkl"]
         - fsspec
         # avoid that people without GPUs needlessly download ~0.5-1GB
@@ -335,6 +370,8 @@ outputs:
       requires:
         - {{ compiler('c') }}
         - {{ compiler('cxx') }}
+        # for torch.compile tests
+        - {{ compiler('cuda') }}       # [cuda_compiler_version != "None"]
         - ninja
         - boto3
         - hypothesis

@h-vetinari
Copy link
Member Author

I had restarted the openblas job, and this time the test suite simply hung indefinitely (note timestamps):

2025-02-04T23:43:14.0625697Z ........................................s............................... [ 84%]
2025-02-05T06:11:05.4560386Z ##[error]The operation was canceled.

@h-vetinari h-vetinari mentioned this pull request Feb 5, 2025
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Feb 5, 2025

can we please revert much of the added testing?

Ensuring that scientific software passes on CIs is a job of its own.

I think we can have very abbreviated tests that mostly ensure correct linkages. So trying to load as many libraries as possible and ensure none have dangling links to missing SO files is likely enough.

@h-vetinari
Copy link
Member Author

can we please revert much of the added testing?

The only recently-added testing (for 2.5) was

{% set tests = tests ~ " test/inductor/test_torchinductor.py" %} # [py==312 and not aarch64]

which found some issues with the torch.compile setup, and that is a feature that I'd like to keep working (and thus tested), at least going forward (we can remove some tests for 2.5 just for the sake of publishing something, but I don't feel great about that, though OTOH the failures above occurred outside of test_torchinductor).

With conda-forge taking over from the pytorch channel, I'd like to test more than the bare minimum. I don't want users running into something like CUBLAS_STATUS_NOT_SUPPORTED and wondering WTH is going on. The ~15min test suite per python version seems like a reasonable middle ground IMO, and it was running fine for #331.

It was running fine in this PR as well until 162a7eb (minus the build cache issue and the pytest-internal crash). I'm starting to think that e1f50ac from #318 might have something to with all that. I will try that next.

Finally, I did take your input and removed the smoke test in #326.

@h-vetinari
Copy link
Member Author

I'm starting to think that e1f50ac from #318 might have something to with all that. I will try that next.

As an update, I tried reverting the unvendoring of pybind and the non-isolation changes in #344, and it still fails with the same pytorch error, i.e.

INTERNALERROR>   File "$PREFIX/lib/python3.12/site-packages/xdist/workermanage.py", line 374, in sendcommand
INTERNALERROR>     self.channel.send((name, kwargs))
INTERNALERROR>   File "$PREFIX/lib/python3.12/site-packages/execnet/gateway_base.py", line 911, in send
INTERNALERROR>     raise OSError(f"cannot send to {self!r}")
INTERNALERROR> OSError: cannot send to <Channel id=3 closed>

I double-checked the pytest versions, and there's no difference either between passing:

    execnet:                     2.1.1-pyhd8ed1ab_1                   conda-forge
    [...]
    pytest:                      8.3.4-pyhd8ed1ab_1                   conda-forge
    pytest-flakefinder:          1.1.0-pyh29332c3_2                   conda-forge
    pytest-rerunfailures:        15.0-pyhd8ed1ab_1                    conda-forge
    pytest-xdist:                3.6.1-pyhd8ed1ab_1                   conda-forge
    python:                      3.12.8-h9e4cc4f_1_cpython            conda-forge
    python-dateutil:             2.9.0.post0-pyhff2d567_1             conda-forge
    python_abi:                  3.12-5_cp312                         conda-forge
    pytorch:                     2.5.1-cuda126_mkl_py312_hdbe889e_310 local

and failing

    execnet:                     2.1.1-pyhd8ed1ab_1                   conda-forge
    [...]
    pytest:                      8.3.4-pyhd8ed1ab_1                   conda-forge
    pytest-flakefinder:          1.1.0-pyh29332c3_2                   conda-forge
    pytest-rerunfailures:        15.0-pyhd8ed1ab_1                    conda-forge
    pytest-xdist:                3.6.1-pyhd8ed1ab_1                   conda-forge
    python:                      3.12.8-h9e4cc4f_1_cpython            conda-forge
    python-dateutil:             2.9.0.post0-pyhff2d567_1             conda-forge
    python_abi:                  3.12-5_cp312                         conda-forge
    pytorch:                     2.5.1-cuda126_mkl_py312_hdbe889e_312 local

I've started a last-ditch resort to checking whether a hard-reset back to the last passing commit still passes in #345

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fails to run find_package(Torch) on Windows with libtorch package
7 participants