Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test if last passing run can be reproduced #345

Closed
wants to merge 1 commit into from

Conversation

h-vetinari
Copy link
Member

@h-vetinari h-vetinari commented Feb 6, 2025

#344 tried to reduce the diff to the last passing run (dfadf15), but still runs into the same issue with pytest.

As a final check, take no shortcuts and simply run CI again for the last passing build; not a hair different (no tests, no skips, no comments, no nothing), just a hard reset.

More concretely all the linux-64 + CUDA + MKL builds are failing with

INTERNALERROR>   File "$PREFIX/lib/python3.12/site-packages/xdist/workermanage.py", line 374, in sendcommand
INTERNALERROR>     self.channel.send((name, kwargs))
INTERNALERROR>   File "$PREFIX/lib/python3.12/site-packages/execnet/gateway_base.py", line 911, in send
INTERNALERROR>     raise OSError(f"cannot send to {self!r}")
INTERNALERROR> OSError: cannot send to <Channel id=3 closed>

I double-checked the pytest versions, and there's no difference either between passing:

    execnet:                     2.1.1-pyhd8ed1ab_1                   conda-forge
    [...]
    pytest:                      8.3.4-pyhd8ed1ab_1                   conda-forge
    pytest-flakefinder:          1.1.0-pyh29332c3_2                   conda-forge
    pytest-rerunfailures:        15.0-pyhd8ed1ab_1                    conda-forge
    pytest-xdist:                3.6.1-pyhd8ed1ab_1                   conda-forge
    python:                      3.12.8-h9e4cc4f_1_cpython            conda-forge
    python-dateutil:             2.9.0.post0-pyhff2d567_1             conda-forge
    python_abi:                  3.12-5_cp312                         conda-forge
    pytorch:                     2.5.1-cuda126_mkl_py312_hdbe889e_310 local

and failing

    execnet:                     2.1.1-pyhd8ed1ab_1                   conda-forge
    [...]
    pytest:                      8.3.4-pyhd8ed1ab_1                   conda-forge
    pytest-flakefinder:          1.1.0-pyh29332c3_2                   conda-forge
    pytest-rerunfailures:        15.0-pyhd8ed1ab_1                    conda-forge
    pytest-xdist:                3.6.1-pyhd8ed1ab_1                   conda-forge
    python:                      3.12.8-h9e4cc4f_1_cpython            conda-forge
    python-dateutil:             2.9.0.post0-pyhff2d567_1             conda-forge
    python_abi:                  3.12-5_cp312                         conda-forge
    pytorch:                     2.5.1-cuda126_mkl_py312_hdbe889e_312 local

The full diff between the test environments from the last passing run to the one in #344 is quite massive though.

@@ -2,16 +2,19 @@ The following NEW packages will be INSTALLED:

     _libgcc_mutex:               0.1-conda_forge                      conda-forge
     _openmp_mutex:               4.5-2_kmp_llvm                       conda-forge
+    adwaita-icon-theme:          47.0-unix_0                          conda-forge
+    at-spi2-atk:                 2.38.0-h0630a04_3                    conda-forge
+    at-spi2-core:                2.40.3-h0630a04_0                    conda-forge
     atk-1.0:                     2.38.0-h04ea711_2                    conda-forge
     attr:                        2.5.1-h166bdaf_1                     conda-forge
     attrs:                       25.1.0-pyh71513ae_0                  conda-forge
     binutils_impl_linux-64:      2.43-h4bf12b8_2                      conda-forge
     binutils_linux-64:           2.43-h4852527_2                      conda-forge
-    boto3:                       1.36.6-pyhd8ed1ab_0                  conda-forge
-    botocore:                    1.36.6-pyge310_1234567_0             conda-forge
+    boto3:                       1.36.13-pyhd8ed1ab_0                 conda-forge
+    botocore:                    1.36.13-pyge310_1234567_0            conda-forge
     brotli-python:               1.1.0-py312h2ec8cdc_2                conda-forge
     bzip2:                       1.0.8-h4bc722e_7                     conda-forge
-    ca-certificates:             2024.12.14-hbcca054_0                conda-forge
+    ca-certificates:             2025.1.31-hbcca054_0                 conda-forge
     cairo:                       1.18.2-h3394656_1                    conda-forge
     cffi:                        1.17.1-py312h06ac9bb_0               conda-forge
     click:                       8.1.8-pyh707e725_0                   conda-forge
@@ -29,7 +32,6 @@ The following NEW packages will be INSTALLED:
     cuda-cuobjdump:              12.6.77-hbd13f7d_1                   conda-forge
     cuda-cupti:                  12.6.80-hbd13f7d_0                   conda-forge
     cuda-driver-dev_linux-64:    12.6.77-h3f2d84a_0                   conda-forge
-    cuda-nvcc:                   12.6.85-hcdd1206_0                   conda-forge
     cuda-nvcc-dev_linux-64:      12.6.85-he91c749_0                   conda-forge
     cuda-nvcc-impl:              12.6.85-h85509e4_0                   conda-forge
     cuda-nvcc-tools:             12.6.85-he02047a_0                   conda-forge
@@ -42,9 +44,12 @@ The following NEW packages will be INSTALLED:
     cuda-nvvm-tools:             12.6.85-he02047a_0                   conda-forge
     cuda-version:                12.6-h7480c83_3                      conda-forge
     cudnn:                       9.3.0.75-h62a6f1c_2                  conda-forge
-    cusparselt:                  0.6.3.2-hdea8103_1                   conda-forge
+    cusparselt:                  0.7.0.0-hcd2ec93_0                   conda-forge
+    dbus:                        1.13.6-h5008d03_3                    conda-forge
+    epoxy:                       1.5.10-h166bdaf_1                    conda-forge
     exceptiongroup:              1.2.2-pyhd8ed1ab_1                   conda-forge
     execnet:                     2.1.1-pyhd8ed1ab_1                   conda-forge
+    expat:                       2.6.4-h5888daf_0                     conda-forge
     expecttest:                  0.3.0-pyhd8ed1ab_0                   conda-forge
     filelock:                    3.17.0-pyhd8ed1ab_0                  conda-forge
     font-ttf-dejavu-sans-mono:   2.37-hab24e00_0                      conda-forge
@@ -56,42 +61,48 @@ The following NEW packages will be INSTALLED:
     fonts-conda-forge:           1-0                                  conda-forge
     freetype:                    2.12.1-h267a509_2                    conda-forge
     fribidi:                     1.0.10-h36c2ea0_0                    conda-forge
-    fsspec:                      2024.12.0-pyhd8ed1ab_0               conda-forge
+    fsspec:                      2025.2.0-pyhd8ed1ab_0                conda-forge
     gcc_impl_linux-64:           13.3.0-hfea6d02_1                    conda-forge
     gcc_linux-64:                13.3.0-hc28eda2_7                    conda-forge
     gdk-pixbuf:                  2.42.12-hb9ae30d_0                   conda-forge
+    glib-tools:                  2.82.2-h4833e2c_1                    conda-forge
     gmp:                         6.3.0-hac33072_2                     conda-forge
     gmpy2:                       2.1.5-py312h7201bc8_3                conda-forge
     graphite2:                   1.3.13-h59595ed_1003                 conda-forge
-    graphviz:                    12.0.0-hba01fac_0                    conda-forge
-    gtk2:                        2.24.33-h8ee276e_7                   conda-forge
+    graphviz:                    12.2.1-h5ae0cbf_1                    conda-forge
+    gtk3:                        3.24.43-h021d004_3                   conda-forge
     gts:                         0.7.6-h977cf35_4                     conda-forge
     gxx_impl_linux-64:           13.3.0-hdbfa832_1                    conda-forge
     gxx_linux-64:                13.3.0-h6834431_7                    conda-forge
-    h2:                          4.1.0-pyhd8ed1ab_1                   conda-forge
+    h2:                          4.2.0-pyhd8ed1ab_0                   conda-forge
     harfbuzz:                    10.2.0-h4bba637_0                    conda-forge
+    hicolor-icon-theme:          0.17-ha770c72_2                      conda-forge
     hpack:                       4.1.0-pyhd8ed1ab_0                   conda-forge
     hyperframe:                  6.1.0-pyhd8ed1ab_0                   conda-forge
-    hypothesis:                  6.124.7-pyha770c72_0                 conda-forge
+    hypothesis:                  6.125.1-pyha770c72_0                 conda-forge
     icu:                         75.1-he02047a_0                      conda-forge
     iniconfig:                   2.0.0-pyhd8ed1ab_1                   conda-forge
     jinja2:                      3.1.5-pyhd8ed1ab_0                   conda-forge
     jmespath:                    1.0.1-pyhd8ed1ab_1                   conda-forge
     kernel-headers_linux-64:     3.10.0-he073ed8_18                   conda-forge
+    keyutils:                    1.6.1-h166bdaf_0                     conda-forge
+    krb5:                        1.21.3-h659f571_0                    conda-forge
     ld_impl_linux-64:            2.43-h712a8e2_2                      conda-forge
     lerc:                        4.0.0-h27087fc_0                     conda-forge
     libabseil:                   20240722.0-cxx17_hbbce691_4          conda-forge
-    libblas:                     3.9.0-26_linux64_mkl                 conda-forge
+    libblas:                     3.9.0-28_h2556b6b_mkl                conda-forge
     libcap:                      2.71-h39aace5_0                      conda-forge
-    libcblas:                    3.9.0-26_linux64_mkl                 conda-forge
-    libcublas:                   12.6.4.1-hbd13f7d_0                  conda-forge
+    libcblas:                    3.9.0-28_h372d94f_mkl                conda-forge
+    libcublas:                   12.6.4.1-h5888daf_1                  conda-forge
     libcudss0:                   0.4.0.2-he55f5cd_2                   conda-forge
     libcufft:                    11.3.0.4-hbd13f7d_0                  conda-forge
     libcufile:                   1.11.1.6-h12f29b5_4                  conda-forge
+    libcups:                     2.3.3-h4637d8d_4                     conda-forge
     libcurand:                   10.3.7.77-hbd13f7d_0                 conda-forge
-    libcusolver:                 11.7.1.2-hbd13f7d_0                  conda-forge
+    libcusolver:                 11.7.1.2-h5888daf_1                  conda-forge
     libcusparse:                 12.5.4.2-hbd13f7d_0                  conda-forge
     libdeflate:                  1.23-h4ddbbb0_0                      conda-forge
+    libedit:                     3.1.20250104-pl5321h7949ede_0        conda-forge
     libexpat:                    2.6.4-h5888daf_0                     conda-forge
     libffi:                      3.4.2-h7f98852_5                     conda-forge
     libgcc:                      14.2.0-h77fa898_1                    conda-forge
@@ -105,9 +116,9 @@ The following NEW packages will be INSTALLED:
     libhwloc:                    2.11.2-default_h0d58e46_1001         conda-forge
     libiconv:                    1.17-hd590300_2                      conda-forge
     libjpeg-turbo:               3.0.0-hd590300_1                     conda-forge
-    liblapack:                   3.9.0-26_linux64_mkl                 conda-forge
+    liblapack:                   3.9.0-28_hc41d3b0_mkl                conda-forge
     libllvm19:                   19.1.7-ha7bfdaf_1                    conda-forge
-    liblzma:                     5.6.3-hb9d3cd8_1                     conda-forge
+    liblzma:                     5.6.4-hb9d3cd8_0                     conda-forge
     libmagma:                    2.8.0-h566cb83_2                     conda-forge
     libnl:                       3.11.0-hb9d3cd8_0                    conda-forge
     libnsl:                      2.0.1-hd590300_0                     conda-forge
@@ -122,13 +133,14 @@ The following NEW packages will be INSTALLED:
     libstdcxx-ng:                14.2.0-h4852527_1                    conda-forge
     libsystemd0:                 257.2-h3dc2cb9_0                     conda-forge
     libtiff:                     4.7.0-hd9ff511_3                     conda-forge
-    libtorch:                    2.5.1-cuda126_mkl_haa0cf67_310       local
+    libtorch:                    2.5.1-cuda126_mkl_haa0cf67_312       local
     libudev1:                    257.2-h9a4d06a_0                     conda-forge
     libuuid:                     2.38.1-h0b41bf4_0                    conda-forge
     libuv:                       1.50.0-hb9d3cd8_0                    conda-forge
     libwebp-base:                1.5.0-h851e524_0                     conda-forge
     libxcb:                      1.17.0-h8a09558_0                    conda-forge
     libxcrypt:                   4.4.36-hd590300_1                    conda-forge
+    libxkbcommon:                1.8.0-hc4a0caf_0                     conda-forge
     libxml2:                     2.13.5-h8d12d68_1                    conda-forge
     libzlib:                     1.3.1-hb9d3cd8_2                     conda-forge
     llvm-openmp:                 19.1.7-h024ca30_0                    conda-forge
@@ -138,8 +150,8 @@ The following NEW packages will be INSTALLED:
     mpc:                         1.3.1-h24ddda3_1                     conda-forge
     mpfr:                        4.2.1-h90cbb55_3                     conda-forge
     mpmath:                      1.3.0-pyhd8ed1ab_1                   conda-forge
-    nccl:                        2.24.3.1-hb92ee24_0                  conda-forge
-    ncurses:                     6.5-h2d0b736_2                       conda-forge
+    nccl:                        2.25.1.1-ha44e49d_0                  conda-forge
+    ncurses:                     6.5-h2d0b736_3                       conda-forge
     networkx:                    3.4.2-pyh267e887_2                   conda-forge
     ninja:                       1.12.1-h297d8ca_0                    conda-forge
     numpy:                       2.2.2-py312h72c5963_0                conda-forge
@@ -147,7 +159,7 @@ The following NEW packages will be INSTALLED:
     packaging:                   24.2-pyhd8ed1ab_2                    conda-forge
     pango:                       1.56.1-h861ebed_0                    conda-forge
     pcre2:                       10.44-hba22ea6_2                     conda-forge
-    pip:                         24.3.1-pyh8b19718_2                  conda-forge
+    pip:                         25.0-pyh8b19718_0                    conda-forge
     pixman:                      0.44.2-h29eaf8c_0                    conda-forge
     pluggy:                      1.5.0-pyhd8ed1ab_1                   conda-forge
     pthread-stubs:               0.4-hb9d3cd8_1002                    conda-forge
@@ -162,32 +174,42 @@ The following NEW packages will be INSTALLED:
     python:                      3.12.8-h9e4cc4f_1_cpython            conda-forge
     python-dateutil:             2.9.0.post0-pyhff2d567_1             conda-forge
     python_abi:                  3.12-5_cp312                         conda-forge
-    pytorch:                     2.5.1-cuda126_mkl_py312_hdbe889e_310 local
+    pytorch:                     2.5.1-cuda126_mkl_py312_hdbe889e_312 local
     rdma-core:                   55.0-h5888daf_0                      conda-forge
     readline:                    8.2-h8228510_1                       conda-forge
     s3transfer:                  0.11.2-pyhd8ed1ab_0                  conda-forge
     setuptools:                  75.8.0-pyhff2d567_0                  conda-forge
     six:                         1.17.0-pyhd8ed1ab_0                  conda-forge
-    sleef:                       3.7-h1b44611_2                       conda-forge
-    sortedcontainers:            2.4.0-pyhd8ed1ab_0                   conda-forge
+    sleef:                       3.8-h1b44611_0                       conda-forge
+    sortedcontainers:            2.4.0-pyhd8ed1ab_1                   conda-forge
     sympy:                       1.13.3-pyh2585a3b_105                conda-forge
     sysroot_linux-64:            2.17-h0157908_18                     conda-forge
     tabulate:                    0.9.0-pyhd8ed1ab_2                   conda-forge
     tbb:                         2021.13.0-hceb3a55_1                 conda-forge
     tk:                          8.6.13-noxft_h4845f30_101            conda-forge
     tomli:                       2.2.1-pyhd8ed1ab_1                   conda-forge
-    triton:                      3.1.0-cuda126py312h776fbae_5         conda-forge
+    triton:                      3.1.0-cuda126py312h776fbae_6         conda-forge
     typing_extensions:           4.12.2-pyha770c72_1                  conda-forge
     tzdata:                      2025a-h78e105d_0                     conda-forge
     urllib3:                     2.3.0-pyhd8ed1ab_0                   conda-forge
+    wayland:                     1.23.1-h3e06ad9_0                    conda-forge
     wheel:                       0.45.1-pyhd8ed1ab_1                  conda-forge
+    xkeyboard-config:            2.43-hb9d3cd8_0                      conda-forge
     xmlrunner:                   1.7.7-py_0                           conda-forge
     xorg-libice:                 1.1.2-hb9d3cd8_0                     conda-forge
     xorg-libsm:                  1.2.5-he73a12e_0                     conda-forge
-    xorg-libx11:                 1.8.10-h4f16b4b_1                    conda-forge
+    xorg-libx11:                 1.8.11-h4f16b4b_0                    conda-forge
     xorg-libxau:                 1.0.12-hb9d3cd8_0                    conda-forge
+    xorg-libxcomposite:          0.4.6-hb9d3cd8_2                     conda-forge
+    xorg-libxcursor:             1.2.3-hb9d3cd8_0                     conda-forge
+    xorg-libxdamage:             1.1.6-hb9d3cd8_0                     conda-forge
     xorg-libxdmcp:               1.1.5-hb9d3cd8_0                     conda-forge
     xorg-libxext:                1.3.6-hb9d3cd8_0                     conda-forge
+    xorg-libxfixes:              6.0.1-hb9d3cd8_0                     conda-forge
+    xorg-libxi:                  1.8.2-hb9d3cd8_0                     conda-forge
+    xorg-libxinerama:            1.1.5-h5888daf_1                     conda-forge
+    xorg-libxrandr:              1.5.4-hb9d3cd8_0                     conda-forge
     xorg-libxrender:             0.9.12-hb9d3cd8_0                    conda-forge
+    xorg-libxtst:                1.2.5-hb9d3cd8_3                     conda-forge
     zstandard:                   0.23.0-py312hef9b889_1               conda-forge
     zstd:                        1.5.6-ha6fb4c9_0                     conda-forge

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Feb 6, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13174999474. Examine the logs at this URL for more detail.

@h-vetinari
Copy link
Member Author

Incredibly, this really seems to be MKL-specific somehow, as the openblas builds in #326 passed, while the MKL builds ran into the pytest error (same situation as the CI after merging #340).

Whatever "Channels" execnet is trying to use might somehow be getting occupied by MKL?

OSError: cannot send to <Channel id=3 closed>

CC @conda-forge/pytorch-cpu @mgorny @danpetry @rgommers @isuruf literally any ideas on what could be causing this interaction would be welcome.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Feb 6, 2025

have you tried to uninstall pytest-xdist?

@danpetry
Copy link
Contributor

danpetry commented Feb 6, 2025

try running without parallel testing?

@danpetry
Copy link
Contributor

danpetry commented Feb 6, 2025

basically same thing

@danpetry
Copy link
Contributor

danpetry commented Feb 6, 2025

or maybe getting some more verbose logs from pytest-xdist to see why channel 3 is closing? OOM issue maybe...?

@jakirkham
Copy link
Member

Agree with others. Would start with cutting everything test-wise to 1 thread & 1 process

OOM may just be another way of saying oversubscription from parallelism

Also other BLAS libraries have their own kinds of parallelism that may need to be disabled. Usually this can be set with an environment variable

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Feb 7, 2025

On my wimpy machine, i have to set:

OMP_NUM_THREADS=2
MKL_NUM_THREADS=2

to avoid using "energy efficient" cores on my laptop so that pytorch actually runs faster... might help here.

@h-vetinari
Copy link
Member Author

Sure, I'm trying the reduced parallelism route (#346), but that's no explanation why the very same parallel invocation stopped working, much less only on MKL together with CUDA on linux.

(Win+CUDA+MKL is fine, linux+CUDA+openblas is fine, linux+CPU+MKL is fine)

@danpetry
Copy link
Contributor

danpetry commented Feb 7, 2025

I was actually thinking OOM (out of RAM) rather than out of threads, but just an idea. Apparently MKL uses more RAM than openBLAS. Wonder if the failure is deterministic, i.e. is it same test each time?

@h-vetinari
Copy link
Member Author

Wonder if the failure is deterministic, i.e. is it same test each time?

It has been deterministic, but in the test collection phase (rather than for any identifiable individual test), which makes it implausible to me that it's due to an OOM. In any case, I opened #348 so we can centralize the discussion on this that's become scattered over a bunch of PRs.

If the removal of pytest-xdist ends up working, I'll merge the respective PRs for 2.5 & 2.6, and we can come back to figuring out this problem with less time pressure (because going back to the last working commit did in fact turn the CI green again, see #348).

@h-vetinari h-vetinari closed this Feb 9, 2025
@h-vetinari h-vetinari deleted the big_hammer branch February 9, 2025 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants