pytorch v2.6.0 #326

h-vetinari · 2025-01-18T13:41:55Z

Build the release candidates

Linux CI cancelled until builds for #322 are live

…nda-forge-pinning 2025.01.18.07.29.32

conda-forge-admin · 2025-01-18T13:43:30Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13297069508. Examine the logs at this URL for more detail.}

h-vetinari · 2025-01-18T22:17:28Z

This looks better than expected so far. Still have to double check dependency changes. Happy if someone could do that (even if it's just noting which bounds changed relative to the current recipe)

See pytorch/pytorch@2398e75 Partially reverts commit 46d7b46.

recipe/meta.yaml

h-vetinari · 2025-01-19T21:26:48Z

Aarch builds fail with

$SRC_DIR/third_party/XNNPACK/src/reference/unary-elementwise.cc:125:14: error: invalid 'static_cast' from type 'xnn_bfloat16' to type '_Float16'
  125 |       return static_cast<TOut>(x);
      |              ^~~~~~~~~~~~~~~~~~~~

don't rely on PKG_BUILDNUM resolving this correctly, which is either racy, or implicitly depends on a separate render pass after setting build.number

h-vetinari · 2025-01-20T07:29:41Z

Sigh, since when is conda-build applying patches through git rather than through patch? The former is stricter than the latter, and doesn't work in some situations (like applying a patch in one of the submodules):

Applying patch: /Users/runner/work/1/s/recipe/patches_submodules/0001-Fix-bazel-linux-aarch64-gcc13-workflow-and-resolve-a.patch
Applying: Fix `bazel-linux-aarch64-gcc13` workflow and resolve accompanying build errors.
error: sha1 information is lacking or useless (third_party/XNNPACK/src/reference/unary-elementwise.cc).
error: could not build fake ancestor

otherwise conda breaks ``` conda_build.exceptions.RecipeError: Mismatching hashes in recipe. Exact pins in dependencies that contribute to the hash often cause this. Can you change one or more exact pins to version bound constraints? Involved packages were: Mismatching package: libtorch (id cpu_generic_habf3c96_0); dep: libtorch 2.6.0.rc7 *0; consumer package: pytorch ```

danpetry · 2025-01-23T23:40:05Z

on osx-64

FAILED [0.0518s] test/test_nn.py::TestNN::test_batchnorm_nhwc_cpu - AssertionError: Tensor-likes are not close!

Mismatched elements: 1 / 8 (12.5%)
Greatest absolute difference: 1.430511474609375e-05 at index (0,) (up to 1e-05 allowed)
Greatest relative difference: 4.616721980710281e-06 at index (0,) (up to 1.3e-06 allowed)

skip?

h-vetinari · 2025-01-23T23:49:23Z

Yeah, this minor accuracy violation indeed sounds skippable, but I've deprioritised this PR until we get the windows builds for 2.5 fixed (and ideally your #318 merged as well).

danpetry · 2025-01-23T23:52:02Z

ok, good to know

danpetry · 2025-01-24T00:00:15Z

worth pointing out that as of 6 days time, pypi will have an up to date pytorch package whereas conda won't. Will have a look at that other PR

h-vetinari · 2025-01-24T00:17:27Z

worth pointing out that as of 6 days time, pypi will have an up to date pytorch package whereas conda won't.

Are you talking about rc's, or are we not looking at the same index? 2.6.0 GA hasn't been published AFAICT. Or are you saying that 2.6.0 will be released in 6 days?

In any case, this is no reason to rush. We didn't have windows packages for years, and I'm more concerned about fixing them, than lagging behind the PyPI release a bit (and we've often lagged for months in the past; this has gotten much better with the open-gpu server, but it still happens; 2.5.0 was released Oct 18th last year, we had first builds on Nov 3rd).

danpetry · 2025-01-24T21:38:05Z

are you saying that 2.6.0 will be released in 6 days

yes

I'm more concerned about fixing them, than lagging behind the PyPI release a bit

100%

danpetry · 2025-01-28T18:05:12Z

I think https://github.com/conda-forge/triton-feedstock might need to be updated as part of this. Some info on which commit to use here: pytorch/pytorch#145120 (comment)

danpetry

I had a look over and can't see anything I disagree with.
Agree that the tests and also the patches are getting quite extensive. Suggest that we work to reduce that in future PRs perhaps.

h-vetinari · 2025-02-11T21:45:46Z

I was just saying in general, it feels like we're skipping too many tests.

I don't think it's that bad TBH. We're skipping 35 tests (out of 8000+), across all architectures, and the vast majority of those skips are specific to one or two variants in our matrix. That's roughly 0.1-0.2% of tests getting skipped on a given platform.

Agree that the tests and also the patches are getting quite extensive. Suggest that we work to reduce that in future PRs perhaps.

💯 I've started an effort to reduce the skips (see #353 & #353). Reducing the patches would need a dedicated effort on the part of the respective authors to upstream their fix. I'll try upstreaming the find_package(CUDA) patch, but it's a non-trivial effort, and even comparatively simple ones like pytorch/pytorch#137084 are still open months later. OTOH, I did structure my include-path manipulation patches in a way that I could at least upstream pytorch/pytorch#145480.

Overall, both the patches and the skips are aiming to be the minimal set necessary to successfully build and test pytorch. I'd love to see them reduced as fast as possible, but for now I don't have a better alternative.

h-vetinari · 2025-02-11T21:48:07Z

CI will finish in about 7h - could I get some approvals or review until then? 🙏

Or at least a written "ok to proceed"? 🙃

danpetry · 2025-02-11T23:12:15Z

For now, I haven't reviewed in-depth enough to be able to say "approval", I think - could do that tomorrow if it's needed.
Otherwise: ok to proceed from me.

Co-Authored-By: H. Vetinari <h.vetinari@gmx.com>

Suggested-By: Michał Górny <mgorny@gentoo.org>

h-vetinari · 2025-02-12T03:48:24Z

Oh boy, nevermind. Something is very wrong on linux+CUDA:

= 191 failed, 14397 passed, 2763 skipped, 91 xfailed, 143695 warnings in 4444.09s (1:14:04) =

It seems that many (56) are due to RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR_HOST_ALLOCATION_FAILED, so basically: running out of memory. Even more (130) are due to RuntimeError: GET was unable to find an engine to execute this computation, which probably has the same underlying reason.

h-vetinari · 2025-02-12T04:05:13Z

There's another failure that looks like it could have been due to the OOMs, but surprisingly, it fails in exactly the same way on MKL/openblas, and with exactly the same (catastrophic) accuracy violation:

____________________ TestNN.test_Conv3d_1x1x1_no_bias_cuda _____________________
[...]
E                   AssertionError: Tensor-likes are not close!
E                   
E                   Mismatched elements: 6 / 6 (100.0%)
E                   Greatest absolute difference: 18.49855402601494 at index (1, 1, 0, 0, 0) (up to 0.0002 allowed)
E                   Greatest relative difference: 211.43751630108665 at index (0, 0, 0, 0, 0) (up to 0 allowed)

stacktrace

____________________ TestNN.test_Conv3d_1x1x1_no_bias_cuda _____________________
[gw1] linux -- Python 3.10.16 $PREFIX/bin/python

self = <test_nn.TestNN testMethod=test_Conv3d_1x1x1_no_bias_cuda>
test = <torch.testing._internal.common_nn.NewModuleTest object at 0x7fc5465f89a0>
kwargs = {}

    def with_tf32_off(self, test=test, kwargs=kwargs):
        with tf32_off():
>           test.test_cuda(self, **kwargs)

test/test_nn.py:7322: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <torch.testing._internal.common_nn.NewModuleTest object at 0x7fc5465f89a0>
test_case = <test_nn.TestNN testMethod=test_Conv3d_1x1x1_no_bias_cuda>

    def test_cuda(self, test_case):
        if not TEST_CUDA or not self.should_test_cuda:
            raise unittest.SkipTest('Excluded from CUDA tests')
    
        with set_default_dtype(self.default_dtype):
            cpu_input = self._get_input()
    
            type_map = {torch.double: torch.float}
            cpu_input_tuple = cpu_input if isinstance(cpu_input, tuple) else (cpu_input,)
    
            is_any_input_complex = any(isinstance(t, torch.Tensor) and t.dtype.is_complex for t in cpu_input_tuple)
    
            gpu_input_tuple = to_gpu(cpu_input_tuple, type_map=type_map)
    
            cpu_module = self.constructor(*self.constructor_args)
            gpu_module = self.constructor(*self.constructor_args).float().cuda()
            cpu_param = test_case._get_parameters(cpu_module)
            gpu_param = test_case._get_parameters(gpu_module)
            for cpu_p, gpu_p in zip(cpu_param[0], gpu_param[0]):
                gpu_p.data.copy_(cpu_p)
    
            test_case._zero_grad_input(cpu_input_tuple)
            test_case._zero_grad_input(gpu_input_tuple)
            test_case._zero_grad_parameters(cpu_module)
            test_case._zero_grad_parameters(gpu_module)
            cpu_output = test_case._forward(cpu_module, cpu_input_tuple)
            gpu_output = test_case._forward(gpu_module, gpu_input_tuple)
            if getattr(cpu_module, "return_indices", False):
                cpu_output = cpu_output[0]
                gpu_output = gpu_output[0]
            test_case.assertEqual(cpu_output, gpu_output, atol=self.precision, rtol=0, exact_dtype=False)
    
            # Run backwards on CPU and GPU and compare results
            for _ in range(5):
                cpu_gradOutput = cpu_output.clone().normal_()
                gpu_gradOutput = cpu_gradOutput.type_as(gpu_output)
                cpu_gradInput = test_case._backward(cpu_module, cpu_input_tuple, cpu_output, cpu_gradOutput)
                gpu_gradInput = test_case._backward(gpu_module, gpu_input_tuple, gpu_output, gpu_gradOutput)
                test_case.assertEqual(cpu_gradInput, gpu_gradInput, atol=self.precision, rtol=0, exact_dtype=False)
                for cpu_d_p, gpu_d_p in zip(cpu_param[1], gpu_param[1]):
                    test_case.assertEqual(cpu_d_p, gpu_d_p, atol=self.precision, rtol=0)
    
            # Run double-backwards on CPU and GPU and compare results
            if self.check_gradgrad and not self.FIXME_no_cuda_gradgrad_comparison:
                cpu_output = cpu_module(*cpu_input_tuple)
                gpu_output = gpu_module(*gpu_input_tuple)
                if getattr(cpu_module, "return_indices", False):
                    cpu_output = cpu_output[0]
                    gpu_output = gpu_output[0]
    
                cpu_gradOutput = torch.randn_like(cpu_output, requires_grad=True)
                gpu_gradOutput = cpu_gradOutput.type_as(gpu_output).detach()
                gpu_gradOutput.requires_grad = True
    
                cpu_gradInputs = torch.autograd.grad(
                    cpu_output,
                    cpu_input_tuple + tuple(cpu_module.parameters()),
                    cpu_gradOutput,
                    create_graph=True)
                gpu_gradInputs = torch.autograd.grad(
                    gpu_output,
                    gpu_input_tuple + tuple(gpu_module.parameters()),
                    gpu_gradOutput,
                    create_graph=True)
    
                for cpu_d_i, gpu_d_i in zip(cpu_gradInputs, gpu_gradInputs):
>                   test_case.assertEqual(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0, exact_dtype=False)
E                   AssertionError: Tensor-likes are not close!
E                   
E                   Mismatched elements: 6 / 6 (100.0%)
E                   Greatest absolute difference: 18.49855402601494 at index (1, 1, 0, 0, 0) (up to 0.0002 allowed)
E                   Greatest relative difference: 211.43751630108665 at index (0, 0, 0, 0, 0) (up to 0 allowed)

danpetry · 2025-02-12T16:13:14Z

A lot of the skips look like they're there to cope with CI limitations, basically

h-vetinari · 2025-02-12T23:26:13Z

A lot of the skips look like they're there to cope with CI limitations, basically

Sure, some portion of those are unavoidable, but basically all skips point at an issue that should be fixed in our packaging, or ideally upstream (in the test framework and/or the actual implementation). I kinda prefer having that transparency (and as a reminder that there's more work to be done), than simply removing the test modules in question and declaring it a done deal.

For example, the torchinductor tests you added in #318 were painful to get passing, but they did show that the torch.compile feature had essentially be broken in our packaging. I prefer a longer list of skips than a less functional package through ignoring problematic test modules wholesale. Of course, the best is to work through the issues surfaced by the skips, but that needs to happen on a longer timeline IMO, not while we're trying to get the release out.

This reverts commit f14feb5.

h-vetinari · 2025-02-13T10:50:31Z

Since the linux64+openblas+CUDA job had already passed for f14feb5 (with no essential changes since then), the only thing to do is to remove the blas_impl == "mkl" from the selector of this line

pytorch-cpu-feedstock/recipe/meta.yaml

Lines 502 to 503 in be20390

    
           {% set skips = skips ~ " or test_reentrant_parent_error_on_cpu_cuda)" %}                    # [linux and blas_impl == "mkl" and cuda_compiler_version != "None"] 
        
           # non-MKL problems

As such, I'm not planning to do a new push here (will shift the skip when merging). I'm planning to merge this in ~9-12 hours unless there are other comments @conda-forge/pytorch-cpu; I don't doubt that we'll keep iterating, so even post-merge comments won't take long to get picked up.

danpetry · 2025-02-13T20:03:36Z

🎉 thanks a lot for the huge amount of work on this over the last weeks @h-vetinari . This is really great.

h-vetinari added 3 commits January 18, 2025 14:33

pytorch v2.6.0.rc7

11db2ff

MNT: Re-rendered with conda-build 25.1.1, conda-smithy 3.45.2, and co…

5645301

…nda-forge-pinning 2025.01.18.07.29.32

rebase patches

1219e26

h-vetinari added 4 commits January 19, 2025 09:26

Remove bound on setuptools version

1e211b2

See pytorch/pytorch@2398e75 Partially reverts commit 46d7b46.

better sort pytorch run requirements

63c7c00

update dependencies for 2.6.0

bc3581d

pin libtorch in pytorch

113d413

hmaarrfk reviewed Jan 19, 2025

View reviewed changes

recipe/meta.yaml Outdated Show resolved Hide resolved

h-vetinari mentioned this pull request Jan 20, 2025

ARM build failed with recent XNNPACK update: third_party/XNNPACK/src/reference/unary-elementwise.cc:125:14: error: invalid ‘static_cast’ from type ‘xnn_bfloat16’ to type ‘_Float16’ pytorch/pytorch#141083

Open

h-vetinari added 5 commits January 20, 2025 14:01

backport fix for aarch build failure

c1b9cf6

reduce OMP_NUM_THREADS further due to OOMs in test

bf65aa4

use full paths to patched files in submodule

6757593

don't pin libtorch exactly for non-megabuilds

fa31c3f

make build number arithmetic explicit

49d134a

don't rely on PKG_BUILDNUM resolving this correctly, which is either racy, or implicitly depends on a separate render pass after setting build.number

disable patch in submodule

7cd981d

h-vetinari force-pushed the 2.6 branch 2 times, most recently from bd0bec7 to 022f063 Compare January 20, 2025 08:03

h-vetinari force-pushed the 2.6 branch from 022f063 to 1405263 Compare January 20, 2025 08:13

danpetry reviewed Feb 11, 2025

View reviewed changes

mgorny and others added 3 commits February 12, 2025 13:30

avoid very verbose logs from setuptools

22ee0fe

Co-Authored-By: H. Vetinari <h.vetinari@gmx.com>

homogenize pytest invocation; also use -v on unix

47f7f21

Suggested-By: Michał Górny <mgorny@gentoo.org>

filter out ptxas advisories

35d03ed

h-vetinari added 2 commits February 12, 2025 14:51

suppress warning block in pytest results

6d2a9a6

run tests sequentially, but increase OMP_NUM_THREADS again

d31229e

h-vetinari added 2 commits February 12, 2025 15:08

skip test failures that were not due to OOM

364ad9d

temporary: restrict jobs to linux64+CUDA

f14feb5

h-vetinari mentioned this pull request Feb 12, 2025

Test suite sometimes picks up way more tests; takes ~6 times longer #343

Open

h-vetinari added 5 commits February 13, 2025 10:38

minor skip updates

2e75510

ignore C++ warnings

db4e301

skip some tests that take a very long time

5640240

Revert "temporary: restrict jobs to linux64+CUDA"

395bea3

This reverts commit f14feb5.

avoid filter for ptxas advisories on osx

a75ba71

h-vetinari force-pushed the 2.6 branch from e9be2de to a75ba71 Compare February 12, 2025 23:38

h-vetinari mentioned this pull request Feb 13, 2025

test if windows CUDA test suite passes with -n 2 #358

Merged

Tobias-Fischer mentioned this pull request Feb 13, 2025

Torch 2.6 support state #359

Closed

h-vetinari added a commit that referenced this pull request Feb 13, 2025

Merge pull request #326 from h-vetinari/2.6

111cd9d

h-vetinari merged commit a75ba71 into conda-forge:main Feb 13, 2025
26 of 27 checks passed

h-vetinari deleted the 2.6 branch February 13, 2025 19:25

h-vetinari mentioned this pull request Feb 14, 2025

add migration for pytorch 2.6 conda-forge/conda-forge-pinning-feedstock#7035

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytorch v2.6.0 #326

pytorch v2.6.0 #326

h-vetinari commented Jan 18, 2025

conda-forge-admin commented Jan 18, 2025 •

edited

Loading

h-vetinari commented Jan 18, 2025

h-vetinari commented Jan 19, 2025

h-vetinari commented Jan 20, 2025

danpetry commented Jan 23, 2025

h-vetinari commented Jan 23, 2025 •

edited

Loading

danpetry commented Jan 23, 2025

danpetry commented Jan 24, 2025

h-vetinari commented Jan 24, 2025

danpetry commented Jan 24, 2025

danpetry commented Jan 28, 2025 •

edited

Loading

danpetry left a comment

h-vetinari commented Feb 11, 2025

h-vetinari commented Feb 11, 2025

danpetry commented Feb 11, 2025

h-vetinari commented Feb 12, 2025

h-vetinari commented Feb 12, 2025

danpetry commented Feb 12, 2025

h-vetinari commented Feb 12, 2025

h-vetinari commented Feb 13, 2025

danpetry commented Feb 13, 2025

pytorch v2.6.0 #326

pytorch v2.6.0 #326

Conversation

h-vetinari commented Jan 18, 2025

conda-forge-admin commented Jan 18, 2025 • edited Loading

h-vetinari commented Jan 18, 2025

h-vetinari commented Jan 19, 2025

h-vetinari commented Jan 20, 2025

danpetry commented Jan 23, 2025

h-vetinari commented Jan 23, 2025 • edited Loading

danpetry commented Jan 23, 2025

danpetry commented Jan 24, 2025

h-vetinari commented Jan 24, 2025

danpetry commented Jan 24, 2025

danpetry commented Jan 28, 2025 • edited Loading

danpetry left a comment

Choose a reason for hiding this comment

h-vetinari commented Feb 11, 2025

h-vetinari commented Feb 11, 2025

danpetry commented Feb 11, 2025

h-vetinari commented Feb 12, 2025

h-vetinari commented Feb 12, 2025

danpetry commented Feb 12, 2025

h-vetinari commented Feb 12, 2025

h-vetinari commented Feb 13, 2025

danpetry commented Feb 13, 2025

conda-forge-admin commented Jan 18, 2025 •

edited

Loading

h-vetinari commented Jan 23, 2025 •

edited

Loading

danpetry commented Jan 28, 2025 •

edited

Loading