Recipe overhaul: more tests & documentation, various clean-ups #298

mgorny · 2024-12-04T18:32:05Z

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

Old description until Dec 23, 2024

From the commit message:

Upstream keeps all magma-related routines in a separate
libtorch_cuda_linalg library that is loaded dynamically whenever linalg
functions are used.  Given the library is relatively small, splitting it
makes it possible to provide "magma" and "nomagma" variants that can
be alternated between.

Also:

Try to speed up magma/nomagma builds a bit.  Rather than rebuilding
the package 3 times (possibly switching magma → nomagma → magma again),
build it twice at the very beginning and store the built files for later
reuse in subpackage builds.

While at it, replace the `pip wheel` calls with `setup.py build` to
avoid unnecessarily zipping up and then unpacking the whole thing.
In the end, we are only grabbing a handful of files for `libtorch*`
packages and they are in predictable location in the build directory.
`pip install` remains being used in the final builds for `pytorch`.

In this PR's implementation, we've chosen to prioritize the magma build for those with GPUs. We have done so by using both track_features on nomagma, and an increased build number for the magma build with should help different solvers naturally find the magma build.

Fixes #275

Due to the desire to decrease the maintenance burden, and the fact that we were able to reduce the compiled size of the libmagma package, we decided to abandon the original goal of making magma optional.

Instead this PR has changed focus to a refactor of the original recipe bringing tests and documentations into light.

@isuruf helped me a lot with this, particularly with refactoring the builds, so both variants are built in one run.

Upstream keeps all magma-related routines in a separate libtorch_cuda_linalg library that is loaded dynamically whenever linalg functions are used. Given the library is relatively small, splitting it makes it possible to provide "magma" and "nomagma" variants that can be alternated between. Fixes conda-forge#275 Co-authored-by: Isuru Fernando <ifernando@quansight.com>

…nda-forge-pinning 2024.12.04.13.54.14

Try to speed up magma/nomagma builds a bit. Rather than rebuilding the package 3 times (possibly switching magma → nomagma → magma again), build it twice at the very beginning and store the built files for later reuse in subpackage builds. While at it, replace the `pip wheel` calls with `setup.py build` to avoid unnecessarily zipping up and then unpacking the whole thing. In the end, we are only grabbing a handful of files for `libtorch*` packages and they are in predictable location in the build directory. `pip install` remains being used in the final builds for `pytorch`.

conda-forge-admin · 2024-12-04T18:33:41Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

ℹ️ It looks like the 'libtorch-cuda-linalg' output doesn't have any tests.
ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12202823840. Examine the logs at this URL for more detail.}

h-vetinari · 2024-12-04T21:38:04Z

Since this is still WIP, I started only a single job on linux for now (x64+MKL+CUDA)

mgorny · 2024-12-05T12:21:30Z

Since this is still WIP, I started only a single job on linux for now (x64+MKL+CUDA)

Thanks. It seems to have failed only because it's adding new outputs. Do you want me to file the admin request for allowing libtorch-cuda-linalg, or do you prefer reviewing the changes first?

h-vetinari · 2024-12-06T07:43:37Z

Do you want me to file the admin request for allowing libtorch-cuda-linalg

That would be great!

Added in conda-forge/pytorch-cpu-feedstock#298.

mgorny · 2024-12-06T13:28:11Z

Filed as conda-forge/admin-requests#1209.

…onda-forge-pinning 2024.12.06.16.06.14

The test currently refused to even start, since not all dependencies were satisfied.

Put all the rules in a single file. In the end, build_common.sh has pytorch-conditional code at the very end anyway, and keeping the code split like this only makes it harder to notice mistakes.

hmaarrfk · 2024-12-07T20:30:20Z

recipe/README.md

+   that are independent of selected Python version and are therefore shared
+   by all Python versions.
+
+2. `libtorch-cuda-linalg` that provides the shared `libtorch_cuda_linalg.so`


is this the preffered one?

as in, is it preffered that users make use of linalg?

I ask because we try very hard to ensure that

mamba install pytorch

installs the best hardware optimized one.

Not sure I understand the question.

libtorch_cuda_linalg.so is required for some operations. It is always pulled in by libtorch itself, i.e. the end result is that some version of the library is always installed.

As for magma vs. nomagma, I think we ought to prefer magma. #275 (comment) suggests that cusolver will be faster for some workflows, but the magma build supports both and according to https://pytorch.org/docs/stable/backends.html#torch.backends.cuda.preferred_linalg_library, it has a heuristic to choose the faster backend for given operation.

So who benefits from making this optional?

Sorry if this obvious.

This is #298, i.e. people who want to avoid installing the large magma dependency (~250M in package). Given that libtorch-cuda-linalg is 200k (nomagma) / 245k (magma), I've figured out it's worth it.

The no-magma variant reduces the on-disk install size of a pytorch-gpu install from 7 GB to 5 GB (as #275 says). So this is definitely worth doing.

Given that cusolver is now on average fairly close in performance to magma and that the PyTorch wheels don't include magma, I'd argue that the default should be nomagma (I'd certainly almost always want that for my use cases). However, what the default is is less important than both options being available.

Somehow missed the last ping here. Thanks for the info, you certainly know pytorch much better than I do.

Why do you think that linking libmagma statically will help?

Presumably those 10 pytorch functions do not make use of the entirety of the 2GB libmagma. If they were compiled against a static build, the impact would be much smaller.

One option that's different from upstream is conda-forge/libmagma-feedstock#22

That's even better though! :)

Including this bit from that (merged) PR thread:

Me:

Did you measure the difference between the two approaches? Would be interested to know how they compare

Isuru:

2GB -> 620 MB

xref: conda-forge/libmagma-feedstock#22 (comment)

IOW a nearly 70% reduction in size. Perhaps it is worth rechecking the analysis given this change? The biggest thing might be something else now

It looks even better on my machine after @isuruf's compression fix:

$ mamba create -n pytorch-gpu-19dec2024 pytorch-gpu $ cd ~/mambaforge/envs/pytorch-gpu-19dec2024/ $ du -hsx * | sort -rh | head -3 4,1G lib 1,6G targets 69M include $ du -hsx lib/* | sort -rh | head -10 1,4G lib/libtorch_cuda.so 432M lib/libcudnn_engines_precompiled.so.9.3.0 296M lib/python3.13 293M lib/libmagma.so 276M lib/libtorch_cpu.so 252M lib/libnccl.so.2.23.4 233M lib/libcudnn_adv.so.9.3.0 104M lib/libcudnn_ops.so.9.3.0 91M lib/libmagma_sparse.so 68M lib/libmkl_core.so.2 $ du -hsx targets/x86_64-linux/lib/* | sort -rh | head -10 469M targets/x86_64-linux/lib/libcublasLt.so.12.6.4.1 280M targets/x86_64-linux/lib/libcusparse.so.12.5.4.2 266M targets/x86_64-linux/lib/libcufft.so.11.3.0.4 146M targets/x86_64-linux/lib/libcusolver.so.11.7.1.2 104M targets/x86_64-linux/lib/libcublas.so.12.6.4.1 92M targets/x86_64-linux/lib/libcurand.so.10.3.7.77 86M targets/x86_64-linux/lib/libcusolverMg.so.11.7.1.2 56M targets/x86_64-linux/lib/libnvrtc.so.12.6.85 49M targets/x86_64-linux/lib/libnvJitLink.so.12.6.85 5,1M targets/x86_64-linux/lib/libnvrtc-builtins.so.12.6.85

So 385 MB left for libmagma.so + libmagma_sparse.so.

I'll also note that there are two linux-64/libmagma packages, and the CUDA 12 one seems to be a lot smaller than the CUDA 11 one. There are no interpretable build strings, but comparing package hashes with the logs from conda-forge/libmagma-feedstock#22 shows that CUDA 11 is the larger one. The analysis above uses the h7847c38_1 package.

Well, I'm not opposed to reverting the magma parts. Just let me know if I should do it.

Well, I'm not opposed to reverting the magma parts. Just let me know if I should do it.

For completeness here: Isuru moved this decision conversation to gh-275, which seems like the right thing to do for visibility.

conda-forge-admin · 2024-12-08T03:34:38Z

Hi! This is the friendly automated conda-forge-linting service.

I was trying to look for recipes to lint for you, but it appears we have a merge conflict. Please try to merge or rebase with the base branch to resolve this conflict.

Please ping the 'conda-forge/core' team (using the @ notation in a comment) if you believe this is a bug.

mgorny · 2024-12-08T03:38:37Z

I've added explanation in the README, as well as a fix to install libtorch_python symlink, and missing test dependencies. Tests don't work yet but I'm on a good way to run a subset of them (like upstream CI does).

isuruf · 2024-12-21T09:47:53Z

recipe/meta.yaml

 package:
-  name: libtorch
+  name: libtorch-split


Let's not do this unless absolutely necessary. I was okay doing this because of libmagma builds, but now that they are gone, let's not do this split build and keep the top level as libtorch.

Do you prefer me reverting it as more commits, or rebasing the whole thing?

Whichever is easier for you

Ok, I went for more commits, because I wanted to retain full history and attribution, and collapsing the code back seems relatively clean anyway. I'm going to finish testing in a few minutes and push.

h-vetinari · 2024-12-22T20:11:37Z

@conda-forge/pytorch-cpu, I think this PR is finally done. 😅

Are there any remaining comments/opinions/requests?

danpetry · 2024-12-23T00:19:35Z

recipe/meta.yaml

        - test -f $PREFIX/lib/libtorch_python${SHLIB_EXT}     # [unix]

+        # a reasonably safe subset of tests that should run under 15 minutes
+        # disable hypothesis because it randomly yields health check errors


Have you considered their smoke test? It includes testing of some key features such as torch.compile and is useful for developing this feedstock, because it can expose problems pretty quickly. There's an implementation here.

AFAIU, the testing here is more comprehensive than the smoke test. But that shouldn't stop us from adding the smoke test in the next PR if it's considered useful (and potentially covers some aspects not covered by the choice of modules here)

Side note, if we're bringing across changes from Anaconda's recipe to here, would you prefer to have each change in a separate PR or all in one PR?

@danpetry: Depends a bit on the quantity of changes, and how well they're separated in terms of commits. But you can start with everything you have in mind, and we can carve out smaller PRs if necessary.

mgorny · 2024-12-23T06:42:36Z

Hmm, that's a lot of test failures for the linux_64_blas_implgenericc_compiler_version13cuda_compilercuda-nvcccuda_compiler_version12.6cxx_compiler_version13 job. Here, I'm seeing 5 now:

FAILED [0.0019s] test/test_custom_ops.py::TestCustomOpAPI::test_compile - Run...
FAILED [0.0049s] test/test_custom_ops.py::TestCustomOpAPI::test_fake - Runtim...
FAILED [0.0018s] test/test_custom_ops.py::TestCustomOp::test_data_dependent_compile
FAILED [0.0018s] test/test_custom_ops.py::TestCustomOp::test_functionalize_error
FAILED [0.0134s] test/test_torch.py::TestTorch::test_print - AssertionError: ...

But I don't have a GPU here.

recipe/meta.yaml

recipe/README.md

turns out it had nothing to do with order of `-m` & `-k` This reverts commit a59b008.

mgorny · 2024-12-23T14:05:11Z

To be honest, I'm a bit worried about how much we're spending on failing tests here. Perhaps we could cancel all test runs, except for the one that has failed before?

I'm also wondering if we shouldn't just either ignore failures (i.e. restore || :) or skip them for now, especially if they fail again, merge this as-is (as we know it's not a regression), and get them to work separately.

@isuruf

Pointed out by @isuruf in conda-forge#275 (comment)

mgorny · 2024-12-23T15:29:28Z

Just ran the complete build for linux_64_blas_implgenericc_compiler_version13cuda_compilercuda-nvcccuda_compiler_version12.6cxx_compiler_version13 (i.e. the one that failed previously) on our sm75-enabled server and all tests passed.

I've also prepared one more change requested by @isuruf in mgorny#3 ; not merging it yet, since I don't want to restart CI.

h-vetinari · 2024-12-23T20:02:35Z

To be honest, I'm a bit worried about how much we're spending on failing tests here.

Yeah, let's just skip them on GPUs. Especially now that it turns out that the x64 + CUDA build can fail catastrophically:

sssssssssssssssssssssssssssssssssssssssssssssssssssssssF.....ssss.F.ssss [ 40%]
......ssssss..ss............ssssssssssss..............F.....F....F.....F [ 40%]
..FFFFF.FFFFFsFFFFFFF....F.FFF..F.F...F.F..s.F.sFF.............FFFFF.sss [ 41%]
s.F.FFFFFFFFFF...FxxFFFFFFFFFFFFFFFFF..FFFFF.FsFsssFxxF..FFFF........... [ 41%]
..FFFFFFFFFFF........................F..F.Fs..Fs.............FFFFFFFFFFF [ 42%]
..FFFFFFxxFFFFFFFFFFF.FFFFF.F..FFF.....F..........F......F...FFFFF.F...F [ 42%]
.FF.......FFFFFFFFF............................................FFFFFFFFF [ 43%]
FFFF............................FFF..FF................................. [ 43%]
......F............FF.F.................................FFFFFFFF.s...... [ 43%]
.............F..........................................FFFF.FF.FFFF.... [ 44%]
...........xFxxx...F..F.......F.....F....F.s....F............FFxxFxxF... [ 44%]
.F........F.........F....F................F................F............ [ 45%]
.......F...............................................................F [ 45%]
[...]
..F.............F.F...F..s.F.F....s.sF...F..sF.FF.FF..F.F............... [ 58%]
./.scripts/run_docker_build.sh: line 108:  2497 Killed                  docker run ${DOCKER_RUN_ARGS} -v "${RECIPE_ROOT}":/home/conda/recipe_root:rw,z,delegated -v "${FEEDSTOCK_ROOT}":/home/conda/feedstock_root:rw,z,delegated -e CONFIG -e HOST_USER_ID -e UPLOAD_PACKAGES -e IS_PR_BUILD -e GIT_BRANCH -e UPLOAD_ON_BRANCH -e CI -e FEEDSTOCK_NAME -e CPU_COUNT -e BUILD_WITH_CONDA_DEBUG -e BUILD_OUTPUT_ID -e flow_run_id -e remote_url -e sha -e BINSTAR_TOKEN -e FEEDSTOCK_TOKEN -e STAGING_BINSTAR_TOKEN "${DOCKER_IMAGE}" bash "/home/conda/feedstock_root/${PROVIDER_DIR}/build_steps.sh"
##[error]Process completed with exit code 137.

h-vetinari · 2024-12-24T00:50:52Z

Especially now that it turns out that the x64 + CUDA build can fail catastrophically:

On second thought @mgorny, this might actually be due to one of the following commits of yours, which were part of mgorny#2, but mostly tangential to this PR:

AFAICT, we haven't had a x64+generic+CUDA job since 98f8a9c that ran and passed the test suite (modulo the misapplied skips). Both times that the python test suite ran on that job (97cb097 & 4fe6a1e), things failed catastrophically.

I'm going to roll back those commits (& 0a76034) for now; we can do more focussed debugging on the enablement in a separate PR.

This reverts commit 27c5e99.

This reverts commit 1b15f6a.

This reverts commit cbe2c42.

hmaarrfk · 2024-12-24T01:41:23Z

If these are green, I would be in strong favor or just merging this as is an cleaning up in separately.

The main concern I have is loosing momentun on nit picks.

If the test suite doesn't pass, we should disable the tests, and work on a separate PR to address the failing tests so as to allow the following other projects to move forward:

(In no order of preference from me)

I've also updated the top level description to be more in line with the new goals of the PR.

mgorny · 2024-12-24T14:43:14Z

The crashing-so-far generic + CUDA job has passed this time. Can we merge it now? I don't think there's a point in waiting for the remaining jobs to run.

Also, should I open a separate PR to redo the extra deps that have caused trouble here? Possibly one by one, to see which one was responsible.

hmaarrfk · 2024-12-24T22:36:58Z

@mgorny did you have any other pending commits you wanted to get in?

I think we should just wait since the builds have "started" the problems often occur if the builds are "stuck".

mgorny · 2024-12-25T06:54:25Z

No, not at the moment.

All checks passed!

mgorny and others added 3 commits December 4, 2024 19:13

MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.44.9, and co…

da06d90

…nda-forge-pinning 2024.12.04.13.54.14

trigger CI

e48edc0

mgorny added a commit to mgorny/admin-requests that referenced this pull request Dec 6, 2024

Add feedstock output for libtorch-cuda-linalg in pytorch-cpu

b49681e

Added in conda-forge/pytorch-cpu-feedstock#298.

mgorny mentioned this pull request Dec 6, 2024

Add feedstock output for libtorch-cuda-linalg in pytorch-cpu conda-forge/admin-requests#1209

Merged

19 tasks

mgorny added 4 commits December 6, 2024 17:52

Include blas_impl in libtorch-cuda-linalg package build string

55a72e0

Add some explanation on how things work to recipe/README.md

cf410ae

Fix building with CUDA disabled

700e6b4

MNT: Re-rendered with conda-build 24.11.2, conda-smithy 3.44.9, and c…

3bde066

…onda-forge-pinning 2024.12.06.16.06.14

mgorny marked this pull request as ready for review December 6, 2024 16:56

mgorny requested review from Tobias-Fischer, beckermr, benjaminrwilson, hmaarrfk, jeongseok-meta and sodre as code owners December 6, 2024 16:56

mgorny added 3 commits December 6, 2024 20:52

Update test dependencies

9846f68

The test currently refused to even start, since not all dependencies were satisfied.

Move symlinking from build_pytorch.sh to build_common.sh

2843518

Put all the rules in a single file. In the end, build_common.sh has pytorch-conditional code at the very end anyway, and keeping the code split like this only makes it harder to notice mistakes.

Fix creating libtorch_python.so symlink in sitedir

f1c7ec0

hmaarrfk reviewed Dec 7, 2024

View reviewed changes

Explain magma vs. nomagma better in the README

ee8f6d3

Merge remote-tracking branch 'upstream/main' into magma-wip

b8066ee

ensure -k before -m in pytest invocation

a59b008

isuruf reviewed Dec 21, 2024

View reviewed changes

mgorny and others added 2 commits December 21, 2024 16:51

Recombine libtorch into the top-level build rule

85c85d4

fix test skips

97cb097

danpetry reviewed Dec 23, 2024

View reviewed changes

h-vetinari reviewed Dec 23, 2024

View reviewed changes

recipe/meta.yaml Show resolved Hide resolved

isuruf reviewed Dec 23, 2024

View reviewed changes

recipe/README.md Outdated Show resolved Hide resolved

isuruf approved these changes Dec 23, 2024

View reviewed changes

mgorny and others added 2 commits December 23, 2024 08:56

Update README for the removed middle packaging step

3b85785

put arguments for pytest in more obvious order again

4fe6a1e

turns out it had nothing to do with order of `-m` & `-k` This reverts commit a59b008.

Ignore run_exports for libmagma_sparse

2d94c39

Pointed out by @isuruf in conda-forge#275 (comment)

h-vetinari added 3 commits December 24, 2024 11:55

Revert "Reenable building kineto, add CUPTI dep"

2f85656

This reverts commit 27c5e99.

Revert "Enable building against libcudss"

0ae9168

This reverts commit 1b15f6a.

Revert "Enable building against cusparselt"

cc93e41

This reverts commit cbe2c42.

h-vetinari force-pushed the magma-wip branch from 0a76034 to cc93e41 Compare December 24, 2024 00:58

h-vetinari changed the title ~~Recipe overhaul: more {tests, plugins, documentation}, various clean-ups~~ Recipe overhaul: more tests & documentation, various clean-ups Dec 25, 2024

hmaarrfk merged commit ca96a6e into conda-forge:main Dec 25, 2024
25 checks passed

This was referenced Dec 26, 2024

Reenable building kineto, add CUPTI dep #305

Merged

cuDSS support in 2.5.0(?) #269

Closed

Add triton dependency, readd cudss and cusparselt, mention dev speedup tricks in the README #309

Merged

Recipe overhaul: more tests & documentation, various clean-ups #298

Recipe overhaul: more tests & documentation, various clean-ups #298

Conversation

mgorny commented Dec 4, 2024 • edited by h-vetinari Loading

conda-forge-admin commented Dec 4, 2024 • edited Loading

h-vetinari commented Dec 4, 2024

mgorny commented Dec 5, 2024

h-vetinari commented Dec 6, 2024

mgorny commented Dec 6, 2024

Choose a reason for hiding this comment

mgorny Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakirkham Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

conda-forge-admin commented Dec 8, 2024

mgorny commented Dec 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari commented Dec 22, 2024

danpetry Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgorny commented Dec 23, 2024

mgorny commented Dec 23, 2024

mgorny commented Dec 23, 2024

h-vetinari commented Dec 23, 2024

h-vetinari commented Dec 24, 2024

hmaarrfk commented Dec 24, 2024

mgorny commented Dec 24, 2024

hmaarrfk commented Dec 24, 2024

mgorny commented Dec 25, 2024

mgorny commented Dec 4, 2024 •

edited by h-vetinari

Loading

conda-forge-admin commented Dec 4, 2024 •

edited

Loading

mgorny Dec 8, 2024 •

edited

Loading

jakirkham Dec 18, 2024 •

edited

Loading

danpetry Dec 23, 2024 •

edited

Loading