Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3Packages.torch: switch to apple-sdk_13 #351778

Merged
merged 1 commit into from
Nov 9, 2024

Conversation

mikatammi
Copy link
Contributor

Since #346043 was merged to staging, switch to apple-sdk_13 on Darwin. This enables the use of MPS (Metal Performance Shaders) on macOS.

Fixes #243868

Quick test:

$ nix-shell -p 'python3.withPackages (ps: with ps; [ torch ])' -I nixpkgs=.

[nix-shell:~/nixpkgs.git]$ python
Python 3.12.7 (main, Oct  1 2024, 02:05:46) [Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.ones(5, device="mps")
>>> y = x * 2
>>> print(y)
tensor([2., 2., 2., 2., 2.], device='mps:0')

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
    • NixOS test(s) (look inside nixos/tests)
    • and/or package tests
      From package tests, tester-compileCpu was broken because there's no OpenMP in macOS. I hacked it so I could run both CPU and MPS tests on macOS, but I'm going to propose fixes to tests in another PR, because it might become lenghty discussion.
    • or, for functions and "core" functionality, tests in lib/tests or pkgs/test
    • made sure NixOS tests are linked to the relevant packages
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@mikatammi
Copy link
Contributor Author

I also want to include tests for MPS but I was a bit unsure were they even working for other platforms than CUDA. Tests in separate PR here: #351782

@ofborg ofborg bot requested review from teh, tscholak and thoughtpolice October 28, 2024 06:50
@ofborg ofborg bot added 10.rebuild-darwin: 101-500 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Oct 28, 2024
Copy link
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff lgtm

thanks for putting this together @mikatammi ! I think we can merge this on the condition that we commit to adding tests, either in #351782 or elsewhere

@mikatammi
Copy link
Contributor Author

Strangely I noticed tests started failing in accelerate-package, which depends on torch, after this change

@mikatammi
Copy link
Contributor Author

I'm wondering should I just create additional package called torchWithMps, in similar fashion than there now exists torchWithCuda and torchWithRocm. The MPS should then be explicitly disabled in basic torch package, as it gets automatically detected now that the change to use apple-sdk_13 as buildInput was made

@wegank wegank added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Oct 31, 2024
@vcunat vcunat changed the base branch from staging-next to master October 31, 2024 09:46
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/title-the-darwin-sdks-have-been-updated/55295/1

@samuela
Copy link
Member

samuela commented Nov 1, 2024

Strangely I noticed tests started failing in accelerate-package, which depends on torch, after this change

Interesting, do you have logs for these failures?

I'm wondering should I just create additional package called torchWithMps, in similar fashion than there now exists torchWithCuda and torchWithRocm. The MPS should then be explicitly disabled in basic torch package, as it gets automatically detected now that the change to use apple-sdk_13 as buildInput was made

Not the craziest idea ever, but I feel like it would be simpler to avoid adding another derivation flag and alias if at all possible

@mikatammi
Copy link
Contributor Author

#353184

I created a separate PR for fixing up issues in accelerate package.

@mikatammi
Copy link
Contributor Author

Strangely I noticed tests started failing in accelerate-package, which depends on torch, after this change

Interesting, do you have logs for these failures?

#353184 (comment)

@mikatammi
Copy link
Contributor Author

Strangely I noticed tests started failing in accelerate-package, which depends on torch, after this change

Interesting, do you have logs for these failures?

#353184 (comment)

Apparently my fixes for skipping some of the accelerate tests were already merged before this PR which breaks them

@SomeoneSerge SomeoneSerge self-requested a review November 6, 2024 15:37
@SomeoneSerge
Copy link
Contributor

(I see there's already an approval; feel free to merge, I self-requested a review just so this doesn't get lost)

@SomeoneSerge
Copy link
Contributor

I'm wondering should I just create additional package called torchWithMps, in similar fashion than there now exists torchWithCuda and torchWithRocm.

  • I'd rather delete torchWith{Cuda,Rocm}, they're broken anyway (one has to rebuild dependencies like MPI with cuda/rocm too, and one shouldn't mix cuda pytorch with non-cuda rev dependencies; tldr one shouldn't use this attribute, one should just create a whole new consistent package set)
  • There's no need for a separate attribute in case of MPS because we should just always enable it when it's supported, I assume it just comes with the box. Is there any hardware that wouldn't be supported by this mps torch?

@samuela
Copy link
Member

samuela commented Nov 8, 2024

It would have to be so ancient that I think it's outside our support window. Even in the off chance this gets built and run on a machine w/o Metal, that would be a runtime error avoidable by setting device= correctly, no?

Switch to apple-sdk_13 on Darwin. This enables the use of MPS (Metal
Performance Shaders) on macOS.

Signed-off-by: Mika Tammi <mikatammi@gmail.com>
@wegank wegank removed the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Nov 8, 2024
@@ -364,6 +364,9 @@ buildPythonPackage rec {
# NB technical debt: building without NNPACK as workaround for missing `six`
USE_NNPACK = 0;

# Explicitly enable MPS for Darwin
USE_MPS = setBool stdenv.hostPlatform.isDarwin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe at some point we should move these to env, and use optionalAttrs for platform-specific stuff, because the way we do it now we trigger a useless rebuild on linux. But that's OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just waiting for ofborg-eval now)

@ofborg ofborg bot added 10.rebuild-linux: 101-500 and removed 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Nov 9, 2024
@SomeoneSerge SomeoneSerge merged commit 05eff5c into NixOS:master Nov 9, 2024
25 checks passed
@mikatammi mikatammi deleted the pytorch_pr branch November 9, 2024 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Package request: pytorch on darwin with GPU (MPS) support
5 participants