Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cudatoolkit 11.x and cudnn 8.x the default major versions [WIP] #164820

Closed
wants to merge 2 commits into from

Conversation

rehno-lindeque
Copy link
Contributor

@rehno-lindeque rehno-lindeque commented Mar 19, 2022

Description of changes

This is still work-in-progress. (Needs checking)

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.05 Release Notes (or backporting 21.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@rehno-lindeque rehno-lindeque changed the title cudnn: make default 8.3 [WIP] cudnn: make 8.3 the default [WIP] Mar 19, 2022
@samuela
Copy link
Member

samuela commented Mar 19, 2022

I'd say that this is ready to merge once nixpkgs-review and cuda-nix-testsuite are passing.

cc @NixOS/cuda-maintainers

@samuela
Copy link
Member

samuela commented Mar 21, 2022

This breaks the current pytorch version. According to my testing cudnn 7.6 with CUDA 10.2 is the latest version combo that I could get to work. But it sounds like upgrading to cudnn 8.3 is useful for the building latest pytorch? @rehno-lindeque Is there any documentation on the pytorch side of things about what versions of cuDNN/CUDA they support?

@rehno-lindeque
Copy link
Contributor Author

It looks like CUDA 10.2 was technically already supposed to be the lower bound for pytorch v1.10

https://github.com/pytorch/pytorch/blob/v1.10.0/README.md#from-source

@rehno-lindeque
Copy link
Contributor Author

But it sounds like upgrading to cudnn 8.3 is useful for the building latest pytorch? Is there any documentation on the pytorch side of things about what versions of cuDNN/CUDA they support?

I'm struggling a bit to find good information but these things seem relevant:

# CUDA only: Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda110  # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo

Pytorch 1.11.0 and 1.10.0 both have this in their documentation:

The Dockerfile is supplied to build images with CUDA 11.1 support and cuDNN v8.

So I think that suggests it should be possible to build with cudnn v8... 🤔

@samuela
Copy link
Member

samuela commented Mar 21, 2022

Interesting I wasn't able to get it to compile with any cuDNN v8.x version, but no need to worry about that now if we're going to be updating pytorch to v1.11 anyhow

@samuela
Copy link
Member

samuela commented Mar 21, 2022

If it continues to give issues compiling against versions that they claim are supported, we can open an upstream issue with the pytorch folks!

@rehno-lindeque
Copy link
Contributor Author

rehno-lindeque commented Mar 21, 2022

Looks like nvidia does have a Download cuDNN v7.6.5 (November 18th, 2019), for CUDA 10.2.

But I'm going to keep pushing on this v8.3 upgrade for now and see if I can figure it out. (Just trying the pytorch 1.10 build against it now myself)

@rehno-lindeque
Copy link
Contributor Author

rehno-lindeque commented Mar 21, 2022

Oh I think I finally understand now. It looks to me like probably the valid combinations (for pytorch) are

  • cudnn 7.x + cuda 10.x, or
  • cudnn 8.x + cuda 11.x

It seems pretty clear now looking at

So a default bump to cudnn 8 would need to be paired with a default bump to cuda 11. Is that something we want to do? Or should I look at cudnn 7.6 with cudatoolkit 10.2 instead?

@rehno-lindeque rehno-lindeque changed the title cudnn: make 8.3 the default [WIP] Make cudatoolkit 11.x and cudnn 8.x the default major versions [WIP] Mar 22, 2022
@samuela
Copy link
Member

samuela commented Mar 22, 2022

So a default bump to cudnn 8 would need to be paired with a default bump to cuda 11. Is that something we want to do? Or should I look at cudnn 7.6 with cudatoolkit 10.2 instead?

I'd say let's get the pytorch stuff merged first, and then worry about cudnn/cudatoolkit second. There's no need for pytorch to necessarily require the default cudnn/cudatoolkit versions. For better or worse it looks like 5446ad8 already circumvented a PR to update the pytorch source build to 1.11.0, but pytorch-bin will also require an update.

@FRidh FRidh added the 6.topic: cuda Parallel computing platform and API label Mar 30, 2022
@FRidh
Copy link
Member

FRidh commented Apr 9, 2022

This change is present on master.

@FRidh FRidh closed this Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants