-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
errors with cuda ops installation #2
Comments
fixed by installing ninja. Might recommend adding that to the readme as a requirement |
I have encountered the same issue on Colab, and your fix works! %pip install ninja |
@dvschultz Thanks for the report! README.md will be updated. |
Also note that nvcc doesn't work with new gcc, so if you have system default gcc > 8, pytorch will honor the CC env variable, do |
I did install ninja but then I got -> OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. This is happened during evaluating the metrics. |
Still getting this error on a vast.ai VM with pytorch 1.7.1 and cuda 11 installed. I installed all required packages but whatever I try I keep getting the errors when trying to compile the custom cuda ops. Is there perhaps a guide for Linux and ada-pytorch? It seems to be that it should work out of the box, but unfortunately it does not. p.s. I have already made it work in windows by installing vc2019 and cuda 11. Would love to make it run on the VM so that I can train larger models. |
Having same error. I'm on win10 with RTX-3070 GPU and torch 1.7.1+cu110. I also installed required packages and deleted torch_extensions. |
If still not working try installing Windows 10 SDK, I had the same problem, installed Windows 10 SDK and now its working fine. |
still have this issue on linux with CUDA 11.0 after installing ninja, is there a specific version of ninja we need to install? I have the same error on both conda and pip (after pasting the line in the README) |
The original poster in this bug filed this for Colab. Not sure what @Dhruva-Storz is running on. Try changing this line to get more details about what could be going wrong:
to
and check if you get anything relevant in the log. Remember to completely remove your torch extensions dir (search for Usually this is a matter of CUDA SDK (the one you have to install yourself, not the pytorch bundled cuda toolkit) not being installed properly, or there being multiple versions of it and some old or otherwise incompatible version gets used when building our custom extensions. |
My apologies for not giving enough info. Im running on : ubuntu 20.04.1, My pytorch installation is 1.7.1 with cuda toolkit 11.0 To reproduce error
Important bits of error after deleting pytorch extensions dir, setting verbosity to full
You mentioned in the README that we need to use cuda-toolkit 11.1, but the website has no installation instructions for 11.1. This may be the source of the problem. do you have any suggestions on what might be causing this? |
I now see that the README is quite confusing about this. In order to run on RTX 3090, you need to install:
The latter is required to build our custom pytorch extensions. Nvcc from CUDA 11.0 will fail with the error you saw above if you're running it on RTX 3090. Nvcc from CUDA 11.1 should work. |
No solution yet? |
this solution does not work either. |
your solution did not fix my problem either. |
I havent found a way to safely install cuda 11.1 on my work computer because it might interfere with the work of others, so I havent been able to test nurpax's solution. However, it seems like this should fix the problem as the build errors seem to be related to nvcc. If not, the code still runs, you just have to disable warnings with
When they say remove torch_extensions_dir, I believe they mean that you delete the folder where the custom torch extensions were installed. Mine was in ~/.cache/torch_extensions Im probably going to wait for official cuda 11.1 support from pytorch so I can safely install it in an environment. However, if anyone has solutions on how to install two different cuda toolkits safely, do let me know. |
I’m not sure if installing CUDA toolkit from Conda is enough (ie. as part of pytorch installation). I think you really do need a separate full CUDA installation with nvcc, headers, the whole nine yards. Not from Conda or Pip but using NVIDIA’s packages/installers. I recall trying without it, using just what’s bundled with pytorch installation and I don’t think it contained everything that’s required to build our CUDA kernels. I’d be happy to be shown wrong on this as it’d simplify the installation instructions. At least on Windows, you can have multiple CUDA versions installed simultaneously. Safer is of course to match what CUDA you have in the PATH with what your pytorch was built with. If you do end up installing different CUDA SDKs, don’t let the installers touch your GPU drivers. Those are best kept at yiur most recent version. |
@nurpax I am using a separate full CUDA installation with nvcc, cudNN etc.. |
Stuck here big time with ImportError: No module named 'upfirdn2d_plugin' I am using a vast.ai instance nvidia/cuda:11.2.1-cudnn8-runtime-ubuntu18.04
Conda environment is set with What I've triedFIrst I made sure my VM has CUDA 11.2 installed. Then I've installed a newer torch with CUDA 11.1.1, which did not help and I've rolled back. Removed torch_extensions Didn't help gcc And tried installing gcc7 and even gcc5 This does not help either Google Colab works fine and has ubuntu 18.04 with gcc 7.5.0 installed which I am trying to mimic. Hope that is the correct logic. UPD:
UPD2 Please advice on any possible next steps. No idea where to move next. |
Thanks for all the discussions above. It has been very helpful. I probably had all of the above problems, the visual studio definitely helped taking care of the C++ related compiling issues, and the installation of the whole cuda11.3 package (2.7G) from nvidia website took care of the upfirdn2d bug. Now my program is running in pycharm with pytorch 1.7.1, cuda 11.3, python 3.7. |
Add index and seed feature to image and video generation
Thanks for all the discussions above. I have successfully set up the environment with 3090, and would like to share my settings. Here CUDA and CUDNN are installed manually, and pytorch is built from source (https://github.com/pytorch/pytorch/tree/v1.7.1). After installing pytorch, |
works on colab and windows but not on ubuntu 20.04 |
tested on a fresh Colab V100 and P100
The text was updated successfully, but these errors were encountered: