Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

build error:apex #711

Open
wxd1995 opened this issue Apr 23, 2019 · 31 comments
Open

build error:apex #711

wxd1995 opened this issue Apr 23, 2019 · 31 comments

Comments

@wxd1995
Copy link

wxd1995 commented Apr 23, 2019

❓ Questions and Help

when i install the apex using the command "python setup.py install --cuda_ext --cpp_ext"
I get the error :

torch.__version__  =  1.1.0.dev20190422
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
from /usr/local/cuda/bin

Pytorch binaries were compiled with Cuda 9.0.176

running install
running bdist_egg
running egg_info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
reading manifest file 'apex.egg-info/SOURCES.txt'
writing manifest file 'apex.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'amp_C' extension
gcc -pthread -B /home/dy113/anaconda3/envs/maskrcnn_benchmark/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/TH -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/include/python3.7m -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda/bin/nvcc -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/TH -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/dy113/anaconda3/envs/maskrcnn_benchmark/include/python3.7m -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
csrc/type_shim.h(13): error: class "at::Type" has no member "scalarType"

1 error detected in the compilation of "/tmp/tmpxft_0000270c_00000000-6_multi_tensor_scale_kernel.cpp1.ii".
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1

Can anyone help me? Thank you very much!

@Tegala
Copy link

Tegala commented Apr 23, 2019

i got the same trouble, too......

@skzhang1
Copy link

what is your gcc version?

@Tegala
Copy link

Tegala commented Apr 23, 2019

what is your gcc version?

mine is gcc-5.2

@Yuliang-Zou
Copy link

Yuliang-Zou commented Apr 24, 2019

According to this issue, seems that apex can only be installed with CUDA10. My gcc version is 7.3, python is 3.6, and my pytorch version is 1.0.0. It works.

@mel-2445
Copy link

same error with cuda 10 on ubuntu 16
tried with gcc 5 and gcc 7, python 2.7 and python 3.6

@mel-2445
Copy link

workaround is to downgrade to pytorch nightly from a few days ago:
conda install pytorch-nightly=1.0.0.dev20190404 cudatoolkit=10.0 -c pytorch

@fmassa
Copy link
Contributor

fmassa commented Apr 25, 2019

cc @mcarilli are you aware of any recent breakages of apex with latest PyTorch nightly?

@mdsmith-cim
Copy link

There's already an issue at Apex (#267) with a PR (#272) that fixes it that apparently will be merged soon.

So if you need an immediate fix use the scalar_type branch of ptrblk's fork.

@mcarilli
Copy link

Thanks @mdsmith-cim for the concise, correct summary. Our fix will be merged tomorrow at the latest (I have some other commitments so I may not have time to review it in detail today).

@skzhang1
Copy link

@mdsmith-cim hi, sorry to bother you. I still have the same error after git your apex, Why is this, I look forward to your reply.

@MC-devel-staudt
Copy link

@zskadazhang Did you checkout the scalar_type branch?

@skzhang1
Copy link

@DavidSPumpkins Oh! It works,Thank you very much!

@ptrblck
Copy link

ptrblck commented Apr 26, 2019

@zskadazhang Good to hear it's working!
Please tag me in case you are running into issues related to this branch.

However, we should merge it to apex/master today so you can pull from the master branch again.

@ptrblck
Copy link

ptrblck commented Apr 26, 2019

The PR was merged so the build should work again using apex master. :)

@chengruizhe
Copy link

With torch 1.1.0.dev20190425, and the latest apex fix, I still get an error when I try to compile with python setup.py install --cuda_ext --cpp_ext . I'm using gcc 5.5. Can anyone please help? Much appreciated!


/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11089): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11100): error: argument of type "void *" is incompatible with parameter of type "float *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11109): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11120): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11129): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11140): error: argument of type "void *" is incompatible with parameter of type "double *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11149): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11160): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11169): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11180): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11189): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11200): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11209): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11220): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11229): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11240): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11249): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11260): error: argument of type "void *" is incompatible with parameter of type "int *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11269): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11280): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11289): error: argument of type "void *" is incompatible with parameter of type "long long *"

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11300): error: argument of type "void *" is incompatible with parameter of type "long long *"

92 errors detected in the compilation of "/tmp/tmpxft_00004034_00000000-6_multi_tensor_scale_kernel.cpp1.ii".
error: command '/usr/local/cuda-9.0/bin/nvcc' failed with exit status 1```

@skzhang1
Copy link

Sorry, I don not know why。My enviroment is CUDA10.1,GCC7.3。And I get APEX from @mdsmith-cim 。Maybe You can ask him。

@mcarilli
Copy link

@chengruizhe I've never seen this error before. Does it point to a particular line in the file?

@ptrblck
Copy link

ptrblck commented Apr 27, 2019

@chengruizhe @mcarilli
Could it be related to the gcc version?
Based on this information e.g. Ubuntu 16.04 should use GCC 5.3.1 for CUDA9.0.

@Tegala
Copy link

Tegala commented Apr 29, 2019

AttributeError: 'AmpState' object has no attribute 'opt_properties'

is there anyone got this problem? i build apex and maskrcnn-benchmark successful without any error.
my version informations are
cuda9.0, gcc 5.2, pytorch-nightly1.1 (Centos)
(i can run it successful under UbuntuOS with cuda9.0, gcc5.2, pytorch-nightly1.0.0...... )

@ptrblck
Copy link

ptrblck commented Apr 29, 2019

@Tegala Are you building apex from source or are you using an older version of apex?

@Tegala
Copy link

Tegala commented Apr 29, 2019

@Tegala Are you building apex from source or are you using an older version of apex?
Thanks for your reply!
I use the commad to huild apex:

git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

And I am sure I using the latest version of apex. this is strange...

@Tegala
Copy link

Tegala commented Apr 29, 2019

@Tegala Are you building apex from source or are you using an older version of apex?
Thanks for your reply!
I use the commad to huild apex:

git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

And I am sure I using the latest version of apex. this is strange...

The error info outputs:

2019-04-30 06:11:31,877 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
  File "tools/train_net.py", line 177, in <module>
    main()
  File "tools/train_net.py", line 170, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 76, in train
    arguments,
  File "/home/hjz/projects/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 79, in do_train
    with amp.scale_loss(losses, optimizer) as scaled_losses:
  File "/home/hjz/perl5/anaconda3/envs/deephaj-env/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/home/hjz/perl5/anaconda3/envs/deephaj-env/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/handle.py", line 78, in scale_loss
    if not _amp_state.opt_properties.enabled:
AttributeError: 'AmpState' object has no attribute 'opt_properties'

@ptrblck
Copy link

ptrblck commented Apr 29, 2019

Thanks for the information!
I'm trying to reproduce this issue. CC @mcarilli

@mcarilli
Copy link

mcarilli commented Apr 29, 2019

@Tegala Amp requires that

model, optimizer = amp.initialize(model, optimizer, opt_level=...)

be called before any invocation of

with amp.scale_loss(losses, optimizer) as scaled_loss:

.

If your code is somehow invoking with amp.scale_loss without ever invoking amp.initialize, the above error will result.

@Tegala
Copy link

Tegala commented Apr 30, 2019

@Tegala Amp要求

model, optimizer = amp.initialize(model, optimizer, opt_level=...)

在任何调用之前调用

with amp.scale_loss(losses, optimizer

如果您的代码以某种方式调用with amp.scale_loss而没有调用amp.initialize,将导致上述错误。

Thanks so much!
I check it again and find that It is just like what you said, now it works!

@aashokvardhan
Copy link

SeanNaren/deepspeech.pytorch#376

git clone --recursive https://github.com/NVIDIA/apex.git
cd apex && pip install .

This worked for me.

@mcarilli
Copy link

@aashokvardhan pip install . will perform a Python-only install, which is not ideal for performance. You should install with cuda and c++ extensions via

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

and only fall back to pip install . if the extension build doesn't work.

@joskaaaa
Copy link

Encountered this when using Ubuntu 18.04 | CUDA 9.0 and the default GCC/G++ in Ubuntu, version 7. The CUDA compiler is incompatible with GCC >= 6.4.

Solved it by installing GCC-5 and G++-5 ( sudo apt install gcc-5 g++-5 ), and setting them as higher priority using update alternatives:

sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 10

After this, Apex installed fine using the default instructions.

@ying-cai-cd
Copy link

My solution is:
sudo ln -sf /usr/bin/gcc-5 /usr/local/cuda-9.0/bin/gcc
sudo ln -sf /usr/bin/g++-5 /usr/local/cuda-9.0/bin/g++

on my Ubuntu 18.04, cuda 9.0, pytorch 1.1.0, python 3.6.

@Mahhos
Copy link

Mahhos commented Jan 29, 2020

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

I am trying to install apex on windows 10. I clone the apex from its repo and when I run the above command, I get this error: ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found. Do you have any idea how to resolve the issue?
python 3.6
gcc 5.3.0
torch 1.0.1

@ptrblck
Copy link

ptrblck commented Feb 5, 2020

@Mahhos Could you check your current working directory for the setup.py file?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests