Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JOSS REVIEW] CUDA Errors in Arch Linux #598

Closed
adam-m-jcbs opened this issue Jul 22, 2020 · 8 comments
Closed

[JOSS REVIEW] CUDA Errors in Arch Linux #598

adam-m-jcbs opened this issue Jul 22, 2020 · 8 comments
Labels
is:confirmed Someone confirmed this issue. reg:build This is related to the build system.

Comments

@adam-m-jcbs
Copy link

adam-m-jcbs commented Jul 22, 2020

This issue is part of the functionality aspect of a JOSS review (see #597)

I am attempting to build gingko on my local machine with CUDA and OMP on. However, I seem to run into some issues. I will report them below to see if anyone has an idea for a solution, and will continue to debug the issue on my machine to see if I discover anything.

After a fresh clone of ginkgo, in a build directory I execute a script containing a build command similar to the debug build (and I install locally in userspace, not to the system):

cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX='/home/ajacobs/Reporoot/ginkgo-cuda/joss_install' \
      -DGINKGO_BUILD_TESTS=ON -DGINKGO_BUILD_EXAMPLES=ON -DGINKGO_DOC_GENERATE_EXAMPLES=ON \
      -DGINKGO_BUILD_REFERENCE=ON -DGINKGO_BUILD_OMP=ON -DGINKGO_BUILD_CUDA=ON \
      -DGINKGO_DOC_GENERATE_PDF=ON .. && make

You can find the full log of this command's output here. Mainly, I get a CUDA-related error while linking like:

[ 28%] Linking CUDA executable exception_helpers
/usr/bin/ld: ../../../third_party/gtest/build/googlemock/gtest/./libgtest.a(gtest-all.cc.o): in function `testing::internal::JsonUnitTestResultPrinter::PrintJsonTestCase(std::ostream*, testing::TestCase const&)':
gtest-all.cc:(.text+0x2008c): undefined reference to `std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream()'
/usr/bin/ld: ../../../third_party/gtest/build/googlemock/gtest/./libgtest.a(gtest-all.cc.o): in function `testing::internal::JsonUnitTestResultPrinter::PrintJsonUnitTest(std::ostream*, testing::UnitTest const&)':
gtest-all.cc:(.text+0x20f1b): undefined reference to `std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream()'
/usr/bin/ld: ../../../third_party/gtest/build/googlemock/gtest/./libgtest.a(gtest-all.cc.o): in function `testing::internal::edit_distance::CreateUnifiedDiff(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, unsigned long)':
gtest-all.cc:(.text+0x233e3): undefined reference to `std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream()'
/usr/bin/ld: ../../../third_party/gtest/build/googlemock/gtest/./libgtest.a(gtest-all.cc.o): in function `testing::AssertionResult testing::internal::FloatingPointLE<float>(char const*, char const*, float, float)':
gtest-all.cc:(.text._ZN7testing8internal15FloatingPointLEIfEENS_15AssertionResultEPKcS4_T_S5_[_ZN7testing8internal15FloatingPointLEIfEENS_15AssertionResultEPKcS4_T_S5_]+0xfc): undefined reference to `std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream()'
/usr/bin/ld: gtest-all.cc:(.text._ZN7testing8internal15FloatingPointLEIfEENS_15AssertionResultEPKcS4_T_S5_[_ZN7testing8internal15FloatingPointLEIfEENS_15AssertionResultEPKcS4_T_S5_]+0x13a): undefined reference to `std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream()'
/usr/bin/ld: ../../../third_party/gtest/build/googlemock/gtest/./libgtest.a(gtest-all.cc.o):gtest-all.cc:(.text._ZN7testing8internal15FloatingPointLEIdEENS_15AssertionResultEPKcS4_T_S5_[_ZN7testing8internal15FloatingPointLEIdEENS_15AssertionResultEPKcS4_T_S5_]+0x124): more undefined references to `std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream()' follow
/usr/bin/ld: ../../../core/libginkgo.so.1.2.0: undefined reference to `std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream()@GLIBCXX_3.4.26'
collect2: error: ld returned 1 exit status
make[2]: *** [cuda/test/base/CMakeFiles/cuda_test_base_exception_helpers.dir/build.make:110: cuda/test/base/exception_helpers] Error 1
make[1]: *** [CMakeFiles/Makefile2:2907: cuda/test/base/CMakeFiles/cuda_test_base_exception_helpers.dir/all] Error 2
make: *** [Makefile:182: all] Error 2

I also early on get warnings like this:

CMake Warning (dev) in cuda/test/components/CMakeLists.txt:
  Policy CMP0104 is not set: CMAKE_CUDA_ARCHITECTURES now detected for NVCC,
  empty CUDA_ARCHITECTURES not allowed.  Run "cmake --help-policy CMP0104"
  for policy details.  Use the cmake_policy command to set the policy and
  suppress this warning.

  CUDA_ARCHITECTURES is empty for target
  "cuda_test_components_sorting_kernels".
This warning is for project developers.

If I do a build without OMP and CUDA, I can successfully build and tests run with reasonable output.

My system:

linux kernel 5.7.9
CUDA 10.2.89
cmake 3.18.0
gcc 10.1.0
clang 10.0.0
openmp (LLVM runtime library) 10.0.0
@adam-m-jcbs
Copy link
Author

@upsj has helpfully pointed out that this overlaps with another (closed) issue (#579 ) and could be a consequence of the latest-stable nature of Arch's package repositories.

I will try the given workaround and report back. If a documented workaround exists, that is acceptable for the purposes of functionality.

@adam-m-jcbs
Copy link
Author

adam-m-jcbs commented Jul 22, 2020

Indeed, as pointed out in linked discussions, CUDA generally supports gcc8 when it comes to gcc. Arch does install gcc8 as a dependency for CUDA, but as an explicit command gcc-8. When building gingko with CUDA components in Arch or other distros with multiple compiler versions system-wide, you should thus explicitly configure the build compiler.

I tried using the documented host compiler option:

cmake -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-8 [other options] .. && make

However, this proves insufficient. I get linking errors again. However, by forcing gcc-8 globally:

CC=gcc-8 CXX=g++-8 cmake [other options] .. && make

I am able to successfully build.

I will now verify that the executables behave as documented and exercise the GPU, but the successful build is promising for the completion of the functionality review.

Thanks for the timely pointing out of what's already known about this issue!

@pratikvn pratikvn added reg:build This is related to the build system. is:confirmed Someone confirmed this issue. labels Jul 22, 2020
@adam-m-jcbs adam-m-jcbs changed the title [JOSS REVIEW] CUDA Build Errors in Arch Linux [JOSS REVIEW] CUDA Errors in Arch Linux Jul 22, 2020
@adam-m-jcbs
Copy link
Author

adam-m-jcbs commented Jul 22, 2020

The build goes well, however the minimal-cuda-solver hangs indefinitely (non-CUDA examples, like simple-solver continue to work and behave as expected).

I confirm through nvidia-smi that the process is launched on the GPU, but there is no utilization and it hangs.

To verify my CUDA install, I built, executed, and profiled a simple test program from NVIDIA using nvcc (obviously this is much simpler than a ginkgo build!). So, my CUDA install is not clearly broken, at least.

Is there a way the developers recommend for producing some debug logs to help diagnose why the CUDA solver is hanging?

For reference, here's smi:

❯ nvidia-smi
Wed Jul 22 17:18:05 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.57       Driver Version: 450.57       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro T2000        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   50C    P8     2W /  N/A |    260MiB /  3911MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      6599      C   ./minimal-cuda-solver             257MiB |
+-----------------------------------------------------------------------------+

@nbeams
Copy link
Collaborator

nbeams commented Jul 22, 2020

This may be unrelated, but when you were having build errors earlier, you mentioned warnings about CUDA architectures being empty. What is your value for the CMake variable GINKGO_CUDA_ARCHITECTURES after configuring? I remember having similar strange behavior (hanging, not errors) when I accidentally tried to run a test on a node with an older GPU than the one I had used to build Ginkgo. (That's obviously not what is going on here, but if the architecture information isn't being detected correctly, the end result could be the same...)

@upsj
Copy link
Member

upsj commented Jul 23, 2020

@nbeams That's a good observation. Unfortunately, the warnings are probably unrelated to the problems, since they don't come from Ginkgo's CUDA architecture selection (CudaArchitectureSelector), but from the most recent release of CMake, where the same capabilities have been added natively. Before we try to debug this at runtime, can you post the contents of your CMakeCache.txt file for us to see the whole build configuration?

@upsj
Copy link
Member

upsj commented Jul 23, 2020

Ah wait, I realized what the issue is:

minimal-cuda-solver requires input from the user, so you need to concatenate the inputs together and pipe them to the example, see the example documentation:

cat data/A.mtx data/b.mtx data/x0.mtx | ./minimal-cuda-solver

@thoasm
Copy link
Member

thoasm commented Jul 23, 2020

@adam-m-jcbs The CMAKE_CUDA_HOST_COMPILER specifies the C++ compiler used inside a CUDA file to compile host code.
The variable you want to set is CMAKE_CXX_COMPILER="g++-8", so g++-8 is used for all CPU code (for completeness, you might also want to set CMAKE_CUDA_HOST_COMPILER=g++-8, so both are guaranteed to be the same).
The problem (also described in the Known Issues) is that the host compiler of CUDA (here g++-8) needs to link against core (here, compiled with g++-10) for some parts of the code (a couple of CUDA tests and the example custom-matrix-format), which does not define the symbols g++-10 requests, resulting in an error.
Also, this apparently does not happen on all platforms. We currently only encountered it with ArchLinux.

@adam-m-jcbs
Copy link
Author

@upsj ah, yes, thank you! I was not running the solver with input, so makes sense it would hang! That's on me for not reading the docs carefully enough.

I'll read the docs more carefully for the other tests (notably the 27 pt stencil).

I would recommend going through the examples documentation and ensuring all example problems provide model input and explicitly give the command that generates the output given in the "Results" section of each example. I perhaps missed it, but it's unclear, for example, how to run the 27 pt stencil (though I could figure it out by inspecting the code enough, it is convenient to have in the documents a simple example of execution, especially the exact command used to produce expected outputs).

After providing the proper input, I can confirm the build as given works and yields a well-behaving minimal-cuda-solver executable.

@thoasm Thanks for the extra info. As I understand it, Arch's package manager installs CUDA such that it natively utilizes the gcc-8 dependency (for C and C++) without intervention. Thus, I found I did not need to set the CUDA host compiler, but it's possibly good to do just for safety, as you say.

I agree it's better practice and a bit safer to use the cmake variables rather than make implicit variables like CC.

Thus, I rewrote my build command:

cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX='/home/ajacobs/Reporoot/ginkgo-3/joss_install' \
      -DGINKGO_BUILD_TESTS=ON -DGINKGO_BUILD_EXAMPLES=ON -DGINKGO_DOC_GENERATE_EXAMPLES=ON \
      -DGINKGO_BUILD_REFERENCE=ON -DGINKGO_BUILD_OMP=ON -DGINKGO_BUILD_CUDA=ON \
      -DCMAKE_CXX_COMPILER=g++-8 -DCMAKE_CC_COMPILER=gcc-8 \
      -DGINKGO_DOC_GENERATE_PDF=ON .. && make -j6

But the issue proved to be the simpler one: programs don't behave well when you don't provide expected input!

@nbeams thanks for the info! I believe @upsj is correct, though, that this is not the issue. I was wondering about the CUDA architecture variable, but figured it was more relevant for devs.

And though things are working, for reference here is the CMakeCache.txt of the successful build with functional CUDA components that produce the documented expected output.

Thanks for much for all the timely input from the devs! This issue is resolved and closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:confirmed Someone confirmed this issue. reg:build This is related to the build system.
Projects
None yet
Development

No branches or pull requests

5 participants