Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trilinos SYCL build failing: llvm-foreach: Error: Device name missing. #12420

Open
eugeneswalker opened this issue Oct 18, 2023 · 23 comments
Open
Labels
type: bug The primary issue is a bug in Trilinos code or tests

Comments

@eugeneswalker
Copy link

eugeneswalker commented Oct 18, 2023

Bug Report

Unsure who to ping here. @nchaimov @sameershende

Description

Trying to build trilinos@develop w/ SYCL using OneAPI Toolkit 2023.2.1 for use with Intel A770.

This is failing with the error shown below:

Steps to Reproduce

Reproduced here on the Mothra system on UO Frank, which has an Intel A770.

Using Docker image esw123/trilinos-intel:2023.10.18 where all dependencies and artifacts needed for reproduction are already present.

$mothra> docker run -it --name trilinos-sycl --device /dev/dri esw123/trilinos-intel:2023.10.18

root@9142987ed062:/#  icpx --version
Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230721)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2023.2.1/linux/bin-llvm
Configuration file: /opt/intel/oneapi/compiler/2023.2.1/linux/bin-llvm/../bin/icpx.cfg

root@9142987ed062:/# clinfo | grep -i "device name"
...
  Device Name                                     Intel(R) Arc(TM) A770 Graphics
...

root@9142987ed062:/# git clone https://github.com/trilinos/Trilinos --branch develop

root@9142987ed062:/# git -C Trilinos log -1
commit 5f0945bd3b3311e0dc2d9dee010c795645d67b0f (HEAD -> develop, origin/develop)
Merge: 22445c8febb cefe057dd21
Author: trilinos-autotester <trilinos@sandia.gov>
Date:   Wed Oct 18 01:16:07 2023 -0500

    Merge Pull Request #12413 from cgcgcg/Trilinos/rolFix

    Automatically Merged using Trilinos Pull Request AutoTester
    PR Title: b'ROL: Allow use of MPI_COMM_WORLD as default arg'
    PR Author: cgcgcg

root@9142987ed062:/# spack env activate -d .

root@9142987ed062:/Trilinos# nohup bash -c "time spack dev-build -j48 trilinos@develop +testing +amesos +amesos2 +anasazi +aztec +belos +boost +epetra +epetraext +ifpack +ifpack2 +intrepid ~intrepid2 +isorropia +kokkos +ml +minitensor +muelu +nox +piro +phalanx +rol +rythmos +sacado +stk +shards +shylu +stokhos +stratimikos +teko +tempus +tpetra +trilinoscouplings +zoltan +zoltan2 +superlu-dist gotype=long_long ~wrapper cxxstd=17 +sycl ~cuda ~rocm" &
...
==> Installing trilinos-develop-ahryres3wxf67omjsmzdc6vxp6mih47v [19/19]
==> No binary for trilinos-develop-ahryres3wxf67omjsmzdc6vxp6mih47v found: installing from source
==> No patches needed for trilinos
==> trilinos: Executing phase: 'cmake'
...
# See attached file for CMake output
...
Command was: /usr/bin/ocloc -output /tmp/gtest-all-9e4da6-b8880d.out -file /tmp/icpx-1ec28e/gtest-all-0f3e56-531212.spv -output_no_suffix -spirv_input
llvm-foreach:
Error: Device name missing.
Command was: /usr/bin/ocloc -output /tmp/gtest-all-9e4da6-ced84f.out -file /tmp/icpx-1ec28e/gtest-all-0f3e56-3a2ae9.spv -output_no_suffix -spirv_input
llvm-foreach:
Error: Device name missing.
Command was: /usr/bin/ocloc -output /tmp/gtest-all-9e4da6-3e69cb.out -file /tmp/icpx-1ec28e/gtest-all-0f3e56-0f875b.spv -output_no_suffix -spirv_input
llvm-foreach:
icpx: error: gen compiler command failed with exit code 226 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230721)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2023.2.1/linux/bin-llvm
Configuration file: /opt/intel/oneapi/compiler/2023.2.1/linux/bin-llvm/../bin/icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.
make[2]: *** [packages/sacado/test/GTestSuite/googletest/googletest/CMakeFiles/sacado-gtest.dir/build.make:100: lib/libsacado-gtest.so.1.10.0] Error 1
make[2]: Leaving directory '/Trilinos/spack-build-ahryres'
make[1]: *** [CMakeFiles/Makefile2:21110: packages/sacado/test/GTestSuite/googletest/googletest/CMakeFiles/sacado-gtest.dir/all] Error 2
2 warnings generated.
...
     22265    [ 46%] Generating Tempus_ForwardEuler_NumberOfTimeSteps.xml
     22266    cd /Trilinos/spack-build-ahryres/packages/tempus/test/ForwardEuler && /spack/opt/spack/linux-ubuntu22.04-x86_64/oneapi-2023.2.1/cmake-3.27.6-xamzklhqtqjfvipo6vr556odicn
              y5cha/bin/cmake -E copy /Trilinos/packages/tempus/test/ForwardEuler/Tempus_ForwardEuler_NumberOfTimeSteps.xml /Trilinos/spack-build-ahryres/packages/te
mpus/test/Forward
              Euler/Tempus_ForwardEuler_NumberOfTimeSteps.xml
  >> 22267    Error: Device name missing.
     22268    Command was: /usr/bin/ocloc -output /tmp/Shards_Array-f1dfb5-a54e83.out -file /tmp/icpx-6003ce/Shards_Array-fa7881-53881d.spv -output_no_suffix -spirv_input
     22269    llvm-foreach:
     22270    [ 46%] Building C object packages/zoltan/src/CMakeFiles/zoltan.dir/Utilities/shared/zoltan_align.c.o
     22271    [ 46%] Generating geometricTest.xml
  >> 22272    icpx: error: gen compiler command failed with exit code 226 (use -v to see invocation)
     22273    Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230721)
     22274    Target: x86_64-unknown-linux-gnu
     22275    Thread model: posix
     22276    InstalledDir: /opt/intel/oneapi/compiler/2023.2.1/linux/bin-llvm
...

CMake output here: trilinos-sycl-cmake.txt

Concretization / Dependency Information
 -   5pmtmsw  trilinos@develop%oneapi@2023.2.1~adelus~adios2+amesos+amesos2+anasazi+aztec~basker+belos+boost~chaco~complex~cuda~cuda_rdc~debug~dtk+epetra+epetraext~epetraextbtf~epetraextexperimental~epetraextgraphreorderings~exodus+explicit_template_instantiation~float+fortran~gtest~hdf5~hypre+ifpack+ifpack2+intrepid~intrepid2~ipo+isorropia+kokkos~mesquite+minitensor+ml+mpi+muelu~mumps+nox~openmp~panzer+phalanx+piro~python~rocm~rocm_rdc+rol+rythmos+sacado~scorec+shards+shared+shylu+stk+stokhos+stratimikos~strumpack~suite-sparse~superlu+superlu-dist+sycl+teko+tempus+testing+thyra+tpetra+trilinoscouplings~wrapper~x11+zoltan+zoltan2 build_system=cmake build_type=Release cxxstd=17 generator=make gotype=long_long arch=linux-ubuntu22.04-x86_64
[+]  nbqn2dk      ^boost@1.83.0%oneapi@2023.2.1~atomic~chrono~clanglibcpp~container~context~contract~coroutine~date_time~debug+exception~fiber~filesystem+graph~graph_parallel~icu~iostreams~json~locale~log+math+mpi+multithreaded~nowide~numpy~pic~program_options~python~random~regex~serialization+shared~signals~singlethreaded+stacktrace~system~taggedlayout~test~thread~timer~type_erasure~versionedlayout~wave build_system=generic cxxstd=17 patches=8e3faa2,a440f96 visibility=hidden arch=linux-ubuntu22.04-x86_64
[+]  xamzklh      ^cmake@3.27.6%oneapi@2023.2.1~doc+ncurses+ownlibs build_system=generic build_type=Release arch=linux-ubuntu22.04-x86_64
[+]  srtnhbu          ^curl@8.1.2%oneapi@2023.2.1~gssapi~ldap~libidn2~librtmp~libssh~libssh2+nghttp2 build_system=autotools libs=shared,static tls=openssl arch=linux-ubuntu22.04-x86_64
[+]  w2oyihn              ^nghttp2@1.52.0%oneapi@2023.2.1 build_system=autotools arch=linux-ubuntu22.04-x86_64
[+]  qraiuh5              ^openssl@3.1.3%oneapi@2023.2.1~docs+shared build_system=generic certs=mozilla arch=linux-ubuntu22.04-x86_64
[+]  qqkmghd                  ^ca-certificates-mozilla@2023-05-30%oneapi@2023.2.1 build_system=generic arch=linux-ubuntu22.04-x86_64
[+]  udbz4qj          ^ncurses@6.4%oneapi@2023.2.1~symlinks+termlib abi=none build_system=autotools arch=linux-ubuntu22.04-x86_64
[+]  ezzj6si      ^gmake@4.4.1%oneapi@2023.2.1~guile build_system=autotools arch=linux-ubuntu22.04-x86_64
[+]  h2wbnwr      ^hwloc@2.9.1%oneapi@2023.2.1~cairo~cuda~gl~libudev+libxml2~netloc~nvml~oneapi-level-zero~opencl+pci~rocm build_system=autotools libs=shared,static arch=linux-ubuntu22.04-x86_64
[+]  c4recu5          ^libpciaccess@0.17%oneapi@2023.2.1 build_system=autotools arch=linux-ubuntu22.04-x86_64
[+]  22ychgn              ^libtool@2.4.7%oneapi@2023.2.1 build_system=autotools arch=linux-ubuntu22.04-x86_64
[+]  3bilsix                  ^m4@1.4.19%oneapi@2023.2.1+sigsegv build_system=autotools patches=9dc5fbd,bfdffa7 arch=linux-ubuntu22.04-x86_64
[+]  wqbaheo                      ^libsigsegv@2.14%oneapi@2023.2.1 build_system=autotools arch=linux-ubuntu22.04-x86_64
[+]  6mvrdqj              ^util-macros@1.19.3%oneapi@2023.2.1 build_system=autotools arch=linux-ubuntu22.04-x86_64
[+]  xbwqrqb          ^libxml2@2.10.3%oneapi@2023.2.1+pic~python+shared build_system=autotools arch=linux-ubuntu22.04-x86_64
[+]  ciplfmc              ^libiconv@1.17%oneapi@2023.2.1 build_system=autotools libs=shared,static arch=linux-ubuntu22.04-x86_64
[+]  dtwuctr              ^xz@5.4.1%oneapi@2023.2.1~pic build_system=autotools libs=shared,static arch=linux-ubuntu22.04-x86_64
[+]  bwabxoa          ^pkgconf@1.9.5%oneapi@2023.2.1 build_system=autotools arch=linux-ubuntu22.04-x86_64
[+]  rie477q      ^metis@5.1.0%oneapi@2023.2.1~gdb~int64~ipo~real64+shared build_system=cmake build_type=Release generator=make patches=4991da9,93a7903 arch=linux-ubuntu22.04-x86_64
[e]  x3bdfob      ^mpich@4.1.2%oneapi@2023.2.1~argobots~cuda+fortran~hwloc+hydra+libxml2+pci~rocm+romio~slurm~vci~verbs~wrapperrpath build_system=autotools datatype-engine=auto device=ch4 netmod=ofi pmi=pmi arch=linux-ubuntu22.04-x86_64
[+]  2nuaat5      ^openblas@0.3.24%oneapi@2023.2.1~bignuma~consistent_fpcsr+fortran~ilp64+locking+pic+shared build_system=makefile symbol_suffix=none threads=none arch=linux-ubuntu22.04-x86_64
[+]  lx4hsqd          ^perl@5.38.0%oneapi@2023.2.1+cpanm+opcode+open+shared+threads build_system=generic patches=714e4d1 arch=linux-ubuntu22.04-x86_64
[+]  v6zeyam              ^berkeley-db@18.1.40%oneapi@2023.2.1+cxx~docs+stl build_system=autotools patches=26090f4,b231fcc arch=linux-ubuntu22.04-x86_64
[+]  libkwyw              ^bzip2@1.0.8%oneapi@2023.2.1~debug~pic+shared build_system=generic arch=linux-ubuntu22.04-x86_64
[+]  ljoix7a                  ^diffutils@3.9%oneapi@2023.2.1 build_system=autotools arch=linux-ubuntu22.04-x86_64
[+]  mcfkdnn              ^gdbm@1.23%oneapi@2023.2.1 build_system=autotools arch=linux-ubuntu22.04-x86_64
[+]  taavmg5                  ^readline@8.2%oneapi@2023.2.1 build_system=autotools patches=bbf97f1 arch=linux-ubuntu22.04-x86_64
[+]  fbgbpcd      ^parmetis@4.0.3%oneapi@2023.2.1~gdb~int64~ipo+shared build_system=cmake build_type=Release generator=make patches=4f89253,50ed208,704b84f arch=linux-ubuntu22.04-x86_64
[+]  jvykiib      ^superlu-dist@develop%oneapi@2023.2.1~cuda~int64~ipo~openmp~rocm+shared build_system=cmake build_type=Release generator=make arch=linux-ubuntu22.04-x86_64
[+]  szyxzkf      ^zlib-ng@2.1.3%oneapi@2023.2.1+compat+opt build_system=autotools patches=299b958,ae9077a,b692621 arch=linux-ubuntu22.04-x86_64
@eugeneswalker eugeneswalker added the type: bug The primary issue is a bug in Trilinos code or tests label Oct 18, 2023
@eugeneswalker eugeneswalker changed the title Trilinos SYCL build failing: unsure of issue Trilinos SYCL build failing: Error: Device name missing. Oct 18, 2023
@eugeneswalker eugeneswalker changed the title Trilinos SYCL build failing: Error: Device name missing. Trilinos SYCL build failing: llvm-foreach: Error: Device name missing. Oct 18, 2023
@csiefer2
Copy link
Member

csiefer2 commented Oct 18, 2023

@eugeneswalker AFAIK, that is a "compiler is not installed correctly on the system" error. Progress on #12295 is blocked on the same problem.

@rppawlo
Copy link
Contributor

rppawlo commented Oct 18, 2023

Not all of Trilinos has been ported to SYCL. Only packages that were ECP funded could work on sycl port. I know that sacado, phalanx and panzer will not work on that hardware. There are probably other packages as well.

@eugeneswalker
Copy link
Author

@eugeneswalker AFAIK, that is a "compiler is not installed correctly on the system" error. Progress on #12295 is blocked on the same problem.

Do you know how to make this print the exact icpx invocation that triggered the error?

  >> 22267    Error: Device name missing.
     22268    Command was: /usr/bin/ocloc -output /tmp/Shards_Array-f1dfb5-a54e83.out -file /tmp/icpx-6003ce/Shards_Array-fa7881-53881d.spv -output_no_suffix -spirv_input
     22269    llvm-foreach:
     22270    [ 46%] Building C object packages/zoltan/src/CMakeFiles/zoltan.dir/Utilities/shared/zoltan_align.c.o
     22271    [ 46%] Generating geometricTest.xml
  >> 22272    icpx: error: gen compiler command failed with exit code 226 (use -v to see invocation)
     22273    Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230721)
     22274    Target: x86_64-unknown-linux-gnu
     22275    Thread model: posix
     22276    InstalledDir: /opt/intel/oneapi/compiler/2023.2.1/linux/bin-llvm

@csiefer2
Copy link
Member

@eugeneswalker

make VERBOSE=1

@eugeneswalker
Copy link
Author

eugeneswalker commented Oct 19, 2023

Interestingly, if I build kokkos@4.1.00 +sycl separately (i.e. not vendored by Trilinos), and then use TPL_ENABLE_KOKKOS, not only does Trilinos build OK, but it seems to work OK on the Intel GPU. Doing the build this way, I am able to run, for instance, the tpetra ctests and verify a number of them are being run on our Intel A770 GPU. Perhaps there is a tweak required in how Trilinos drives the vendored build of Kokkos?

...
==> trilinos: Successfully installed trilinos-develop-wykgsmgnjcohbrmyt5dzgcan6xpk3tac
  Stage: 0.00s.  Cmake: 54.86s.  Build: 31m 48.99s.  Install: 11.68s.  Post-install: 1.96s.  Total: 32m 58.53s
[+] /spack/opt/spack/linux-ubuntu22.04-x86_64/oneapi-2023.2.1/trilinos-develop-wykgsmgnjcohbrmyt5dzgcan6xpk3tac
$> ctest -R TpetraCore_
...
        Start 566: TpetraCore_Albany182_MPI_4
 93/265 Test #566: TpetraCore_Albany182_MPI_4 ..................................................................   Passed    0.18 sec
        Start 567: TpetraCore_Directory_UnitTests_MPI_4
 94/265 Test #567: TpetraCore_Directory_UnitTests_MPI_4 ........................................................   Passed    0.18 sec
        Start 568: TpetraCore_Directory_Issue6987_MPI_4
 95/265 Test #568: TpetraCore_Directory_Issue6987_MPI_4 ........................................................   Passed    0.18 sec
        Start 569: TpetraCore_Directory_Issue7223_MPI_4
 96/265 Test #569: TpetraCore_Directory_Issue7223_MPI_4 ........................................................   Passed    0.17 sec
        Start 570: TpetraCore_Distributor_UnitTests_MPI_4
 97/265 Test #570: TpetraCore_Distributor_UnitTests_MPI_4 ......................................................   Passed    0.17 sec
        Start 571: TpetraCore_Distributor_CreateFromSendsAndRecvs_MPI_4
 98/265 Test #571: TpetraCore_Distributor_CreateFromSendsAndRecvs_MPI_4 ........................................   Passed    0.37 sec
        Start 572: TpetraCore_Issue1454_MPI_4
 99/265 Test #572: TpetraCore_Issue1454_MPI_4 ..................................................................   Passed    0.18 sec
        Start 573: TpetraCore_Issue1752_MPI_2
...
Screenshot 2023-10-19 at 3 55 00 PM

@csiefer2
Copy link
Member

@bartlettroscoe Comments on the above?

@bartlettroscoe
Copy link
Member

and then use TPL_ENABLE_KOKKOS

@eugeneswalker, CMake variables are case sensitive, so if you want to build Trilinos against a pre-installed Kokkos, you should have to configure with:

   -D TPL_ENABLE_Kokkos=ON

If you try setting:

   -D TPL_ENABLE_KOKKOS=ON

you should get a warning at the end of the CMake STDOUT that the variable TPL_ENABLE_KOKKOS is runread.

@bartlettroscoe
Copy link
Member

Not all of Trilinos has been ported to SYCL. Only packages that were ECP funded could work on sycl port. I know that sacado, phalanx and panzer will not work on that hardware. There are probably other packages as well.

@jwillenbring and @rppawlo, do you know what internal SNL program is funding Trilinos support of SYCL? You can contact me offline to respond.

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Oct 20, 2023

Perhaps there is a tweak required in how Trilinos drives the vendored build of Kokkos?

@eugeneswalker, as far as I know, there should be no difference in behavior for downstream Trilinos packages depending on if Kokkos is built as part of Trilinos or pulled in from a prebuilt Kokkos. But it is possible given that Kokkos does not export every variable from its installed KokkosConfig.cmake file that it makes visible as a cache variable when building as part of Trilinos. So it is possible that there is some difference. I can almost guarantee this would be due to some difference in variable handling in these two cases.

For starters, we would need to be able to inspect the produced cmake STDOUT and CMakeCache.txt files produced by both methods. Can we get those? That may provide a clue to what is happening without having to debug these locally. But otherwise, we would need to be able to reproduce both cases locally to debug something like this. (Then the debugging process should be pretty straightforward, if not tedious.)

But then there is the official Trilinos policy:

so I am not sure that we are authorized to support this build configuration yet since there is no official Trilinos SYCL build configuration that posts to the Trilinos CDash site that duplicates this configuration.

Reproduced here on the Mothra system on UO Frank, which has an Intel A770.

@jwillenbring and @rppawlo, how would Trilinos developers be expected to reproduce these builds on UO Frank using these Spack containers? Given the info above, I am not sure I know how to do that. I think I would need the exact set of commands to run.

@eugeneswalker
Copy link
Author

and then use TPL_ENABLE_KOKKOS

@eugeneswalker, CMake variables are case sensitive, so if you want to build Trilinos against a pre-installed Kokkos, you should have to configure with:

   -D TPL_ENABLE_Kokkos=ON

If you try setting:

   -D TPL_ENABLE_KOKKOS=ON

you should get a warning at the end of the CMake STDOUT that the variable TPL_ENABLE_KOKKOS is runread.

I used the correct case when doing the build, just did not notice that when I wrote the post here. The build with external kokkos worked.

@srajama1
Copy link
Contributor

@eugeneswalker Can you describe what your end goal is? It is better not to enable all of trilinos for SYCL. Please reach out to @iyamazaki @lucbv to get the subset we have ported to this backend to support ECP. Testing those would be nice.

@lucbv
Copy link
Contributor

lucbv commented Oct 20, 2023

We have only worked on the new solvers stack that would be: Tpetra, Ifpack2, Belos, Amesos2, Xpetra, Zoltan2 and MueLu.

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Oct 20, 2023

We have only worked on the new solvers stack that would be: Tpetra, Ifpack2, Belos, Amesos2, Xpetra, Zoltan2 and MueLu.

We could make it so that Trilinos disables, by default, any packages that don't support SYCL when TPL_ENABLE_SYCL=ON is set and could error out if the user tries to enable any packages that don't support SYCL. (But we could define a configure option like Trilinos_ALLOW_ENABLE_OF_UNSUPPORTED_SCYL_PACKAGES that, if set, would allow the explicit enable of these non-supported packages if someone really wanted to try to build them and get them working on their own time. But no-one should be able to accidentally enable these unsupported packages.)

Then, we can deal with the packages that should support SYCL.

Update: See the new issue:

@crtrott
Copy link
Member

crtrott commented Oct 20, 2023

That said, it looks like it does work with external Kokkos, so it seems it might be an internal cmake issue in Trilinos and or Kokkos?

@srajama1
Copy link
Contributor

@crtrott That comment on external Kokkos is confusing to me. The original build had so many packages enabled most of which do not support SYCL. With external Kokkos, Tpetra tests passed. This is what we would expect. I would also expect that Tpetra tests pass with the Kokkos in Trilinos.

I don't know if that comment means rest of packages built fine or just Tpetra built fine. Let us wait for @eugeneswalker to respond. Even if the packages built fine, I would not trust anything there as we do not support/test SYCL with most of the packages except the packages listed by @lucbv above + Kokkos Core + Kokkos Kernels.

@crtrott
Copy link
Member

crtrott commented Oct 20, 2023

Hm yeah if it was only Tpetra that might be the difference. I assumed it was the same configure with external Kokkos or internal.

@eugeneswalker
Copy link
Author

I don't know if that comment means rest of packages built fine or just Tpetra built fine. Let us wait for @eugeneswalker to respond.

Everything about the Trilinos configuration was the same except that I had it use external Kokkos. The same set of packages were turned on. I only ran the tpetra ctests.

trilinos@develop +testing +amesos +amesos2 +anasazi +aztec +belos +boost +epetra +epetraext +ifpack +ifpack2 +intrepid ~intrepid2 +isorropia +kokkos +ml +minitensor +muelu +nox +piro +phalanx +rol +rythmos +sacado +stk +shards +shylu +stokhos +stratimikos +teko +tempus +tpetra +trilinoscouplings +zoltan +zoltan2 +superlu-dist gotype=long_long ~wrapper cxxstd=17 +sycl ~cuda ~rocm

@bartlettroscoe
Copy link
Member

@srajama1, @crtrott, @lucbv,

FYI: From looking at the configure output given above, we see the set of enabled and disabled packages:

  • Final set of enabled top-level packages: Kokkos Teuchos KokkosKernels RTOp Sacado MiniTensor Epetra Zoltan Shards Triutils EpetraExt Tpetra TrilinosSS Thyra Xpetra Isorropia AztecOO Galeri Amesos Zoltan2Core Ifpack ML Belos Amesos2 Anasazi Ifpack2 Stratimikos Teko Intrepid STK Phalanx NOX MueLu Zoltan2 ShyLU_DD ShyLU Tempus Stokhos ROL Piro TrilinosCouplings 41

  • Final set of enabled packages: Kokkos TeuchosCore TeuchosParser TeuchosParameterList TeuchosComm TeuchosNumerics TeuchosRemainder TeuchosKokkosCompat TeuchosKokkosComm Teuchos KokkosKernels RTOp Sacado MiniTensor Epetra Zoltan Shards Triutils EpetraExt TpetraTSQR TpetraCore Tpetra TrilinosSS ThyraCore ThyraEpetraAdapters ThyraEpetraExtAdapters ThyraTpetraAdapters Thyra Xpetra Isorropia AztecOO Galeri Amesos Zoltan2Core Ifpack ML Belos Amesos2 Anasazi Ifpack2 Stratimikos Teko Intrepid STKUtil STKCoupling STKMath STKSimd STKExprEval STKTopology STKSearch STKTransfer STKMesh STKUnit_tests STKDoc_tests STKEmend STK Phalanx NOX MueLu Zoltan2 ShyLU_DDFROSch ShyLU_DDCore ShyLU_DD ShyLU Tempus Stokhos ROL Piro TrilinosCouplings 69

  • Final set of non-enabled top-level packages: TrilinosFrameworkTests TrilinosATDMConfigTests Gtest Pliris Pamgen ShyLU_Node SEACAS Intrepid2 Compadre Percept Krino Zoltan2Sphynx Panzer PyTrilinos NewPackage Adelus TrilinosBuildStats TrilinosInstallTests 18

  • Final set of non-enabled packages: TrilinosFrameworkTests TrilinosATDMConfigTests Gtest Pliris Pamgen ShyLU_NodeHTS ShyLU_NodeTacho ShyLU_NodeBasker ShyLU_NodeFastILU ShyLU_Node SEACASExodus SEACASExodus_for SEACASExoIIv2for32 SEACASNemesis SEACASIoss SEACASChaco SEACASAprepro_lib SEACASSupes SEACASSuplib SEACASSuplibC SEACASSuplibCpp SEACASSVDI SEACASPLT SEACASAlgebra SEACASAprepro SEACASBlot SEACASConjoin SEACASEjoin SEACASEpu SEACASCpup SEACASExo2mat SEACASExodiff SEACASExomatlab SEACASExotxt SEACASExo_format SEACASEx1ex2v2 SEACASExotec2 SEACASFastq SEACASGjoin SEACASGen3D SEACASGenshell SEACASGrepos SEACASExplore SEACASMapvarlib SEACASMapvar SEACASMapvar-kd SEACASMat2exo SEACASNas2exo SEACASZellij SEACASNemslice SEACASNemspread SEACASNumbers SEACASSlice SEACASTxtexo SEACASEx2ex1v2 SEACAS Intrepid2 Compadre STKNGP_TEST STKMiddle_mesh STKIO STKTools STKBalance STKUnit_test_utils STKSearchUtil STKTransferUtil Percept Krino Zoltan2Sphynx ShyLU_DDCommon PanzerCore PanzerDofMgr PanzerDiscFE PanzerAdaptersSTK PanzerMiniEM PanzerExprEval Panzer PyTrilinos NewPackage Adelus TrilinosBuildStats TrilinosInstallTests 82

  • Final set of enabled top-level external packages/TPLs: MPI BLAS LAPACK Boost METIS ParMETIS Zlib SuperLUDist DLlib 9

  • Final set of enabled external packages/TPLs: MPI BLAS LAPACK Boost METIS ParMETIS Zlib SuperLUDist DLlib 9

So it looks like most of Trilinos is being enabled in that configuration. Which of those packages listed under Final set of enabled packages use Kokkos and Tpetra but are known not to work with SYCL?

@eugeneswalker
Copy link
Author

I don't know if that comment means rest of packages built fine or just Tpetra built fine. Let us wait for @eugeneswalker to respond.

Everything about the Trilinos configuration was the same except that I had it use external Kokkos. The same set of packages were turned on. I only ran the tpetra ctests.

I can try the build again, using internal Kokkos, but with everything turned off except the packages identified a few comments above as having support for SYCL. Is that worth trying as part of the process here?

@srajama1
Copy link
Contributor

@eugeneswalker This is strange behavior for sure. In theory, it shouldn't matter which version of Kokkos you use, SYCL support in packages shouldn't be affected by those.

From the configure:

Final set of enabled packages: Kokkos TeuchosCore TeuchosParser TeuchosParameterList TeuchosComm TeuchosNumerics TeuchosRemainder TeuchosKokkosCompat TeuchosKokkosComm Teuchos KokkosKernels RTOp Sacado MiniTensor Epetra Zoltan Shards Triutils EpetraExt TpetraTSQR TpetraCore Tpetra TrilinosSS ThyraCore ThyraEpetraAdapters ThyraEpetraExtAdapters ThyraTpetraAdapters Thyra Xpetra Isorropia AztecOO Galeri Amesos Zoltan2Core Ifpack ML Belos Amesos2 Anasazi Ifpack2 Stratimikos Teko Intrepid STKUtil STKCoupling STKMath STKSimd STKExprEval STKTopology STKSearch STKTransfer STKMesh STKUnit_tests STKDoc_tests STKEmend STK Phalanx NOX MueLu Zoltan2 ShyLU_DDFROSch ShyLU_DDCore ShyLU_DD ShyLU Tempus Stokhos ROL Piro TrilinosCouplings 69

This is the set of packages that we have tested with SYCL:
We have only worked on the new solvers stack that would be: Tpetra, Ifpack2, Belos, Amesos2, Xpetra, Zoltan2 and MueLu + Kokkos Core + Kokkos Kernels.

There are packages like Epetra, Zoltan that it probably doesn't matter whether you use SYCL or not.

I do not know how to quickly go from that first list down the dependency tree to say a package requires SYCL testing. Some probably do, say NOX. The same is true for ROL, ShyLU*, STK etc. I believe they have to be tested as they will all rely on Kokkos. Some probably don't. Say TrilinosSS, Triutils. You see my dilemma?

@masterleinad
Copy link
Contributor

masterleinad commented Nov 15, 2023

The failing compile line looks like the following to me:

[  4%] Linking CXX shared library ../../../../../../lib/libsacado-gtest.so
cd /home/darndt/trilinos/build_new/packages/sacado/test/GTestSuite/googletest/googletest && /soft/packaging/spack/gnu-ldpath/build/linux-sles15-x86_64/gcc-11.2.0/cmake-3.26.3-vnn7ncxwjjekhm5ehnwhk4w6tyoygc4p/bin/cmake -E cmake_link_script CMakeFiles/sacado-gtest.dir/link.txt --verbose=1
/soft/restricted/CNDA/updates/2023.10.15.001/oneapi/compiler/eng-20231009/linux/bin/icpx -fPIC -g -fp-model=precise -fsycl -fno-sycl-id-queries-fit-in-int -fsycl-dead-args-optimization -fsycl-unnamed-lambda -fsycl-targets=spir64_gen  -O3 -DNDEBUG -shared -Wl,-soname,libsacado-gtest.so.1.10.0 -o ../../../../../../lib/libsacado-gtest.so.1.10.0 "CMakeFiles/sacado-gtest.dir/src/gtest-all.cc.o" 
Error: Device name missing.

whereas the linker step should look something like

[ 80%] Linking CXX shared library libkokkoscore.so
cd /home/darndt/kokkos/example/build_cmake_in_tree/build/kokkos/core/src && /soft/packaging/spack/gnu-ldpath/build/linux-sles15-x86_64/gcc-11.2.0/cmake-3.26.3-vnn7ncxwjjekhm5ehnwhk4w6tyoygc4p/bin/cmake -E cmake_link_script CMakeFiles/kokkoscore.dir/link.txt --verbose=1
/soft/restricted/CNDA/updates/2023.10.15.001/oneapi/compiler/eng-20231009/linux/bin/icpx -fPIC -O3 -DNDEBUG -DKOKKOS_DEPENDENCE -fsycl -fno-sycl-id-queries-fit-in-int -fsycl-dead-args-optimization -DDESUL_SYCL_DEVICE_GLOBAL_SUPPORTED -fsycl-targets=spir64_gen -Xsycl-target-backend "-device 12.60.7" -shared -Wl,-soname,libkokkoscore.so.4.2 -o libkokkoscore.so.4.2.99 CMakeFiles/kokkoscore.dir/impl/Kokkos_Abort.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_Command_Line_Parsing.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_Error.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_ExecPolicy.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_HostBarrier.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_HostSpace.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_HostSpace_deepcopy.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_HostThreadTeam.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_MemoryPool.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_MemorySpace.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_Profiling.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_SharedAlloc.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_Stacktrace.cpp.o CMakeFiles/kokkoscore.dir/impl/Kokkos_hwloc.cpp.o CMakeFiles/kokkoscore.dir/Serial/Kokkos_Serial.cpp.o CMakeFiles/kokkoscore.dir/Serial/Kokkos_Serial_Task.cpp.o CMakeFiles/kokkoscore.dir/SYCL/Kokkos_SYCL.cpp.o CMakeFiles/kokkoscore.dir/SYCL/Kokkos_SYCL_Instance.cpp.o CMakeFiles/kokkoscore.dir/SYCL/Kokkos_SYCL_Space.cpp.o CMakeFiles/kokkoscore.dir/home/darndt/kokkos/tpls/desul/src/Lock_Array_SYCL.cpp.o  -ldl

which means that sacado-gtest is using the compiler flags provided by Kokkos but not the linker flags provided by Kokkos (we are missing -Xsycl-target-backend "-device 12.60.7" in particular). Note that gtest shouldn't need all of the Kokkos flags in the first place. Maybe, someone with better knowledge of TriBITS can help understand why these flags are used for this target and why we are missing the linker flags.

@masterleinad
Copy link
Contributor

#9514 would probably solve this problem as well.

@masterleinad
Copy link
Contributor

Fixed by #12707.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

8 participants