Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: cmake logic for detecting GPU-aware MPI only works for OpenMPI variants #12468

Closed
jhux2 opened this issue Nov 1, 2023 · 7 comments
Closed
Labels
pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests

Comments

@jhux2
Copy link
Member

jhux2 commented Nov 1, 2023

Bug Report

The Tpetra Cmake logic for setting Tpetra_ASSUME_GPU_AWARE_MPI assumes the existence of ompi_info. The latter is specific to OpenMPI.

Description

Frontier uses an MPICH variant. Tpetra incorrectly sets Tpetra_ASSUME_GPU_AWARE_MPI to false (if the option isn't explicitly set on the commmand line).

@jhux2 jhux2 added type: bug The primary issue is a bug in Trilinos code or tests pkg: Tpetra labels Nov 1, 2023
@jhux2
Copy link
Member Author

jhux2 commented Nov 1, 2023

Are there currently GPU architectures where the MPI is not GPU aware?

Should the default be to assume GPU-aware MPI?

@csiefer2
Copy link
Member

csiefer2 commented Nov 1, 2023

As discussed offline: Assume GPU-aware by default is reasonable. Consider removing the ompi specific special sauce.

@jhux2
Copy link
Member Author

jhux2 commented Nov 1, 2023

I assume we'd need to deprecate this?

@csiefer2
Copy link
Member

csiefer2 commented Nov 2, 2023

@jhux2 I mean, this isn't something you can control, so you can't really deprecate it.

@jhux2
Copy link
Member Author

jhux2 commented Nov 13, 2023

Will the merge of #12517, Tpetra now defaults to assuming that MPI is GPU aware.

If an application is using an MPI that isn't GPU aware, the app should either configure Trilinos with

-DTpetra_ASSUME_GPU_AWARE_MPI:BOOL=FALSE

or at run time set the environment variable

export TPETRA_ASSUME_GPU_AWARE_MPI=0.

@rppawlo
Copy link
Contributor

rppawlo commented Nov 13, 2023

Just a heads up. This change caused a lot of testing failures inside sandia. The internal cuda test machines don't seem to have cuda aware mpi installed. The failures are seg faults with no real info, so it is not easy to debug. You might want to send out an email to the trilinos lists mentioning this change.

@csiefer2
Copy link
Member

csiefer2 commented Jun 7, 2024

@jhux2 Can we close this?

@jhux2 jhux2 closed this as completed Jun 7, 2024
@jhux2 jhux2 added this to Tpetra Aug 12, 2024
@jhux2 jhux2 moved this to Done in Tpetra Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests
Projects
Status: Done
Development

No branches or pull requests

3 participants