Tpetra::CrsMatrix: Overlap communication & computation in apply() #385
Labels
CLOSED_DUE_TO_INACTIVITY
Issue or PR has been closed by the GitHub Actions bot due to inactivity.
MARKED_FOR_CLOSURE
Issue or PR is marked for auto-closure by the GitHub Actions bot.
pkg: Tpetra
story
The issue corresponds to a Kanban Story (vs. Epic or Task)
TpetraRF
Milestone
@trilinos/tpetra
Epic: #767.
Overlap communication & computation in the apply() method of Tpetra::CrsMatrix, which implements sparse matrix-vector multiply.
This depends on #384 working for Tpetra::MultiVector, which in turn depends on #383.
If we fix #439, then we can fix this issue without needing to change CrsGraph semantics. In particular, CrsGraph could still compute its Import from the domain Map to the (entire, with locals too) column Map, and code that relies on this Import could work unchanged. Sparse matrix-vector multiply implementations, such as those in CrsMatrix and BlockCrsMatrix (see #424), could then do coarse-grained overlap as follows:
The "if necessary" remark on Step 2 relates to whether the domain Map is "fitted" to the column Map (see #437 for a definition of "fitted"). If so, the local entries of the input vector would not need to be copied (see #435 and #436). If not, they would need to be copied (and/or permuted), but this copy could be per process. For example, processes with the same number of local entries and no entries that need permutation, would not need to make a copy: the local part of the mat-vec could just take the original input (multi)vector pointer as input. (In fact, the domain Map need not even be fitted; that's sufficient but not necessary.)
This approach has the following benefits over one that uses a different Import than the CrsGraph's domain -> column Map Import:
In particular, this approach would work regardless of whether the domain Map is fitted to the column Map. The graph or matrix would not need to do any extra all-reduces to figure out if the Maps are fitted on all processes.
The text was updated successfully, but these errors were encountered: