-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move dot-based GEMM out of TPL CUBLAS.. #1050
Conversation
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Using Repos:
Pull Request Author: iyamazaki |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720 # 325 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720 # 317 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL18 # 303 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light # 359 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light_LayoutRight # 136 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10 # 324 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight # 135 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA9 # 319 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720_GCC740 # 317 (click to expand)
|
…lpha * A^T*B case).
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Using Repos:
Pull Request Author: iyamazaki |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @iyamazaki ! In general, everything looks great. Please remove the redundant !defined(KOKKOSKERNELS_ETI_ONLY) || KOKKOSKERNELS_IMPL_COMPILE_LIBRARY
ifdefs as suggested below.
const scalar_A alpha; | ||
const scalar_C beta; | ||
|
||
// The following types (especially dotSize) could have simply been int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider removing the comment and changing the variable types to int
|
||
void run(bool conjugateTranspose) { | ||
|
||
constexpr size_C workPerTeam = 4096; // Amount of work per team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why 4096
is used? if so, please add a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
constexpr size_C workPerTeam = 4096; // Amount of work per team | ||
const size_C ndots = numCrows * numCcols; // Number of dot products | ||
size_C appxNumTeams = (dotSize * ndots) / workPerTeam; // Estimation for appxNumTeams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is appxNumTeams
? consider renaming the variable or adding a comment
// potentially be performed by multiple teams. First, compute | ||
// numDivPerDot as an integer (take the floor, not ceiling), then, | ||
// compute actual number of teams by using this factor. | ||
else{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: else {
if(appxNumTeams < 1) | ||
appxNumTeams = 1; | ||
if(appxNumTeams > 1024) | ||
appxNumTeams = 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above for 4096
@@ -544,20 +391,11 @@ struct GEMM< \ | |||
else \ | |||
transb = CUBLAS_OP_C; \ | |||
\ | |||
constexpr int numDotsLayoutLeftThreshold = 1600; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iyamazaki: this looks good to me. just to confirm: you DO NOT want to call DotBasedGEMM when CUBLAS is enabled, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|| ( A_is_lr && A_is_tr && !B_is_tr && M*N < numDotsLayoutRightThreshold)) { | ||
// call dot-based GEMM, only for C := beta * C + alpha * A^T * B | ||
bool A_is_conj = ((transA[0]=='C') || (transA[0]=='c')); | ||
DotBasedGEMM<ExecSpace, AViewType, BViewType, CViewType> dotBasedGemm(alpha, A, B, beta, C); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: dotBasedGemm -> dotBasedGEMM
DotBasedGEMM<ExecSpace, AViewType, BViewType, CViewType> dotBasedGemm(alpha, A, B, beta, C); | ||
dotBasedGemm.run(A_is_conj ? true : false); | ||
} else | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#endif |
KokkosBlas::Impl::GEMMImpl<typename CViewType::execution_space,AViewType,BViewType,CViewType,blockA0,blockA1,blockB1,2,2> gemm(alpha,A,B,beta,C); | ||
gemm.run(team_size,vector_length,scratch_level); | ||
|
||
#if !defined(KOKKOSKERNELS_ETI_ONLY) || KOKKOSKERNELS_IMPL_COMPILE_LIBRARY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if !defined(KOKKOSKERNELS_ETI_ONLY) || KOKKOSKERNELS_IMPL_COMPILE_LIBRARY |
|
||
constexpr int numDotsLayoutLeftThreshold = 1600; | ||
constexpr int numDotsLayoutRightThreshold = 100; | ||
if( (!A_is_lr && A_is_tr && !B_is_tr && M*N < numDotsLayoutLeftThreshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. As long as the performance of DotBasedGEMM is better than GEMMImpl for these cases.
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Using Repos:
Pull Request Author: iyamazaki |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Using Repos:
Pull Request Author: iyamazaki |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720 # 328 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720 # 320 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL18 # 306 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light # 362 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC720_Light_LayoutRight # 139 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10 # 327 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight # 138 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_CUDA9 # 322 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GCC720_GCC740 # 320 (click to expand)
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
Thank you @lucbv. I've only looked at performance on an NVIDIA or AMD GPU. I put the logic to use dot-based GEMM only on a device. Can you see if that logic is okay? |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Using Repos:
Pull Request Author: iyamazaki |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more quick change please : )
const int M = static_cast<int> (C.extent(0)); | ||
const int N = static_cast<int> (C.extent(1)); | ||
|
||
const bool host_space = std::is_same<Kokkos::HostSpace, ExecSpace>::value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be a little picky but could you do this instead:
#include "KokkosKernels_ExecSpaceUtils.hpp"
const bool is_gpu = KokkosKernels::Impl::kk_is_gpu_exec_space<ExecSpace>()
That function was created to handle this situation and does it pretty cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making time, @lucbv. Please be picky as much as you can. I welcome any suggestions!!
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
Using Repos:
Pull Request Author: iyamazaki |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL18
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC720_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA10_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_CUDA9
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GCC720_GCC740
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
2 similar comments
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ lucbv ]! |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
2 similar comments
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @iyamazaki !
This PR likes to move the dot-based GEMM out of TPL CUBLAS so that it can be called on other architectures (like on an AMD GPU). At this point we like some inputs..
A quick performance result using Belos on Spock, using a momentum matrix from Nalu-Wind (ABL40M):
Original:
Belos: PseudoBlockGmresSolMgr total solve time 13.63 (1)
Belos: Operation Op*x 0.0001294 (11)
Belos: Operation Prec*x 0 (0)
Belos: ICGS[2]: Orthogonalization 13.58 (11)
Belos: ICGS[2]: Ortho (Inner Product) 13.47 (20)
Belos: ICGS[2]: Ortho (Norm) 0.001005 (11)
Belos: ICGS[2]: Ortho (Update) 0.1053 (20)
With dot-based gemm (this PR):
Belos: PseudoBlockGmresSolMgr total solve time 0.1698 (1)
Belos: Operation Op*x 0.0001215 (11)
Belos: Operation Prec*x 0 (0)
Belos: ICGS[2]: Orthogonalization 0.1192 (11)
Belos: ICGS[2]: Ortho (Inner Product) 0.01279 (20)
Belos: ICGS[2]: Ortho (Norm) 0.001221 (11)
Belos: ICGS[2]: Ortho (Update) 0.1051 (20)
With gemv-based gemm (PR 948)
Belos: PseudoBlockGmresSolMgr total solve time 0.06106 (1)
Belos: Operation Op*x 0.0001173 (11)
Belos: Operation Prec*x 0 (0)
Belos: ICGS[2]: Orthogonalization 0.01671 (11)
Belos: ICGS[2]: Ortho (Inner Product) 0.0128 (20)
Belos: ICGS[2]: Ortho (Norm) 0.001241 (11)
Belos: ICGS[2]: Ortho (Update) 0.002581 (20)
On Summit:
Belos: PseudoBlockGmresSolMgr total solve time 0.04137 (1)
Belos: Operation Op*x 0.0133 (11)
Belos: Operation Prec*x 0 (0)
Belos: ICGS[2]: Orthogonalization 0.009377 (11)
Belos: ICGS[2]: Ortho (Inner Product) 0.004267 (20)
Belos: ICGS[2]: Ortho (Norm) 0.0006617 (11)
Belos: ICGS[2]: Ortho (Update) 0.003849 (20)