You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Notice that currently gkmx<...,TA,TB,TC,TV> has the following issues.
TV cannot be different from TC.
Type rules in macro kernels may be wrong.
According to the definition, gkmx only need to pass in an m-by-n of C in type TC, but when k > KC the temporary rank-KC update must be stored as an m-by-NC matrix as type TV. It it very unpleasant to allocate this temporary buffer, but currently I have not find a way to resolve this. GKRM will have the same issue later.
Maybe we can increase KC such that k will never be larger than KC when TC != TV detected.
Notice that gkmx_gpu.hpp does not have this problem. GEMM algorithm on GPU does not store rank-KC update back to the global memory. L1 cache on GPU can be manually controlled; thus, storing back in unnecessary.
The text was updated successfully, but these errors were encountered:
/frame/gkmx.hpp
Notice that currently gkmx<...,TA,TB,TC,TV> has the following issues.
According to the definition, gkmx only need to pass in an m-by-n of C in type TC, but when k > KC the temporary rank-KC update must be stored as an m-by-NC matrix as type TV. It it very unpleasant to allocate this temporary buffer, but currently I have not find a way to resolve this. GKRM will have the same issue later.
Maybe we can increase KC such that k will never be larger than KC when TC != TV detected.
Notice that gkmx_gpu.hpp does not have this problem. GEMM algorithm on GPU does not store rank-KC update back to the global memory. L1 cache on GPU can be manually controlled; thus, storing back in unnecessary.
The text was updated successfully, but these errors were encountered: