-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD/Loop framework upgrade #2937
SIMD/Loop framework upgrade #2937
Conversation
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Really appreciate your effort of simplifying these interfaces!
Jenkins Linux amd64 Build #15654 [push] SIMD/Loop framework upgr... started at 07:04 |
Jenkins Linux s390x Build #15657 [push] SIMD/Loop framework upgr... started at 08:04 |
Jenkins Linux ppc64le Build #14685 [push] SIMD/Loop framework upgr... started at 08:16 |
Jenkins Linux amd64 Build #15654 [push] SIMD/Loop framework upgr... passed after 1 hr 8 min |
Jenkins Linux s390x Build #15657 [push] SIMD/Loop framework upgr... passed after 1 hr 25 min |
Jenkins Linux ppc64le Build #14685 [push] SIMD/Loop framework upgr... passed after 2 hr 3 min |
Added support for handling SIMD for loops that have loop iterations that are not multiple of the Vector Length.
Because we can now generate SIMD code using either Krnl, Affine, or SCF, it was painful to have multiple ways to generate loops. I have now a unified interface that create loops across all 3 dialects:
which takes a lower/upper bound as IndexExpr, a boolean to define if the loop is sequential or parallel, and the function to be called.
An example is shown here
This complement the 3 SIMD calls:
simdIterateIE
,simdReduceIE
, andsimdReduce2DIE
. The last 2 calls both perform reductions, but the first one uses horizontal/do-across reductions (e.g. available on z16 with integer add) and the second one use shuffle to mix VL consecutive reductions.All simd calls now work with arbitrary numbers of loop iterations (whether a multiple of the hardware vector length or not).
To better provide the same functionality to both reduce simd calls, I expect now one lambda function per output (before one lambda function to generate all outputs).
We also had different calls for memory load/store. Now a common interface is used for Krnl, Affine, and MemRef, and nearly identical for Vector (where the load operation needs the type to determine the VL).
They all use the calls below