-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hoist chpl__getPrivatizedCopy #6184
Comments
@benharsh I had a pretty easy time getting chpl__getPrivatizedCopy hoisted for something like: use PrivatizationWrappers;
proc myproc() {
var privatizedIdx = 5;
var newValue = new C(privatizedIdx);
insertPrivatized(newValue, privatizedIdx);
for 1..10 {
var a = getPrivatized(privatizedIdx);
}
}
myproc(); but for the code you gave me it's going to be a lot trickier to do. The privatized class ends up getting passed around to a bunch of different functions and is used in different branches, so I think the defUse analysis and the domination analysis are preventing hoisting. If you have time next week, we should sit down and go through the generated code together. An easier workaround might be to just move the runtime support for privatization into a header so the backend can inline and optimize it. Here's a first stab at that: master...ronawho:inline-privatization It's not getting quite the performance boost you saw, but it's close at ~90-92 MFlop/s:
|
Thanks for looking into this! It seems like the most practical next step would be to make the header modifications. I'll be interested to see what other benchmarks are impacted. We can still look over the generated code if you want. |
Agreed, I think it makes sense to go ahead with the header mods, though longer term the LICM changes are probably still a good idea. I didn't see many other major perf changes, here's the .dat files that had diffs https://gist.github.com/ronawho/40223c26c32c362dc012b3a14d66f18e |
This is likely a simpler program to work with: use BlockDist;
proc main() {
var dom = {1..10};
var space = dom dmapped Block(dom);
var A : [space] int;
for i in 1..10 do local {
A.localAccess[i] += 1;
}
writeln(A);
} If you inline |
Move runtime privatization support from the .c file to the header. chpl_getPrivatizedClass() can be called frequently, so we want to allow the backend compiler to fully optimize/inline calls to it. Moving the privatization source code into the header has a pretty big performance impact for the stencil PRK, improving performance by about 15% for 16-node-xc. There's also some minor improvements for fft, and lulesh. This is motivated by chapel-lang#6184, though it's not quite enough to close that issue yet.
Move runtime privatization support into chpl-privatization.h [reviewed by @benharsh] Move runtime privatization support from the .c file to the header. chpl_getPrivatizedClass() can be called frequently, so we want to allow the backend compiler to fully optimize/inline calls to it. Moving the privatization source code into the header has a pretty big performance impact for the stencil PRK, improving performance by about 15% for 16-node-xc. There's also some minor improvements for fft, and lulesh. This is motivated by #6184, though it's not quite enough to close that issue yet.
This is a second attempt at chapel-lang#6198, but only moves chpl_getPrivatizedClass() instead of the entire privatization implementation. chpl_getPrivatizedClass() is just a getter for chpl_privateObjects, so we also need to extern to chpl_privateObjects. chpl_getPrivatizedClass() can be called frequently, so we want to allow the backend compiler to fully optimize/inline calls to it. This has a pretty big performance impact for the stencil PRK, improving performance by about 15% for 16-node-xc. There's also some minor improvements for fft, and lulesh. This is motivated by chapel-lang#6184, though it's not quite enough to close that issue yet.
This is a second attempt at chapel-lang#6198, but only moves chpl_getPrivatizedClass() instead of the entire privatization implementation. chpl_getPrivatizedClass() is just a getter for chpl_privateObjects, so we also need to extern to chpl_privateObjects. chpl_getPrivatizedClass() can be called frequently, so we want to allow the backend compiler to fully optimize/inline calls to it. This has a pretty big performance impact for the stencil PRK, improving performance by about 15% for 16-node-xc. There's also some minor improvements for fft, and lulesh. This is motivated by chapel-lang#6184, though it's not quite enough to close that issue yet.
Another variant to consider: use BlockDist;
proc main() {
var dom = {1..10};
var space = dom dmapped Block(dom);
var A : [space] int;
forall i in space do local {
A.localAccess[i] += 1;
}
writeln(A);
} The forall will create coforall functions and pass |
Move chpl_getPrivatizedClass() into chpl-privatization.h [reviewed by @benharsh, @dmk42, and @gbtitus] This is a second attempt at #6198, but only moves chpl_getPrivatizedClass() instead of the entire privatization implementation. chpl_getPrivatizedClass() is a getter for chpl_privateObjects, so we also need to extern to chpl_privateObjects. chpl_getPrivatizedClass() can be called frequently, so we want to allow the backend compiler to fully optimize/inline calls to it. This has a pretty big performance impact for the stencil PRK, improving performance by about 15% for 16-node-xc. There's also some minor improvements for fft, and lulesh. This is motivated by #6184, though it's not quite enough to close that issue yet.
Consider the following program:
When compiled with --no-local, the
A[i]
expression will generate a call to the runtime function chpl__getPrivatizedCopy. We currently do not hoist this call, and it surprisingly turns out to have an impact on the Stencil PRK. In the Stencil PRK, there is more than one call to chpl__getPrivatizedCopy.On 16-nodes ugni-qthreads we observe a 15% improvement in performance when hand-modifying the code to manually hoist the chpl__getPrivatizedCopy call.
The performance improvement varies depending on problem size and other potential hand-optimizations.
The text was updated successfully, but these errors were encountered: