-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent results from TeamThreadRange reduction #1905
Comments
This isn't relevant to your question, but just curious: What's with the rvalue references |
@mhoemmen It's a good point. I did it because I didn't know enough what the modify and sync function do (what if, e.g., they are ref qualified) and I was trying to keep it generic. Then I simply copy the snippet all over the place. |
You definitely don't need the |
You cannot update sum.d_view(team.league_rank() directly within a team parallel_for because all of the threads in the team will be live and will access sum.d_view at some point. The Kokkos::Single is necessary to limit access to sum.d_view to only once per team. See pages 132- 136 in the KokkosTutorial_ORNL18.pdf for more info. |
@jeffmiles63 Thanks for the reply. I think //some array to contribute to the sum
Kokkos::DualView<T[nTeams][teamSize]> v("v")
//some DualView to store the results
Kokkos::DualView<T[nTeams]> sum("sum")
...
parallel_for(policy, KOKKOS_LAMBDA(const auto& team) {
int s = 0;
parallel_reduce(TeamThreadRange(team,teamSize),[&](const auto& i, auto& update) {
update += v.d_view(team.league_rank(),i);
},s);
team.team_barrier();
sum.d_view(team.league_rank()) = s;
});
... will give the same answer too. I think something inside |
This is weird. I would have thought that this should all give the same answer as long as T is word size or smaller. |
Got an even simpler reproducer .. . Only fails with OpenMP. Serial, Pthreads and CUDA are fine. |
#include<Kokkos_Core.hpp>
int main(int argc, char* argv[]) {
Kokkos::initialize(argc,argv);
{
int N = (argc>1) ? atoi(argv[1]) : 10;
int M = (argc>2) ? atoi(argv[2]) : 64;
int R = (argc>3) ? atoi(argv[3]) : 8;
Kokkos::View<double*> results1("r1",N);
Kokkos::View<double*> results2("r2",N);
Kokkos::View<double**> data("d",N,M);
Kokkos::deep_copy(data,1);
Kokkos::parallel_for(Kokkos::TeamPolicy<>(N,R), KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& team) {
const int i = team.league_rank();
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,M), [&] (const int j, double& update) {
update += data(i,j) + 1000*i + j;
},results1(i));
team.team_barrier();
double s;
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,M), [&] (const int j, double& update) {
update += data(i,j) + 1000*i + j;
},s);
results2(i) = s;
});
Kokkos::parallel_for(N, KOKKOS_LAMBDA (const int i) {
printf("%i %lf %lf %lf\n",i,results1(i),results2(i),1.0*(M*(1000*i+1)+M*(M-1)/2));
});
}
Kokkos::finalize();
} |
Root cause is that the OpenMP backend didn't produce a local variable for intermediate results but simply used the argument the user provided. I.e. the different threads would stomp over each other. |
Pull request issued #2079 |
My fix for OpenMP broke it for Serial ...
Why do the following 2 slightly different usages of TeamThreadRange reduction give different results:
vs.
The first case gives the correct answer but the second case gives gibberish. What makes the difference here?
Here's a complete example to try it out:
The text was updated successfully, but these errors were encountered: