-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async IO for Particles #1058
Async IO for Particles #1058
Conversation
if (AsyncOut::UseAsyncOut()) { | ||
WriteBinaryParticleDataAsync(*this, dir, name, | ||
write_real_comp, write_int_comp, | ||
real_comp_names, int_comp_names); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why the async version does not take F
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to remove F
soon. WarpX will implement this differently, by copying to a tmp particle container and applying a filter during the copy. That functionality will be useful in other places, and it will let us remove all those extra overloads of WritePlotFile
and Checkpoint
.
if (MyProc == IOProcNumber) | ||
{ | ||
if ( ! amrex::UtilCreateDirectory(LevelDir, 0755)) | ||
{ | ||
amrex::CreateDirectoryFailed(LevelDir); | ||
} | ||
} | ||
ParallelDescriptor::Barrier(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we try to merge this with lines 340-347?
Vector<Vector<Long> > np_per_grid(pc.finestLevel()+1); | ||
for (int lev = 0; lev <= pc.finestLevel(); lev++) | ||
{ | ||
np_per_grid[lev] = pc.NumberOfParticlesInGrid(lev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also do local reduction only and then do mpi_allreduce after the for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for the following code in NumberOfParticlesInGrid
ParallelAllReduce::Sum(&nparticles[0], ngrids, ParallelContext::CommunicatorSub());
}
I think @mrowan137 has shown it's much faster to gather and broadcast. @mrowan137 Can you comment on that? There is a function for the gather ParallelDescriptor::GatherLayoutDataToVector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atmyers if you look at DistributionMapping::makeKnapSack (const LayoutData<Real>& rcost_local, ...
in AMReX_DistributionMapping.cpp you can see the alternate gather/broadcast sequence, it should probably be faster than the ParallelAllReduce::Sum
here.
auto wrc = std::make_shared<Vector<int> >(write_real_comp); | ||
auto wic = std::make_shared<Vector<int> >(write_int_comp); | ||
auto rcn = std::make_shared<Vector<std::string> >(real_comp_names); | ||
auto icn = std::make_shared<Vector<std::string> >(int_comp_names); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they needed? Can we just use write_real_comp
etc. directly in the lambda below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed they are not. Fixed.
It's a big PR. Although I made some comments, I can approve it and then you can decide whether you want to improve it later in follow-up PRs. |
Thanks for your comments - there's no hurry to merge this right now. I'll try your suggestions. |
No description provided.