-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend the number of unique particles per cpu we can have at once. #1315
Extend the number of unique particles per cpu we can have at once. #1315
Conversation
…rticle as well as the rank it was generated on. This allows unique combinations of 'id', and 'cpu' numbers to be generated without any communication between ranks. However, this does waste some space, since it's unlikely that 2**31-1 MPI ranks will be used any time soon, while the same limit for the id has actually been overflowed in real-world WarpX simulations. To address this, in this PR, we still use 64 bytes to represent the combination of (id, cpu), but we devote 40 bits to the id and only 24 to the cpu. This allows ~0.5 trillion unique particles on each of 16.7 million MPI ranks, which should be good enough for the forseable future.
On the test I am running, running on this branch (the standard output contains
See the Backtrace. Currently this test crashes after 30 min on 2 V100 GPUs. I can make a reproducer that crashes faster. |
@atmyers This input file is a single-GPU WarpX reproducer where the issue comes after 1802 iterations (2 minutes on 1 V100). Here the number of particles injected should be roughly If you want to make it even faster, you can increase the number of ppc for |
Yes, changes are need on the WarpX side as well, which still uses |
I believe this PR to WarpX should do it: ECP-WarpX/WarpX#1266 |
Src/Particle/AMReX_Particle.H
Outdated
// zero out the first 24 bits, which are used to store the cpu number | ||
m_idata &= (~ 0x00FFFFFF); | ||
|
||
AMREX_ASSERT(cpu > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>= 0
Oh yes, of course, I just resubmitted the test on both branches. Thanks! |
Src/Particle/AMReX_Particle.H
Outdated
} | ||
|
||
AMREX_GPU_HOST_DEVICE | ||
operator long () noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be const function.
} | ||
|
||
AMREX_GPU_HOST_DEVICE | ||
operator int () noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a const function.
Yes, this fixes the issue for me. Thanks! |
…MReX-Codes#1315) Currently, we use two signed integers to store id numbers for each particle as well as the rank it was generated on. This allows unique combinations of 'id', and 'cpu' numbers to be generated without any communication between ranks. However, this does waste some space, since it's unlikely that 2**31-1 MPI ranks will be used any time soon, while the same limit for the id has actually been overflowed in real-world WarpX simulations. To address this, in this PR, we still use 64 bits to represent the combination of (id, cpu), but we devote 40 bits to the id and only 24 to the cpu. This allows ~0.5 trillion unique particles on each of 16.7 million MPI ranks, which should be good enough for the foreseeable future. The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [x] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] are described in the proposed changes to the AMReX documentation, if appropriate
…once. (AMReX-Codes#1315)" This reverts commit d91e0ae.
…MReX-Codes#1315) Currently, we use two signed integers to store id numbers for each particle as well as the rank it was generated on. This allows unique combinations of 'id', and 'cpu' numbers to be generated without any communication between ranks. However, this does waste some space, since it's unlikely that 2**31-1 MPI ranks will be used any time soon, while the same limit for the id has actually been overflowed in real-world WarpX simulations. To address this, in this PR, we still use 64 bits to represent the combination of (id, cpu), but we devote 40 bits to the id and only 24 to the cpu. This allows ~0.5 trillion unique particles on each of 16.7 million MPI ranks, which should be good enough for the foreseeable future. The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [x] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] are described in the proposed changes to the AMReX documentation, if appropriate
Currently, we use two signed integers to store id numbers for each particle as well as the rank it was generated on. This allows unique combinations of 'id', and 'cpu' numbers to be generated without any communication between ranks. However, this does waste some space, since it's unlikely that 2**31-1 MPI ranks will be used any time soon, while the same limit for the id has actually been overflowed in real-world WarpX simulations. To address this, in this PR, we still use 64 bits to represent the combination of (id, cpu), but we devote 40 bits to the id and only 24 to the cpu. This allows ~0.5 trillion unique particles on each of 16.7 million MPI ranks, which should be good enough for the foreseeable future.
The proposed changes: