Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the number of unique particles per cpu we can have at once. #1315

Merged
merged 10 commits into from
Aug 27, 2020

Conversation

atmyers
Copy link
Member

@atmyers atmyers commented Aug 26, 2020

Currently, we use two signed integers to store id numbers for each particle as well as the rank it was generated on. This allows unique combinations of 'id', and 'cpu' numbers to be generated without any communication between ranks. However, this does waste some space, since it's unlikely that 2**31-1 MPI ranks will be used any time soon, while the same limit for the id has actually been overflowed in real-world WarpX simulations. To address this, in this PR, we still use 64 bits to represent the combination of (id, cpu), but we devote 40 bits to the id and only 24 to the cpu. This allows ~0.5 trillion unique particles on each of 16.7 million MPI ranks, which should be good enough for the foreseeable future.

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • are described in the proposed changes to the AMReX documentation, if appropriate

…rticle as well as the rank it was generated on. This allows unique combinations of 'id', and 'cpu' numbers to be generated without any communication between ranks. However, this does waste some space, since it's unlikely that 2**31-1 MPI ranks will be used any time soon, while the same limit for the id has actually been overflowed in real-world WarpX simulations. To address this, in this PR, we still use 64 bytes to represent the combination of (id, cpu), but we devote 40 bits to the id and only 24 to the cpu. This allows ~0.5 trillion unique particles on each of 16.7 million MPI ranks, which should be good enough for the forseable future.
@atmyers atmyers requested a review from WeiqunZhang August 26, 2020 23:01
@MaxThevenet
Copy link
Contributor

On the test I am running, running on this branch (the standard output contains AMReX (20.08-97-ge99860c8e34c) initialized), I still get the same error at the same time step

STEP 13749 starts ...
amrex::Abort::1::ERROR: overflow on particle id numbers !!!
SIGABRT

See the Backtrace.

Currently this test crashes after 30 min on 2 V100 GPUs. I can make a reproducer that crashes faster.

@MaxThevenet
Copy link
Contributor

@atmyers This input file is a single-GPU WarpX reproducer where the issue comes after 1802 iterations (2 minutes on 1 V100). Here the number of particles injected should be roughly 256 * 256 * 32 * 1802 = 3.7 billions.

If you want to make it even faster, you can increase the number of ppc for plasma_e. If you encounter memory issues, you can decrease the number of cells longitudinally AND decrease the physical size of the domain in the longitudinal direction accordingly (so dz remains small, so you still inject ~1 cell per time step).

@atmyers
Copy link
Member Author

atmyers commented Aug 27, 2020

Yes, changes are need on the WarpX side as well, which still uses int for the pid. This just removes the restriction from AMReX.

@atmyers
Copy link
Member Author

atmyers commented Aug 27, 2020

I believe this PR to WarpX should do it: ECP-WarpX/WarpX#1266

// zero out the first 24 bits, which are used to store the cpu number
m_idata &= (~ 0x00FFFFFF);

AMREX_ASSERT(cpu > 0);
Copy link
Member

@WeiqunZhang WeiqunZhang Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>= 0

@MaxThevenet
Copy link
Contributor

Oh yes, of course, I just resubmitted the test on both branches. Thanks!

}

AMREX_GPU_HOST_DEVICE
operator long () noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be const function.

}

AMREX_GPU_HOST_DEVICE
operator int () noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a const function.

@MaxThevenet
Copy link
Contributor

Yes, this fixes the issue for me. Thanks!

@WeiqunZhang WeiqunZhang merged commit 5e9dbc1 into AMReX-Codes:development Aug 27, 2020
kweide pushed a commit to ECP-Astro/amrex that referenced this pull request Sep 28, 2020
…MReX-Codes#1315)

Currently, we use two signed integers to store id numbers for each particle as well as the rank it was generated on. This allows unique combinations of 'id', and 'cpu' numbers to be generated without any communication between ranks. However, this does waste some space, since it's unlikely that 2**31-1 MPI ranks will be used any time soon, while the same limit for the id has actually been overflowed in real-world WarpX simulations. To address this, in this PR, we still use 64 bits to represent the combination of (id, cpu), but we devote 40 bits to the id and only 24 to the cpu. This allows ~0.5 trillion unique particles on each of 16.7 million MPI ranks, which should be good enough for the foreseeable future.

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [x] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX users
- [ ] are described in the proposed changes to the AMReX documentation, if appropriate
kweide added a commit to ECP-Astro/amrex that referenced this pull request Sep 28, 2020
dwillcox pushed a commit to dwillcox/amrex that referenced this pull request Oct 3, 2020
…MReX-Codes#1315)

Currently, we use two signed integers to store id numbers for each particle as well as the rank it was generated on. This allows unique combinations of 'id', and 'cpu' numbers to be generated without any communication between ranks. However, this does waste some space, since it's unlikely that 2**31-1 MPI ranks will be used any time soon, while the same limit for the id has actually been overflowed in real-world WarpX simulations. To address this, in this PR, we still use 64 bits to represent the combination of (id, cpu), but we devote 40 bits to the id and only 24 to the cpu. This allows ~0.5 trillion unique particles on each of 16.7 million MPI ranks, which should be good enough for the foreseeable future.

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [x] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX users
- [ ] are described in the proposed changes to the AMReX documentation, if appropriate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants