Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix out of bounds access to distance_to_eb #4831

Merged
merged 7 commits into from
Apr 10, 2024

Conversation

roelof-groenewald
Copy link
Member

While running in debug mode I found that there are out of bounds accesses to the distance_to_eb multifab. Tracing these I discovered that they come from AddNParticles() due to the fact that particles are scraped before they are redistributed. This PR fixes the issue.

@roelof-groenewald roelof-groenewald added bug: affects latest release Bug also exists in latest release version component: boundary PML, embedded boundaries, et al. labels Apr 5, 2024
// Remove particles that are inside the embedded boundaries
#ifdef AMREX_USE_EB
auto & distance_to_eb = WarpX::GetInstance().GetDistanceToEB();
scrapeParticles( *this, amrex::GetVecOfConstPtrs(distance_to_eb), ParticleBoundaryProcess::Absorb());
#endif

Redistribute();
// Call (local) redistribute again to remove particles with invalid ids
Redistribute(lev, lev, 1, true, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that the local argument should be 1, since it is expecting an int. The description for local says "If > 0, this is the maximum number of cells a particle can have moved since the last Redistribute() call." It is not a flag.
I was thinking that if nGrow=1 and local=1, the routine might be smart enough to know that no communication is needed. No communication is needed since it is only removing invalid particles. Is there maybe a separate routine that does just that? This would need a comment from @WeiqunZhang

Copy link
Member Author

@roelof-groenewald roelof-groenewald Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I definitely interpreted local incorrectly. I guess it should actually be local=0 then, since the particles will have moved at all since the previous time Redistribute() was called.
Looking at the code in AMReX_ParticleContainerI.H, nGrow is used in the locateParticle() function (here) only after the particle could not be located within the ParticleLocData with nGrow=0 - basically it expands the search area with nGrow. Therefore, seeing as the first Redistribute call happens with nGrow=0, it shouldn't make any difference to have the second one use nGrow=1.

There was some mention recently of creating a function that will only remove invalid particles, but as far as I know that has not been done yet.

Suggested change
Redistribute(lev, lev, 1, true, true);
Redistribute(lev, lev, 0, 0, true);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confuses me though since the default value for local is 0, meaning usually when we call Redistribute it is as if we specify that the particles have not moved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local=0 means a global communication will be done as if particles may have moved across multiple processors. I think the correct value is 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But then looking at the actual redistribute function code, it has this:

    if (local > 0) { BuildRedistributeMask(0, local); }

While I don't fully understand what exactly is done with BuildRedistributeMask, it seems to me that it determines the neighboring ranks to which particles could be passed / received from.
Is it correct then to say that even with local=1 there would still be some unnecessary overhead but that overhead will be minimized compared to the case where particles could be globally redistributed?

This really makes a case for having a function that simply, locally, removes invalid particles.

@ax3l ax3l added the bug Something isn't working label Apr 8, 2024
@ax3l ax3l requested a review from dpgrote April 8, 2024 19:06
Comment on lines 304 to 308
scrapeParticles( *this, amrex::GetVecOfConstPtrs(distance_to_eb), ParticleBoundaryProcess::Absorb());
#endif

Redistribute();
// Call (local) redistribute again to remove particles with invalid ids
Redistribute(lev, lev, 0, 1, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be moved inside the #ifdef:

#ifdef AMREX_USE_EB
    auto & distance_to_eb = WarpX::GetInstance().GetDistanceToEB();
    scrapeParticles( *this, amrex::GetVecOfConstPtrs(distance_to_eb), ParticleBoundaryProcess::Absorb());
    // Call (local) redistribute again to remove particles with invalid ids
    Redistribute(lev, lev, 0, 1, true);
#endif

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense for sure. I'll update now.

@RemiLehe RemiLehe enabled auto-merge (squash) April 9, 2024 23:49
@RemiLehe RemiLehe merged commit e19d66b into ECP-WarpX:development Apr 10, 2024
45 checks passed
@roelof-groenewald roelof-groenewald deleted the fix_AddNParticles branch April 10, 2024 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: affects latest release Bug also exists in latest release version bug Something isn't working component: boundary PML, embedded boundaries, et al.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants