Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous combi WIP #9

Closed
wants to merge 22 commits into from
Closed

Asynchronous combi WIP #9

wants to merge 22 commits into from

Conversation

obersteiner
Copy link
Collaborator

No description provided.

Your Name and others added 22 commits May 2, 2018 10:51
Also changed the file to show relative error instead whatever wa sthere
before
uncommented TIMING definition, for plots.

Add modified test_manger that can handle any dimensions
…rtion to avoid segfault in case of multiple processes per group
…th minimal number of hierarchizations and dehierarchizations
static MPI_Request *requestAsync;

/** SubspaceSized for Async Reduce*/
static std::vector<int> *subspaceSizes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should not be a static member of Task because it already exists in DistributedSparseGridUniform, and I think that this is where it should be.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know this is in task because the Sparse Grid gets destroyed frequently but this information should be constant over multiple combinations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is just an optimization so that the sparse grid sizes do not need to be communicated again. For the request array I would need to investigate if it is necessary to be in task.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But one could also put the subspaceSizes as static member of the SparseGrid class I think...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, we can leave it like this here. on third-level, subspacesDataSizes_ is introduced which makes this redundant. I had not noticed that it wasn't on master.

@freifrauvonbleifrei
Copy link
Contributor

I am just investigating a merge into https://github.com/SGpp/DisCoTec/tree/third-level , but seems that -- unsurprisingly -- some of the distributed sparse grid data management has changed for the widely distributed combination technique.

This is why I thought it was the right time to leave a comment on the (in my opinion unneccessary) member Task::subspaceSizes .

Quoting @mhurler123 in case he has an opinion on that ;)


Stats::startEvent("combine global reduce init");

Task::bufAsync = new std::vector<CombiDataType>[numGrids];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory leak?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be the case. We could add a delete before. Or switch to smart pointers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe make this a member of DistributedFullGridUniform? that is also connected to the next question...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer is deleted in line 854

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also put this as a member to the fullgrid directly. But I am not sure if this is improves things.

}

for(int g=0; g<numGrids; g++){
CombiCom::distributedGlobalReduceAsyncInit( *combinedUniDSGVector_[g], Task::subspaceSizes[g], Task::bufAsync[g], Task::requestAsync[g]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I don't get why this is done once for each task -- a single time should suffice, right?

Copy link
Collaborator Author

@obersteiner obersteiner Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is only done a single time. The numGrids does not refer to the number of tasks but to the number of grids each task owns. Usually numGrids is 1 but for example with multiple species in GENE numGrids=nspecies.

assert(initialized_);

int lrank;
MPI_Comm_rank(lcomm, &lrank);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> auto lrank = theMPISystem()->getLocalRank(); -- use the buffered rank

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it's unused anyways ;)

@vancraar vancraar deleted the branch master January 16, 2023 14:49
@vancraar vancraar closed this Jan 16, 2023
@freifrauvonbleifrei
Copy link
Contributor

oops, this was unintentionally closed by making the main branch default. reopening as #101.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants