Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a bug in the counstrained source term under MPI #33

Closed
wants to merge 1 commit into from

Conversation

wenqing
Copy link
Member

@wenqing wenqing commented Mar 18, 2016

As titled.

@wenqing wenqing force-pushed the constrSt_MPI branch 3 times, most recently from 9f3dd75 to ca06b3b Compare March 18, 2016 14:45
@wenqing
Copy link
Member Author

wenqing commented Mar 18, 2016

@WaltherM Just updated.

@hydromarc
Copy link

I see. I was just testing it at the momemt... I'll rebase and let you know about the status...

@norihiro-w norihiro-w mentioned this pull request Mar 20, 2016
@hydromarc
Copy link

can't test this due to #34

@wenqing
Copy link
Member Author

wenqing commented Mar 22, 2016

@norihiro-w Updated according to your comments.

@@ -26,6 +26,9 @@
/*--------------------- MPI Parallel -------------------*/
#if defined(USE_MPI) || defined(USE_MPI_PARPROC) || defined(USE_MPI_REGSOIL)
#include <mpi.h>
#ifdef OGS_FEM_IPQC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry after #35, we don't need this #ifdef anymore. SplitMPI_Communicator.h is always used if USE_MPI

@norihiro-w
Copy link
Contributor

@wenqing you can keep the original commit and remove the last commit including my comments.

@wenqing wenqing force-pushed the constrSt_MPI branch 3 times, most recently from 20e4429 to ca06b3b Compare March 22, 2016 10:08
@wenqing
Copy link
Member Author

wenqing commented Mar 22, 2016

@norihiro-w Recovered.

@norihiro-w
Copy link
Contributor

I will merge this after #35

@wenqing
Copy link
Member Author

wenqing commented Mar 22, 2016

@norihiro-w Thanks. @WaltherM should give a test before it is merged.

@hydromarc
Copy link

will test after #35 is merged and this is rebased.

@norihiro-w
Copy link
Contributor

#35 was merged

@hydromarc
Copy link

@wenqing could you please help me to rebase this - I tried it on master, but ogs crashes with

[node002:13167] *** An error occurred in MPI_Allreduce
[node002:13167] *** reported by process [47049921658881,47051366727684]
[node002:13167] *** on communicator MPI_COMM_WORLD
[node002:13167] *** MPI_ERR_IN_STATUS: error code in status
[node002:13167] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node002:13167] *** and potentially your MPI job)

@wenqing
Copy link
Member Author

wenqing commented Mar 23, 2016

@WaltherM Rebased. If it still crashes, please send me your input files.

#else
MPI_Comm comm = comm_DDC;
#endif
MPI_Allreduce(&st_conditionst, &int_buff, 1, MPI_INT, MPI_SUM, comm);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WaltherM collective communication function can’t be called here because that not call partitions satisfy the condition of (line 8044)

            if (st_node.size() > 0 && (long) st_node.size() > i)

The means some compute cores skip the if-condition, while the other move inside the if-condition to call MPI_Allreduce, which leads to a dead lock here.

Due to this reason, I have to close this PR and want to discuss the issue with you later.

@wenqing wenqing closed this Mar 30, 2016
@hydromarc
Copy link

@wenqing ok, lets discuss this on Monday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants