Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid parallel (OpenMP) support for incompressible solvers #1178

Merged
merged 17 commits into from
Jan 30, 2021

Conversation

pcarruscag
Copy link
Member

Proposed Changes

Remove some more duplication, fix some issues, and make the incompressible solvers compatible with OpenMP.

Related Work

Fix #1175

PR Checklist

  • I am submitting my contribution to the develop branch.
  • My contribution generates no new compiler warnings (try with the '-Wall -Wextra -Wno-unused-parameter -Wno-empty-body' compiler flags, or simply --warnlevel=2 when using meson).
  • My contribution is commented and consistent with SU2 style.
  • I have added a test case that demonstrates my contribution, if necessary.
  • I have updated appropriate documentation (Tutorials, Docs Page, config_template.cpp) , if necessary.

StrainMag_Max = max(StrainMag_Max, strainMax);
Omega_Max = max(Omega_Max, omegaMax);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you do the reduction over the threads yourself because of efficiency, when the if statement is false, but wouldn't it be more readable to have a reduction clause in the OMP for loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for having a look at the code Edwin, the issues with these kind of reductions were:

  • The microsoft compilers do not support max/min reductions;
  • For the code to compile with AD (forward) we would need to define custom reduction operators, which might not be very readable;
  • Earlier discussions about OpenMP+AD (reverse) suggested that avoiding reduction clauses would make it easier to develop a thread-safe AD tool.

For performance the reduction clause would be more efficient, as the compiler can make min/max atomic, however I do not know of a nice simple way to implement atomic min/max.

SU2_OMP_BARRIER
SU2_OMP_MASTER
{
SU2_OMP_MASTER {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For efficiency reasons then the Allreduce could be done in a single call with two arguments, as for both Omega and the Strain you compute the maximum.

@@ -38,7 +38,7 @@
*/
class CIncEulerSolver : public CFVMFlowSolverBase<CIncEulerVariable, INCOMPRESSIBLE> {
protected:
CFluidModel *FluidModel = nullptr; /*!< \brief fluid model used in the solver */
vector<CFluidModel*> FluidModel; /*!< \brief fluid model used in the solver. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to put the fluid model in the base class? CEulerSolver has exactly the same variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit different for the NEMO solver, it also has a "FluidModel" but of a different type.

Copy link
Contributor

@vdweide vdweide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The few questions I had have been answered, so this one looks good to me.

@pcarruscag pcarruscag merged commit be378f2 into develop Jan 30, 2021
@pcarruscag pcarruscag deleted the hybrid_parallel_incomp branch January 30, 2021 14:12
Copy link
Contributor

@TobiKattmann TobiKattmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding omp support to the inc solver 💐 and moving some reg test accordingly
And thanks for moving some functions to the FVMFlowSolverBase 👍
Much appreciated

@@ -37,6 +37,8 @@
*/
class CEulerSolver : public CFVMFlowSolverBase<CEulerVariable, COMPRESSIBLE> {
protected:
using BaseClass = CFVMFlowSolverBase<CEulerVariable, COMPRESSIBLE>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need this to call an implementation of the base class later like BaseClass::SetInitialCondition in this very CEulerSolver. But SetInitialCondition is also public in the FVMBase class so it should be callable anyway because we inherit that via class CEulerSolver : public CFVMFlowSolverBase ... where am I wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call SetInitialCondition in SetInitialCondition you get an infinite loop, I just wanted to make more expressive I was calling the implementation of the parent first.

Comment on lines +264 to +268
inline virtual void Viscous_Residual(unsigned long iEdge, CGeometry *geometry, CSolver **solver_container,
CNumerics *numerics, CConfig *config) { }
void Viscous_Residual_impl(unsigned long iEdge, CGeometry *geometry, CSolver **solver_container,
CNumerics *numerics, CConfig *config);
using CSolver::Viscous_Residual; /*--- Silence warning ---*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this to be able to 'fake' override Viscous_Residual for viscous flows with just calling Viscous_Residual_impl because for inviscid flow the empty implementation from here is taken. This trick has to be used because FVMBase -> Euler -> NS is the inheritance and Viscous_Residual itself cannot be overloaded because Euler is the middleman?
And what happens to the call in CIntegration to Viscous_Residual for NS flow. Is there a doubled contribution then because it is called there and in the centered/upwind residual?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in Cintegration the empty impl of CSolver is used and in the centered/upwind residual the correct Viscous_Residual impl with the Viscous_Residual_impl init ...
Seems that i have to go back and read a couple of pages in 'Inheritance for complete morons'. fml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's all correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except the self flagellation part, no need for that xD

Comment on lines +256 to +257
* \note The convective residual methods include a call to this for each edge,
* this allows convective and viscous loops to be "fused".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a speed-up due to this? Or is this just for simplification?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fast code goes brrrrrrrrrrrrrrrrrrrr

Comment on lines +1044 to +1048
template <class V, ENUM_REGIME R>
void CFVMFlowSolverBase<V, R>::LoadRestart(CGeometry **geometry, CSolver ***solver,
CConfig *config, int iter, bool update_geo) {
LoadRestart_impl(geometry, solver, config, iter, update_geo);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why having the LoadRestart_impl is beneficial? The LoadRestart moved from the Euler and IncEuler to FVMFlowSolverBase so that is good but why the additional wrapper function

@@ -149,7 +142,7 @@ class CIncEulerSolver : public CFVMFlowSolverBase<CIncEulerVariable, INCOMPRESSI
* \brief Compute the pressure at the infinity.
* \return Value of the pressure at the infinity.
*/
inline CFluidModel* GetFluidModel(void) const final { return FluidModel;}
inline CFluidModel* GetFluidModel(void) const final { return FluidModel[omp_get_thread_num()]; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I guess (real) shared-memory is not viable for the FluidModel as it is filled and computes stuff on the fly so each thread gets its own?

}
}
}
BaseClass::SetInitialCondition(geometry, solver_container, config, TimeIter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the BaseClass call I mean above

@@ -1057,14 +983,21 @@ void CIncEulerSolver::SetTime_Step(CGeometry *geometry, CSolver **solver_contain
void CIncEulerSolver::Centered_Residual(CGeometry *geometry, CSolver **solver_container, CNumerics **numerics_container,
CConfig *config, unsigned short iMesh, unsigned short iRKStep) {

CNumerics* numerics = numerics_container[CONV_TERM];
CNumerics* numerics = numerics_container[CONV_TERM + omp_get_thread_num()*MAX_TERMS];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you need multiple numerics instances because in the OMP loop they all individually fill the container and evaluate their residual, right?

Comment on lines 983 to 985
void CIncEulerSolver::Centered_Residual(CGeometry *geometry, CSolver **solver_container, CNumerics **numerics_container,
CConfig *config, unsigned short iMesh, unsigned short iRKStep) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it would be prob nice to reflect in the name that it contains the viscous residual.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... but then I would need to change all the solvers, the vectorized residual loop is called EdgeFluxResidual to be more inclusive...

Comment on lines -3092 to 2833
delete [] Restart_Data; Restart_Data = nullptr;
LoadRestart_impl(geometry, solver, config, val_iter, val_update_geo, Solution, nVar_Restart);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TobiKattmann to satisfy the quirkiness of the incompressible solver that needs a default value for temperature when there is no energy equation

@jblueh jblueh mentioned this pull request Mar 1, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect aero coeffs for compressible moving grid simulations
3 participants