-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hybrid parallel (OpenMP) support for incompressible solvers #1178
Conversation
StrainMag_Max = max(StrainMag_Max, strainMax); | ||
Omega_Max = max(Omega_Max, omegaMax); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you do the reduction over the threads yourself because of efficiency, when the if statement is false, but wouldn't it be more readable to have a reduction clause in the OMP for loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for having a look at the code Edwin, the issues with these kind of reductions were:
- The microsoft compilers do not support max/min reductions;
- For the code to compile with AD (forward) we would need to define custom reduction operators, which might not be very readable;
- Earlier discussions about OpenMP+AD (reverse) suggested that avoiding reduction clauses would make it easier to develop a thread-safe AD tool.
For performance the reduction clause would be more efficient, as the compiler can make min/max atomic, however I do not know of a nice simple way to implement atomic min/max.
SU2_OMP_BARRIER | ||
SU2_OMP_MASTER | ||
{ | ||
SU2_OMP_MASTER { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For efficiency reasons then the Allreduce could be done in a single call with two arguments, as for both Omega and the Strain you compute the maximum.
@@ -38,7 +38,7 @@ | |||
*/ | |||
class CIncEulerSolver : public CFVMFlowSolverBase<CIncEulerVariable, INCOMPRESSIBLE> { | |||
protected: | |||
CFluidModel *FluidModel = nullptr; /*!< \brief fluid model used in the solver */ | |||
vector<CFluidModel*> FluidModel; /*!< \brief fluid model used in the solver. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to put the fluid model in the base class? CEulerSolver has exactly the same variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit different for the NEMO solver, it also has a "FluidModel" but of a different type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The few questions I had have been answered, so this one looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding omp support to the inc solver 💐 and moving some reg test accordingly
And thanks for moving some functions to the FVMFlowSolverBase 👍
Much appreciated
@@ -37,6 +37,8 @@ | |||
*/ | |||
class CEulerSolver : public CFVMFlowSolverBase<CEulerVariable, COMPRESSIBLE> { | |||
protected: | |||
using BaseClass = CFVMFlowSolverBase<CEulerVariable, COMPRESSIBLE>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need this to call an implementation of the base class later like BaseClass::SetInitialCondition in this very CEulerSolver. But SetInitialCondition is also public in the FVMBase class so it should be callable anyway because we inherit that via class CEulerSolver : public CFVMFlowSolverBase
... where am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call SetInitialCondition in SetInitialCondition you get an infinite loop, I just wanted to make more expressive I was calling the implementation of the parent first.
inline virtual void Viscous_Residual(unsigned long iEdge, CGeometry *geometry, CSolver **solver_container, | ||
CNumerics *numerics, CConfig *config) { } | ||
void Viscous_Residual_impl(unsigned long iEdge, CGeometry *geometry, CSolver **solver_container, | ||
CNumerics *numerics, CConfig *config); | ||
using CSolver::Viscous_Residual; /*--- Silence warning ---*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind this to be able to 'fake' override Viscous_Residual for viscous flows with just calling Viscous_Residual_impl because for inviscid flow the empty implementation from here is taken. This trick has to be used because FVMBase -> Euler -> NS is the inheritance and Viscous_Residual itself cannot be overloaded because Euler is the middleman?
And what happens to the call in CIntegration to Viscous_Residual for NS flow. Is there a doubled contribution then because it is called there and in the centered/upwind residual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in Cintegration the empty impl of CSolver is used and in the centered/upwind residual the correct Viscous_Residual impl with the Viscous_Residual_impl init ...
Seems that i have to go back and read a couple of pages in 'Inheritance for complete morons'. fml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's all correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except the self flagellation part, no need for that xD
* \note The convective residual methods include a call to this for each edge, | ||
* this allows convective and viscous loops to be "fused". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a speed-up due to this? Or is this just for simplification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fast code goes brrrrrrrrrrrrrrrrrrrr
template <class V, ENUM_REGIME R> | ||
void CFVMFlowSolverBase<V, R>::LoadRestart(CGeometry **geometry, CSolver ***solver, | ||
CConfig *config, int iter, bool update_geo) { | ||
LoadRestart_impl(geometry, solver, config, iter, update_geo); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why having the LoadRestart_impl is beneficial? The LoadRestart moved from the Euler and IncEuler to FVMFlowSolverBase so that is good but why the additional wrapper function
@@ -149,7 +142,7 @@ class CIncEulerSolver : public CFVMFlowSolverBase<CIncEulerVariable, INCOMPRESSI | |||
* \brief Compute the pressure at the infinity. | |||
* \return Value of the pressure at the infinity. | |||
*/ | |||
inline CFluidModel* GetFluidModel(void) const final { return FluidModel;} | |||
inline CFluidModel* GetFluidModel(void) const final { return FluidModel[omp_get_thread_num()]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I guess (real) shared-memory is not viable for the FluidModel as it is filled and computes stuff on the fly so each thread gets its own?
} | ||
} | ||
} | ||
BaseClass::SetInitialCondition(geometry, solver_container, config, TimeIter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the BaseClass call I mean above
@@ -1057,14 +983,21 @@ void CIncEulerSolver::SetTime_Step(CGeometry *geometry, CSolver **solver_contain | |||
void CIncEulerSolver::Centered_Residual(CGeometry *geometry, CSolver **solver_container, CNumerics **numerics_container, | |||
CConfig *config, unsigned short iMesh, unsigned short iRKStep) { | |||
|
|||
CNumerics* numerics = numerics_container[CONV_TERM]; | |||
CNumerics* numerics = numerics_container[CONV_TERM + omp_get_thread_num()*MAX_TERMS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you need multiple numerics instances because in the OMP loop they all individually fill the container and evaluate their residual, right?
void CIncEulerSolver::Centered_Residual(CGeometry *geometry, CSolver **solver_container, CNumerics **numerics_container, | ||
CConfig *config, unsigned short iMesh, unsigned short iRKStep) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it would be prob nice to reflect in the name that it contains the viscous residual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... but then I would need to change all the solvers, the vectorized residual loop is called EdgeFluxResidual to be more inclusive...
delete [] Restart_Data; Restart_Data = nullptr; | ||
LoadRestart_impl(geometry, solver, config, val_iter, val_update_geo, Solution, nVar_Restart); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TobiKattmann to satisfy the quirkiness of the incompressible solver that needs a default value for temperature when there is no energy equation
Proposed Changes
Remove some more duplication, fix some issues, and make the incompressible solvers compatible with OpenMP.
Related Work
Fix #1175
PR Checklist