forked from E3SM-Project/E3SM
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request E3SM-Project#5 from philipwjones/omega/mach-env-de…
…sign Added requirements and design for tracking and initializing machine and environment vars
- Loading branch information
Showing
1 changed file
with
265 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,265 @@ | ||
<!--- OMEGA Machine Environment Requirements and Design -----------------------> | ||
|
||
# OMEGA Requirements and Design: | ||
|
||
|
||
# *MachineEnv* | ||
|
||
|
||
## 1 Overview | ||
|
||
On startup, OMEGA will need to initialize the message-passing and other | ||
environments and set up parameters related to machine layout, messaging, | ||
and hardware for use throughout OMEGA. | ||
|
||
|
||
## 2 Requirements | ||
|
||
### 2.1 Requirement: Initialize MPI in Standalone | ||
|
||
In standalone mode, OMEGA will need to initialize the MPI environment | ||
and define a default communicator. | ||
|
||
### 2.2 Requirement: Create MPI communicator in coupled mode | ||
|
||
When running as part of a coupled system, OMEGA will need to define a default | ||
communicator based on a parent communicator sent by the calling routine | ||
(coupler or parent model). | ||
|
||
### 2.3 Requirement: Define MPI layouts | ||
|
||
Each MPI rank will need to know its own rank id, number of ranks and | ||
define a master rank. | ||
|
||
### 2.4 Requirement: YAKL initialization | ||
|
||
Since we are using YAKL for kernel launching and array types, we | ||
will need to initialize YAKL in standalone mode. It may also be | ||
needed for coupled simulations. | ||
|
||
### 2.5 Requirement: Vector blocking size defined at compile time | ||
|
||
To achieve the best CPU performance, especially when using | ||
GPU-friendly loop forms, it is useful to explicitly size inner | ||
loops based on a compile-time length (chunk size) that is a multiple | ||
of the vector length. | ||
|
||
### 2.6 Desired: Set alternative master task | ||
|
||
While rank 0 is typically used as a master task, it is sometimes desirable | ||
to assign a different rank as master to avoid overloading rank 0, especially | ||
when running in coupled mode with other components also using the same rank. | ||
|
||
### 2.7 Desired: Multiple environments | ||
|
||
In OMEGA, we may wish to run sub-components on different partitions. For | ||
example, we might want to rearrange the communication-dominated | ||
barotropic solve to run on fewer nodes or within a single node. We will | ||
need to be able to create new environments based on a subset of an | ||
existing environment. In setting up multiple environments, it will | ||
be desireable to have some awareness of network topology for optimal | ||
task placement. | ||
|
||
### 2.8 Desired: Define threading parameters | ||
|
||
If OpenMP threading is enabled for CPU, it may be useful to also define | ||
similar master threads and thread layouts to enable some task parallelism | ||
within an MPI rank. | ||
|
||
### 2.9 Desired: Define other machine parameters | ||
|
||
In the future, it may be useful to define other machine parameters, like | ||
the specific configuration of cores, accelerators and other devices, to | ||
manage task assignments on hybrid nodes. It should be easy to modify | ||
this class to add additional information. | ||
|
||
## 3 Algorithmic Formulation | ||
|
||
No algorithms are needed beyond what is provided by standard MPI, OpenMP | ||
libraries. | ||
|
||
## 4 Design | ||
|
||
In general, this is a simple class to hold information for later | ||
retrieval. | ||
|
||
### 4.1 Data types and parameters | ||
|
||
#### 4.1.1 Parameters | ||
|
||
For improved vector performance on CPU and perhaps match thread block | ||
sizes on GPU, we wish to block the inner loops with a compile-time | ||
parameter. We define a CPP parameter: `OMEGA_VECTOR_SIZE`. This is | ||
dependent on the machine and on whether GPU acceleration is enabled. | ||
It will typically take on values like 16, 32, 64 for CPU-only builds | ||
and 1 for GPU builds to maximize parallelism. | ||
|
||
#### 4.1.2 Class/structs/data types | ||
|
||
There will be a simple class MachEnv. We use a class here rather than | ||
a struct so that we can make members private to prevent overwriting | ||
these variables. | ||
|
||
class MachEnv { | ||
|
||
private: | ||
int mComm; ///< MPI communicator for this environment | ||
int mMyRank; ///< rank ID for local MPI rank | ||
int mNumRanks; ///< total number of MPI ranks | ||
int mMasterRank;///< rank ID for master rank | ||
bool mIsMaster; ///< true if the local rank is the master | ||
|
||
public: | ||
|
||
// Methods | ||
[define methods here - see below] | ||
} | ||
|
||
#### 4.1.3 Default environment | ||
|
||
We will keep a default environment `OMEGA::defaultEnv` as a public | ||
static instantiation that will be used by most of the OMEGA infrastructure. | ||
If other environments are created, they must be maintained by the | ||
defining routines or sub-components. | ||
|
||
### 4.2 Methods | ||
|
||
#### 4.2.1 Initialization | ||
|
||
There will be two forms of the initialization routines. One of | ||
these two must be called as early as possible in OMEGA initialization | ||
(typically the first call). Both forms will return an integer | ||
error code and will define the default environment `OMEGA::defaultEnv`. | ||
|
||
// Initialization - standalone | ||
int MachEnvInit(); | ||
|
||
// Initialization - coupled | ||
int MachEnvInit(int inCommunicator, ///< [in] parent MPI communicator | ||
); | ||
|
||
#### 4.2.2 Constructors | ||
|
||
We provide several constructors for creating instances of the environment. | ||
These will be used primarily by the above initialization routine, though | ||
can also be used to create additional environments per requirement 2.7. | ||
|
||
// Generic constructor that uses `MPI_COMM_WORLD` | ||
MachEnv(); | ||
|
||
// Constructor that uses an assigned communicator (eg from coupler) | ||
MachEnv(const int inCommunicator ///< [in] parent MPI communicator | ||
); | ||
|
||
// Create a new environment from a contiguous subset of an | ||
// existing environment | ||
MachEnv(const int inCommunicator,///< [in] parent MPI communicator | ||
const int newSize ///< [in] use first newSize ranks | ||
); | ||
|
||
// Create a new environment from a strided subset of an | ||
// existing environment | ||
MachEnv(const int inCommunicator,///< [in] parent MPI communicator | ||
const int newSize, ///< [in] num ranks in new env | ||
const int begin, ///< [in] starting parent rank | ||
const int stride ///< [in] stride for ranks to incl | ||
); | ||
|
||
// Create a new environment from a custome subset of an | ||
// existing environment, supplying list of parent ranks to include | ||
MachEnv(const int inCommunicator ///< [in] parent MPI communicator | ||
const int newSize, ///< [in] num ranks in new env | ||
const int ranks[] ///< [in] vector of parent ranks to incl | ||
); | ||
|
||
In standalone mode, the simple constructor would be used: | ||
|
||
MachEnv omegaEnv(); // Initialize MPI and machine environment | ||
|
||
and would define `MPI_COMM_WORLD` as the communicator as well | ||
as initialize various other environments as needed. In coupled mode, | ||
the second form would be used with the coupler passing the | ||
appropriate communicator to be used. | ||
|
||
To satisfy requirement 2.7, we provide three forms to create a new | ||
environment based on subsets of the old. The first simply creates | ||
a subset from the first newSize ranks of the parent. The second | ||
creates a strided subset (every "n" ranks starting from a specified | ||
beginning rank). A third is the most general and creates a subset from | ||
a vector of specific ranks of the parent to include in the new environment. | ||
|
||
#### 4.2.2 Get/Retrieval | ||
|
||
We provide specific retrieval functions for each class member to mimic | ||
using this environment as a struct: | ||
|
||
int Comm() const; ///< returns MPI communicator for this environment | ||
int MyRank() const; ///< returns local MPI rank | ||
int NumRanks() const; ///< total number of MPI ranks | ||
|
||
In typical use, these would look like: | ||
|
||
myRank = OMEGA::defaultEnv.MyRank(); // retrieve local rank id | ||
if (OMEGA::defaultEnv.IsMaster()){ | ||
// do stuff on master rank | ||
} | ||
|
||
#### 4.2.3 Change master task | ||
|
||
While most of the members should not be modified (read only using the get above), we will need a single set function to satisfy requirement 2.6. | ||
|
||
int SetMaster(const int newMasterRank); | ||
|
||
Note that this should occur as soon as possible after the environment is | ||
created. Resetting the master after other activities have already assumed | ||
the default master rank could leave variables defined on the wrong rank | ||
and undefined on the new master rank. The integer return value is a | ||
success/fail return code. | ||
|
||
|
||
## 5 Verification and Testing | ||
|
||
We will test this with a simple test driver in an 8-rank | ||
MPI configuration. Requirement 2.4 (YAKL initialization) will | ||
need to be verified by visual inspection but later tests using | ||
YAKL functions will determine whether this has been successful. | ||
|
||
### 5.1 Test standalone initialization | ||
|
||
The test driver will call the standalone initialization, then | ||
verify by retrieving the members and comparing to the equivalent | ||
native MPI calls on `MPI_COMM_WORLD` | ||
* tests requirement 2.1, 2.3 and retrieval functions | ||
|
||
### 5.2 Test changing master task | ||
|
||
After the above test, use the set function to change the master | ||
task to 1 and then verify using the retrieval function | ||
* tests requirement 2.6 | ||
|
||
### 5.3 Test multiple environments | ||
|
||
Create three new environments using each of the three subset | ||
constructors with the first creating a new environment using | ||
the first 4 ranks, the second using every other rank (stride 2), | ||
and the third using ranks 1,2,5,7. Verify that the members are | ||
as expected when compared to the parent environment. | ||
* tests requirement 2.7 | ||
|
||
### 5.4 Initialization in coupled mode | ||
|
||
While we can't test this directly in a standalone test driver, | ||
we can test the underlying constructor by passing one of the | ||
subset communicators as the input and verifying the resulting | ||
environment is the same as the subset environment. | ||
* tests requirement 2.2 (mostly) | ||
|
||
### 5.5 Test vector blocking factor | ||
|
||
Within the test driver, set an int variable to `OMEGA_VECTOR_SIZE`, | ||
and build the test with `-D OMEGA_VECTOR_SIZE=16`. Verify the | ||
internal variable is also 16 to test that the preprocessor properly | ||
propagates the value internally. | ||
* tests requirement 2.5 | ||
|
||
|