Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Commit

Permalink
Support paiStorageConfigName (#2536)
Browse files Browse the repository at this point in the history
  • Loading branch information
SparkSnail authored Jun 12, 2020
1 parent 52f71f5 commit 8a60d62
Show file tree
Hide file tree
Showing 23 changed files with 37 additions and 35 deletions.
21 changes: 11 additions & 10 deletions docs/en_US/TrainingService/PaiMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Step 3. Mount NFS storage to local machine.
![](../../img/pai_job_submission_page.jpg)
Find the data management region in job submission page.
![](../../img/pai_data_management_page.jpg)
The `DEFAULT_STORAGE`field is the path to be mounted in PAI's container when a job is started. The `Preview container paths` is the NFS host and path that PAI provided, you need to mount the corresponding host and path to your local machine first, then NNI could use the PAI's NFS storage.
The `Preview container paths` is the NFS host and path that PAI provided, you need to mount the corresponding host and path to your local machine first, then NNI could use the PAI's NFS storage.
For example, use the following command:
```
sudo mount -t nfs4 gcr-openpai-infra02:/pai/data /local/mnt
Expand All @@ -25,13 +25,14 @@ Then the `/data` folder in container will be mounted to `/local/mnt` folder in y
You could use the following configuration in your NNI's config file:
```
nniManagerNFSMountPath: /local/mnt
containerNFSMountPath: /data
```

Step 4. Get PAI's storage plugin name.
Contact PAI's admin, and get the PAI's storage plugin name for NFS storage. The default storage name is `teamwise_storage`, the configuration in NNI's config file is in following value:
Step 4. Get PAI's storage config name and nniManagerMountPath
The `Team share storage` field is storage configuration used to specify storage value in PAI. You can get `paiStorageConfigName` and `containerNFSMountPath` field in `Team share storage`, for example:

```
paiStoragePlugin: teamwise_storage
paiStorageConfigName: confignfs-data
containerNFSMountPath: /mnt/confignfs-data
```

## Run an experiment
Expand Down Expand Up @@ -66,7 +67,7 @@ trial:
virtualCluster: default
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: teamwise_storage
paiStorageConfigName: confignfs-data
# Configuration to access OpenPAI Cluster
paiConfig:
userName: your_pai_nni_user
Expand All @@ -90,13 +91,13 @@ Compared with [LocalMode](LocalMode.md) and [RemoteMachineMode](RemoteMachineMod
* Required key. Set the mount path in your nniManager machine.
* containerNFSMountPath
* Required key. Set the mount path in your container used in PAI.
* paiStoragePlugin
* Optional key. Set the storage plugin name used in PAI. If it is not set in trial configuration, it should be set in the config file specified in `paiConfigPath` field.
* paiStorageConfigName:
* Optional key. Set the storage name used in PAI. If it is not set in trial configuration, it should be set in the config file specified in `paiConfigPath` field.
* command
* Optional key. Set the commands used in PAI container.
* paiConfigPath
* Optional key. Set the file path of pai job configuration, the file is in yaml format.
If users set `paiConfigPath` in NNI's configuration file, no need to specify the fields `command`, `paiStoragePlugin`, `virtualCluster`, `image`, `memoryMB`, `cpuNum`, `gpuNum` in `trial` configuration. These fields will use the values from the config file specified by `paiConfigPath`.
If users set `paiConfigPath` in NNI's configuration file, no need to specify the fields `command`, `paiStorageConfigName`, `virtualCluster`, `image`, `memoryMB`, `cpuNum`, `gpuNum` in `trial` configuration. These fields will use the values from the config file specified by `paiConfigPath`.
```
Note:
1. The job name in PAI's configuration file will be replaced by a new job name, the new job name is created by NNI, the name format is nni_exp_${this.experimentId}_trial_${trialJobId}.
Expand Down Expand Up @@ -127,7 +128,7 @@ And you will be redirected to HDFS web portal to browse the output files of that
You can see there're three fils in output folder: stderr, stdout, and trial.log
## data management
Befour using NNI to start your experiment, users should set the corresponding mount data path in your nniManager machine. PAI has their own storage(NFS, AzureBlob ...), and the storage will used in PAI will be mounted to the container when it start a job. Users should set the PAI storage type by `paiStoragePlugin` field to choose a storage in PAI. Then users should mount the storage to their nniManager machine, and set the `nniManagerNFSMountPath` field in configuration file, NNI will generate bash files and copy data in `codeDir` to the `nniManagerNFSMountPath` folder, then NNI will start a trial job. The data in `nniManagerNFSMountPath` will be sync to PAI storage, and will be mounted to PAI's container. The data path in container is set in `containerNFSMountPath`, NNI will enter this folder first, and then run scripts to start a trial job.
Before using NNI to start your experiment, users should set the corresponding mount data path in your nniManager machine. PAI has their own storage(NFS, AzureBlob ...), and the storage will used in PAI will be mounted to the container when it start a job. Users should set the PAI storage type by `paiStorageConfigName` field to choose a storage in PAI. Then users should mount the storage to their nniManager machine, and set the `nniManagerNFSMountPath` field in configuration file, NNI will generate bash files and copy data in `codeDir` to the `nniManagerNFSMountPath` folder, then NNI will start a trial job. The data in `nniManagerNFSMountPath` will be sync to PAI storage, and will be mounted to PAI's container. The data path in container is set in `containerNFSMountPath`, NNI will enter this folder first, and then run scripts to start a trial job.
## version check
NNI support version check feature in since version 0.6. It is a policy to insure the version of NNIManager is consistent with trialKeeper, and avoid errors caused by version incompatibility.
Expand Down
Binary file modified docs/img/pai_data_management_page.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion examples/trials/auto-gbdt/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/cifar10_pytorch/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/efficientnet/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
nniManagerIp: <nni_manager_ip>
paiConfig:
userName: <username>
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/ga_squad/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/mnist-advisor/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/mnist-annotation/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/mnist-batch-tune-keras/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/mnist-keras/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/mnist-pytorch/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/mnist-tfv1/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/network_morphism/cifar10/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/sklearn/classification/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion examples/trials/sklearn/regression/config_pai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ trial:
image: msranni/nni:latest
nniManagerNFSMountPath: /home/user/mnt
containerNFSMountPath: /mnt/data/user
paiStoragePlugin: team_wise
paiStorageConfigName: confignfs-data
paiConfig:
#The username to login pai
userName: username
Expand Down
2 changes: 1 addition & 1 deletion src/nni_manager/rest_server/restValidationSchemas.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ export namespace ValidationSchemas {
nniManagerNFSMountPath: joi.string().min(1),
containerNFSMountPath: joi.string().min(1),
paiConfigPath: joi.string(),
paiStoragePlugin: joi.string().min(1),
paiStorageConfigName: joi.string().min(1),
nasMode: joi.string().valid('classic_mode', 'enas_mode', 'oneshot_mode', 'darts_mode'),
portList: joi.array().items(joi.object({
label: joi.string().required(),
Expand Down
6 changes: 3 additions & 3 deletions src/nni_manager/training_service/pai/paiK8S/paiK8SConfig.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,20 @@ export class NNIPAIK8STrialConfig extends TrialConfig {
public virtualCluster?: string;
public readonly nniManagerNFSMountPath: string;
public readonly containerNFSMountPath: string;
public readonly paiStoragePlugin: string;
public readonly paiStorageConfigName: string;
public readonly paiConfigPath?: string;

constructor(command: string, codeDir: string, gpuNum: number, cpuNum: number, memoryMB: number,
image: string, nniManagerNFSMountPath: string, containerNFSMountPath: string,
paiStoragePlugin: string, virtualCluster?: string, paiConfigPath?: string) {
paiStorageConfigName: string, virtualCluster?: string, paiConfigPath?: string) {
super(command, codeDir, gpuNum);
this.cpuNum = cpuNum;
this.memoryMB = memoryMB;
this.image = image;
this.virtualCluster = virtualCluster;
this.nniManagerNFSMountPath = nniManagerNFSMountPath;
this.containerNFSMountPath = containerNFSMountPath;
this.paiStoragePlugin = paiStoragePlugin;
this.paiStorageConfigName = paiStorageConfigName;
this.paiConfigPath = paiConfigPath;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -233,9 +233,9 @@ class PAIK8STrainingService extends PAITrainingService {
}
},
extras: {
'com.microsoft.pai.runtimeplugin': [
'storages': [
{
plugin: this.paiTrialConfig.paiStoragePlugin
name: this.paiTrialConfig.paiStorageConfigName
}
],
submitFrom: 'submit-job-v2'
Expand Down
2 changes: 1 addition & 1 deletion test/config/training_service.yml
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ pai:
memoryMB: 8192
nniManagerNFSMountPath:
containerNFSMountPath:
paiStoragePlugin:
paiStorageConfigName:
remote:
machineList:
- ip:
Expand Down
5 changes: 3 additions & 2 deletions test/nni_test/nnitest/generate_ts_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ def update_training_service_config(args):
config[args.ts]['trial']['nniManagerNFSMountPath'] = args.nni_manager_nfs_mount_path
if args.container_nfs_mount_path is not None:
config[args.ts]['trial']['containerNFSMountPath'] = args.container_nfs_mount_path
if args.pai_storage_plugin is not None:
config[args.ts]['trial']['paiStoragePlugin'] = args.pai_storage_plugin
if args.pai_storage_config_name is not None:
config[args.ts]['trial']['paiStorageConfigName'] = args.pai_storage_config_name
if args.vc is not None:
config[args.ts]['trial']['virtualCluster'] = args.vc
elif args.ts == 'kubeflow':
Expand Down Expand Up @@ -102,6 +102,7 @@ def update_training_service_config(args):
parser.add_argument("--vc", type=str)
parser.add_argument("--pai_token", type=str)
parser.add_argument("--pai_storage_plugin", type=str)
parser.add_argument("--pai_storage_config_name", type=str)
parser.add_argument("--nni_manager_nfs_mount_path", type=str)
parser.add_argument("--container_nfs_mount_path", type=str)
# args for kubeflow and frameworkController
Expand Down
2 changes: 1 addition & 1 deletion tools/nni_cmd/config_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ def setPathCheck(key):
Optional('memoryMB'): setType('memoryMB', int),
Optional('image'): setType('image', str),
Optional('virtualCluster'): setType('virtualCluster', str),
Optional('paiStoragePlugin'): setType('paiStoragePlugin', str),
Optional('paiStorageConfigName'): setType('paiStorageConfigName', str),
Optional('paiConfigPath'): And(os.path.exists, error=SCHEMA_PATH_ERROR % 'paiConfigPath')
}
}
Expand Down
2 changes: 1 addition & 1 deletion tools/nni_cmd/launcher_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ def validate_pai_config_path(experiment_config):
print_error('Please set taskRoles in paiConfigPath config file!')
exit(1)
else:
pai_trial_fields_required_list = ['image', 'gpuNum', 'cpuNum', 'memoryMB', 'paiStoragePlugin', 'command']
pai_trial_fields_required_list = ['image', 'gpuNum', 'cpuNum', 'memoryMB', 'paiStorageConfigName', 'command']
for trial_field in pai_trial_fields_required_list:
if experiment_config['trial'].get(trial_field) is None:
print_error('Please set {0} in trial configuration,\
Expand Down

0 comments on commit 8a60d62

Please sign in to comment.