Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Commit

Permalink
Add version check document in PAI, remote, kubeflow and frameworkcont…
Browse files Browse the repository at this point in the history
…roller (#947)
  • Loading branch information
SparkSnail authored Apr 2, 2019
1 parent c49c24c commit 29a2333
Show file tree
Hide file tree
Showing 6 changed files with 27 additions and 4 deletions.
5 changes: 4 additions & 1 deletion docs/en_US/FrameworkControllerMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,4 +97,7 @@ Trial configuration in frameworkcontroller mode have the following configuration
* frameworkAttemptCompletionPolicy: the policy to run framework, please refer the [user-manual](https://github.com/Microsoft/frameworkcontroller/blob/master/doc/user-manual.md#frameworkattemptcompletionpolicy) to get the specific information. Users could use the policy to control the pod, for example, if ps does not stop, only worker stops, this completionpolicy could helps stop ps.
## How to run example
After you prepare a config file, you could run your experiment by nnictl. The way to start an experiment on frameworkcontroller is similar to kubeflow, please refer the [document](./KubeflowMode.md) for more information.
After you prepare a config file, you could run your experiment by nnictl. The way to start an experiment on frameworkcontroller is similar to kubeflow, please refer the [document](./KubeflowMode.md) for more information.
## version check
NNI support version check feature in since version 0.6, [refer](PAIMode.md)
3 changes: 3 additions & 0 deletions docs/en_US/KubeflowMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,4 +196,7 @@ Notice: In kubeflow mode, NNIManager will start a rest server and listen on a po
Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information.
## version check
NNI support version check feature in since version 0.6, [refer](PAIMode.md)
Any problems when using NNI in kubeflow mode, please create issues on [NNI Github repo](https://github.com/Microsoft/nni).
10 changes: 10 additions & 0 deletions docs/en_US/PAIMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,3 +83,13 @@ You can see there're three fils in output folder: stderr, stdout, and trial.log
If you also want to save trial's other output into HDFS, like model files, you can use environment variable `NNI_OUTPUT_DIR` in your trial code to save your own output files, and NNI SDK will copy all the files in `NNI_OUTPUT_DIR` from trial's container to HDFS.
Any problems when using NNI in pai mode, please create issues on [NNI github repo](https://github.com/Microsoft/nni).
## version check
NNI support version check feature in since version 0.6. It is a policy to insure the version of NNIManager is consistent with trialKeeper, and avoid errors caused by version incompatibility.
Check policy:
1. NNIManager before v0.6 could run any version of trialKeeper, trialKeeper support backward compatibility.
2. Since version 0.6, NNIManager version should keep same with triakKeeper version. For example, if NNIManager version is 0.6, trialKeeper version should be 0.6 too.
3. Note that the version check feature only check first two digits of version.For example, NNIManager v0.6.1 could use trialKeeper v0.6 or trialKeeper v0.6.2, but could not use trialKeeper v0.5.1 or trialKeeper v0.7.
If you could not run your experiment and want to know if it is caused by version check, you could check your webUI, and there will be an error message about version check.
![](../img/version_check.png)
3 changes: 3 additions & 0 deletions docs/en_US/RemoteMachineMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,6 @@ nnictl create --config ~/nni/examples/trials/mnist-annotation/config_remote.yml
```

to start the experiment.

## version check
NNI support version check feature in since version 0.6, [refer](PAIMode.md)
Binary file added docs/img/version_check.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 7 additions & 3 deletions src/nni_manager/training_service/local/gpuScheduler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,13 @@ class GPUScheduler {

public async stop() {
this.stopping = true;
const pid: string = await fs.promises.readFile(path.join(this.gpuMetricCollectorScriptFolder, 'pid'), 'utf8');
await cpp.exec(`pkill -P ${pid}`);
await cpp.exec(`rm -rf ${this.gpuMetricCollectorScriptFolder}`);
try {
const pid: string = await fs.promises.readFile(path.join(this.gpuMetricCollectorScriptFolder, 'pid'), 'utf8');
await cpp.exec(`pkill -P ${pid}`);
await cpp.exec(`rm -rf ${this.gpuMetricCollectorScriptFolder}`);
} catch (error){
this.log.error(`GPU scheduler error: ${error}`);
}
}

private async updateGPUSummary() {
Expand Down

0 comments on commit 29a2333

Please sign in to comment.