Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

add document for nnictl #230

Merged
merged 36 commits into from
Oct 17, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
dc780cd
Merge pull request #1 from Microsoft/master
SparkSnail Sep 14, 2018
86243e7
Merge pull request #2 from Microsoft/master
SparkSnail Sep 14, 2018
3d1e4e9
fix nnictl bug
Sep 14, 2018
6d09780
Merge pull request #4 from Microsoft/master
SparkSnail Sep 17, 2018
0d24158
Merge branch 'master' of https://github.com/SparkSnail/nni
Sep 18, 2018
6d669c6
Merge pull request #6 from Microsoft/master
SparkSnail Sep 19, 2018
af2615d
Merge pull request #8 from Microsoft/master
SparkSnail Sep 20, 2018
f6b7c0a
Merge pull request #9 from Microsoft/master
SparkSnail Sep 24, 2018
a74febc
Merge pull request #10 from Microsoft/master
SparkSnail Sep 25, 2018
334b0a4
Merge pull request #12 from Microsoft/master
SparkSnail Sep 27, 2018
efe93df
Merge pull request #13 from Microsoft/master
SparkSnail Sep 27, 2018
0d9b074
Merge branch 'master' of https://github.com/SparkSnail/nni
Sep 28, 2018
421ad1a
Merge pull request #16 from Microsoft/master
SparkSnail Sep 30, 2018
660a8f8
Merge branch 'master' of https://github.com/SparkSnail/nni
Sep 30, 2018
2b01089
fix install.sh
Sep 30, 2018
951e80e
Merge pull request #17 from Microsoft/master
SparkSnail Oct 1, 2018
90fe674
Merge pull request #18 from Microsoft/master
SparkSnail Oct 7, 2018
2ccf0ed
Merge pull request #19 from Microsoft/master
SparkSnail Oct 8, 2018
77aacee
Merge pull request #20 from Microsoft/master
SparkSnail Oct 8, 2018
9e23dfe
Merge pull request #22 from Microsoft/master
SparkSnail Oct 8, 2018
ca7bbe4
Merge pull request #24 from Microsoft/master
SparkSnail Oct 10, 2018
346badd
add desc for Dockerfile.build.base
Oct 10, 2018
4af27d6
Merge pull request #27 from Microsoft/master
SparkSnail Oct 11, 2018
46a8350
update document for Dockerfile
Oct 11, 2018
4e3697f
Merge pull request #29 from Microsoft/master
SparkSnail Oct 12, 2018
4cd95aa
Merge pull request #30 from Microsoft/master
SparkSnail Oct 15, 2018
405ce45
Merge pull request #31 from Microsoft/master
SparkSnail Oct 15, 2018
c3949e6
Merge pull request #32 from Microsoft/master
SparkSnail Oct 16, 2018
22c78fd
Merge pull request #33 from Microsoft/master
SparkSnail Oct 16, 2018
a870817
update
Oct 16, 2018
b45268c
refactor port detect
Oct 16, 2018
59626ec
update
Oct 16, 2018
31ea28b
Merge pull request #34 from Microsoft/master
SparkSnail Oct 16, 2018
2ca84c5
refactor NNICTLDOC.md
Oct 17, 2018
ab02c93
add document for pai and nnictl
Oct 17, 2018
5ff7b45
add default value for port
Oct 17, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions docs/ExperimentConfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform:
searchSpacePath:
#choice: true, false
Expand Down Expand Up @@ -42,7 +42,7 @@ experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform:
searchSpacePath:
#choice: true, false
Expand Down Expand Up @@ -79,7 +79,7 @@ experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform:
#choice: true, false
useAnnotation:
Expand Down Expand Up @@ -145,6 +145,8 @@ machineList:
* __local__ mode means you run an experiment in your local linux machine.

* __remote__ mode means you submit trial jobs to remote linux machines. If you set platform as remote, you should complete __machineList__ field.

* __pai__ mode means you submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md)

* __searchSpacePath__
* Description
Expand Down Expand Up @@ -268,7 +270,7 @@ experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
Expand All @@ -292,7 +294,7 @@ experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
Expand Down Expand Up @@ -324,7 +326,7 @@ experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
Expand Down Expand Up @@ -360,7 +362,7 @@ experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
#choice: local, remote, pai
trainingServicePlatform: remote
searchSpacePath: /nni/search_space.json
#choice: true, false
Expand Down
2 changes: 1 addition & 1 deletion docs/GetStarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100

# choice: local, remote
# choice: local, remote, pai
trainingServicePlatform: local

# choice: true, false
Expand Down
78 changes: 73 additions & 5 deletions docs/NNICTLDOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ nnictl trial
nnictl experiment
nnictl config
nnictl log
nnictl webui
```
### Manage an experiment
* __nnictl create__
Expand All @@ -33,7 +34,7 @@ nnictl log
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --config, -c| True| |yaml configure file of the experiment|

| --port, -p | False| |the port of restful server|

* __nnictl resume__

Expand All @@ -56,11 +57,20 @@ nnictl log
* __nnictl stop__
* Description

You can use this command to stop a running experiment.
You can use this command to stop a running experiment or multiple experiments.

* Usage

nnictl stop
nnictl stop [id]

* Detail

1.If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message.
2.If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
3.If the id ends with *, nnictl will stop all experiments whose ids matchs the regular.
4.If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
5.If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
6.Users could use 'nnictl stop all' to stop all experiments

* __nnictl update__

Expand All @@ -78,6 +88,7 @@ nnictl log
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --filename, -f| True| |the file storing your new search space|
| --id, -i| False| |ID of the experiment you want to set|

* __nnictl update concurrency__
* Description
Expand All @@ -93,6 +104,7 @@ nnictl log
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --value, -v| True| |the number of allowed concurrent trials|
| --id, -i| False| |ID of the experiment you want to set|

* __nnictl update duration__
* Description
Expand All @@ -108,6 +120,7 @@ nnictl log
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --value, -v| True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.|
| --id, -i| False| |ID of the experiment you want to set|


* __nnictl trial__
Expand All @@ -120,6 +133,12 @@ nnictl log

nnictl trial ls

Options:

| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|

* __nnictl trial kill__
* Description

Expand All @@ -132,7 +151,8 @@ nnictl log

| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --trialid, -t| True| |ID of the trial you want to kill.|
| --trialid, -t| True| |ID of the trial you want to kill.|
| --id, -i| False| |ID of the experiment you want to set|



Expand All @@ -146,6 +166,36 @@ nnictl log
* Usage

nnictl experiment show

Options:

| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|


* __nnictl experiment status__
* Description

Show the status of experiment.
* Usage

nnictl experiment status

Options:

| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|


* __nnictl experiment list__
* Description

Show the id and start time of all running experiments.
* Usage

nnictl experiment list



Expand Down Expand Up @@ -176,6 +226,7 @@ nnictl log
| --head, -h| False| |show head lines of stdout|
| --tail, -t| False| |show tail lines of stdout|
| --path, -p| False| |show the path of stdout file|
| --id, -i| False| |ID of the experiment you want to set|

* __nnictl log stderr__
* Description
Expand All @@ -193,6 +244,7 @@ nnictl log
| --head, -h| False| |show head lines of stderr|
| --tail, -t| False| |show tail lines of stderr|
| --path, -p| False| |show the path of stderr file|
| --id, -i| False| |ID of the experiment you want to set|

* __nnictl log trial__
* Description
Expand All @@ -208,4 +260,20 @@ nnictl log
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -I| False| |the id of trial|



### Manage webui
* __nnictl webui url__
* Description

Show the urls of the experiment.

* Usage

nnictl webui url

Options:

| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --id, -i| False| |ID of the experiment you want to set|
2 changes: 1 addition & 1 deletion docs/RemoteMachineMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ trialConcurrency: 2
maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
# choice: local, remote
# choice: local, remote, pai
trainingServicePlatform: local
# choice: true, false
useAnnotation: true
Expand Down
2 changes: 1 addition & 1 deletion tools/nnicmd/common_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def detect_port(port):
socket_test = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
try:
socket_test.connect(('127.0.0.1', int(port)))
socket_test.shutdown(2)
socket_test.close()
return True
except:
return False
4 changes: 2 additions & 2 deletions tools/nnicmd/config_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,12 +92,12 @@
machine_list_schima = {
Optional('machineList'):[Or({
'ip': str,
'port': And(int, lambda x: 0 < x < 65535),
Optional('port'): And(int, lambda x: 0 < x < 65535),
'username': str,
'passwd': str
},{
'ip': str,
'port': And(int, lambda x: 0 < x < 65535),
Optional('port'): And(int, lambda x: 0 < x < 65535),
'username': str,
'sshKeyPath': os.path.exists,
Optional('passphrase'): str
Expand Down
5 changes: 5 additions & 0 deletions tools/nnicmd/launcher_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,11 @@ def validate_common_content(experiment_config):
experiment_config['maxExecDuration'] = '999d'
if experiment_config.get('maxTrialNum') is None:
experiment_config['maxTrialNum'] = 99999
if experiment_config['trainingServicePlatform'] == 'remote':
for index in range(len(experiment_config['machineList'])):
if experiment_config['machineList'][index].get('port') is None:
experiment_config['machineList'][index]['port'] = 22

except Exception as exception:
raise Exception(exception)

Expand Down