Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Refactor nnictl to support list multiple experiment #207

Merged
merged 98 commits into from
Oct 16, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
dc780cd
Merge pull request #1 from Microsoft/master
SparkSnail Sep 14, 2018
86243e7
Merge pull request #2 from Microsoft/master
SparkSnail Sep 14, 2018
3d1e4e9
fix nnictl bug
Sep 14, 2018
b0b2136
fix nnictl create bug
Sep 14, 2018
6d09780
Merge pull request #4 from Microsoft/master
SparkSnail Sep 17, 2018
04ccb73
Merge branch 'master' of https://github.com/SparkSnail/nni into t-shy…
Sep 17, 2018
408a361
add experiment status logic
Sep 17, 2018
7c3d80c
add more information for nnictl
Sep 17, 2018
6c0a4f1
fix Evolution Tuner bug
Sep 17, 2018
1e61de8
refactor code
Sep 18, 2018
08a46ae
fix code in updater.py
Sep 18, 2018
0d24158
Merge branch 'master' of https://github.com/SparkSnail/nni
Sep 18, 2018
61772d0
fix nnictl --help
Sep 19, 2018
356272f
fix classArgs bug
Sep 19, 2018
ab1b34c
update check response.status_code logic
Sep 19, 2018
6d669c6
Merge pull request #6 from Microsoft/master
SparkSnail Sep 19, 2018
d9636b3
Merge branch 'master' of https://github.com/SparkSnail/nni into t-shy…
Sep 19, 2018
22b73bb
show trial log path
Sep 19, 2018
af2615d
Merge pull request #8 from Microsoft/master
SparkSnail Sep 20, 2018
57e1086
Merge branch 'master' of https://github.com/SparkSnail/nni into t-shy…
Sep 20, 2018
805fc57
update document
Sep 20, 2018
d863197
fix install.sh
Sep 21, 2018
f6b7c0a
Merge pull request #9 from Microsoft/master
SparkSnail Sep 24, 2018
9904639
Merge branch 'master' of https://github.com/SparkSnail/nni into t-shy…
Sep 24, 2018
8279ee4
set default vallue for maxTrialNum and maxExecDuration
Sep 24, 2018
52bea3b
fix nnictl
Sep 25, 2018
94181e8
fix config path hint
Sep 25, 2018
a74febc
Merge pull request #10 from Microsoft/master
SparkSnail Sep 25, 2018
8c46f6e
Merge branch 'master' of https://github.com/SparkSnail/nni into t-shy…
Sep 25, 2018
a0cef38
support multiPhase
Sep 26, 2018
334b0a4
Merge pull request #12 from Microsoft/master
SparkSnail Sep 27, 2018
b6b0738
Merge branch 'master' of https://github.com/SparkSnail/nni into t-shy…
Sep 27, 2018
efe93df
Merge pull request #13 from Microsoft/master
SparkSnail Sep 27, 2018
5f0f359
fix bash-completion
Sep 27, 2018
41ff800
Merge branch 'master' of https://github.com/SparkSnail/nni into t-shy…
Sep 27, 2018
9d7d87f
refactor bash-completion
Sep 27, 2018
c398c37
add sklearn-regression
Sep 27, 2018
4fd4238
add search_space
Sep 27, 2018
f61ab3e
fix bug
Sep 27, 2018
0d9b074
Merge branch 'master' of https://github.com/SparkSnail/nni
Sep 28, 2018
421ad1a
Merge pull request #16 from Microsoft/master
SparkSnail Sep 30, 2018
660a8f8
Merge branch 'master' of https://github.com/SparkSnail/nni
Sep 30, 2018
2b01089
fix install.sh
Sep 30, 2018
951e80e
Merge pull request #17 from Microsoft/master
SparkSnail Oct 1, 2018
90fe674
Merge pull request #18 from Microsoft/master
SparkSnail Oct 7, 2018
2ccf0ed
Merge pull request #19 from Microsoft/master
SparkSnail Oct 8, 2018
c364469
fix conflict
Oct 8, 2018
4539985
Merge branch 'master' of https://github.com/SparkSnail/nni into t-shy…
Oct 8, 2018
77aacee
Merge pull request #20 from Microsoft/master
SparkSnail Oct 8, 2018
c1dc94d
Merge branch 'master' of https://github.com/SparkSnail/nni into t-shy…
Oct 8, 2018
cf1bf50
refactor code
Oct 8, 2018
9b7045a
remove unused code
Oct 8, 2018
9e23dfe
Merge pull request #22 from Microsoft/master
SparkSnail Oct 8, 2018
11180c5
Merge branch 'master' of https://github.com/SparkSnail/nni into t-shy…
Oct 8, 2018
65294e4
support multi experiments
Oct 8, 2018
67aa22a
fix issues
Oct 8, 2018
9a2a168
Support multiple experiments of nnictl (#183)
SparkSnail Oct 10, 2018
cf0771d
Let nni manager web server handle static content
Oct 10, 2018
be5cbf9
Merge pull request #23 from Microsoft/dev-multiple-experiments
SparkSnail Oct 10, 2018
ad5b0df
set nnictl stop require the port
Oct 10, 2018
6e1f848
Support multiple experiments of nnictl (#183)
SparkSnail Oct 10, 2018
c442b3d
Let nni manager web server handle static content
Oct 10, 2018
e64ec24
Dev multiple experiments (#189)
SparkSnail Oct 10, 2018
ca7bbe4
Merge pull request #24 from Microsoft/master
SparkSnail Oct 10, 2018
346badd
add desc for Dockerfile.build.base
Oct 10, 2018
e880844
Update documents for supporting multiple experiments
Oct 11, 2018
86a0b0f
Merge branch 'dev-multiple-experiments' into dev-multiple-experiments
SparkSnail Oct 11, 2018
84a04da
Merge pull request #26 from Microsoft/dev-multiple-experiments
SparkSnail Oct 11, 2018
ee11679
create a constant variable for 51188
Oct 11, 2018
617574e
create a constant variable for 51188
Oct 11, 2018
bd6a1ed
Fixed issue that WebUI can not refresh page
Oct 11, 2018
4da6e40
Upgrade Node.js and Yarn to latest version
Oct 11, 2018
4af27d6
Merge pull request #27 from Microsoft/master
SparkSnail Oct 11, 2018
46a8350
update document for Dockerfile
Oct 11, 2018
d390fae
fix conflict
Oct 11, 2018
3ca6415
Merge pull request #28 from Microsoft/dev-multiple-experiments
SparkSnail Oct 11, 2018
1030ba1
Merge branch 'dev-multiple-experiments' of https://github.com/SparkSn…
Oct 11, 2018
2f9c704
refactor nnictl to support multi experiment list
Oct 11, 2018
8282da7
change class name Experiment->Experiments
Oct 12, 2018
b2e34fe
add desc for nnictl stop
Oct 12, 2018
4e3697f
Merge pull request #29 from Microsoft/master
SparkSnail Oct 12, 2018
9bb4cc7
refactor nnictl stop information
Oct 12, 2018
f6f0db8
update
Oct 12, 2018
09eb656
refactor nnictl
Oct 15, 2018
8b1f3a8
remove --id from nnictl stop command
Oct 15, 2018
f60c39c
add port detect
Oct 15, 2018
70f1cbc
change default rest port
Oct 15, 2018
b03fe34
refactor experiment list command
Oct 15, 2018
4cd95aa
Merge pull request #30 from Microsoft/master
SparkSnail Oct 15, 2018
64ed26f
support stop all
Oct 15, 2018
405ce45
Merge pull request #31 from Microsoft/master
SparkSnail Oct 15, 2018
c169314
Merge branch 'master' of https://github.com/SparkSnail/nni into dev-m…
Oct 15, 2018
1f61f75
fix azure
Oct 15, 2018
8b16d63
fix readme.md
Oct 16, 2018
c3949e6
Merge pull request #32 from Microsoft/master
SparkSnail Oct 16, 2018
c890cee
Merge branch 'master' of https://github.com/SparkSnail/nni into dev-m…
Oct 16, 2018
1b026ee
fix comments
Oct 16, 2018
92e757d
fix comments
Oct 16, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 0 additions & 18 deletions .travis.yml

This file was deleted.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![Issues](https://img.shields.io/github/issues-raw/Microsoft/nni.svg)](https://github.com/Microsoft/nni/issues?q=is%3Aissue+is%3Aopen)
[![Bugs](https://img.shields.io/github/issues/Microsoft/nni/bug.svg)](https://github.com/Microsoft/nni/issues?q=is%3Aissue+is%3Aopen+label%3Abug)
[![Pull Requests](https://img.shields.io/github/issues-pr-raw/Microsoft/nni.svg)](https://github.com/Microsoft/nni/pulls?q=is%3Apr+is%3Aopen)
[![Version](https://img.shields.io/github/tag/Microsoft/nni.svg)]()
[![Version](https://img.shields.io/github/release/Microsoft/nni.svg)](https://github.com/Microsoft/nni/releases)

NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning experiments.
The tool dispatches and runs trial jobs that generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments (e.g. local machine, remote servers and cloud).
Expand Down
2 changes: 2 additions & 0 deletions deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ Dockerfile
===
## 1.Description
This is the Dockerfile of nni project, including the most kinds of deeplearning frameworks and nni source code. You can run your nni experiment in this docker container directly.
Dockerfile.build.base could build the base Docker image, users can get a docker image with Ubuntu and NNI environment after building this file.
Dockerfile could build the customized docker image, users could build their customized docker image using this file.
## 2.Including Libraries

```
Expand Down
2 changes: 1 addition & 1 deletion test/naive/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,4 @@ def run():
traceback.print_exc()
raise error

subprocess.run(['nnictl', 'stop', '--port', '51188'])
subprocess.run(['nnictl', 'stop'])
11 changes: 11 additions & 0 deletions tools/nnicmd/common_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import json
import yaml
import psutil
import socket
from .constants import ERROR_INFO, NORMAL_INFO, WARNING_INFO, COLOR_RED_FORMAT, COLOR_YELLOW_FORMAT

def get_yml_content(file_path):
Expand Down Expand Up @@ -60,3 +61,13 @@ def detect_process(pid):
return process.is_running()
except:
return False

def detect_port(port):
'''Detect if the port is used'''
socket_test = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
try:
socket_test.connect(('127.0.0.1', int(port)))
socket_test.shutdown(2)
return True
except:
return False
51 changes: 43 additions & 8 deletions tools/nnicmd/config_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@
import os
import json
import shutil
from .constants import HOME_DIR
from .constants import NNICTL_HOME_DIR

class Config:
'''a util class to load and save config'''
def __init__(self, port):
config_path = os.path.join(HOME_DIR, str(port))
config_path = os.path.join(NNICTL_HOME_DIR, str(port))
os.makedirs(config_path, exist_ok=True)
self.config_file = os.path.join(config_path, '.config')
self.config = self.read_file()
Expand All @@ -46,12 +46,6 @@ def get_config(self, key):
'''get a value according to key'''
return self.config.get(key)

def copy_metadata_to_new_path(self, path):
'''copy metadata to a new path'''
if not os.path.exists(path):
os.mkdir(path)
shutil.copy(self.config_file, path)

def write_file(self):
'''save config to local file'''
if self.config:
Expand All @@ -71,3 +65,44 @@ def read_file(self):
except ValueError:
return {}
return {}

class Experiments:
'''Maintain experiment list'''
def __init__(self):
os.makedirs(NNICTL_HOME_DIR, exist_ok=True)
self.experiment_file = os.path.join(NNICTL_HOME_DIR, '.experiment')
self.experiments = self.read_file()

def add_experiment(self, id, port, time):
'''set {key:value} paris to self.experiment'''
self.experiments[id] = [port, time]
self.write_file()

def remove_experiment(self, id):
'''remove an experiment by id'''
if id in self.experiments:
self.experiments.pop(id)
self.write_file()

def get_all_experiments(self):
'''return all of experiments'''
return self.experiments

def write_file(self):
'''save config to local file'''
try:
with open(self.experiment_file, 'w') as file:
json.dump(self.experiments, file)
except IOError as error:
print('Error:', error)
return

def read_file(self):
'''load config from local file'''
if os.path.exists(self.experiment_file):
try:
with open(self.experiment_file, 'r') as file:
return json.load(file)
except ValueError:
return {}
return {}
20 changes: 16 additions & 4 deletions tools/nnicmd/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,20 @@

import os

HOME_DIR = os.path.join(os.environ['HOME'], '.local', 'nni', 'nnictl')
NNICTL_HOME_DIR = os.path.join(os.environ['HOME'], '.local', 'nni', 'nnictl')

ERROR_INFO = 'ERROR: %s'

NORMAL_INFO = 'INFO: %s'

WARNING_INFO = 'WARNING: %s'

DEFAULT_REST_PORT = 51188
DEFAULT_REST_PORT = 8080

EXPERIMENT_SUCCESS_INFO = '\033[1;32;32mSuccessfully started experiment!\n\033[0m' \
'-----------------------------------------------------------------------\n' \
'The experiment id is %s\n'\
'The restful server post is %s\n' \
'The Web UI urls are: %s\n' \
'-----------------------------------------------------------------------\n\n' \
'You can use these commands to get more information about the experiment\n' \
'-----------------------------------------------------------------------\n' \
Expand All @@ -42,11 +42,23 @@
'2. nnictl trial ls list all of trial jobs\n' \
'3. nnictl log stderr show stderr log content\n' \
'4. nnictl log stdout show stdout log content\n' \
'5. nnictl stop stop a experiment\n' \
'5. nnictl stop stop an experiment\n' \
'6. nnictl trial kill kill a trial job by id\n' \
'7. nnictl --help get help information about nnictl\n' \
'-----------------------------------------------------------------------\n' \

LOG_HEADER = '-----------------------------------------------------------------------\n' \
' Experiment start time %s\n' \
'-----------------------------------------------------------------------\n'

EXPERIMENT_START_FAILED_INFO = 'There is an experiment running in the port %d, please stop it first or set another port!\n' \
'You could use \'nnictl stop --port [PORT]\' command to stop an experiment!\nOr you could use \'nnictl create --config [CONFIG_PATH] --port [PORT]\' to set port!\n'

EXPERIMENT_ID_INFO = '-----------------------------------------------------------------------\n' \
' Experiment information\n' \
'%s\n' \
'-----------------------------------------------------------------------\n'

PACKAGE_REQUIREMENTS = {
'SMAC': 'smac_tuner'
}
Expand Down
53 changes: 32 additions & 21 deletions tools/nnicmd/launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,11 @@
from .launcher_utils import validate_all_content
from .rest_utils import rest_put, rest_post, check_rest_server, check_rest_server_quick, check_response
from .url_utils import cluster_metadata_url, experiment_url
from .config_utils import Config
from .common_utils import get_yml_content, get_json_content, print_error, print_normal, print_warning, detect_process
from .config_utils import Config, Experiments
from .common_utils import get_yml_content, get_json_content, print_error, print_normal, print_warning, detect_process, detect_port
from .constants import *
from .webui_utils import *
import time

def start_rest_server(port, platform, mode, experiment_id=None):
'''Run nni manager process'''
Expand All @@ -40,21 +42,29 @@ def start_rest_server(port, platform, mode, experiment_id=None):
rest_port = nni_config.get_config('restServerPort')
running, _ = check_rest_server_quick(rest_port)
if rest_port and running:
print_error('There is an experiment running, please stop it first...')
print_normal('You can use \'nnictl stop\' command to stop an experiment!')
exit(0)
print_error(EXPERIMENT_START_FAILED_INFO % port)
exit(1)

if detect_port(port):
print_error('Port %s is used by another process, please reset the port!' % port)
exit(1)

print_normal('Starting restful server...')
manager = os.environ.get('NNI_MANAGER', 'nnimanager')
cmds = [manager, '--port', str(port), '--mode', platform, '--start_mode', mode]
if mode == 'resume':
cmds += ['--experiment_id', experiment_id]
stdout_full_path = os.path.join(HOME_DIR, str(port), 'stdout')
stderr_full_path = os.path.join(HOME_DIR, str(port), 'stderr')
stdout_full_path = os.path.join(NNICTL_HOME_DIR, str(port), 'stdout')
stderr_full_path = os.path.join(NNICTL_HOME_DIR, str(port), 'stderr')
stdout_file = open(stdout_full_path, 'a+')
stderr_file = open(stderr_full_path, 'a+')
time_now = time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
#add time information in the header of log files
log_header = LOG_HEADER % str(time_now)
stdout_file.write(log_header)
stderr_file.write(log_header)
process = Popen(cmds, stdout=stdout_file, stderr=stderr_file)
return process
return process, str(time_now)

def set_trial_config(experiment_config, port):
'''set trial configuration'''
Expand All @@ -79,7 +89,7 @@ def set_trial_config(experiment_config, port):
return True
else:
print('Error message is {}'.format(response.text))
stderr_full_path = os.path.join(HOME_DIR, str(port), 'stderr')
stderr_full_path = os.path.join(NNICTL_HOME_DIR, str(port), 'stderr')
with open(stderr_full_path, 'a+') as fout:
fout.write(json.dumps(json.loads(response.text), indent=4, sort_keys=True, separators=(',', ':')))
return False
Expand All @@ -98,7 +108,7 @@ def set_remote_config(experiment_config, port):
if not response or not check_response(response):
if response is not None:
err_message = response.text
stderr_full_path = os.path.join(HOME_DIR, str(port), 'stderr')
stderr_full_path = os.path.join(NNICTL_HOME_DIR, str(port), 'stderr')
with open(stderr_full_path, 'a+') as fout:
fout.write(json.dumps(json.loads(err_message), indent=4, sort_keys=True, separators=(',', ':')))
return False, err_message
Expand All @@ -115,7 +125,8 @@ def set_pai_config(experiment_config, port):
if not response or not response.status_code == 200:
if response is not None:
err_message = response.text
with open(STDERR_FULL_PATH, 'a+') as fout:
stderr_full_path = os.path.join(NNICTL_HOME_DIR, str(port), 'stderr')
with open(stderr_full_path, 'a+') as fout:
fout.write(json.dumps(json.loads(err_message), indent=4, sort_keys=True, separators=(',', ':')))
return False, err_message

Expand Down Expand Up @@ -180,7 +191,7 @@ def set_experiment(experiment_config, mode, port):
if check_response(response):
return response
else:
stderr_full_path = os.path.join(HOME_DIR, str(port), 'stderr')
stderr_full_path = os.path.join(NNICTL_HOME_DIR, str(port), 'stderr')
with open(stderr_full_path, 'a+') as fout:
fout.write(json.dumps(json.loads(response.text), indent=4, sort_keys=True, separators=(',', ':')))
print_error('Setting experiment error, error message is {}'.format(response.text))
Expand All @@ -189,14 +200,8 @@ def set_experiment(experiment_config, mode, port):
def launch_experiment(args, experiment_config, mode, experiment_id=None):
'''follow steps to start rest server and start experiment'''
nni_config = Config(args.port)
#Check if there is an experiment running
origin_rest_pid = nni_config.get_config('restServerPid')
if origin_rest_pid and detect_process(origin_rest_pid):
print_error('There is an experiment running, please stop it first...')
print_normal('You can use \'nnictl stop\' command to stop an experiment!')
exit(1)
# start rest server
rest_process = start_rest_server(args.port, experiment_config['trainingServicePlatform'], mode, experiment_id)
rest_process, start_time = start_rest_server(args.port, experiment_config['trainingServicePlatform'], mode, experiment_id)
nni_config.set_config('restServerPid', rest_process.pid)
# Deal with annotation
if experiment_config.get('useAnnotation'):
Expand Down Expand Up @@ -233,7 +238,7 @@ def launch_experiment(args, experiment_config, mode, experiment_id=None):
print_normal('Setting remote config...')
config_result, err_msg = set_remote_config(experiment_config, args.port)
if config_result:
print_normal('Success!')
print_normal('Successfully set remote config!')
else:
print_error('Failed! Error is: {}'.format(err_msg))
try:
Expand Down Expand Up @@ -288,7 +293,13 @@ def launch_experiment(args, experiment_config, mode, experiment_id=None):
except Exception:
raise Exception(ERROR_INFO % 'Restful server stopped!')
exit(1)
print_normal(EXPERIMENT_SUCCESS_INFO % (experiment_id, args.port))
web_ui_url_list = get_web_ui_urls(args.port)

#save experiment information
experiment_config = Experiments()
experiment_config.add_experiment(experiment_id, args.port, start_time)

print_normal(EXPERIMENT_SUCCESS_INFO % (experiment_id, ' '.join(web_ui_url_list)))

def resume_experiment(args):
'''resume an experiment'''
Expand Down
Loading