Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Dev hyperband #405

Merged
merged 60 commits into from
Nov 30, 2018
Merged
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
cc2632d
support hyperband
QuanluZhang Oct 30, 2018
65341ed
add example for hyperband
QuanluZhang Oct 30, 2018
69f9cb5
register Hyperband in tuner
QuanluZhang Oct 30, 2018
2e2bc80
after debug
QuanluZhang Oct 30, 2018
a556455
update doc
QuanluZhang Oct 30, 2018
55041a0
trivial change
QuanluZhang Oct 30, 2018
6793127
update spec validation of yaml config
QuanluZhang Oct 31, 2018
594f3a0
modify nnictl launcher
QuanluZhang Oct 31, 2018
cfa8875
modify nnimanager and util to support advisor
QuanluZhang Oct 31, 2018
4a8c8a9
Quick fix nnictl config logic (#289)
SparkSnail Oct 31, 2018
2442c4a
refactor sdk main
QuanluZhang Oct 31, 2018
553c386
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Oct 31, 2018
9e3e60c
update unit test accordingly
QuanluZhang Oct 31, 2018
c5926ef
update example's config file
QuanluZhang Oct 31, 2018
3250263
update restserver validation
QuanluZhang Oct 31, 2018
69276f0
PR merge to 0.3 (#297)
scarlett2018 Nov 1, 2018
530179c
remove files
QuanluZhang Nov 1, 2018
b882a1b
update
QuanluZhang Nov 1, 2018
3e7b32b
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Nov 1, 2018
1070ecf
remove enas readme (#292)
chicm-ms Nov 1, 2018
c97890f
support checkpoint directory
QuanluZhang Nov 1, 2018
0587ebd
Fix datastore performance issue (#301)
chicm-ms Nov 1, 2018
60d9a3f
fix pylint
QuanluZhang Nov 1, 2018
061a4a4
Fix nnictl in v0.3 (#299)
SparkSnail Nov 1, 2018
0362f8a
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Nov 1, 2018
ec093d6
Merge branch 'v0.3' of https://github.com/Microsoft/nni into dev-hype…
QuanluZhang Nov 1, 2018
124fae9
modify log
QuanluZhang Nov 2, 2018
f90fad0
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Nov 5, 2018
0b3a07d
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Nov 7, 2018
b2797ab
trivial changes
QuanluZhang Nov 7, 2018
833cb8b
update example
QuanluZhang Nov 7, 2018
dd5304e
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Nov 9, 2018
671db95
update makefile
QuanluZhang Nov 11, 2018
41c49d1
update launcher.py to fix the problem of finding main.js
QuanluZhang Nov 11, 2018
a1a0820
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Nov 11, 2018
3f75235
Merge branch 'fix-makefile' into dev-hyperband
QuanluZhang Nov 11, 2018
79d7500
debug
QuanluZhang Nov 12, 2018
e9978ce
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Nov 12, 2018
fa54447
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Nov 12, 2018
eb60c4c
add hyperparameter info into trial_end api
QuanluZhang Nov 12, 2018
80ba95e
fix bug and update example
QuanluZhang Nov 21, 2018
45b1561
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Nov 28, 2018
04b7a9a
fix error induced by merge
QuanluZhang Nov 28, 2018
c26908b
support initialize
QuanluZhang Nov 28, 2018
4e718a1
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Nov 29, 2018
1a4a1d0
add doc for hyperband
QuanluZhang Nov 29, 2018
1063025
fix bugs and add config_pai
Nov 30, 2018
6c7c8c0
fix bugs and add config_pai
Nov 30, 2018
21b5be4
fix bugs and add config_pai
Nov 30, 2018
e22ac0b
fix bugs and add config_pai
Nov 30, 2018
c3c8022
Merge pull request #1 from Crysple/test-hyperband
QuanluZhang Nov 30, 2018
8a88c90
update doc
QuanluZhang Nov 30, 2018
31ba2c7
Merge branch 'dev-hyperband' of github.com:QuanluZhang/nni into dev-h…
QuanluZhang Nov 30, 2018
c9f3776
add doc for advisor
QuanluZhang Nov 30, 2018
22f6387
fit
Nov 30, 2018
83fdb77
modification based on hui's comments
QuanluZhang Nov 30, 2018
6558951
Merge pull request #2 from Crysple/test-hyperband
QuanluZhang Nov 30, 2018
fc02837
Merge branch 'master' of https://github.com/Microsoft/nni into dev-hy…
QuanluZhang Nov 30, 2018
a8ae5ce
Merge branch 'dev-hyperband' of github.com:QuanluZhang/nni into dev-h…
QuanluZhang Nov 30, 2018
940fb34
update doc
QuanluZhang Nov 30, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/howto_2_CustomizedTuner.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ So, if user want to implement a customized Tuner, she/he only need to:

1) Inherit a tuner of a base Tuner class
2) Implement receive_trial_result and generate_parameter function
3) Write a script to run Tuner
3) Configure your customized tuner in experiment yaml config file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this file name("howto_2_CustomizedTuner") is so strange....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed by @scarlett2018


Here ia an example:

Expand Down Expand Up @@ -93,3 +93,6 @@ More detail example you could see:
> * [evolution-tuner](../src/sdk/pynni/nni/evolution_tuner)
> * [hyperopt-tuner](../src/sdk/pynni/nni/hyperopt_tuner)
> * [evolution-based-customized-tuner](../examples/tuners/ga_customer_tuner)

## Write a more advanced automl algorithm
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, the methods in assessor (e.g., `assess_trial`, `trial_end`), in order to have a more powerful automl algorithm. Therefore, we have another concept called `advisor` which directly inherits from `MsgDispatcherBase` in [`src/sdk/pynni/nni/msg_dispatcher_base.py`](../src/sdk/pynni/nni/msg_dispatcher_base.py). Please refer to [here](howto_3_CustomizedAdvisor) for how to write a customized advisor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain more detail here? when we need to use 'advisor'??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thx

39 changes: 39 additions & 0 deletions docs/howto_3_CustomizedAdvisor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# **How To** - Customize Your Own Advisor

*Advisor targets the scenario that the automl algorithm wants the methods of both tuner and assessor. Advisor is similar to tuner on that it receives trial configuration request, final results, and generate trial configurations. Also, it is similar to assessor on that it receives intermediate results, trial's end state, and could send trial kill command. Note that, if you use Advisor, tuner and assessor are not allowed to be used at the same time.*

So, if user want to implement a customized Advisor, she/he only need to:

1) Define an Advisor inheriting from the MsgDispatcherBase class
2) Implement the handle_xxx methods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle_xxx methods from MsgDispathcherBase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I can list all the methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thx

3) Configure your customized Advisor in experiment yaml config file

Here ia an example:

**1) Define an Advisor inheriting from the MsgDispatcherBase class**
```python
from nni.msg_dispatcher_base import MsgDispatcherBase

class CustomizedAdvisor(MsgDispatcherBase):
def __init__(self, ...):
...
```

**2) Implement the handle_xxx methods**

Please refer to the implementation of Hyperband ([src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py](../src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py)) for how to implement the methods.

**3) Configure your customized Advisor in experiment yaml config file**

Similar to tuner and assessor. NNI needs to locate your customized Advisor class and instantiate the class, so you need to specify the location of the customized Advisor class and pass literal values as parameters to the \_\_init__ constructor.

```yaml
advisor:
codeDir: /home/abc/myadvisor
classFileName: my_customized_advisor.py
className: CustomizedAdvisor
# Any parameter need to pass to your advisor class __init__ constructor
# can be specified in this optional classArgs field, for example
classArgs:
arg1: value1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could I dynamic change the parameters in here and I don't have to change the code in nni? for example, advisor1 have parameter arg1 and advisor1 have parameter arg2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

advisor2? and what do you mean?

```
24 changes: 24 additions & 0 deletions examples/trials/mnist-hyperband/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
authorName: default
experimentName: example_mnist
trialConcurrency: 2
maxExecDuration: 100h
maxTrialNum: 10000
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
advisor:
#choice: Hyperband
builtinAdvisorName: Hyperband
classArgs:
#R: the maximum STEPS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maximum step for what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not straightforward, please refer to our doc of hyperband

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thx

R: 100
#eta: proportion of discarded trials
eta: 3
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
39 changes: 39 additions & 0 deletions examples/trials/mnist-hyperband/config_pai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
authorName: default
experimentName: example_mnist_hyperband
maxExecDuration: 1h
maxTrialNum: 10000
trialConcurrency: 10
#choice: local, remote, pai
trainingServicePlatform: pai
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
advisor:
#choice: Hyperband
builtinAdvisorName: Hyperband
classArgs:
#R: the maximum STEPS
R: 100
#eta: proportion of discarded trials
eta: 3
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
cpuNum: 1
memoryMB: 8196
#The docker image to run nni job on pai
image: openpai/pai.example.tensorflow
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
dataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
outputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig:
#The username to login pai
userName: username
#The password to login pai
passWord: password
#The host of restful server of pai
host: 10.10.10.10
231 changes: 231 additions & 0 deletions examples/trials/mnist-hyperband/mnist.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
"""A deep MNIST classifier using convolutional layers."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see much difference with old-mnist example. Could we reuse the old one and add a config for hyperband?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. the STEPS in hyperparameters is very important.


import logging
import math
import tempfile
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data

import nni

FLAGS = None

logger = logging.getLogger('mnist_AutoML')


class MnistNetwork(object):
'''
MnistNetwork is for initlizing and building basic network for mnist.
'''
def __init__(self,
channel_1_num,
channel_2_num,
conv_size,
hidden_size,
pool_size,
learning_rate,
x_dim=784,
y_dim=10):
self.channel_1_num = channel_1_num
self.channel_2_num = channel_2_num
self.conv_size = conv_size
self.hidden_size = hidden_size
self.pool_size = pool_size
self.learning_rate = learning_rate
self.x_dim = x_dim
self.y_dim = y_dim

self.images = tf.placeholder(tf.float32, [None, self.x_dim], name='input_x')
self.labels = tf.placeholder(tf.float32, [None, self.y_dim], name='input_y')
self.keep_prob = tf.placeholder(tf.float32, name='keep_prob')

self.train_step = None
self.accuracy = None

def build_network(self):
'''
Building network for mnist
'''

# Reshape to use within a convolutional neural net.
# Last dimension is for "features" - there is only one here, since images are
# grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc.
with tf.name_scope('reshape'):
try:
input_dim = int(math.sqrt(self.x_dim))
except:
print(
'input dim cannot be sqrt and reshape. input dim: ' + str(self.x_dim))
logger.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why duplicate the error message both into stdout and log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied from example/trials/mnist/mnist.py, maybe better to keep it and fix this kind of problems in all the examples latter :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

'input dim cannot be sqrt and reshape. input dim: %s', str(self.x_dim))
raise
x_image = tf.reshape(self.images, [-1, input_dim, input_dim, 1])

# First convolutional layer - maps one grayscale image to 32 feature maps.
with tf.name_scope('conv1'):
w_conv1 = weight_variable(
[self.conv_size, self.conv_size, 1, self.channel_1_num])
b_conv1 = bias_variable([self.channel_1_num])
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)

# Pooling layer - downsamples by 2X.
with tf.name_scope('pool1'):
h_pool1 = max_pool(h_conv1, self.pool_size)

# Second convolutional layer -- maps 32 feature maps to 64.
with tf.name_scope('conv2'):
w_conv2 = weight_variable([self.conv_size, self.conv_size,
self.channel_1_num, self.channel_2_num])
b_conv2 = bias_variable([self.channel_2_num])
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)

# Second pooling layer.
with tf.name_scope('pool2'):
h_pool2 = max_pool(h_conv2, self.pool_size)

# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image
# is down to 7x7x64 feature maps -- maps this to 1024 features.
last_dim = int(input_dim / (self.pool_size * self.pool_size))
with tf.name_scope('fc1'):
w_fc1 = weight_variable(
[last_dim * last_dim * self.channel_2_num, self.hidden_size])
b_fc1 = bias_variable([self.hidden_size])

h_pool2_flat = tf.reshape(
h_pool2, [-1, last_dim * last_dim * self.channel_2_num])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)

# Dropout - controls the complexity of the model, prevents co-adaptation of features.
with tf.name_scope('dropout'):
h_fc1_drop = tf.nn.dropout(h_fc1, self.keep_prob)

# Map the 1024 features to 10 classes, one for each digit
with tf.name_scope('fc2'):
w_fc2 = weight_variable([self.hidden_size, self.y_dim])
b_fc2 = bias_variable([self.y_dim])
y_conv = tf.matmul(h_fc1_drop, w_fc2) + b_fc2

with tf.name_scope('loss'):
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=self.labels, logits=y_conv))
with tf.name_scope('adam_optimizer'):
self.train_step = tf.train.AdamOptimizer(
self.learning_rate).minimize(cross_entropy)

with tf.name_scope('accuracy'):
correct_prediction = tf.equal(
tf.argmax(y_conv, 1), tf.argmax(self.labels, 1))
self.accuracy = tf.reduce_mean(
tf.cast(correct_prediction, tf.float32))


def conv2d(x_input, w_matrix):
"""conv2d returns a 2d convolution layer with full stride."""
return tf.nn.conv2d(x_input, w_matrix, strides=[1, 1, 1, 1], padding='SAME')


def max_pool(x_input, pool_size):
"""max_pool downsamples a feature map by 2X."""
return tf.nn.max_pool(x_input, ksize=[1, pool_size, pool_size, 1],
strides=[1, pool_size, pool_size, 1], padding='SAME')


def weight_variable(shape):
"""weight_variable generates a weight variable of a given shape."""
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)


def bias_variable(shape):
"""bias_variable generates a bias variable of a given shape."""
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)


def main(params):
'''
Main function, build mnist network, run and send result to NNI.
'''
# Import data
mnist = input_data.read_data_sets(params['data_dir'], one_hot=True)
print('Mnist download data down.')
logger.debug('Mnist download data down.')

# Create the model
# Build the graph for the deep net
mnist_network = MnistNetwork(channel_1_num=params['channel_1_num'],
channel_2_num=params['channel_2_num'],
conv_size=params['conv_size'],
hidden_size=params['hidden_size'],
pool_size=params['pool_size'],
learning_rate=params['learning_rate'])
mnist_network.build_network()
logger.debug('Mnist build network done.')

# Write log
graph_location = tempfile.mkdtemp()
logger.debug('Saving graph to: %s', graph_location)
train_writer = tf.summary.FileWriter(graph_location)
train_writer.add_graph(tf.get_default_graph())

test_acc = 0.0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(params['batch_num']):
batch = mnist.train.next_batch(params['batch_size'])
mnist_network.train_step.run(feed_dict={mnist_network.images: batch[0],
mnist_network.labels: batch[1],
mnist_network.keep_prob: 1 - params['dropout_rate']}
)

if i % 10 == 0:
test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})

nni.report_intermediate_result(test_acc)
logger.debug('test accuracy %g', test_acc)
logger.debug('Pipe send intermediate result done.')

test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})

nni.report_final_result(test_acc)
logger.debug('Final result is %g', test_acc)
logger.debug('Send final result done.')


def generate_default_params():
'''
Generate default parameters for mnist network.
'''
params = {
'data_dir': '/tmp/tensorflow/mnist/input_data',
'dropout_rate': 0.5,
'channel_1_num': 32,
'channel_2_num': 64,
'conv_size': 5,
'pool_size': 2,
'hidden_size': 1024,
'learning_rate': 1e-4,
'batch_size': 32}
return params


if __name__ == '__main__':
try:
# get parameters form tuner
RCV_PARAMS = nni.get_next_parameter()
logger.debug(RCV_PARAMS)
# run
params = generate_default_params()
params.update(RCV_PARAMS)
params['batch_num'] = RCV_PARAMS['STEPS'] * 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why times 10 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STEP controls how many resource allocated to a configuration, the minimum number of STEPS is 1 which is too small for the training.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am little confuse. the default parameter and search space has no ‘STEP’,so what is 'STEP' come from?

main(params)
except Exception as exception:
logger.exception(exception)
raise
7 changes: 7 additions & 0 deletions examples/trials/mnist-hyperband/search_space.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"dropout_rate":{"_type":"uniform","_value":[0.5,0.9]},
"conv_size":{"_type":"choice","_value":[2,3,5,7]},
"hidden_size":{"_type":"choice","_value":[124, 512, 1024]},
"batch_size": {"_type":"choice","_value":[8, 16, 32, 64]},
"learning_rate":{"_type":"choice","_value":[0.0001, 0.001, 0.01, 0.1]}
}
2 changes: 1 addition & 1 deletion pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ max-attributes=15
const-naming-style=any

disable=duplicate-code,
super-init-not-called
super-init-not-called
11 changes: 10 additions & 1 deletion src/nni_manager/common/manager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ interface ExperimentParams {
trainingServicePlatform: string;
multiPhase?: boolean;
multiThread?: boolean;
tuner: {
tuner?: {
className: string;
builtinTunerName?: string;
codeDir?: string;
Expand All @@ -53,6 +53,15 @@ interface ExperimentParams {
checkpointDir: string;
gpuNum?: number;
};
advisor?: {
className: string;
builtinAdvisorName?: string;
codeDir?: string;
classArgs?: any;
classFileName?: string;
checkpointDir: string;
gpuNum?: number;
};
clusterMetaData?: {
key: string;
value: string;
Expand Down
Loading