-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Dev hyperband #405
Dev hyperband #405
Changes from 54 commits
cc2632d
65341ed
69f9cb5
2e2bc80
a556455
55041a0
6793127
594f3a0
cfa8875
4a8c8a9
2442c4a
553c386
9e3e60c
c5926ef
3250263
69276f0
530179c
b882a1b
3e7b32b
1070ecf
c97890f
0587ebd
60d9a3f
061a4a4
0362f8a
ec093d6
124fae9
f90fad0
0b3a07d
b2797ab
833cb8b
dd5304e
671db95
41c49d1
a1a0820
3f75235
79d7500
e9978ce
fa54447
eb60c4c
80ba95e
45b1561
04b7a9a
c26908b
4e718a1
1a4a1d0
1063025
6c7c8c0
21b5be4
e22ac0b
c3c8022
8a88c90
31ba2c7
c9f3776
22f6387
83fdb77
6558951
fc02837
a8ae5ce
940fb34
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,7 @@ So, if user want to implement a customized Tuner, she/he only need to: | |
|
||
1) Inherit a tuner of a base Tuner class | ||
2) Implement receive_trial_result and generate_parameter function | ||
3) Write a script to run Tuner | ||
3) Configure your customized tuner in experiment yaml config file | ||
|
||
Here ia an example: | ||
|
||
|
@@ -93,3 +93,6 @@ More detail example you could see: | |
> * [evolution-tuner](../src/sdk/pynni/nni/evolution_tuner) | ||
> * [hyperopt-tuner](../src/sdk/pynni/nni/hyperopt_tuner) | ||
> * [evolution-based-customized-tuner](../examples/tuners/ga_customer_tuner) | ||
|
||
## Write a more advanced automl algorithm | ||
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, the methods in assessor (e.g., `assess_trial`, `trial_end`), in order to have a more powerful automl algorithm. Therefore, we have another concept called `advisor` which directly inherits from `MsgDispatcherBase` in [`src/sdk/pynni/nni/msg_dispatcher_base.py`](../src/sdk/pynni/nni/msg_dispatcher_base.py). Please refer to [here](howto_3_CustomizedAdvisor) for how to write a customized advisor. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you explain more detail here? when we need to use 'advisor'?? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed, thx |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# **How To** - Customize Your Own Advisor | ||
|
||
*Advisor targets the scenario that the automl algorithm wants the methods of both tuner and assessor. Advisor is similar to tuner on that it receives trial configuration request, final results, and generate trial configurations. Also, it is similar to assessor on that it receives intermediate results, trial's end state, and could send trial kill command. Note that, if you use Advisor, tuner and assessor are not allowed to be used at the same time.* | ||
|
||
So, if user want to implement a customized Advisor, she/he only need to: | ||
|
||
1) Define an Advisor inheriting from the MsgDispatcherBase class | ||
2) Implement the handle_xxx methods | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. handle_xxx methods from MsgDispathcherBase? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I can list all the methods. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed, thx |
||
3) Configure your customized Advisor in experiment yaml config file | ||
|
||
Here ia an example: | ||
|
||
**1) Define an Advisor inheriting from the MsgDispatcherBase class** | ||
```python | ||
from nni.msg_dispatcher_base import MsgDispatcherBase | ||
|
||
class CustomizedAdvisor(MsgDispatcherBase): | ||
def __init__(self, ...): | ||
... | ||
``` | ||
|
||
**2) Implement the handle_xxx methods** | ||
|
||
Please refer to the implementation of Hyperband ([src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py](../src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py)) for how to implement the methods. | ||
|
||
**3) Configure your customized Advisor in experiment yaml config file** | ||
|
||
Similar to tuner and assessor. NNI needs to locate your customized Advisor class and instantiate the class, so you need to specify the location of the customized Advisor class and pass literal values as parameters to the \_\_init__ constructor. | ||
|
||
```yaml | ||
advisor: | ||
codeDir: /home/abc/myadvisor | ||
classFileName: my_customized_advisor.py | ||
className: CustomizedAdvisor | ||
# Any parameter need to pass to your advisor class __init__ constructor | ||
# can be specified in this optional classArgs field, for example | ||
classArgs: | ||
arg1: value1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could I dynamic change the parameters in here and I don't have to change the code in nni? for example, advisor1 have parameter arg1 and advisor1 have parameter arg2? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. advisor2? and what do you mean? |
||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
authorName: default | ||
experimentName: example_mnist | ||
trialConcurrency: 2 | ||
maxExecDuration: 100h | ||
maxTrialNum: 10000 | ||
#choice: local, remote, pai | ||
trainingServicePlatform: local | ||
searchSpacePath: search_space.json | ||
#choice: true, false | ||
useAnnotation: false | ||
advisor: | ||
#choice: Hyperband | ||
builtinAdvisorName: Hyperband | ||
classArgs: | ||
#R: the maximum STEPS | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maximum step for what? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not straightforward, please refer to our doc of hyperband There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed, thx |
||
R: 100 | ||
#eta: proportion of discarded trials | ||
eta: 3 | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
trial: | ||
command: python3 mnist.py | ||
codeDir: . | ||
gpuNum: 0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
authorName: default | ||
experimentName: example_mnist_hyperband | ||
maxExecDuration: 1h | ||
maxTrialNum: 10000 | ||
trialConcurrency: 10 | ||
#choice: local, remote, pai | ||
trainingServicePlatform: pai | ||
searchSpacePath: search_space.json | ||
#choice: true, false | ||
useAnnotation: false | ||
advisor: | ||
#choice: Hyperband | ||
builtinAdvisorName: Hyperband | ||
classArgs: | ||
#R: the maximum STEPS | ||
R: 100 | ||
#eta: proportion of discarded trials | ||
eta: 3 | ||
#choice: maximize, minimize | ||
optimize_mode: maximize | ||
trial: | ||
command: python3 mnist.py | ||
codeDir: . | ||
gpuNum: 0 | ||
cpuNum: 1 | ||
memoryMB: 8196 | ||
#The docker image to run nni job on pai | ||
image: openpai/pai.example.tensorflow | ||
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' | ||
dataDir: hdfs://10.10.10.10:9000/username/nni | ||
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' | ||
outputDir: hdfs://10.10.10.10:9000/username/nni | ||
paiConfig: | ||
#The username to login pai | ||
userName: username | ||
#The password to login pai | ||
passWord: password | ||
#The host of restful server of pai | ||
host: 10.10.10.10 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,231 @@ | ||
"""A deep MNIST classifier using convolutional layers.""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't see much difference with old-mnist example. Could we reuse the old one and add a config for hyperband? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no. the STEPS in hyperparameters is very important. |
||
|
||
import logging | ||
import math | ||
import tempfile | ||
import tensorflow as tf | ||
|
||
from tensorflow.examples.tutorials.mnist import input_data | ||
|
||
import nni | ||
|
||
FLAGS = None | ||
|
||
logger = logging.getLogger('mnist_AutoML') | ||
|
||
|
||
class MnistNetwork(object): | ||
''' | ||
MnistNetwork is for initlizing and building basic network for mnist. | ||
''' | ||
def __init__(self, | ||
channel_1_num, | ||
channel_2_num, | ||
conv_size, | ||
hidden_size, | ||
pool_size, | ||
learning_rate, | ||
x_dim=784, | ||
y_dim=10): | ||
self.channel_1_num = channel_1_num | ||
self.channel_2_num = channel_2_num | ||
self.conv_size = conv_size | ||
self.hidden_size = hidden_size | ||
self.pool_size = pool_size | ||
self.learning_rate = learning_rate | ||
self.x_dim = x_dim | ||
self.y_dim = y_dim | ||
|
||
self.images = tf.placeholder(tf.float32, [None, self.x_dim], name='input_x') | ||
self.labels = tf.placeholder(tf.float32, [None, self.y_dim], name='input_y') | ||
self.keep_prob = tf.placeholder(tf.float32, name='keep_prob') | ||
|
||
self.train_step = None | ||
self.accuracy = None | ||
|
||
def build_network(self): | ||
''' | ||
Building network for mnist | ||
''' | ||
|
||
# Reshape to use within a convolutional neural net. | ||
# Last dimension is for "features" - there is only one here, since images are | ||
# grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc. | ||
with tf.name_scope('reshape'): | ||
try: | ||
input_dim = int(math.sqrt(self.x_dim)) | ||
except: | ||
print( | ||
'input dim cannot be sqrt and reshape. input dim: ' + str(self.x_dim)) | ||
logger.debug( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why duplicate the error message both into stdout and log? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. copied from example/trials/mnist/mnist.py, maybe better to keep it and fix this kind of problems in all the examples latter :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay |
||
'input dim cannot be sqrt and reshape. input dim: %s', str(self.x_dim)) | ||
raise | ||
x_image = tf.reshape(self.images, [-1, input_dim, input_dim, 1]) | ||
|
||
# First convolutional layer - maps one grayscale image to 32 feature maps. | ||
with tf.name_scope('conv1'): | ||
w_conv1 = weight_variable( | ||
[self.conv_size, self.conv_size, 1, self.channel_1_num]) | ||
b_conv1 = bias_variable([self.channel_1_num]) | ||
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1) | ||
|
||
# Pooling layer - downsamples by 2X. | ||
with tf.name_scope('pool1'): | ||
h_pool1 = max_pool(h_conv1, self.pool_size) | ||
|
||
# Second convolutional layer -- maps 32 feature maps to 64. | ||
with tf.name_scope('conv2'): | ||
w_conv2 = weight_variable([self.conv_size, self.conv_size, | ||
self.channel_1_num, self.channel_2_num]) | ||
b_conv2 = bias_variable([self.channel_2_num]) | ||
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2) | ||
|
||
# Second pooling layer. | ||
with tf.name_scope('pool2'): | ||
h_pool2 = max_pool(h_conv2, self.pool_size) | ||
|
||
# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image | ||
# is down to 7x7x64 feature maps -- maps this to 1024 features. | ||
last_dim = int(input_dim / (self.pool_size * self.pool_size)) | ||
with tf.name_scope('fc1'): | ||
w_fc1 = weight_variable( | ||
[last_dim * last_dim * self.channel_2_num, self.hidden_size]) | ||
b_fc1 = bias_variable([self.hidden_size]) | ||
|
||
h_pool2_flat = tf.reshape( | ||
h_pool2, [-1, last_dim * last_dim * self.channel_2_num]) | ||
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1) | ||
|
||
# Dropout - controls the complexity of the model, prevents co-adaptation of features. | ||
with tf.name_scope('dropout'): | ||
h_fc1_drop = tf.nn.dropout(h_fc1, self.keep_prob) | ||
|
||
# Map the 1024 features to 10 classes, one for each digit | ||
with tf.name_scope('fc2'): | ||
w_fc2 = weight_variable([self.hidden_size, self.y_dim]) | ||
b_fc2 = bias_variable([self.y_dim]) | ||
y_conv = tf.matmul(h_fc1_drop, w_fc2) + b_fc2 | ||
|
||
with tf.name_scope('loss'): | ||
cross_entropy = tf.reduce_mean( | ||
tf.nn.softmax_cross_entropy_with_logits(labels=self.labels, logits=y_conv)) | ||
with tf.name_scope('adam_optimizer'): | ||
self.train_step = tf.train.AdamOptimizer( | ||
self.learning_rate).minimize(cross_entropy) | ||
|
||
with tf.name_scope('accuracy'): | ||
correct_prediction = tf.equal( | ||
tf.argmax(y_conv, 1), tf.argmax(self.labels, 1)) | ||
self.accuracy = tf.reduce_mean( | ||
tf.cast(correct_prediction, tf.float32)) | ||
|
||
|
||
def conv2d(x_input, w_matrix): | ||
"""conv2d returns a 2d convolution layer with full stride.""" | ||
return tf.nn.conv2d(x_input, w_matrix, strides=[1, 1, 1, 1], padding='SAME') | ||
|
||
|
||
def max_pool(x_input, pool_size): | ||
"""max_pool downsamples a feature map by 2X.""" | ||
return tf.nn.max_pool(x_input, ksize=[1, pool_size, pool_size, 1], | ||
strides=[1, pool_size, pool_size, 1], padding='SAME') | ||
|
||
|
||
def weight_variable(shape): | ||
"""weight_variable generates a weight variable of a given shape.""" | ||
initial = tf.truncated_normal(shape, stddev=0.1) | ||
return tf.Variable(initial) | ||
|
||
|
||
def bias_variable(shape): | ||
"""bias_variable generates a bias variable of a given shape.""" | ||
initial = tf.constant(0.1, shape=shape) | ||
return tf.Variable(initial) | ||
|
||
|
||
def main(params): | ||
''' | ||
Main function, build mnist network, run and send result to NNI. | ||
''' | ||
# Import data | ||
mnist = input_data.read_data_sets(params['data_dir'], one_hot=True) | ||
print('Mnist download data down.') | ||
logger.debug('Mnist download data down.') | ||
|
||
# Create the model | ||
# Build the graph for the deep net | ||
mnist_network = MnistNetwork(channel_1_num=params['channel_1_num'], | ||
channel_2_num=params['channel_2_num'], | ||
conv_size=params['conv_size'], | ||
hidden_size=params['hidden_size'], | ||
pool_size=params['pool_size'], | ||
learning_rate=params['learning_rate']) | ||
mnist_network.build_network() | ||
logger.debug('Mnist build network done.') | ||
|
||
# Write log | ||
graph_location = tempfile.mkdtemp() | ||
logger.debug('Saving graph to: %s', graph_location) | ||
train_writer = tf.summary.FileWriter(graph_location) | ||
train_writer.add_graph(tf.get_default_graph()) | ||
|
||
test_acc = 0.0 | ||
with tf.Session() as sess: | ||
sess.run(tf.global_variables_initializer()) | ||
for i in range(params['batch_num']): | ||
batch = mnist.train.next_batch(params['batch_size']) | ||
mnist_network.train_step.run(feed_dict={mnist_network.images: batch[0], | ||
mnist_network.labels: batch[1], | ||
mnist_network.keep_prob: 1 - params['dropout_rate']} | ||
) | ||
|
||
if i % 10 == 0: | ||
test_acc = mnist_network.accuracy.eval( | ||
feed_dict={mnist_network.images: mnist.test.images, | ||
mnist_network.labels: mnist.test.labels, | ||
mnist_network.keep_prob: 1.0}) | ||
|
||
nni.report_intermediate_result(test_acc) | ||
logger.debug('test accuracy %g', test_acc) | ||
logger.debug('Pipe send intermediate result done.') | ||
|
||
test_acc = mnist_network.accuracy.eval( | ||
feed_dict={mnist_network.images: mnist.test.images, | ||
mnist_network.labels: mnist.test.labels, | ||
mnist_network.keep_prob: 1.0}) | ||
|
||
nni.report_final_result(test_acc) | ||
logger.debug('Final result is %g', test_acc) | ||
logger.debug('Send final result done.') | ||
|
||
|
||
def generate_default_params(): | ||
''' | ||
Generate default parameters for mnist network. | ||
''' | ||
params = { | ||
'data_dir': '/tmp/tensorflow/mnist/input_data', | ||
'dropout_rate': 0.5, | ||
'channel_1_num': 32, | ||
'channel_2_num': 64, | ||
'conv_size': 5, | ||
'pool_size': 2, | ||
'hidden_size': 1024, | ||
'learning_rate': 1e-4, | ||
'batch_size': 32} | ||
return params | ||
|
||
|
||
if __name__ == '__main__': | ||
try: | ||
# get parameters form tuner | ||
RCV_PARAMS = nni.get_next_parameter() | ||
logger.debug(RCV_PARAMS) | ||
# run | ||
params = generate_default_params() | ||
params.update(RCV_PARAMS) | ||
params['batch_num'] = RCV_PARAMS['STEPS'] * 10 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why times 10 here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. STEP controls how many resource allocated to a configuration, the minimum number of STEPS is 1 which is too small for the training. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am little confuse. the default parameter and search space has no ‘STEP’,so what is 'STEP' come from? |
||
main(params) | ||
except Exception as exception: | ||
logger.exception(exception) | ||
raise |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"dropout_rate":{"_type":"uniform","_value":[0.5,0.9]}, | ||
"conv_size":{"_type":"choice","_value":[2,3,5,7]}, | ||
"hidden_size":{"_type":"choice","_value":[124, 512, 1024]}, | ||
"batch_size": {"_type":"choice","_value":[8, 16, 32, 64]}, | ||
"learning_rate":{"_type":"choice","_value":[0.0001, 0.001, 0.01, 0.1]} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,4 +15,4 @@ max-attributes=15 | |
const-naming-style=any | ||
|
||
disable=duplicate-code, | ||
super-init-not-called | ||
super-init-not-called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this file name("howto_2_CustomizedTuner") is so strange....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed by @scarlett2018