-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Conversation
@RayMeng8 please add document for pbt tuner under |
@RayMeng8 please add doc and unittest for this tuner |
hyper_parameters[key] = hyper_parameters['save_checkpoint_dir'] | ||
elif key == 'save_checkpoint_dir': | ||
hyper_parameters[key] = os.path.join(bot_checkpoint_dir, str(epoch)) | ||
elif isinstance(hyper_parameters[key], float): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not perturb other types of hyper-parameters such as int, string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the paper, this way of exploration was introduced, but it is not applicable to other types of data. I am not sure how to perturb other types of data. Maybe I can add them in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point @leckie-chn . @RayMeng8 if you want to support other types in future, please make it clear what types of search space does PBT support. https://github.com/microsoft/nni/blob/master/docs/en_US/Tutorial/SearchSpaceSpec.md#search-space-types-supported-by-each-tuner
hyper_parameters[key] = os.path.join(bot_checkpoint_dir, str(epoch)) | ||
elif isinstance(hyper_parameters[key], float): | ||
perturb = np.random.choice(factors) | ||
hyper_parameters[key] *= perturb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should make sure that after perturb the value is still within search space
@@ -192,6 +193,9 @@ def test_networkmorphism(self): | |||
def test_ppo(self): | |||
pass | |||
|
|||
def test_pbt(self): | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not adding unittest for pbt, please follow other tuners' unittest to think about how to the unittest of pbt should be.
if isinstance(tuner, PBTTuner): | ||
parameters = tuner.generate_multiple_parameters(list(range(i * self.params_each_round, | ||
(i + 1) * self.params_each_round)), st_callback=self.send_trial_callback) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generate_multiple_parameters
of other tuners can be called with st_callback
anyway?
_trial_params = {} | ||
|
||
|
||
def _pack_parameter(parameter_id, params, customized=False, trial_job_id=None, parameter_index=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly import it from msg_dispatcher
instead of copying it?
src/sdk/pynni/nni/utils.py
Outdated
import functools | ||
from enum import Enum, unique | ||
import json_tricks | ||
|
||
import nni.parameter_expressions as parameter_expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from . import parameter_expressions
bot_trial_info.clean_id() | ||
|
||
|
||
class Trial_Info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style comment: TrialInfo
.
@@ -192,6 +224,9 @@ def test_networkmorphism(self): | |||
def test_ppo(self): | |||
pass | |||
|
|||
def test_pbt(self): | |||
self.search_space_test_all(lambda: PBTTuner(all_checkpoint_dir="~/nni/checkpoint/test/", population_size=100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to specify all_checkpoint_dir
?
docs/en_US/Tuner/PBTTuner.md
Outdated
|
||
Population Based Training(PBT) comes from [Population Based Training of Neural Networks](https://arxiv.org/abs/1711.09846v1). It's a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. | ||
|
||
PBTTuner initializes a population with several trials. Users can set a specific number of training epochs. After a certain number of epochs, the parameters and hyperparameters in the trial with bad metrics will be replaced with a better trial (exploit). Then the hyperparameters are purturbed (explore). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perturbed
docs/en_US/Tuner/PBTTuner.md
Outdated
|
||
PBTTuner initializes a population with several trials. Users can set a specific number of training epochs. After a certain number of epochs, the parameters and hyperparameters in the trial with bad metrics will be replaced with a better trial (exploit). Then the hyperparameters are purturbed (explore). | ||
|
||
In our implementation, training epochs in the trial code is regarded as a step of PBT, different with other tuners. When a step is over, PBTTuner will perform exploitation and exploration. The checkpoint is not assigned explicitly, but by continuously changing load_checkpoint_dir and save_checkpoint_dir, we can directly change load_checkpoint_dir to replace parameters and hyperparameters. And save_checkpoint_dir used to save checkpoint which can be loaded in next step. Therefore, the directory need to be accessible by all the trials. If the experiment is local mode, users could provide all_checkpoint_dir which decides load_checkpoint_dir and save_checkpoint_dir(checkpoint_dir is set to "all_checkpoint_dir/<population-id>/<step>"), otherwise the directory would be "~/nni/checkpoint/<exp-id>". If the experiment is not local mode, then users should provide a path in a shared storage which can be accessed by all the trials as all_checkpoint_dir. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our implementation, training epochs in the trial code is regarded as a step of PBT, different with other tuners. At the end of each step, PBT tuner will do exploitation and exploration -- replacing some trials with new trials. This is implemented by constantly modifying the values of load_checkpoint_dir
and save_checkpoint_dir
. We can directly change load_checkpoint_dir
to replace parameters and hyperparameters, and save_checkpoint_dir
to save a checkpoint that will be loaded in the next step. To this end, we need a shared folder which is accessible to all trials.
If the experiment is running in local mode, users could provide an argument all_checkpoint_dir
which will be the base folder of load_checkpoint_dir
and save_checkpoint_dir
(checkpoint_dir
is set to all_checkpoint_dir/<population-id>/<step>
). By default, all_checkpoint_dir
is set to be ~/nni/checkpoint/<exp-id>
. If the experiment is in non-local mode, then users should provide a path in a shared storage folder which is mounted at all_checkpoint_dir
on worker machines (but it's not necessarily available on the machine which runs tuner).
docs/en_US/Tuner/BuiltinTuner.md
Outdated
|
||
**Suggested scenario** | ||
|
||
Population Based Training (PBT) which bridges and extends parallel search methods and sequential optimization methods. It has a wallclock run time that is no greater than that of a single optimisation process, does not require sequential runs, and is also able to use fewer computational resources than naive search methods. Therefore, it's effective when you want to save computational resources and time. Besides, PBT returns hyperparameter scheduler instead of configuration. If you don't need to get a specific configuration, but just expect good results, you can choose this tuner. It should be noted that, in our implementation, the operation of checkpoint storage location is involved. A trial is considered as several traning epochs of training, so the loading and saving of checkpoint must be specified in the trial code, which is different with other tuners. Otherwise, if the experiment is not local mode, users should provide a path in a shared storage which can be accessed by all the trials. You could try it on very simple task, such as the [mnist-pbt-tuner-pytorch](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-pbt-tuner-pytorch) example. [See details](./PBTTuner.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimization
The implementation of the paper "Population Based Training of Neural Networks" on NNI