SparkSnail · SparkSnail · Aug 26, 2020 · Aug 24, 2020 · Aug 24, 2020 · Aug 24, 2020
diff --git a/docs/en_US/CommunitySharings/ModelCompressionComparison.md b/docs/en_US/CommunitySharings/ModelCompressionComparison.md
@@ -9,7 +9,7 @@ In addition, we provide friendly instructions on the re-implementation of these
 
 The experiments are performed with the following pruners/datasets/models:
 
-* Models: [VGG16, ResNet18, ResNet50](https://github.com/microsoft/nni/tree/master/examples/model_compress/models/cifar10)
+* Models: [VGG16, ResNet18, ResNet50](https://github.com/microsoft/nni/tree/v1.8/examples/model_compress/models/cifar10)
 
 * Datasets: CIFAR-10
 
@@ -23,7 +23,7 @@ The experiments are performed with the following pruners/datasets/models:
 
     For the pruners with scheduling, `L1Filter Pruner` is used as the base algorithm. That is to say, after the sparsities distribution is decided by the scheduling algorithm, `L1Filter Pruner` is used to performn real pruning.
 
-    - All the pruners listed above are implemented in [nni](https://github.com/microsoft/nni/tree/master/docs/en_US/Compressor/Overview.md).
+    - All the pruners listed above are implemented in [nni](https://github.com/microsoft/nni/tree/v1.8/docs/en_US/Compressor/Overview.md).
 
 ## Experiment Result
 
@@ -60,13 +60,14 @@ From the experiment result, we get the following conclusions:
 
 * The experiment results are all collected with the default configuration of the pruners in nni, which means that when we call a pruner class in nni, we don't change any default class arguments.
 
-* Both FLOPs and the number of parameters are counted with [Model FLOPs/Parameters Counter](https://github.com/microsoft/nni/blob/master/docs/en_US/Compressor/CompressionUtils.md#model-flopsparameters-counter) after [model speed up](https://github.com/microsoft/nni/blob/master/docs/en_US/Compressor/ModelSpeedup.md). This avoids potential issues of counting them of masked models.
+* Both FLOPs and the number of parameters are counted with [Model FLOPs/Parameters Counter](https://github.com/microsoft/nni/tree/v1.8/docs/en_US/Compressor/CompressionUtils.md#model-flopsparameters-counter) after [model speed up](https://github.com/microsoft/nni/tree/v1.8/docs/en_US/Compressor/ModelSpeedup.md).
+This avoids potential issues of counting them of masked models.
 
-* The experiment code can be found [here]( https://github.com/microsoft/nni/tree/master/examples/model_compress/auto_pruners_torch.py).
+* The experiment code can be found [here]( https://github.com/microsoft/nni/tree/v1.8/examples/model_compress/auto_pruners_torch.py).
 
 ### Experiment Result Rendering
 
-* If you follow the practice in the [example]( https://github.com/microsoft/nni/tree/master/examples/model_compress/auto_pruners_torch.py), for every single pruning experiment, the experiment result will be saved in JSON format as follows:
+* If you follow the practice in the [example]( https://github.com/microsoft/nni/tree/v1.8/examples/model_compress/auto_pruners_torch.py), for every single pruning experiment, the experiment result will be saved in JSON format as follows:
     ``` json
     {
         "performance": {"original": 0.9298, "pruned": 0.1, "speedup": 0.1, "finetuned": 0.7746}, 
@@ -75,8 +76,8 @@ From the experiment result, we get the following conclusions:
     }
     ```
 
-* The experiment results are saved [here](https://github.com/microsoft/nni/tree/master/examples/model_compress/experiment_data). 
-You can refer to [analyze](https://github.com/microsoft/nni/tree/master/examples/model_compress/experiment_data/analyze.py) to plot new performance comparison figures.
+* The experiment results are saved [here](https://github.com/microsoft/nni/tree/v1.8/examples/model_compress/comparison_of_pruners). 
+You can refer to [analyze](https://github.com/microsoft/nni/tree/v1.8/examples/model_compress/comparison_of_pruners/analyze.py) to plot new performance comparison figures.
 
 ## Contribution
 

diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md
@@ -42,7 +42,7 @@ Pruning algorithms compress the original network by removing redundant weights o
 | [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
 | [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |
 
-You can refer to this [benchmark](https://github.com/microsoft/nni/tree/master/docs/en_US/Benchmark.md) for the performance of these pruners on some benchmark problems.
+You can refer to this [benchmark](https://github.com/microsoft/nni/tree/v1.8/docs/en_US/CommunitySharings/ModelCompressionComparison.md) for the performance of these pruners on some benchmark problems.
 
 ### Quantization Algorithms
 

diff --git a/docs/en_US/NAS/Benchmarks.md b/docs/en_US/NAS/Benchmarks.md
@@ -1,4 +1,4 @@
-# NAS Benchmarks (experimental)
+# NAS Benchmarks
 
 ```eval_rst
 ..  toctree::
@@ -8,12 +8,13 @@
 ```
 
 ## Introduction
+
 To imporve the reproducibility of NAS algorithms as well as reducing computing resource requirements, researchers proposed a series of NAS benchmarks such as [NAS-Bench-101](https://arxiv.org/abs/1902.09635), [NAS-Bench-201](https://arxiv.org/abs/2001.00326), [NDS](https://arxiv.org/abs/1905.13214), etc. NNI provides a query interface for users to acquire these benchmarks. Within just a few lines of code, researcher are able to evaluate their NAS algorithms easily and fairly by utilizing these benchmarks.
 
 ## Prerequisites
 
-* Please prepare a folder to household all the benchmark databases. By default, it can be found at `${HOME}/.nni/nasbenchmark`. You can place it anywhere you like, and specify it in `NASBENCHMARK_DIR` before importing NNI.
-* Please install `peewee` via `pip install peewee`, which NNI uses to connect to database.
+* Please prepare a folder to household all the benchmark databases. By default, it can be found at `${HOME}/.nni/nasbenchmark`. You can place it anywhere you like, and specify it in `NASBENCHMARK_DIR` via `export NASBENCHMARK_DIR=/path/to/your/nasbenchmark` before importing NNI.
+* Please install `peewee` via `pip3 install peewee`, which NNI uses to connect to database.
 
 ## Data Preparation
 

diff --git a/examples/model_compress/model_prune_tf.py b/examples/model_compress/model_prune_tf.py
@@ -28,21 +28,31 @@ def get_dataset(dataset_name='mnist'):
 
 def create_model(model_name='naive'):
     assert model_name == 'naive'
-    return tf.keras.Sequential([
-        tf.keras.layers.Conv2D(filters=20, kernel_size=5),
-        tf.keras.layers.BatchNormalization(),
-        tf.keras.layers.ReLU(),
-        tf.keras.layers.MaxPool2D(pool_size=2),
-        tf.keras.layers.Conv2D(filters=20, kernel_size=5),
-        tf.keras.layers.BatchNormalization(),
-        tf.keras.layers.ReLU(),
-        tf.keras.layers.MaxPool2D(pool_size=2),
-        tf.keras.layers.Flatten(),
-        tf.keras.layers.Dense(units=500),
-        tf.keras.layers.ReLU(),
-        tf.keras.layers.Dense(units=10),
-        tf.keras.layers.Softmax()
-    ])
+    return NaiveModel()
+
+class NaiveModel(tf.keras.Model):
+    def __init__(self):
+        super().__init__()
+        self.seq_layers = [
+            tf.keras.layers.Conv2D(filters=20, kernel_size=5),
+            tf.keras.layers.BatchNormalization(),
+            tf.keras.layers.ReLU(),
+            tf.keras.layers.MaxPool2D(pool_size=2),
+            tf.keras.layers.Conv2D(filters=20, kernel_size=5),
+            tf.keras.layers.BatchNormalization(),
+            tf.keras.layers.ReLU(),
+            tf.keras.layers.MaxPool2D(pool_size=2),
+            tf.keras.layers.Flatten(),
+            tf.keras.layers.Dense(units=500),
+            tf.keras.layers.ReLU(),
+            tf.keras.layers.Dense(units=10),
+            tf.keras.layers.Softmax()
+        ]
+
+    def call(self, x):
+        for layer in self.seq_layers:
+            x = layer(x)
+        return x
 
 
 def create_pruner(model, pruner_name):
@@ -55,20 +65,40 @@ def main(args):
     model_name = prune_config[args.pruner_name]['model_name']
     dataset_name = prune_config[args.pruner_name]['dataset_name']
     train_set, test_set = get_dataset(dataset_name)
-    model  = create_model(model_name)
-
-    optimizer = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9, decay=1e-4)
-    model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
+    model = create_model(model_name)
 
     print('start training')
-    model.fit(train_set[0], train_set[1], batch_size=args.batch_size, epochs=args.pretrain_epochs, validation_data=test_set)
+    optimizer = tf.keras.optimizers.SGD(learning_rate=0.1, momentum=0.9, decay=1e-4)
+    model.compile(
+        optimizer=optimizer,
+        loss='sparse_categorical_crossentropy',
+        metrics=['accuracy']
+    )
+    model.fit(
+        train_set[0],
+        train_set[1],
+        batch_size=args.batch_size,
+        epochs=args.pretrain_epochs,
+        validation_data=test_set
+    )
 
     print('start model pruning')
     optimizer_finetune = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9, decay=1e-4)
     pruner = create_pruner(model, args.pruner_name)
     model = pruner.compress()
-    model.compile(optimizer=optimizer_finetune, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
-    model.fit(train_set[0], train_set[1], batch_size=args.batch_size, epochs=args.prune_epochs, validation_data=test_set)
+    model.compile(
+        optimizer=optimizer_finetune,
+        loss='sparse_categorical_crossentropy',
+        metrics=['accuracy'],
+        run_eagerly=True  # NOTE: Important, model compression does not work in graph mode!
+    )
+    model.fit(
+        train_set[0],
+        train_set[1],
+        batch_size=args.batch_size,
+        epochs=args.prune_epochs,
+        validation_data=test_set
+    )
 
 
 if __name__ == '__main__':

diff --git a/examples/model_compress/models/mobilenet.py b/examples/model_compress/models/mobilenet.py
@@ -53,7 +53,7 @@ def __init__(self, n_class,  profile='normal'):
     def forward(self, x):
         x = self.conv1(x)
         x = self.features(x)
-        x = x.mean(3).mean(2)  # global average pooling
+        x = x.mean([2, 3]) # global average pooling
 
         x = self.classifier(x)
         return x

diff --git a/examples/model_compress/models/mobilenet_v2.py b/examples/model_compress/models/mobilenet_v2.py
@@ -108,7 +108,10 @@ def __init__(self, n_class=1000, input_size=224, width_mult=1.):
 
     def forward(self, x):
         x = self.features(x)
-        x = x.mean(3).mean(2)
+        # it's same with .mean(3).mean(2), but
+        # speedup only suport the mean option
+        # whose output only have two dimensions
+        x = x.mean([2, 3])
         x = self.classifier(x)
         return x
 

diff --git a/examples/nas/benchmarks/nasbench101.sh b/examples/nas/benchmarks/nasbench101.sh
@@ -15,5 +15,5 @@ fi
 echo "Generating database..."
 rm -f ${NASBENCHMARK_DIR}/nasbench101.db ${NASBENCHMARK_DIR}/nasbench101.db-journal
 mkdir -p ${NASBENCHMARK_DIR}
-python -m nni.nas.benchmarks.nasbench101.db_gen nasbench_full.tfrecord
+python3 -m nni.nas.benchmarks.nasbench101.db_gen nasbench_full.tfrecord
 rm -f nasbench_full.tfrecord
diff --git a/examples/nas/benchmarks/nasbench201.sh b/examples/nas/benchmarks/nasbench201.sh
@@ -15,5 +15,5 @@ fi
 echo "Generating database..."
 rm -f ${NASBENCHMARK_DIR}/nasbench201.db ${NASBENCHMARK_DIR}/nasbench201.db-journal
 mkdir -p ${NASBENCHMARK_DIR}
-python -m nni.nas.benchmarks.nasbench201.db_gen a.pth
+python3 -m nni.nas.benchmarks.nasbench201.db_gen a.pth
 rm -f a.pth
diff --git a/examples/nas/benchmarks/nds.sh b/examples/nas/benchmarks/nds.sh
@@ -16,5 +16,5 @@ unzip data.zip
 echo "Generating database..."
 rm -f ${NASBENCHMARK_DIR}/nds.db ${NASBENCHMARK_DIR}/nds.db-journal
 mkdir -p ${NASBENCHMARK_DIR}
-python -m nni.nas.benchmarks.nds.db_gen nds_data
+python3 -m nni.nas.benchmarks.nds.db_gen nds_data
 rm -rf data.zip nds_data
diff --git a/examples/nas/search_space_zoo/darts_example.py b/examples/nas/search_space_zoo/darts_example.py
@@ -14,7 +14,7 @@
 from utils import accuracy
 
 from nni.nas.pytorch.search_space_zoo import DartsCell
-from darts_search_space import DartsStackedCells
+from darts_stack_cells import DartsStackedCells
 
 logger = logging.getLogger('nni')
 

diff --git a/examples/nas/search_space_zoo/darts_stack_cells.py b/examples/nas/search_space_zoo/darts_stack_cells.py
@@ -2,7 +2,7 @@
 # Licensed under the MIT license.
 
 import torch.nn as nn
-import ops
+from nni.nas.pytorch.search_space_zoo.darts_ops import DropPath
 
 
 class DartsStackedCells(nn.Module):
@@ -79,5 +79,5 @@ def forward(self, x):
 
     def drop_path_prob(self, p):
         for module in self.modules():
-            if isinstance(module, ops.DropPath):
+            if isinstance(module, DropPath):
                 module.p = p
diff --git a/examples/nas/search_space_zoo/enas_macro_example.py b/examples/nas/search_space_zoo/enas_macro_example.py
@@ -58,7 +58,6 @@ def forward(self, x):
     parser = ArgumentParser("enas")
     parser.add_argument("--batch-size", default=128, type=int)
     parser.add_argument("--log-frequency", default=10, type=int)
-    # parser.add_argument("--search-for", choices=["macro", "micro"], default="macro")
     parser.add_argument("--epochs", default=None, type=int, help="Number of epochs (default: macro 310, micro 150)")
     parser.add_argument("--visualization", default=False, action="store_true")
     args = parser.parse_args()
@@ -71,7 +70,6 @@ def forward(self, x):
     criterion = nn.CrossEntropyLoss()
     optimizer = torch.optim.SGD(model.parameters(), 0.05, momentum=0.9, weight_decay=1.0E-4)
     lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=num_epochs, eta_min=0.001)
-
     trainer = enas.EnasTrainer(model,
                                loss=criterion,
                                metrics=accuracy,

diff --git a/examples/nas/search_space_zoo/enas_micro_example.py b/examples/nas/search_space_zoo/enas_micro_example.py
@@ -62,7 +62,7 @@ def __init__(self, num_layers=2, num_nodes=5, out_channels=24, in_channels=3, nu
             reduction = False
             if layer_id in pool_layers:
                 c_cur, reduction = c_p * 2, True
-            self.layers.append(ENASMicroLayer(self.layers, num_nodes, c_pp, c_p, c_cur, reduction))
+            self.layers.append(ENASMicroLayer(num_nodes, c_pp, c_p, c_cur, reduction))
             if reduction:
                 c_pp = c_p = c_cur
             c_pp, c_p = c_p, c_cur
@@ -98,7 +98,6 @@ def forward(self, x):
     parser = ArgumentParser("enas")
     parser.add_argument("--batch-size", default=128, type=int)
     parser.add_argument("--log-frequency", default=10, type=int)
-    # parser.add_argument("--search-for", choices=["macro", "micro"], default="macro")
     parser.add_argument("--epochs", default=None, type=int, help="Number of epochs (default: macro 310, micro 150)")
     parser.add_argument("--visualization", default=False, action="store_true")
     args = parser.parse_args()

diff --git a/src/nni_manager/training_service/local/localTrainingService.ts b/src/nni_manager/training_service/local/localTrainingService.ts
@@ -491,7 +491,7 @@ class LocalTrainingService implements TrainingService {
         if (process.platform === 'win32') {
             script.push(`cd $env:NNI_CODE_DIR`);
             script.push(
-                `cmd.exe /c ${localTrialConfig.command} 2>"${path.join(workingDirectory, 'stderr')}"`,
+                `cmd.exe /c ${localTrialConfig.command} 2>&1 | Out-File "${path.join(workingDirectory, 'stderr')}" -encoding utf8`,
                 `$NOW_DATE = [int64](([datetime]::UtcNow)-(get-date "1/1/1970")).TotalSeconds`,
                 `$NOW_DATE = "$NOW_DATE" + (Get-Date -Format fff).ToString()`,
                 `Write $LASTEXITCODE " " $NOW_DATE  | Out-File "${path.join(workingDirectory, '.nni', 'state')}" -NoNewline -encoding utf8`);
@@ -523,6 +523,8 @@ class LocalTrainingService implements TrainingService {
         const runScriptContent: string[] = [];
         if (process.platform !== 'win32') {
             runScriptContent.push('#!/bin/bash');
+        } else {
+            runScriptContent.push(`$env:PATH="${process.env.path}"`)
         }
         for (const variable of variables) {
             runScriptContent.push(setEnvironmentVariable(variable));

diff --git a/src/nni_manager/training_service/remote_machine/remoteMachineTrainingService.ts b/src/nni_manager/training_service/remote_machine/remoteMachineTrainingService.ts
@@ -87,6 +87,8 @@ class RemoteMachineTrainingService implements TrainingService {
             this.log.info('ssh connection initialized!');
             // set sshConnectionPromises to [] to avoid log information duplicated
             this.sshConnectionPromises = [];
+            // initialize gpuScheduler
+            this.gpuScheduler = new GPUScheduler(this.machineExecutorManagerMap);
         }
         while (!this.stopping) {
             while (this.jobQueue.length > 0) {
@@ -310,7 +312,6 @@ class RemoteMachineTrainingService implements TrainingService {
                 break;
             case TrialConfigMetadataKey.MACHINE_LIST:
                 await this.setupConnections(value);
-                this.gpuScheduler = new GPUScheduler(this.machineExecutorManagerMap);
                 break;
             case TrialConfigMetadataKey.TRIAL_CONFIG: {
                 const remoteMachineTrailConfig: TrialConfig = <TrialConfig>JSON.parse(value);
@@ -426,19 +427,19 @@ class RemoteMachineTrainingService implements TrainingService {
         const rmMetaList: RemoteMachineMeta[] = <RemoteMachineMeta[]>JSON.parse(machineList);
 
         for (const rmMeta of rmMetaList) {
-            rmMeta.occupiedGpuIndexMap = new Map<number, number>();
-            const executorManager: ExecutorManager = new ExecutorManager(rmMeta);
-            this.log.info(`connecting to ${rmMeta.username}@${rmMeta.ip}:${rmMeta.port}`);
-            const executor: ShellExecutor = await executorManager.getExecutor(this.initExecutorId);
-            this.log.debug(`reached ${executor.name}`);
-            this.machineExecutorManagerMap.set(rmMeta, executorManager);
-            this.log.debug(`initializing ${executor.name}`);
-            this.sshConnectionPromises.push(this.initRemoteMachineOnConnected(rmMeta, executor));
-            this.log.info(`connecting to ${executor.name}`);
+            this.sshConnectionPromises.push(this.initRemoteMachineOnConnected(rmMeta));
         }
     }
 
-    private async initRemoteMachineOnConnected(rmMeta: RemoteMachineMeta, executor: ShellExecutor): Promise<void> {
+    private async initRemoteMachineOnConnected(rmMeta: RemoteMachineMeta): Promise<void> {
+        rmMeta.occupiedGpuIndexMap = new Map<number, number>();
+        const executorManager: ExecutorManager = new ExecutorManager(rmMeta);
+        this.log.info(`connecting to ${rmMeta.username}@${rmMeta.ip}:${rmMeta.port}`);
+        const executor: ShellExecutor = await executorManager.getExecutor(this.initExecutorId);
+        this.log.debug(`reached ${executor.name}`);
+        this.machineExecutorManagerMap.set(rmMeta, executorManager);
+        this.log.debug(`initializing ${executor.name}`);
+
         // Create root working directory after executor is ready
         const nniRootDir: string = executor.joinPath(executor.getTempPath(), 'nni');
         await executor.createFolder(executor.getRemoteExperimentRootDir(getExperimentId()));