microsoft · QuanluZhang · Sep 18, 2020 · Sep 9, 2020 · Sep 9, 2020 · squirrelsc
diff --git a/README_zh_CN.md b/README_zh_CN.md
@@ -10,7 +10,7 @@
 
 **NNI (Neural Network Intelligence)** 是一个轻量但强大的工具包，帮助用户**自动**的进行[特征工程](docs/zh_CN/FeatureEngineering/Overview.md)，[神经网络架构搜索](docs/zh_CN/NAS/Overview.md)，[超参调优](docs/zh_CN/Tuner/BuiltinTuner.md)以及[模型压缩](docs/zh_CN/Compressor/Overview.md)。
 
-NNI 管理自动机器学习 (AutoML) 的 Experiment，**调度运行**由调优算法生成的 Trial 任务来找到最好的神经网络架构和/或超参，支持**各种训练环境**，如[本机](docs/zh_CN/TrainingService/LocalMode.md)，[远程服务器](docs/zh_CN/TrainingService/RemoteMachineMode.md)，[OpenPAI](docs/zh_CN/TrainingService/PaiMode.md)，[Kubeflow](docs/zh_CN/TrainingService/KubeflowMode.md)，[基于 K8S 的 FrameworkController（如，AKS 等)](docs/zh_CN/TrainingService/FrameworkControllerMode.md)， [DLWorkspace (又称 DLTS)](docs/zh_CN/TrainingService/DLTSMode.md) 和其它云服务。
+NNI 管理自动机器学习 (AutoML) 的 Experiment，**调度运行**由调优算法生成的 Trial 任务来找到最好的神经网络架构和/或超参，支持**各种训练环境**，如[本机](docs/zh_CN/TrainingService/LocalMode. md)，[远程服务器](docs/zh_CN/TrainingService/RemoteMachineMode. md)，[OpenPAI](docs/zh_CN/TrainingService/PaiMode. md)，[Kubeflow](docs/zh_CN/TrainingService/KubeflowMode. md)，[基于 K8S 的 FrameworkController（如，AKS 等)](docs/zh_CN/TrainingService/FrameworkControllerMode. md)， [DLWorkspace](docs/zh_CN/TrainingService/DLTSMode. md) (又称 DLTS)</a>, [AML](docs/zh_CN/TrainingService/AMLMode.md) (Azure Machine Learning) 以及其它环境。
 
 ## **使用场景**
 
@@ -19,7 +19,7 @@ NNI 管理自动机器学习 (AutoML) 的 Experiment，**调度运行**由调优
 * 想要更容易**实现或试验新的自动机器学习算法**的研究员或数据科学家，包括：超参调优算法，神经网络搜索算法以及模型压缩算法。
 * 在机器学习平台中**支持自动机器学习**。
 
-### **[NNI v1.6 已发布！](https://github.com/microsoft/nni/releases) &nbsp;[<img width="48" src="docs/img/release_icon.png" />](#nni-released-reminder)**
+### **[NNI v1.8 已发布！](https://github.com/microsoft/nni/releases) &nbsp;[<img width="48" src="docs/img/release_icon.png" />](#nni-released-reminder)**
 
 ## **NNI 功能一览**
 
@@ -164,6 +164,7 @@ NNI 提供命令行工具以及友好的 WebUI 来管理训练的 Experiment。
       <ul>
         <li><a href="docs/zh_CN/TrainingService/LocalMode.md">本机</a></li>
         <li><a href="docs/zh_CN/TrainingService/RemoteMachineMode.md">远程计算机</a></li>
+        <li><a href="docs/zh_CN/TrainingService/AMLMode.md">AML(Azure Machine Learning)</a></li>
         <li><b>基于 Kubernetes 的平台</b></li>
             <ul><li><a href="docs/zh_CN/TrainingService/PaiMode.md">OpenPAI</a></li>
             <li><a href="docs/zh_CN/TrainingService/KubeflowMode.md">Kubeflow</a></li>
@@ -208,7 +209,7 @@ NNI 提供命令行工具以及友好的 WebUI 来管理训练的 Experiment。
 
 ### **安装**
 
-NNI 支持并在 Ubuntu >= 16.04, macOS >= 10.14.1, 和 Windows 10 >= 1809 通过了测试。 在 `python 64-bit >= 3.5` 的环境中，只需要运行 `pip install` 即可完成安装。
+NNI 支持并在 Ubuntu >= 16.04, macOS >= 10.14.1, 和 Windows 10 >= 1809 通过了测试。 在 `python 64-bit >= 3.6` 的环境中，只需要运行 `pip install` 即可完成安装。
 
 Linux 或 macOS
 
@@ -239,7 +240,7 @@ Linux 和 macOS 下 NNI 系统需求[参考这里](https://nni.readthedocs.io/zh
 * 通过克隆源代码下载示例。
 
    ```bash
-   git clone -b v1.6 https://github.com/Microsoft/nni.git
+   git clone -b v1.8 https://github.com/Microsoft/nni.git
    ```
 
 * 运行 MNIST 示例。

diff --git a/deployment/docker/README_zh_CN.md b/deployment/docker/README_zh_CN.md
@@ -1,18 +1,20 @@
-# Dockerfile
+# Dockerfile 
 
 ## 1. 说明
 
 这是 NNI 项目的 Dockerfile 文件。 其中包含了 NNI 以及多个流行的深度学习框架。 在 `Ubuntu 16.04 LTS` 上进行过测试：
 
-    CUDA 9.0, CuDNN 7.0
-    numpy 1.14.3,scipy 1.1.0
-    TensorFlow-gpu 1.10.0
-    Keras 2.1.6
-    PyTorch 0.4.1
-    scikit-learn 0.20.0
+    CUDA 9.0
+    CuDNN 7.0
+    numpy 1.14.3
+    scipy 1.1.0
+    tensorflow-gpu 1.15.0
+    keras 2.1.6
+    torch 1.4.0
+    scikit-learn 0.23.2
     pandas 0.23.4
     lightgbm 2.2.2
-    NNI v0.7
+    nni
 
 
 此 Dockerfile 可作为定制的参考。
@@ -47,4 +49,4 @@
 
 使用下列命令从 docker Hub 中拉取 NNI docker 映像。
 
-    docker pull msranni/nni:latest
+    docker pull msranni/nni:latest
diff --git a/docs/zh_CN/Assessor/CustomizeAssessor.md b/docs/zh_CN/Assessor/CustomizeAssessor.md
@@ -54,7 +54,7 @@ assessor:
 
 注意在 **2** 中， `trial_history` 对象与 Trial 通过 `report_intermediate_result` 函数返回给 Assessor 的对象完全一致。
 
-Assessor 的工作目录是`<home>/nni/experiments/<experiment_id>/log` 可从环境变量 `NNI_LOG_DIRECTORY` 中获取。
+Assessor 的工作目录是`<home>/nni-experiments/<experiment_id>/log` 可从环境变量 `NNI_LOG_DIRECTORY` 中获取。
 
 更多示例，可参考：
 

diff --git a/...zh_CN/CommunitySharings/HpoComparision.md → .../zh_CN/CommunitySharings/HpoComparison.md b/...zh_CN/CommunitySharings/HpoComparision.md → .../zh_CN/CommunitySharings/HpoComparison.md
@@ -1,15 +1,14 @@
 # 超参数优化的对比
-
 *匿名作者*
 
 超参优化算法（HPO）在几个问题上的对比。
 
 超参数优化算法如下：
 
-- [Random Search（随机搜索）](../Tuner/BuiltinTuner.md)
-- [Grid Search（遍历搜索）](../Tuner/BuiltinTuner.md)
+- [Random Search](../Tuner/BuiltinTuner.md)
+- [Grid Search](../Tuner/BuiltinTuner.md)
 - [Evolution](../Tuner/BuiltinTuner.md)
-- [Anneal（退火算法）](../Tuner/BuiltinTuner.md)
+- [Anneal](../Tuner/BuiltinTuner.md)
 - [Metis](../Tuner/BuiltinTuner.md)
 - [TPE](../Tuner/BuiltinTuner.md)
 - [SMAC](../Tuner/BuiltinTuner.md)
@@ -20,15 +19,16 @@
 
 环境：
 
-    OS: Linux Ubuntu 16.04 LTS
-    CPU: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2600 MHz
-    Memory: 112 GB
-    NNI Version: v0.7
-    NNI 模式(local|pai|remote): local
-    Python 版本: 3.6
-    使用的虚拟环境: Conda
-    是否在 Docker 中运行: no
-
+```
+OS: Linux Ubuntu 16.04 LTS
+CPU: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2600 MHz
+Memory: 112 GB
+NNI Version: v0.7
+NNI 模式(local|pai|remote): local
+Python 版本: 3.6
+使用的虚拟环境: Conda
+是否在 Docker 中运行: no
+```
 
 ## AutoGBDT 示例
 
@@ -67,40 +67,40 @@
 }
 ```
 
-总搜索空间为 1, 204, 224 次，将最大 Trial 次数设置为1000。 时间限制为 48 小时。
+总搜索空间为 1, 204, 224 次，将最大 Trial 次数设置为 1000。 时间限制为 48 小时。
 
 ### 结果
 
-| 算法            | 最好的损失值       | 最好的 5 次损失的平均值 | 最好的 10 次损失的平均 |
-| ------------- | ------------ | ------------- | ------------- |
-| Random Search | 0.418854     | 0.420352      | 0.421553      |
-| Random Search | 0.417364     | 0.420024      | 0.420997      |
-| Random Search | 0.417861     | 0.419744      | 0.420642      |
-| Grid Search   | 0.498166     | 0.498166      | 0.498166      |
-| Evolution     | 0.409887     | 0.409887      | 0.409887      |
-| Evolution     | 0.413620     | 0.413875      | 0.414067      |
-| Evolution     | 0.409887     | 0.409887      | 0.409887      |
-| Anneal        | 0.414877     | 0.417289      | 0.418281      |
-| Anneal        | 0.409887     | 0.409887      | 0.410118      |
-| Anneal        | 0.413683     | 0.416949      | 0.417537      |
-| Metis         | 0.416273     | 0.420411      | 0.422380      |
-| Metis         | 0.420262     | 0.423175      | 0.424816      |
-| Metis         | 0.421027     | 0.424172      | 0.425714      |
-| TPE           | 0.414478     | 0.414478      | 0.414478      |
-| TPE           | 0.415077     | 0.417986      | 0.418797      |
-| TPE           | 0.415077     | 0.417009      | 0.418053      |
-| SMAC          | **0.408386** | **0.408386**  | **0.408386**  |
-| SMAC          | 0.414012     | 0.414012      | 0.414012      |
-| SMAC          | **0.408386** | **0.408386**  | **0.408386**  |
-| BOHB          | 0.410464     | 0.415319      | 0.417755      |
-| BOHB          | 0.418995     | 0.420268      | 0.422604      |
-| BOHB          | 0.415149     | 0.418072      | 0.418932      |
-| HyperBand     | 0.414065     | 0.415222      | 0.417628      |
-| HyperBand     | 0.416807     | 0.417549      | 0.418828      |
-| HyperBand     | 0.415550     | 0.415977      | 0.417186      |
-| GP            | 0.414353     | 0.418563      | 0.420263      |
-| GP            | 0.414395     | 0.418006      | 0.420431      |
-| GP            | 0.412943     | 0.416566      | 0.418443      |
+| 算法            | 最好的损失值       | 最好的 5 次损失的平均值 | 最好的 10 次损失的平均值 |
+| ------------- | ------------ | ------------- | -------------- |
+| Random Search | 0.418854     | 0.420352      | 0.421553       |
+| Random Search | 0.417364     | 0.420024      | 0.420997       |
+| Random Search | 0.417861     | 0.419744      | 0.420642       |
+| Grid Search   | 0.498166     | 0.498166      | 0.498166       |
+| Evolution     | 0.409887     | 0.409887      | 0.409887       |
+| Evolution     | 0.413620     | 0.413875      | 0.414067       |
+| Evolution     | 0.409887     | 0.409887      | 0.409887       |
+| Anneal        | 0.414877     | 0.417289      | 0.418281       |
+| Anneal        | 0.409887     | 0.409887      | 0.410118       |
+| Anneal        | 0.413683     | 0.416949      | 0.417537       |
+| Metis         | 0.416273     | 0.420411      | 0.422380       |
+| Metis         | 0.420262     | 0.423175      | 0.424816       |
+| Metis         | 0.421027     | 0.424172      | 0.425714       |
+| TPE           | 0.414478     | 0.414478      | 0.414478       |
+| TPE           | 0.415077     | 0.417986      | 0.418797       |
+| TPE           | 0.415077     | 0.417009      | 0.418053       |
+| SMAC          | **0.408386** | **0.408386**  | **0.408386**   |
+| SMAC          | 0.414012     | 0.414012      | 0.414012       |
+| SMAC          | **0.408386** | **0.408386**  | **0.408386**   |
+| BOHB          | 0.410464     | 0.415319      | 0.417755       |
+| BOHB          | 0.418995     | 0.420268      | 0.422604       |
+| BOHB          | 0.415149     | 0.418072      | 0.418932       |
+| HyperBand     | 0.414065     | 0.415222      | 0.417628       |
+| HyperBand     | 0.416807     | 0.417549      | 0.418828       |
+| HyperBand     | 0.415550     | 0.415977      | 0.417186       |
+| GP            | 0.414353     | 0.418563      | 0.420263       |
+| GP            | 0.414395     | 0.418006      | 0.420431       |
+| GP            | 0.412943     | 0.416566      | 0.418443       |
 
 此例中，所有算法都使用了默认参数。 Metis 算法因为其高斯计算过程的复杂度为 O(n^3) 而运行非常慢，因此仅执行了 300 次 Trial。
 
@@ -114,21 +114,22 @@
 
 #### 计算机配置
 
-    RocksDB:    version 6.1
-    CPU:        6 * Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
-    CPUCache:   35840 KB
-    Keys:       16 bytes each
-    Values:     100 bytes each (50 bytes after compression)
-    Entries:    1000000
-
+```
+RocksDB:    version 6.1
+CPU:        6 * Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
+CPUCache:   35840 KB
+Keys:       16 bytes each
+Values:     100 bytes each (50 bytes after compression)
+Entries:    1000000
+```
 
 #### 存储性能
 
 **延迟**：每个 IO 请求都需要一些时间才能完成，这称为平均延迟。 有几个因素会影响此时间，包括网络连接质量和硬盘IO性能。
 
-**IOPS**： **每秒的 IO 操作数量**，这意味着可以在一秒钟内完成的*读取或写入操作次数*。
+**IOPS**：**每秒的 IO 操作数量**，这意味着可以在一秒钟内完成的_读取或写入操作次数_。
 
-**IO 大小**： **每个 IO 请求的大小**。 根据操作系统和需要磁盘访问的应用程序、服务，它将同时发出读取或写入一定数量数据的请求。
+**IO 大小**：**每个 IO 请求的大小**。 根据操作系统和需要磁盘访问的应用程序、服务，它将同时发出读取或写入一定数量数据的请求。
 
 **吞吐量（以 MB/s 为单位）= 平均 IO 大小 x IOPS **
 
@@ -200,7 +201,7 @@ IOPS 与在线处理能力有关，我们在实验中使用 IOPS 作为指标。
 | SMAC      | 491067          | 490472          | **491136**      |
 | Metis     | 444920          | 457060          | 454438          |
 
-Figure:
+图：
 
 ![](../../img/hpo_rocksdb_fillrandom.png)
 
@@ -215,6 +216,6 @@ Figure:
 | SMAC      | 2270874         | 2284904         | 2282266         |
 | Metis     | **2287696**     | 2283496         | 2277701         |
 
-Figure:
+图：
 
-![](../../img/hpo_rocksdb_readrandom.png)
+![](../../img/hpo_rocksdb_readrandom.png)
diff --git a/docs/zh_CN/CommunitySharings/ModelCompressionComparison.md b/docs/zh_CN/CommunitySharings/ModelCompressionComparison.md
@@ -0,0 +1,82 @@
+# 滤波器剪枝算法比较
+
+为了初步了解各种滤波器剪枝算法的性能，在一些基准模型和数据集上使用各种剪枝算法进行了广泛的实验。 此文档中展示了实验结果。 此外，还对这些实验的复现提供了友好的说明，以促进对这项工作的进一步贡献。
+
+## 实验设置
+
+实验使用以下剪枝器/数据集/模型进行:
+
+* 模型：[VGG16, ResNet18, ResNet50](https://github.com/microsoft/nni/tree/master/examples/model_compress/models/cifar10)
+
+* 数据集：CIFAR-10
+
+* 剪枝器：
+    - 剪枝器包括：
+        - 迭代式剪枝器 : `SimulatedAnnealing Pruner`, `NetAdapt Pruner`, `AutoCompress Pruner`。 给定总体稀疏度要求，这类剪枝器可以在不同层中自动分配稀疏度。
+        - 单轮剪枝器：`L1Filter Pruner`，`L2Filter Pruner`，`FPGM Pruner`。 每层的稀疏度与实验设置的总体稀疏度相同。
+    - 这里只比较 **filter pruning** 的剪枝效果。
+
+    对于迭代式剪枝器，使用 `L1Filter Pruner` 作为基础算法。 也就是说, 在迭代式剪枝器决定了稀疏度分布之后，使用 `L1Filter Pruner` 进行真正的剪枝。
+
+    - 上面列出来的所有的剪枝器都已经在 [NNI](https://github.com/microsoft/nni/tree/master/docs/zh_CN/Compressor/Overview.md) 中实现。
+
+## 实验结果
+
+对于每一个数据集/模型/剪枝器的组合，设置不同的目标稀疏度对模型进行剪枝。
+
+这里展示了**权重数量 - 性能**曲线，还展示了**FLOPs - 性能**曲线。 同时在图上画出论文 [AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates](http://arxiv.org/abs/1907.03141) 中对 VGG16 和 ResNet18 在 CIFAR-10 上的剪枝结果作为对比。
+
+实验结果如下图所示：
+
+CIFAR-10, VGG16:
+
+![](../../../examples/model_compress/comparison_of_pruners/img/performance_comparison_vgg16.png)
+
+CIFAR-10, ResNet18:
+
+![](../../../examples/model_compress/comparison_of_pruners/img/performance_comparison_resnet18.png)
+
+CIFAR-10, ResNet50:
+
+![](../../../examples/model_compress/comparison_of_pruners/img/performance_comparison_resnet50.png)
+
+## 分析
+
+从实验结果中，得到以下结论：
+
+* 如果稀疏度是通过限制参数量，那么迭代式剪枝器 ( `AutoCompress Pruner` , `SimualatedAnnealing Pruner` ) 比单轮剪枝器表现好。 但是在以 FLOPs 稀疏度为标准的情况下，它们相比于单轮剪枝器就没有了优势，因为当前的这些剪枝算法都是根据参数稀疏度来剪枝的。
+* 在上述实验中，简单的单轮剪枝器 `L1Filter Pruner` , `L2Filter Pruner` , `FPGM Pruner` 表现比较相近。
+* `NetAdapt Pruner` 无法达到比较高的压缩率。 因为它的机制是一次迭代只剪枝一层。 这就导致如果每次迭代剪掉的稀疏度远小于指定的总的稀疏度的话，会导致不可接受的剪枝复杂度。
+
+## 实验复现
+
+### 实现细节
+
+* 实验结果都是在 NNI 中使用剪枝器的默认配置收集的，这意味着当我们在 NNI 中调用一个剪枝器类时，我们不会更改任何默认的类参数。
+
+* FLOPs 和 参数数量均通过 [模型 FLOPs 和参数计数器](https://github.com/microsoft/nni/blob/master/docs/zh_CN/Compressor/CompressionUtils.md#model-flopsparameters-counter)在[模型加速](https://github.com/microsoft/nni/blob/master/docs/zh_CN/Compressor/ModelSpeedup.md)后计算。 这避免了依据掩码模型计算的潜在问题。
+
+* 实验代码在[这里](https://github.com/microsoft/nni/tree/master/examples/model_compress/auto_pruners_torch.py)。
+
+### 实验结果展示
+
+* 如果遵循[示例](https://github.com/microsoft/nni/tree/master/examples/model_compress/auto_pruners_torch.py)的做法，对于每一次剪枝实验，实验结果将以JSON格式保存如下：
+    ``` json
+    {
+        "performance": {"original": 0.9298, "pruned": 0.1, "speedup": 0.1, "finetuned": 0.7746}, 
+        "params": {"original": 14987722.0, "speedup": 167089.0}, 
+        "flops": {"original": 314018314.0, "speedup": 38589922.0}
+    }
+    ```
+
+* 实验结果保存在[这里](https://github.com/microsoft/nni/tree/master/examples/model_compress/comparison_of_pruners)。 可以参考[分析](https://github.com/microsoft/nni/blob/master/examples/model_compress/comparison_of_pruners/analyze.py)来绘制新的性能比较图。
+
+## 贡献
+
+### 待办事项
+
+* 有 FLOPS/延迟 限制的剪枝器
+* 更多剪枝算法/数据集/模型
+
+### 问题
+关于算法实现及实验问题，请[发起 issue](https://github.com/microsoft/nni/issues/new/)。
diff --git a/docs/zh_CN/CommunitySharings/NNI_colab_support.md b/docs/zh_CN/CommunitySharings/NNI_colab_support.md
@@ -0,0 +1,44 @@
+
+# 在 Google Colab 上使用 NNI
+在 Google Colab 上轻松使用 NNI。 Colab 没有暴露它的公网 IP 及端口，因此默认情况下无法在 Colab 中访问 NNI 的 Web 界面。 为解决此问题，需要使用反向代理软件，例如 `ngrok` 或 `frp`。 此教程将展示如何使用 ngrok 在 Colab 上访问 NNI 的Web 界面。
+
+## 如何在 Google Colab 上打开 NNI 的 Web 界面
+
+1. 安装需要的包和软件。
+
+
+```
+! pip install nni # install nni
+! wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip # download ngrok and unzip it
+! unzip ngrok-stable-linux-amd64.zip
+! mkdir -p nni_repo
+! git clone https://github.com/microsoft/nni.git nni_repo/nni # clone NNI's offical repo to get examples
+```
+
+2. 在[此处](https://ngrok.com/)注册 ngrok 账号，然后通过 authtoken 来连接。
+
+
+```
+! ./ngrok authtoken <your-authtoken>
+```
+
+3. 在大于 1024 的端口号上启动 NNI 样例，之后在相同端口上启动 ngrok。 如果希望使用 GPU，确保 config.yml 中 gpuNum >= 1 。 因为使用 `! ngrok http 5000 &` 会停止响应，要使用 </0> get_ipython()</code> 来启动 ngrok。
+
+
+```
+! nnictl create --config nni_repo/nni/examples/trials/mnist-pytorch/config.yml --port 5000 &
+get_ipython().system_raw('./ngrok http 5000 &')
+```
+
+4. 查看公网 url 。
+
+
+```
+! curl -s http://localhost:4040/api/tunnels # don't change the port number 4040
+```
+
+在步骤 4 后将会看到类似 http://xxxx.ngrok.io 的 url，打开此url即可看到 NNI 的Web 界面。 玩得开心 :)
+
+## 使用 frp 访问 Web 界面
+
+frp 是另一款提供了相似功能的反向代理软件。 但 frp 不提供免费的公网 url，因此可能需要一台拥有公网 IP 的服务器来作为 frp 的服务器端。 参考[这里](https://github.com/fatedier/frp)来了解如何部署 frp。