Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Commit

Permalink
Merge master into dev-retiarii (#3178)
Browse files Browse the repository at this point in the history
  • Loading branch information
liuzhe-lz authored Dec 11, 2020
1 parent d165905 commit 3ec26b4
Show file tree
Hide file tree
Showing 327 changed files with 19,958 additions and 887 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ typings/
__pycache__
build
*.egg-info
.eggs/
setup.pye
**/__init__.pye
**/.ipynb_checkpoints
Expand Down
28 changes: 15 additions & 13 deletions deployment/docker/Dockerfile → Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,15 @@

FROM nvidia/cuda:9.2-cudnn7-runtime-ubuntu18.04

ARG NNI_RELEASE

LABEL maintainer='Microsoft NNI Team<nni@microsoft.com>'

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get -y update && \
apt-get -y install sudo \
RUN apt-get -y update
RUN apt-get -y install \
sudo \
apt-utils \
git \
curl \
Expand All @@ -26,28 +29,27 @@ RUN apt-get -y update && \
python3-dev \
python3-pip \
python3-tk \
libcupti-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
libcupti-dev
RUN apt-get clean
RUN rm -rf /var/lib/apt/lists/*

#
# generate python script
#
RUN cp /usr/bin/python3 /usr/bin/python
RUN ln -s python3 /usr/bin/python

#
# update pip
#
RUN python3 -m pip install --upgrade pip==20.0.2 setuptools==41.0.0
RUN python3 -m pip install --upgrade pip==20.2.4 setuptools==50.3.2

# numpy 1.14.3 scipy 1.1.0
RUN python3 -m pip --no-cache-dir install \
numpy==1.14.3 scipy==1.1.0
RUN python3 -m pip --no-cache-dir install numpy==1.14.3 scipy==1.1.0

#
# Tensorflow 1.15
# TensorFlow
#
RUN python3 -m pip --no-cache-dir install tensorflow-gpu==1.15.0
RUN python3 -m pip --no-cache-dir install tensorflow==2.3.1

#
# Keras 2.1.6
Expand All @@ -73,15 +75,15 @@ RUN python3 -m pip --no-cache-dir install pandas==0.23.4 lightgbm==2.2.2
#
# Install NNI
#
RUN python3 -m pip --no-cache-dir install nni
COPY dist/nni-${NNI_RELEASE}-py3-none-manylinux1_x86_64.whl .
RUN python3 -m pip install nni-${NNI_RELEASE}-py3-none-manylinux1_x86_64.whl

#
# install aml package
#
RUN python3 -m pip --no-cache-dir install azureml
RUN python3 -m pip --no-cache-dir install azureml-sdk


ENV PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/root/.local/bin:/usr/bin:/bin:/sbin

WORKDIR /root
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

**NNI (Neural Network Intelligence)** is a lightweight but powerful toolkit to help users **automate** <a href="docs/en_US/FeatureEngineering/Overview.md">Feature Engineering</a>, <a href="docs/en_US/NAS/Overview.md">Neural Architecture Search</a>, <a href="docs/en_US/Tuner/BuiltinTuner.md">Hyperparameter Tuning</a> and <a href="docs/en_US/Compression/Overview.md">Model Compression</a>.

The tool manages automated machine learning (AutoML) experiments, **dispatches and runs** experiments' trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in **different training environments** like <a href="docs/en_US/TrainingService/LocalMode.md">Local Machine</a>, <a href="docs/en_US/TrainingService/RemoteMachineMode.md">Remote Servers</a>, <a href="docs/en_US/TrainingService/PaiMode.md">OpenPAI</a>, <a href="docs/en_US/TrainingService/KubeflowMode.md">Kubeflow</a>, <a href="docs/en_US/TrainingService/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a>, <a href="docs/en_US/TrainingService/DLTSMode.md">DLWorkspace (aka. DLTS)</a>, <a href="docs/en_US/TrainingService/AMLMode.md">AML (Azure Machine Learning)</a> and other cloud options.
The tool manages automated machine learning (AutoML) experiments, **dispatches and runs** experiments' trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in **different training environments** like <a href="docs/en_US/TrainingService/LocalMode.md">Local Machine</a>, <a href="docs/en_US/TrainingService/RemoteMachineMode.md">Remote Servers</a>, <a href="docs/en_US/TrainingService/PaiMode.md">OpenPAI</a>, <a href="docs/en_US/TrainingService/KubeflowMode.md">Kubeflow</a>, <a href="docs/en_US/TrainingService/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a>, <a href="docs/en_US/TrainingService/DLTSMode.md">DLWorkspace (aka. DLTS)</a>, <a href="docs/en_US/TrainingService/AMLMode.md">AML (Azure Machine Learning)</a>, <a href="docs/en_US/TrainingService/AdaptDLMode.md">AdaptDL (aka. ADL)</a> and other cloud options.

## **Who should consider using NNI**

Expand Down Expand Up @@ -173,11 +173,13 @@ Within the following table, we summarized the current NNI capabilities, we are g
<li><a href="docs/en_US/TrainingService/RemoteMachineMode.md">Remote Servers</a></li>
<li><a href="docs/en_US/TrainingService/AMLMode.md">AML(Azure Machine Learning)</a></li>
<li><b>Kubernetes based services</b></li>
<ul><li><a href="docs/en_US/TrainingService/PaiMode.md">OpenPAI</a></li>
<li><a href="docs/en_US/TrainingService/KubeflowMode.md">Kubeflow</a></li>
<li><a href="docs/en_US/TrainingService/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a></li>
</ul>
<ul><li><a href="docs/en_US/TrainingService/DLTSMode.md">DLWorkspace (aka. DLTS)</a></li>
<ul>
<li><a href="docs/en_US/TrainingService/PaiMode.md">OpenPAI</a></li>
<li><a href="docs/en_US/TrainingService/KubeflowMode.md">Kubeflow</a></li>
<li><a href="docs/en_US/TrainingService/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a></li>
<li><a href="docs/en_US/TrainingService/DLTSMode.md">DLWorkspace (aka. DLTS)</a></li>
<li><a href="docs/en_US/TrainingService/AdaptDLMode.md">AdaptDL (aka. ADL)</a></li>
</ul>
</ul>
</td>
</tr>
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
131 changes: 131 additions & 0 deletions docs/archive_en_US/ResearchPublications.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Research and Publications

We are intensively working on both tool chain and research to make automatic model design and tuning really practical and powerful. On the one hand, our main work is tool chain oriented development. On the other hand, our research works aim to improve this tool chain, rethink challenging problems in AutoML (on both system and algorithm) and propose elegant solutions. Below we list some of our research works, we encourage more research works on this topic and encourage collaboration with us.


## System Research

- [Retiarii: A Deep Learning Exploratory-Training Framework](https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf)

```bibtex
@inproceedings{zhang2020retiarii,
title={Retiarii: A Deep Learning Exploratory-Training Framework},
author={Zhang, Quanlu and Han, Zhenhua and Yang, Fan and Zhang, Yuge and Liu, Zhe and Yang, Mao and Zhou, Lidong},
booktitle={14th $\{$USENIX$\}$ Symposium on Operating Systems Design and Implementation ($\{$OSDI$\}$ 20)},
pages={919--936},
year={2020}
}
```

- [AutoSys: The Design and Operation of Learning-Augmented Systems](https://www.usenix.org/system/files/atc20-liang-chieh-jan.pdf)

```bibtex
@inproceedings{liang2020autosys,
title={AutoSys: The Design and Operation of Learning-Augmented Systems},
author={Liang, Chieh-Jan Mike and Xue, Hui and Yang, Mao and Zhou, Lidong and Zhu, Lifei and Li, Zhao Lucis and Wang, Zibo and Chen, Qi and Zhang, Quanlu and Liu, Chuanjie and others},
booktitle={2020 $\{$USENIX$\}$ Annual Technical Conference ($\{$USENIX$\}$$\{$ATC$\}$ 20)},
pages={323--336},
year={2020}
}
```

- [Gandiva: Introspective Cluster Scheduling for Deep Learning](https://www.usenix.org/system/files/osdi18-xiao.pdf)

```bibtex
@inproceedings{xiao2018gandiva,
title={Gandiva: Introspective cluster scheduling for deep learning},
author={Xiao, Wencong and Bhardwaj, Romil and Ramjee, Ramachandran and Sivathanu, Muthian and Kwatra, Nipun and Han, Zhenhua and Patel, Pratyush and Peng, Xuan and Zhao, Hanyu and Zhang, Quanlu and others},
booktitle={13th $\{$USENIX$\}$ Symposium on Operating Systems Design and Implementation ($\{$OSDI$\}$ 18)},
pages={595--610},
year={2018}
}
```

## Algorithm Research

### New Algorithms

- [TextNAS: A Neural Architecture Search Space Tailored for Text Representation](https://arxiv.org/pdf/1912.10729.pdf)

```bibtex
@inproceedings{wang2020textnas,
title={TextNAS: A Neural Architecture Search Space Tailored for Text Representation.},
author={Wang, Yujing and Yang, Yaming and Chen, Yiren and Bai, Jing and Zhang, Ce and Su, Guinan and Kou, Xiaoyu and Tong, Yunhai and Yang, Mao and Zhou, Lidong},
booktitle={AAAI},
pages={9242--9249},
year={2020}
}
```

- [Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search](https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf)

```bibtex
@article{peng2020cream,
title={Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search},
author={Peng, Houwen and Du, Hao and Yu, Hongyuan and Li, Qi and Liao, Jing and Fu, Jianlong},
journal={Advances in Neural Information Processing Systems},
volume={33},
year={2020}
}
```

- [Metis: Robustly tuning tail latencies of cloud systems](https://www.usenix.org/system/files/conference/atc18/atc18-li-zhao.pdf)

```bibtex
@inproceedings{li2018metis,
title={Metis: Robustly tuning tail latencies of cloud systems},
author={Li, Zhao Lucis and Liang, Chieh-Jan Mike and He, Wenjia and Zhu, Lianjie and Dai, Wenjun and Jiang, Jin and Sun, Guangzhong},
booktitle={2018 $\{$USENIX$\}$ Annual Technical Conference ($\{$USENIX$\}$$\{$ATC$\}$ 18)},
pages={981--992},
year={2018}
}
```

- [OpEvo: An Evolutionary Method for Tensor Operator Optimization](https://arxiv.org/abs/2006.05664)

```bibtex
@article{gao2020opevo,
title={OpEvo: An Evolutionary Method for Tensor Operator Optimization},
author={Gao, Xiaotian and Wei, Cui and Zhang, Lintao and Yang, Mao},
journal={arXiv preprint arXiv:2006.05664},
year={2020}
}
```

### Measurement and Understanding

- [Deeper insights into weight sharing in neural architecture search](https://arxiv.org/pdf/2001.01431.pdf)

```bibtex
@article{zhang2020deeper,
title={Deeper insights into weight sharing in neural architecture search},
author={Zhang, Yuge and Lin, Zejun and Jiang, Junyang and Zhang, Quanlu and Wang, Yujing and Xue, Hui and Zhang, Chen and Yang, Yaming},
journal={arXiv preprint arXiv:2001.01431},
year={2020}
}
```

- [How Does Supernet Help in Neural Architecture Search?](https://arxiv.org/abs/2010.08219)

```bibtex
@article{zhang2020does,
title={How Does Supernet Help in Neural Architecture Search?},
author={Zhang, Yuge and Zhang, Quanlu and Yang, Yaming},
journal={arXiv preprint arXiv:2010.08219},
year={2020}
}
```

### Applications

- [AutoADR: Automatic Model Design for Ad Relevance](https://arxiv.org/pdf/2010.07075.pdf)

```bibtex
@inproceedings{chen2020autoadr,
title={AutoADR: Automatic Model Design for Ad Relevance},
author={Chen, Yiren and Yang, Yaming and Sun, Hong and Wang, Yujing and Xu, Yu and Shen, Wei and Zhou, Rong and Tong, Yunhai and Bai, Jing and Zhang, Ruofei},
booktitle={Proceedings of the 29th ACM International Conference on Information \& Knowledge Management},
pages={2365--2372},
year={2020}
}
```
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Use Grid search to find the best combination of alpha, beta and gamma for Effici
[Example code](https://github.com/microsoft/nni/tree/v1.9/examples/trials/efficientnet)

1. Set your working directory here in the example code directory.
2. Run `git clone https://github.com/ultmaster/EfficientNet-PyTorch` to clone this modified version of [EfficientNet-PyTorch](https://github.com/lukemelas/EfficientNet-PyTorch). The modifications were done to adhere to the original [Tensorflow version](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet) as close as possible (including EMA, label smoothing and etc.); also added are the part which gets parameters from tuner and reports intermediate/final results. Clone it into `EfficientNet-PyTorch`; the files like `main.py`, `train_imagenet.sh` will appear inside, as specified in the configuration files.
2. Run `git clone https://github.com/ultmaster/EfficientNet-PyTorch` to clone the [ultmaster modified version](https://github.com/ultmaster/EfficientNet-PyTorch) of the original [EfficientNet-PyTorch](https://github.com/lukemelas/EfficientNet-PyTorch). The modifications were done to adhere to the original [Tensorflow version](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet) as close as possible (including EMA, label smoothing and etc.); also added are the part which gets parameters from tuner and reports intermediate/final results. Clone it into `EfficientNet-PyTorch`; the files like `main.py`, `train_imagenet.sh` will appear inside, as specified in the configuration files.
3. Run `nnictl create --config config_local.yml` (use `config_pai.yml` for OpenPAI) to find the best EfficientNet-B1. Adjust the training service (PAI/local/remote), batch size in the config files according to the environment.

For training on ImageNet, read `EfficientNet-PyTorch/train_imagenet.sh`. Download ImageNet beforehand and extract it adhering to [PyTorch format](https://pytorch.org/docs/stable/torchvision/datasets.html#imagenet) and then replace `/mnt/data/imagenet` in with the location of the ImageNet storage. This file should also be a good example to follow for mounting ImageNet into the container on OpenPAI.
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
101 changes: 101 additions & 0 deletions docs/en_US/Assessor/BuiltinAssessor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
.. role:: raw-html(raw)
:format: html


Built-in Assessors
==================

NNI provides state-of-the-art tuning algorithms within our builtin-assessors and makes them easy to use. Below is a brief overview of NNI's current builtin Assessors.

Note: Click the **Assessor's name** to get each Assessor's installation requirements, suggested usage scenario, and a config example. A link to a detailed description of each algorithm is provided at the end of the suggested scenario for each Assessor.

Currently, we support the following Assessors:

.. list-table::
:header-rows: 1
:widths: auto

* - Assessor
- Brief Introduction of Algorithm
* - `Medianstop <#MedianStop>`__
- Medianstop is a simple early stopping rule. It stops a pending trial X at step S if the trial’s best objective value by step S is strictly worse than the median value of the running averages of all completed trials’ objectives reported up to step S. `Reference Paper <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf>`__
* - `Curvefitting <#Curvefitting>`__
- Curve Fitting Assessor is an LPA (learning, predicting, assessing) algorithm. It stops a pending trial X at step S if the prediction of the final epoch's performance worse than the best final performance in the trial history. In this algorithm, we use 12 curves to fit the accuracy curve. `Reference Paper <http://aad.informatik.uni-freiburg.de/papers/15-IJCAI-Extrapolation_of_Learning_Curves.pdf>`__


Usage of Builtin Assessors
--------------------------

Usage of builtin assessors provided by the NNI SDK requires one to declare the **builtinAssessorName** and **classArgs** in the ``config.yml`` file. In this part, we will introduce the details of usage and the suggested scenarios, classArg requirements, and an example for each assessor.

Note: Please follow the provided format when writing your ``config.yml`` file.

:raw-html:`<a name="MedianStop"></a>`

Median Stop Assessor
^^^^^^^^^^^^^^^^^^^^

..
Builtin Assessor Name: **Medianstop**


**Suggested scenario**

It's applicable in a wide range of performance curves, thus, it can be used in various scenarios to speed up the tuning progress. `Detailed Description <./MedianstopAssessor.rst>`__

**classArgs requirements:**


* **optimize_mode** (*maximize or minimize, optional, default = maximize*\ ) - If 'maximize', assessor will **stop** the trial with smaller expectation. If 'minimize', assessor will **stop** the trial with larger expectation.
* **start_step** (*int, optional, default = 0*\ ) - A trial is determined to be stopped or not only after receiving start_step number of reported intermediate results.

**Usage example:**

.. code-block:: yaml
# config.yml
assessor:
builtinAssessorName: Medianstop
classArgs:
optimize_mode: maximize
start_step: 5
:raw-html:`<br>`

:raw-html:`<a name="Curvefitting"></a>`

Curve Fitting Assessor
^^^^^^^^^^^^^^^^^^^^^^

..
Builtin Assessor Name: **Curvefitting**


**Suggested scenario**

It's applicable in a wide range of performance curves, thus, it can be used in various scenarios to speed up the tuning progress. Even better, it's able to handle and assess curves with similar performance. `Detailed Description <./CurvefittingAssessor.rst>`__

**Note**\ , according to the original paper, only incremental functions are supported. Therefore this assessor can only be used to maximize optimization metrics. For example, it can be used for accuracy, but not for loss.

**classArgs requirements:**


* **epoch_num** (*int,** required***\ ) - The total number of epochs. We need to know the number of epochs to determine which points we need to predict.
* **start_step** (*int, optional, default = 6*\ ) - A trial is determined to be stopped or not only after receiving start_step number of reported intermediate results.
* **threshold** (*float, optional, default = 0.95*\ ) - The threshold that we use to decide to early stop the worst performance curve. For example: if threshold = 0.95, and the best performance in the history is 0.9, then we will stop the trial who's predicted value is lower than 0.95 * 0.9 = 0.855.
* **gap** (*int, optional, default = 1*\ ) - The gap interval between Assessor judgements. For example: if gap = 2, start_step = 6, then we will assess the result when we get 6, 8, 10, 12...intermediate results.

**Usage example:**

.. code-block:: yaml
# config.yml
assessor:
builtinAssessorName: Curvefitting
classArgs:
epoch_num: 20
start_step: 6
threshold: 0.95
gap: 1
Loading

0 comments on commit 3ec26b4

Please sign in to comment.