Skip to content

Commit

Permalink
Merge pull request #297 from microsoft/master
Browse files Browse the repository at this point in the history
merge master
  • Loading branch information
SparkSnail authored May 24, 2021
2 parents fdb2d77 + 35c3d16 commit 5190f5a
Show file tree
Hide file tree
Showing 30 changed files with 3,270 additions and 32 deletions.
151 changes: 151 additions & 0 deletions docs/en_US/NAS/FBNet.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
FBNet
======

For the mobile application of facial landmark, based on the basic architecture of PFLD model, we have applied the FBNet (Block-wise DNAS) to design an concise model with the trade-off between latency and accuracy. References are listed as below:


* `FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search <https://arxiv.org/abs/1812.03443>`__
* `PFLD: A Practical Facial Landmark Detector <https://arxiv.org/abs/1902.10859>`__

FBNet is a block-wise differentiable NAS method (Block-wise DNAS), where the best candidate building blocks can be chosen by using Gumbel Softmax random sampling and differentiable training. At each layer (or stage) to be searched, the diverse candidate blocks are side by side planned (just like the effectiveness of structural re-parameterization), leading to sufficient pre-training of the supernet. The pre-trained supernet is further sampled for finetuning of the subnet, to achieve better performance.

.. image:: ../../img/fbnet.png
:target: ../../img/fbnet.png
:alt:


PFLD is a lightweight facial landmark model for realtime application. The architecture of PLFD is firstly simplified for acceleration, by using the stem block of PeleeNet, average pooling with depthwise convolution and eSE module.

To achieve better trade-off between latency and accuracy, the FBNet is further applied on the simplified PFLD for searching the best block at each specific layer. The search space is based on the FBNet space, and optimized for mobile deployment by using the average pooling with depthwise convolution and eSE module etc.


Experiments
------------

To verify the effectiveness of FBNet applied on PFLD, we choose the open source dataset with 106 landmark points as the benchmark:

* `Grand Challenge of 106-Point Facial Landmark Localization <https://arxiv.org/abs/1905.03469>`__

The baseline model is denoted as MobileNet-V3 PFLD (`Reference baseline <https://github.com/Hsintao/pfld_106_face_landmarks>`__), and the searched model is denoted as Subnet. The experimental results are listed as below, where the latency is tested on Qualcomm 625 CPU (ARMv8):


.. list-table::
:header-rows: 1
:widths: auto

* - Model
- Size
- Latency
- Validation NME
* - MobileNet-V3 PFLD
- 1.01MB
- 10ms
- 6.22%
* - Subnet
- 693KB
- 1.60ms
- 5.58%


Example
--------

`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/oneshot/pfld>`__

Please run the following scripts at the example directory.

The Python dependencies used here are listed as below:

.. code-block:: bash
numpy==1.18.5
opencv-python==4.5.1.48
torch==1.6.0
torchvision==0.7.0
onnx==1.8.1
onnx-simplifier==0.3.5
onnxruntime==1.7.0
Data Preparation
-----------------

Firstly, you should download the dataset `106points dataset <https://drive.google.com/file/d/1I7QdnLxAlyG2Tq3L66QYzGhiBEoVfzKo/view?usp=sharing>`__ to the path ``./data/106points`` . The dataset includes the train-set and test-set:

.. code-block:: bash
./data/106points/train_data/imgs
./data/106points/train_data/list.txt
./data/106points/test_data/imgs
./data/106points/test_data/list.txt
Quik Start
-----------

1. Search
^^^^^^^^^^

Based on the architecture of simplified PFLD, the setting of multi-stage search space and hyper-parameters for searching should be firstly configured to construct the supernet, as an example:

.. code-block:: bash
from lib.builder import search_space
from lib.ops import PRIMITIVES
from lib.supernet import PFLDInference, AuxiliaryNet
from nni.algorithms.nas.pytorch.fbnet import LookUpTable, NASConfig,
# configuration of hyper-parameters
# search_space defines the multi-stage search space
nas_config = NASConfig(
model_dir="./ckpt_save",
nas_lr=0.01,
mode="mul",
alpha=0.25,
beta=0.6,
search_space=search_space,
)
# lookup table to manage the information
lookup_table = LookUpTable(config=nas_config, primitives=PRIMITIVES)
# created supernet
pfld_backbone = PFLDInference(lookup_table)
After creation of the supernet with the specification of search space and hyper-parameters, we can run below command to start searching and training of the supernet:

.. code-block:: bash
python train.py --dev_id "0,1" --snapshot "./ckpt_save" --data_root "./data/106points"
The validation accuracy will be shown during training, and the model with best accuracy will be saved as ``./ckpt_save/supernet/checkpoint_best.pth``.


2. Finetune
^^^^^^^^^^^^

After pre-training of the supernet, we can run below command to sample the subnet and conduct the finetuning:

.. code-block:: bash
python retrain.py --dev_id "0,1" --snapshot "./ckpt_save" --data_root "./data/106points" \
--supernet "./ckpt_save/supernet/checkpoint_best.pth"
The validation accuracy will be shown during training, and the model with best accuracy will be saved as ``./ckpt_save/subnet/checkpoint_best.pth``.


3. Export
^^^^^^^^^^

After the finetuning of subnet, we can run below command to export the ONNX model:

.. code-block:: bash
python export.py --supernet "./ckpt_save/supernet/checkpoint_best.pth" \
--resume "./ckpt_save/subnet/checkpoint_best.pth"
ONNX model is saved as ``./output/subnet.onnx``, which can be further converted to the mobile inference engine by using `MNN <https://github.com/alibaba/MNN>`__ .

The checkpoints of pre-trained supernet and subnet are offered as below:

* `Supernet <https://drive.google.com/file/d/1TCuWKq8u4_BQ84BWbHSCZ45N3JGB9kFJ/view?usp=sharing>`__
* `Subnet <https://drive.google.com/file/d/160rkuwB7y7qlBZNM3W_T53cb6MQIYHIE/view?usp=sharing>`__
* `ONNX model <https://drive.google.com/file/d/1s-v-aOiMv0cqBspPVF3vSGujTbn_T_Uo/view?usp=sharing>`__
2 changes: 2 additions & 0 deletions docs/en_US/NAS/Overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ NNI currently supports the one-shot NAS algorithms listed below and is adding mo
- `Cyclic Differentiable Architecture Search <https://arxiv.org/pdf/2006.10724.pdf>`__ builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.
* - `ProxylessNAS <Proxylessnas.rst>`__
- `ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware <https://arxiv.org/abs/1812.00332>`__. It removes proxy, directly learns the architectures for large-scale target tasks and target hardware platforms.
* - `FBNet <FBNet.rst>`__
- `FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search <https://arxiv.org/abs/1812.03443>`__. It is a block-wise differentiable neural network architecture search method with the hardware-aware constraint.
* - `TextNAS <TextNAS.rst>`__
- `TextNAS: A Neural Architecture Search Space tailored for Text Representation <https://arxiv.org/pdf/1912.10729.pdf>`__. It is a neural architecture search algorithm tailored for text representation.
* - `Cream <Cream.rst>`__
Expand Down
1 change: 1 addition & 0 deletions docs/en_US/NAS/one_shot_nas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@ One-shot NAS algorithms leverage weight sharing among models in neural architect
SPOS <SPOS>
CDARTS <CDARTS>
ProxylessNAS <Proxylessnas>
FBNet <FBNet>
TextNAS <TextNAS>
Cream <Cream>
Binary file added docs/img/fbnet.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
62 changes: 62 additions & 0 deletions examples/nas/oneshot/pfld/datasets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

from __future__ import absolute_import, division, print_function

import cv2
import os

import numpy as np

from torch.utils import data


class PFLDDatasets(data.Dataset):
""" Dataset to manage the data loading, augmentation and generation. """

def __init__(self, file_list, transforms=None, data_root="", img_size=112):
"""
Parameters
----------
file_list : list
a list of file path and annotations
transforms : function
function for data augmentation
data_root : str
the root path of dataset
img_size : int
the size of image height or width
"""
self.line = None
self.path = None
self.img_size = img_size
self.land = None
self.angle = None
self.data_root = data_root
self.transforms = transforms
with open(file_list, "r") as f:
self.lines = f.readlines()

def __getitem__(self, index):
""" Get the data sample and labels with the index. """
self.line = self.lines[index].strip().split()
# load image
if self.data_root:
self.img = cv2.imread(os.path.join(self.data_root, self.line[0]))
else:
self.img = cv2.imread(self.line[0])
# resize
self.img = cv2.resize(self.img, (self.img_size, self.img_size))
# obtain gt labels
self.land = np.asarray(self.line[1: (106 * 2 + 1)], dtype=np.float32)
self.angle = np.asarray(self.line[(106 * 2 + 1):], dtype=np.float32)

# augmentation
if self.transforms:
self.img = self.transforms(self.img)

return self.img, self.land, self.angle

def __len__(self):
""" Get the size of dataset. """
return len(self.lines)
70 changes: 70 additions & 0 deletions examples/nas/oneshot/pfld/export.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

from __future__ import absolute_import, division, print_function

import argparse
import onnx
import onnxsim
import os
import torch

from lib.builder import search_space
from lib.ops import PRIMITIVES
from nni.algorithms.nas.pytorch.fbnet import (
LookUpTable,
NASConfig,
model_init,
)


parser = argparse.ArgumentParser(description="Export the ONNX model")
parser.add_argument("--net", default="subnet", type=str)
parser.add_argument("--supernet", default="", type=str, metavar="PATH")
parser.add_argument("--resume", default="", type=str, metavar="PATH")
parser.add_argument("--num_points", default=106, type=int)
parser.add_argument("--img_size", default=112, type=int)
parser.add_argument("--onnx", default="./output/pfld.onnx", type=str)
parser.add_argument("--onnx_sim", default="./output/subnet.onnx", type=str)
args = parser.parse_args()

os.makedirs("./output", exist_ok=True)

if args.net == "subnet":
from lib.subnet import PFLDInference
else:
raise ValueError("Network is not implemented")

check = torch.load(args.supernet, map_location=torch.device("cpu"))
sampled_arch = check["arch_sample"]

nas_config = NASConfig(search_space=search_space)
lookup_table = LookUpTable(config=nas_config, primitives=PRIMITIVES)
pfld_backbone = PFLDInference(lookup_table, sampled_arch, args.num_points)

pfld_backbone.eval()
check_sub = torch.load(args.resume, map_location=torch.device("cpu"))
param_dict = check_sub["pfld_backbone"]
model_init(pfld_backbone, param_dict)

print("Convert PyTorch model to ONNX.")
dummy_input = torch.randn(1, 3, args.img_size, args.img_size)
input_names = ["input"]
output_names = ["output"]
torch.onnx.export(
pfld_backbone,
dummy_input,
args.onnx,
verbose=True,
input_names=input_names,
output_names=output_names,
)

print("Check ONNX model.")
model = onnx.load(args.onnx)

print("Simplifying the ONNX model.")
model_opt, check = onnxsim.simplify(args.onnx)
assert check, "Simplified ONNX model could not be validated"
onnx.save(model_opt, args.onnx_sim)
print("Onnx model simplify Ok!")
Empty file.
55 changes: 55 additions & 0 deletions examples/nas/oneshot/pfld/lib/builder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

from __future__ import absolute_import, division, print_function


search_space = {
# multi-stage definition for candidate layers
# here two stages are defined for PFLD searching
"stages": {
"stage_0": {
"ops": [
"mb_k3_res",
"mb_k3_e2_res",
"mb_k3_res_d3",
"mb_k5_res",
"mb_k5_e2_res",
"sep_k3",
"sep_k5",
"gh_k3",
"gh_k5",
],
"layer_num": 2,
},
"stage_1": {
"ops": [
"mb_k3_e2_res",
"mb_k3_e4_res",
"mb_k3_e2_res_se",
"mb_k3_res_d3",
"mb_k5_res",
"mb_k5_e2_res",
"mb_k5_res_se",
"mb_k5_e2_res_se",
"gh_k5",
],
"layer_num": 3,
},
},
# necessary information of layers for NAS
# the basic information is as (input_channels, height, width)
"input_shape": [
(32, 14, 14),
(32, 14, 14),
(32, 14, 14),
(64, 7, 7),
(64, 7, 7),
],
# output channels for each layer
"channel_size": [32, 32, 64, 64, 64],
# stride for each layer
"strides": [1, 1, 2, 1, 1],
# height of feature map for each layer
"fm_size": [14, 14, 7, 7, 7],
}
Loading

0 comments on commit 5190f5a

Please sign in to comment.