Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Request for Updating Cream NAS algorithm #3228

Merged
merged 8 commits into from
Jan 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/en_US/NAS/Cream.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
=======================================================================================

**`[Paper] <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__ `[Models-Google Drive] <https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing>`__\ `[Models-Baidu Disk (PWD: wqw6)] <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__ `[BibTex] <https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja>`__** :raw-html:`<br/>`
`[Paper] <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__ `[Models-Google Drive] <https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing>`__ `[Models-Baidu Disk (PWD: wqw6)] <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__ `[BibTex] <https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja>`__ :raw-html:`<br/>`

In this work, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. The discovered architectures achieve superior performance compared to the recent `MobileNetV3 <https://arxiv.org/abs/1905.02244>`__ and `EfficientNet <https://arxiv.org/abs/1905.11946>`__ families under aligned settings.

Expand Down
51 changes: 51 additions & 0 deletions examples/nas/cream/configs/retrain/114.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '112m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0

DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False

NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470

EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999


LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5

AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode

51 changes: 51 additions & 0 deletions examples/nas/cream/configs/retrain/14.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '14m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0

DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False

NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470

EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999


LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5

AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode

51 changes: 51 additions & 0 deletions examples/nas/cream/configs/retrain/23.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '23m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0

DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False

NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470

EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999


LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5

AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode

51 changes: 51 additions & 0 deletions examples/nas/cream/configs/retrain/287.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '287m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0

DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False

NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 470

EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999


LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5

AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode

51 changes: 51 additions & 0 deletions examples/nas/cream/configs/retrain/43.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '43m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0

DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False

NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 43

EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999


LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5

AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode

51 changes: 51 additions & 0 deletions examples/nas/cream/configs/retrain/481.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '481m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
EVAL_METRICS: 'prec1'
TTA: 0
LOCAL_RANK: 0

DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False

NET:
GP: 'avg'
DROPOUT_RATE: 0.2
SELECTION: 481

EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9999


LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
WEIGHT_DECAY: 1e-5

AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode

45 changes: 22 additions & 23 deletions examples/nas/cream/configs/retrain.yaml → examples/nas/cream/configs/retrain/604.yaml
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@ AUTO_RESUME: False
DATA_DIR: './data/imagenet'
MODEL: '604m_retrain'
RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
SAVE_PATH: './'
SAVE_PATH: './experiments/workspace/retrain'
SEED: 42
LOG_INTERVAL: 50
RECOVERY_INTERVAL: 0
WORKERS: 4
NUM_GPU: 2
WORKERS: 8
NUM_GPU: 8
SAVE_IMAGES: False
AMP: False
OUTPUT: 'None'
Expand All @@ -19,34 +19,33 @@ DATASET:
NUM_CLASSES: 1000
IMAGE_SIZE: 224 # image patch size
INTERPOLATION: 'random' # Image resize interpolation type
BATCH_SIZE: 32 # batch size
BATCH_SIZE: 128 # batch size
NO_PREFECHTER: False

NET:
GP: 'avg'
DROPOUT_RATE: 0.0
SELECTION: 42
DROPOUT_RATE: 0.2
SELECTION: 604

EMA:
USE: True
FORCE_CPU: False # force model ema to be tracked on CPU
DECAY: 0.9998
DECAY: 0.9999

OPT: 'sgd'
OPT_EPS: 1e-2
MOMENTUM: 0.9
DECAY_RATE: 0.1

SCHED: 'sgd'
LR_NOISE: None
LR_NOISE_PCT: 0.67
LR_NOISE_STD: 1.0
WARMUP_LR: 1e-4
MIN_LR: 1e-5
EPOCHS: 200
START_EPOCH: None
DECAY_EPOCHS: 30.0
LR: 0.064
EPOCHS: 500
OPT_EPS: 1e-3
SCHED: 'cosine'
OPT: 'rmsproptf'
WARMUP_LR: 1e-6
DECAY_EPOCHS: 2.4
DECAY_RATE: 0.973
WARMUP_EPOCHS: 3
COOLDOWN_EPOCHS: 10
PATIENCE_EPOCHS: 10
LR: 1e-2
WEIGHT_DECAY: 1e-5

AUGMENTATION:
AA: 'rand-m9-mstd0.5'
RE_PROB: 0.2 # random erase prob
RE_MODE: 'pixel' # random erase mode

Loading