Merge branch 'dev' into fcos_large_models

Intelligent-Systems-Laboratory · Apr 18, 2020 · 068367c · 068367c
2 parents 778c948 + 5db713e
commit 068367c
Show file tree

Hide file tree

Showing 48 changed files with 1,977 additions and 54 deletions.
diff --git a/.gitignore b/.gitignore
@@ -40,10 +40,16 @@ dist/
 # project dirs
 /datasets/coco
 /datasets/lvis
+/datasets/pic
+/datasets/ytvos
 /models
+/demo_outputs
+/example_inputs
 /debug
 /weights
+/export
 eval.sh
 
-# debug code
 demo/performance.py
+train.sh
+benchmark.sh
diff --git a/MODEL_ZOO.md b/MODEL_ZOO.md
@@ -0,0 +1,44 @@
+# AdelaiDet Model Zoo and Baselines
+
+## Introduction
+This file documents a collection of models trained with AdelaiDet in Nov, 2019.
+
+## Models
+
+The inference time is measured on one 1080Ti based on the most recent commit on Detectron2 ([ffff8ac](https://github.com/facebookresearch/detectron2/commit/ffff8acc35ea88ad1cb1806ab0f00b4c1c5dbfd9)).
+
+More models will be released soon. Stay tuned.
+
+### COCO Object Detecton Baselines with FCOS
+
+Name | box AP | download
+--- |:---:|:---:
+[FCOS_R_50_1x](configs/FCOS-Detection/R_50_1x.yaml) | 38.7 | [model](https://cloudstor.aarnet.edu.au/plus/s/glqFc13cCoEyHYy/download)
+
+### COCO Instance Segmentation Baselines with [BlendMask](https://arxiv.org/abs/2001.00309)
+
+Model | Name |inference time (ms/im) | box AP | mask AP | download
+--- |:---:|:---:|:---:|:---:|:---:
+Mask R-CNN | [550_R_50_3x](configs/RCNN/550_R_50_FPN_3x.yaml) | 63 | 39.1 | 35.3 |
+BlendMask | [550_R_50_3x](configs/BlendMask/550_R_50_3x.yaml) | 36 | 38.7 | 34.5 | [model](https://cloudstor.aarnet.edu.au/plus/s/R3Qintf7N8UCiIt/download)
+Mask R-CNN | [R_50_1x](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml) | 80 | 38.6 | 35.2 |
+BlendMask | [R_50_1x](configs/BlendMask/R_50_1x.yaml) | 73 | 39.9 | 35.8 | [model](https://cloudstor.aarnet.edu.au/plus/s/zoxXPnr6Hw3OJgK/download)
+Mask R-CNN | [R_50_3x](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml) | 80 | 41.0 | 37.2 | 
+BlendMask | [R_50_3x](configs/BlendMask/R_50_3x.yaml) | 74 | 42.7 | 37.8 | [model](https://cloudstor.aarnet.edu.au/plus/s/ZnaInHFEKst6mvg/download)
+Mask R-CNN | [R_101_3x](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml) | 100 | 42.9 | 38.6 |
+BlendMask | [R_101_3x](configs/BlendMask/R_101_3x.yaml) | 94 | 44.8 | 39.5 | [model](https://cloudstor.aarnet.edu.au/plus/s/e4fXrliAcMtyEBy/download)
+BlendMask | [R_101_dcni3_5x](configs/BlendMask/R_101_dcni3_5x.yaml) | 105 | 46.8 | 41.1 | [model](https://cloudstor.aarnet.edu.au/plus/s/vbnKnQtaGlw8TKv/download)
+
+### COCO Panoptic Segmentation Baselines with BlendMask
+Model | Name | PQ | PQ<sup>Th</sup> | PQ<sup>St</sup> | download
+--- |:---:|:---:|:---:|:---:|:---:
+Panoptic FPN | [R_50_3x](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-PanopticSegmentation/panoptic_fpn_R_50_3x.yaml) | 41.5 | 48.3 | 31.2 | 
+BlendMask | [R_50_3x](configs/BlendMask/Panoptic/R_50_3x.yaml) | 42.5 | 49.5 | 32.0 | [model](https://cloudstor.aarnet.edu.au/plus/s/oDgi0826JOJXCr5/download)
+Panoptic FPN | [R_101_3x](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-InstanceSegmentation/panoptic_fpn_R_101_3x.yaml) | 43.0 | 49.7 | 32.9 |
+BlendMask | [R_101_3x](configs/BlendMask/Panoptic/R_101_3x.yaml) | 44.3 | 51.6 | 33.2 | [model](https://cloudstor.aarnet.edu.au/plus/s/u6gZwj06MWDEkYe/download)
+BlendMask | [R_101_dcni3_5x](configs/BlendMask/Panoptic/R_101_dcni3_5x.yaml) | 46.0 | 52.9 | 35.5 | [model](https://cloudstor.aarnet.edu.au/plus/s/Jwp41WEzDdrhWsN/download)
+
+### Person in Context with BlendMask
+Model | Name | box AP | mask AP | download
+--- |:---:|:---:|:---:|:---:
+BlendMask | [R_50_1x](configs/BlendMask/Person/R_50_1x.yaml) | 70.6 | 66.7 | [model](https://cloudstor.aarnet.edu.au/plus/s/nvpcKTFA5fsagc0/download)
diff --git a/README.md b/README.md
@@ -16,7 +16,7 @@ To date, AdelaiDet implements the following algorithms:
 
 ## Models
 
-More models will be released soon. Stay tuned.
+All of our trained models are available in the [Model Zoo](MODEL_ZOO.md).
 
 ### COCO Object Detecton Baselines with [FCOS](https://arxiv.org/abs/1904.01355)
 
@@ -41,6 +41,21 @@ Name | inf. time | box AP | download
 
 *Inference time is measured on a NVIDIA 1080Ti with batch size 1.*
 
+### COCO Instance Segmentation Baselines with [BlendMask](https://arxiv.org/abs/2001.00309)
+
+Model | Name |inf. time | box AP | mask AP | download
+--- |:---:|:---:|:---:|:---:|:---:
+Mask R-CNN | [550_R_50_3x](configs/RCNN/550_R_50_FPN_3x.yaml) | 16FPS | 39.1 | 35.3 |
+BlendMask | [550_R_50_3x](configs/BlendMask/550_R_50_3x.yaml) | 28FPS | 38.7 | 34.5 | [model](https://cloudstor.aarnet.edu.au/plus/s/R3Qintf7N8UCiIt/download)
+BlendMask | [DLA_34_4x](configs/BlendMask/DLA_34_syncbn_4x.yaml) | 32FPS | 40.9 | 35.2 | [model](https://cloudstor.aarnet.edu.au/plus/s/Lx94rWNnZ8TRd2Y/download)
+Mask R-CNN | [R_50_1x](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml) | 13FPS | 38.6 | 35.2 |
+BlendMask | [R_50_1x](configs/BlendMask/R_50_1x.yaml) | 14FPS | 39.9 | 35.8 | [model](https://cloudstor.aarnet.edu.au/plus/s/zoxXPnr6Hw3OJgK/download)
+Mask R-CNN | [R_50_3x](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml) | 13FPS | 41.0 | 37.2 | 
+BlendMask | [R_50_3x](configs/BlendMask/R_50_3x.yaml) | 14FPS | 42.7 | 37.8 | [model](https://cloudstor.aarnet.edu.au/plus/s/ZnaInHFEKst6mvg/download)
+Mask R-CNN | [R_101_3x](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml) | 10FPS | 42.9 | 38.6 |
+BlendMask | [R_101_3x](configs/BlendMask/R_101_3x.yaml) | 11FPS | 44.8 | 39.5 | [model](https://cloudstor.aarnet.edu.au/plus/s/e4fXrliAcMtyEBy/download)
+BlendMask | [R_101_dcni3_5x](configs/BlendMask/R_101_dcni3_5x.yaml) | 10FPS | 46.8 | 41.1 | [model](https://cloudstor.aarnet.edu.au/plus/s/vbnKnQtaGlw8TKv/download)
+
 ## Installation
 
 First install Detectron2 following the official guide: [INSTALL.md](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md). Then build AdelaiDet with:

diff --git a/adet/config/defaults.py b/adet/config/defaults.py
@@ -6,6 +6,8 @@
 # Additional Configs
 # ---------------------------------------------------------------------------- #
 _C.MODEL.MOBILENET = False
+_C.MODEL.BACKBONE.ANTI_ALIAS = False
+_C.MODEL.RESNETS.DEFORM_INTERVAL = 1
 
 # ---------------------------------------------------------------------------- #
 # FCOS Head
@@ -46,12 +48,11 @@
 _C.MODEL.FCOS.CENTER_SAMPLE = True
 _C.MODEL.FCOS.POS_RADIUS = 1.5
 _C.MODEL.FCOS.LOC_LOSS_TYPE = 'giou'
-
+_C.MODEL.FCOS.YIELD_PROPOSAL = False
 
 # ---------------------------------------------------------------------------- #
 # VoVNet backbone
 # ---------------------------------------------------------------------------- #
-
 _C.MODEL.VOVNET = CN()
 _C.MODEL.VOVNET.CONV_BODY = "V-39-eSE"
 _C.MODEL.VOVNET.OUT_FEATURES = ["stage2", "stage3", "stage4", "stage5"]
@@ -61,7 +62,6 @@
 _C.MODEL.VOVNET.OUT_CHANNELS = 256
 _C.MODEL.VOVNET.BACKBONE_OUT_CHANNELS = 256
 
-
 # ---------------------------------------------------------------------------- #
 # DLA backbone
 # ---------------------------------------------------------------------------- #
@@ -72,3 +72,32 @@
 
 # Options: FrozenBN, GN, "SyncBN", "BN"
 _C.MODEL.DLA.NORM = "FrozenBN"
+
+# ---------------------------------------------------------------------------- #
+# BlendMask Options
+# ---------------------------------------------------------------------------- #
+_C.MODEL.BLENDMASK = CN()
+_C.MODEL.BLENDMASK.ATTN_SIZE = 14
+_C.MODEL.BLENDMASK.TOP_INTERP = "bilinear"
+_C.MODEL.BLENDMASK.BOTTOM_RESOLUTION = 56
+_C.MODEL.BLENDMASK.POOLER_TYPE = "ROIAlignV2"
+_C.MODEL.BLENDMASK.POOLER_SAMPLING_RATIO = 1
+_C.MODEL.BLENDMASK.POOLER_SCALES = (0.25,)
+_C.MODEL.BLENDMASK.INSTANCE_LOSS_WEIGHT = 1.0
+_C.MODEL.BLENDMASK.VISUALIZE = False
+
+# ---------------------------------------------------------------------------- #
+# Basis Module Options
+# ---------------------------------------------------------------------------- #
+_C.MODEL.BASIS_MODULE = CN()
+_C.MODEL.BASIS_MODULE.NAME = "ProtoNet"
+_C.MODEL.BASIS_MODULE.NUM_BASES = 4
+_C.MODEL.BASIS_MODULE.LOSS_ON = False
+_C.MODEL.BASIS_MODULE.ANN_SET = "coco"
+_C.MODEL.BASIS_MODULE.CONVS_DIM = 128
+_C.MODEL.BASIS_MODULE.IN_FEATURES = ["p3", "p4", "p5"]
+_C.MODEL.BASIS_MODULE.NORM = "SyncBN"
+_C.MODEL.BASIS_MODULE.NUM_CONVS = 3
+_C.MODEL.BASIS_MODULE.COMMON_STRIDE = 8
+_C.MODEL.BASIS_MODULE.NUM_CLASSES = 80
+_C.MODEL.BASIS_MODULE.LOSS_WEIGHT = 0.3
diff --git a/adet/data/__init__.py b/adet/data/__init__.py
@@ -1,5 +1,5 @@
 from . import builtin  # ensure the builtin datasets are registered
-# from .dataset_mapper import DatasetMapperWithBasis
+from .dataset_mapper import DatasetMapperWithBasis
 
 
-# __all__ = ["DatasetMapperWithBasis"]
+__all__ = ["DatasetMapperWithBasis"]
diff --git a/adet/data/builtin.py b/adet/data/builtin.py
@@ -1,8 +1,9 @@
 import os
 
 from detectron2.data.datasets.register_coco import register_coco_instances
+from detectron2.data.datasets.builtin_meta import _get_builtin_metadata
 
-# register person in context dataset
+# register plane reconstruction
 
 _PREDEFINED_SPLITS_PIC = {
     "pic_person_train": ("pic/image/train", "pic/annotations/train_person.json"),
@@ -24,4 +25,5 @@ def register_all_coco(root="datasets"):
             os.path.join(root, image_root),
         )
 
-register_all_coco()
+
+register_all_coco()
diff --git a/adet/data/dataset_mapper.py b/adet/data/dataset_mapper.py
@@ -0,0 +1,141 @@
+import copy
+import numpy as np
+import torch
+from fvcore.common.file_io import PathManager
+from PIL import Image
+
+from detectron2.data.dataset_mapper import DatasetMapper
+from detectron2.data.detection_utils import SizeMismatchError
+from detectron2.data import detection_utils as utils
+from detectron2.data import transforms as T
+
+"""
+This file contains the default mapping that's applied to "dataset dicts".
+"""
+
+__all__ = ["DatasetMapperWithBasis"]
+
+
+class DatasetMapperWithBasis(DatasetMapper):
+    """
+    This caller enables the default Detectron2 mapper to read an additional basis semantic label
+    """
+
+    def __init__(self, cfg, is_train=True):
+        super().__init__(cfg, is_train)
+
+        # fmt: off
+        self.basis_loss_on  = cfg.MODEL.BASIS_MODULE.LOSS_ON
+        self.ann_set        = cfg.MODEL.BASIS_MODULE.ANN_SET
+        # fmt: on
+
+    def __call__(self, dataset_dict):
+        """
+        Args:
+            dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.
+
+        Returns:
+            dict: a format that builtin models in detectron2 accept
+        """
+        dataset_dict = copy.deepcopy(dataset_dict)  # it will be modified by code below
+        # USER: Write your own image loading if it's not from a file
+        try:
+            image = utils.read_image(dataset_dict["file_name"], format=self.img_format)
+        except Exception as e:
+            print(dataset_dict["file_name"])
+            print(e)
+            raise e
+        try:
+            utils.check_image_size(dataset_dict, image)
+        except SizeMismatchError as e:
+            expected_wh = (dataset_dict["width"], dataset_dict["height"])
+            image_wh = (image.shape[1], image.shape[0])
+            if (image_wh[1], image_wh[0]) == expected_wh:
+                print("transposing image {}".format(dataset_dict["file_name"]))
+                image = image.transpose(1, 0, 2)
+            else:
+                raise e
+
+        if "annotations" not in dataset_dict or len(dataset_dict["annotations"]) == 0:
+            image, transforms = T.apply_transform_gens(
+                ([self.crop_gen] if self.crop_gen else []) + self.tfm_gens, image
+            )
+        else:
+            # Crop around an instance if there are instances in the image.
+            # USER: Remove if you don't use cropping
+            if self.crop_gen:
+                crop_tfm = utils.gen_crop_transform_with_instance(
+                    self.crop_gen.get_crop_size(image.shape[:2]),
+                    image.shape[:2],
+                    np.random.choice(dataset_dict["annotations"]),
+                )
+                image = crop_tfm.apply_image(image)
+            image, transforms = T.apply_transform_gens(self.tfm_gens, image)
+            if self.crop_gen:
+                transforms = crop_tfm + transforms
+
+        image_shape = image.shape[:2]  # h, w
+
+        # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
+        # but not efficient on large generic data structures due to the use of pickle & mp.Queue.
+        # Therefore it's important to use torch.Tensor.
+        dataset_dict["image"] = torch.as_tensor(image.transpose(2, 0, 1).astype("float32"))
+        # Can use uint8 if it turns out to be slow some day
+
+        # USER: Remove if you don't use pre-computed proposals.
+        if self.load_proposals:
+            utils.transform_proposals(
+                dataset_dict, image_shape, transforms, self.min_box_side_len, self.proposal_topk
+            )
+
+        if not self.is_train:
+            dataset_dict.pop("annotations", None)
+            dataset_dict.pop("sem_seg_file_name", None)
+            dataset_dict.pop("pano_seg_file_name", None)
+            return dataset_dict
+
+        if "annotations" in dataset_dict:
+            # USER: Modify this if you want to keep them for some reason.
+            for anno in dataset_dict["annotations"]:
+                if not self.mask_on:
+                    anno.pop("segmentation", None)
+                if not self.keypoint_on:
+                    anno.pop("keypoints", None)
+
+            # USER: Implement additional transformations if you have other types of data
+            annos = [
+                utils.transform_instance_annotations(
+                    obj, transforms, image_shape, keypoint_hflip_indices=self.keypoint_hflip_indices
+                )
+                for obj in dataset_dict.pop("annotations")
+                if obj.get("iscrowd", 0) == 0
+            ]
+            instances = utils.annotations_to_instances(
+                annos, image_shape, mask_format=self.mask_format
+            )
+            # Create a tight bounding box from masks, useful when image is cropped
+            if self.crop_gen and instances.has("gt_masks"):
+                instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
+            dataset_dict["instances"] = utils.filter_empty_instances(instances)
+
+        # USER: Remove if you don't do semantic/panoptic segmentation.
+        if "sem_seg_file_name" in dataset_dict:
+            with PathManager.open(dataset_dict.pop("sem_seg_file_name"), "rb") as f:
+                sem_seg_gt = Image.open(f)
+                sem_seg_gt = np.asarray(sem_seg_gt, dtype="uint8")
+            sem_seg_gt = transforms.apply_segmentation(sem_seg_gt)
+            sem_seg_gt = torch.as_tensor(sem_seg_gt.astype("long"))
+            dataset_dict["sem_seg"] = sem_seg_gt
+
+        if self.basis_loss_on and self.is_train:
+            # load basis supervisions
+            if self.ann_set == "coco":
+                basis_sem_path = dataset_dict["file_name"].replace('train2017', 'thing_train2017').replace('image/train', 'thing_train')
+            else:
+                basis_sem_path = dataset_dict["file_name"].replace('coco', 'lvis').replace('train2017', 'thing_train').replace('jpg', 'npz')
+            basis_sem_path = basis_sem_path.replace('jpg', 'npz')
+            basis_sem_gt = np.load(basis_sem_path)["mask"]
+            basis_sem_gt = transforms.apply_segmentation(basis_sem_gt)
+            basis_sem_gt = torch.as_tensor(basis_sem_gt.astype("long"))
+            dataset_dict["basis_sem"] = basis_sem_gt
+        return dataset_dict
diff --git a/adet/layers/conv_with_kaiming_uniform.py b/adet/layers/conv_with_kaiming_uniform.py
@@ -37,7 +37,7 @@ def make_conv(
             if norm is None:
                 nn.init.constant_(conv.bias, 0)
         module = [conv,]
-        if norm is not None:
+        if norm is not None and len(norm) > 0:
             if norm == "GN":
                 norm_module = nn.GroupNorm(32, out_channels)
             else:

diff --git a/adet/modeling/__init__.py b/adet/modeling/__init__.py
@@ -1,4 +1,5 @@
 from .fcos import FCOS
+from .blendmask import BlendMask
 from .backbone import build_fcos_resnet_fpn_backbone
 from .one_stage_detector import OneStageDetector
 

diff --git a/adet/modeling/backbone/__init__.py b/adet/modeling/backbone/__init__.py
@@ -1,3 +1,4 @@
 from .fpn import build_fcos_resnet_fpn_backbone
 from .vovnet import build_vovnet_fpn_backbone, build_vovnet_backbone
 from .dla import build_fcos_dla_fpn_backbone
+from .resnet_lpf import build_resnet_lpf_backbone
diff --git a/adet/modeling/backbone/fpn.py b/adet/modeling/backbone/fpn.py
@@ -6,6 +6,8 @@
 from detectron2.layers import ShapeSpec
 from detectron2.modeling.backbone.build import BACKBONE_REGISTRY
 
+from .resnet_lpf import build_resnet_lpf_backbone
+from .resnet_interval import build_resnet_interval_backbone
 from .mobilenet import build_mnv2_backbone
 
 
@@ -57,7 +59,11 @@ def build_fcos_resnet_fpn_backbone(cfg, input_shape: ShapeSpec):
     Returns:
         backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.
     """
-    if cfg.MODEL.MOBILENET:
+    if cfg.MODEL.BACKBONE.ANTI_ALIAS:
+        bottom_up = build_resnet_lpf_backbone(cfg, input_shape)
+    elif cfg.MODEL.RESNETS.DEFORM_INTERVAL > 1:
+        bottom_up = build_resnet_interval_backbone(cfg, input_shape)
+    elif cfg.MODEL.MOBILENET:
         bottom_up = build_mnv2_backbone(cfg, input_shape)
     else:
         bottom_up = build_resnet_backbone(cfg, input_shape)