Panoptic Feature Pyramid Networks
Authors: Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár
Affiliation: Facebook AI Research (FAIR)
Task: Panoptic Segmentation
Links: arXiv
Year: 2019
TL;DR: A single network which is based on ResNet-FPN is used for panoptic segmentation. The model uses two branches ( Mask R-CNN + lightweighted semantic head) and can serve as a baseline.
Task
The authors propose a simple baseline for panoptic segmentation, which combines semantic and instance segmentation. The single-network is based on popular architectures for the individual tasks.
Model
Backbone: ResNet-FPN
The authors propose a single-network with two branches. The backbone encodes features at different resolution levels and a decoder (FPN) is applied on top. The first branch is Mask R-CNN, which in its original variant uses ResNet-FPN as backbone as well. The second branch is a light semantic segmentation branch described in the following.
Semantic Segmentation
This branch merges the features of the feature pyramid by upsampling and summation. Staring with the smallest layer, 3 \times 3
convolution operations followed by group norm, ReLU and bilinear upsampling is applied until a resolution of 1/4
is reached. This procedure is followed for each pyramid layer and all the results are element-wise summed up. This is followed by a 1 \times 1
convolution, upsampling to original resolution and softmax classification. This branch predicts all stuff classes and one 'other' class.
Training
Pre-trained: ImageNet
Losses: Mask R-CNN (classification, bounding box, mask), segmentation (cross-entropy)
All losses are summed up with different weights per branch. Hence, hyper-parameter tuning is needed.
Fusion
The paper uses merging heuristics:
- Resolve overlaps between different instances based on confidence score
- Resolve disagreement between instance and semantic branch by using instance output
- Remove regions labeled 'other' or that have a small area
Results
Datasets: COCO, Cityscapes
Augmentation: scale-jitter (COCO), cropping + random scales + color augmentation (Cityscapes)
Metrics
Instance Segmentation: AP
Semantic Segmentation: mIoU, fIoU, iIoU
Panoptic Segmentation: PQ
Backbone performance
Semantic Segmentation
Multi-Task Training
Panoptic Segmentation
Discussion
Positive Aspects
- Many ablation studies justyfing design choices
- simple design
- strong baseline
Negative Aspects
- merging heuristics
- so much hyper-parameter tuning
- only small "novel" contribution