Panoptic Feature Pyramid Networks

Authors: Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár

Affiliation: Facebook AI Research (FAIR)

Task: Panoptic Segmentation

Links: arXiv

Year: 2019

TL;DR: A single network which is based on ResNet-FPN is used for panoptic segmentation. The model uses two branches ( Mask R-CNN + lightweighted semantic head) and can serve as a baseline.

Task

The authors propose a simple baseline for panoptic segmentation, which combines semantic and instance segmentation. The single-network is based on popular architectures for the individual tasks.

Model

Backbone: ResNet-FPN

The authors propose a single-network with two branches. The backbone encodes features at different resolution levels and a decoder (FPN) is applied on top. The first branch is Mask R-CNN, which in its original variant uses ResNet-FPN as backbone as well. The second branch is a light semantic segmentation branch described in the following.

Semantic Segmentation

This branch merges the features of the feature pyramid by upsampling and summation. Staring with the smallest layer, 3 \times 3 convolution operations followed by group norm, ReLU and bilinear upsampling is applied until a resolution of 1/4 is reached. This procedure is followed for each pyramid layer and all the results are element-wise summed up. This is followed by a 1 \times 1 convolution, upsampling to original resolution and softmax classification. This branch predicts all stuff classes and one 'other' class.

Training

Pre-trained: ImageNet

Losses: Mask R-CNN (classification, bounding box, mask), segmentation (cross-entropy)

All losses are summed up with different weights per branch. Hence, hyper-parameter tuning is needed.

Fusion

The paper uses merging heuristics:

Resolve overlaps between different instances based on confidence score
Resolve disagreement between instance and semantic branch by using instance output
Remove regions labeled 'other' or that have a small area