Skip to content

Panoptic Feature Pyramid Networks

Authors: Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár

Affiliation: Facebook AI Research (FAIR)

Task: Panoptic Segmentation

Links: arXiv

Year: 2019

TL;DR: A single network which is based on ResNet-FPN is used for panoptic segmentation. The model uses two branches ( Mask R-CNN + lightweighted semantic head) and can serve as a baseline.

Task

The authors propose a simple baseline for panoptic segmentation, which combines semantic and instance segmentation. The single-network is based on popular architectures for the individual tasks.

panfpn-task

Model

Backbone: ResNet-FPN

panfpn-model

The authors propose a single-network with two branches. The backbone encodes features at different resolution levels and a decoder (FPN) is applied on top. The first branch is Mask R-CNN, which in its original variant uses ResNet-FPN as backbone as well. The second branch is a light semantic segmentation branch described in the following.

Semantic Segmentation

panfpn-semantic

This branch merges the features of the feature pyramid by upsampling and summation. Staring with the smallest layer, 3 \times 3 convolution operations followed by group norm, ReLU and bilinear upsampling is applied until a resolution of 1/4 is reached. This procedure is followed for each pyramid layer and all the results are element-wise summed up. This is followed by a 1 \times 1 convolution, upsampling to original resolution and softmax classification. This branch predicts all stuff classes and one 'other' class.

Training

Pre-trained: ImageNet

Losses: Mask R-CNN (classification, bounding box, mask), segmentation (cross-entropy)

All losses are summed up with different weights per branch. Hence, hyper-parameter tuning is needed.

Fusion

The paper uses merging heuristics:

  1. Resolve overlaps between different instances based on confidence score
  2. Resolve disagreement between instance and semantic branch by using instance output
  3. Remove regions labeled 'other' or that have a small area

Results

Datasets: COCO, Cityscapes

Augmentation: scale-jitter (COCO), cropping + random scales + color augmentation (Cityscapes)

Metrics

Instance Segmentation: AP

Semantic Segmentation: mIoU, fIoU, iIoU

Panoptic Segmentation: PQ

Backbone performance

panfpn-backbone

panfpn-backbone-performance

Semantic Segmentation

panfpn-semseg

Multi-Task Training

panfpn-multi-task

Panoptic Segmentation

panfpn-panopticresults

panfpn-moreresults

Discussion

Positive Aspects

  • Many ablation studies justyfing design choices
  • simple design
  • strong baseline

Negative Aspects

  • merging heuristics
  • so much hyper-parameter tuning
  • only small "novel" contribution
Edited by Mark Weber