Skip to content
Search
Generic filters
Exact matches only

A reading guide about Deep Learning with CNNs

The evolution of the DeepLab family is characteristic for the evolution of FCN inspired models for image segmentation. DeepLab variants can be found in both, naive-decoder and encoder-decoder models. Hence, the guide orientates on this family by first looking at naive-decoders and then turning towards encoder-decoder models.

The most important insights of naive-decoder models are mainly the establishment of so called atrous convolutions and long range image context exploitation for prediction on pixel level. Atrous convolutions are a variant of normal convolutions, which allow an increasing receptive field without the loss of image resolution. The famous Atrous Spatial Pyramid Pooling module (ASPP module) in DeepLab-V2 [4] and later combines both: atrous convolutions and long range image context exploitation. When reading the following literature, focus on the developments of those features — Atrous convolutions, the ASPP module and long range image context exploitation/parsing.

Today, the most famous encoder-decoder is probably the U-Net [5]. A CNN which was developed for analyzing medical images. Its clear structure invited many researchers to experiment and adopt it and it is famous for its skip connections, which allow the sharing of features between the encoder and decoder paths. Encoder-decoder models focus on enhancing the semantically rich feature maps during upsampling in the decoder with more locally precise feature maps from the encoder.

With the literature at hand, you will be able to reflect on modern image segmentation papers and implementations with CNNs. Let’s meet again in Part III, where we will discuss object detection.

[1] Hoeser, T; Kuenzer, C. Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part I: Evolution and Recent Trends. Remote Sensing 2020, 12(10), 1667. DOI: 10.3390/rs12101667.

[2] Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 39, 640–651.

[3] Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Computer Vision–ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C.; Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 833–851

[4] Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal.
Mach. Intell. 2016, 40, 834–848.

[5] Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015; Navab, N., Hornegger, J.,
Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241.