[Paper Summary] UNet 3+: A Full-Scale Connected UNET For Medical Image Segmentation

In medical image analysis, accurately identifying and outlining organs is vital for clinical applications such as diagnosis and treatment planning. The UNet architecture, a widely favoured choice for these tasks, has seen enhancements through UNet++, which introduced nested and dense skip connections to improve performance. Taking this evolution further, the introduction of UNet 3+ marks a significant advancement over UNet++, presenting an innovative approach to overcome current limitations and elevate segmentation accuracy to unprecedented levels.

Research Paper: UNET 3+: A FULL-SCALE CONNECTED UNET FOR MEDICAL IMAGE SEGMENTATION

What is UNet 3+?

UNet 3+ is a U-shape encoder-decoder architecture built upon the foundation of its predecessors, i.e., UNet and UNet++. It aims to capture both fine-grained details and coarse-grained semantics from full scales. The paper highlights the re-design of inter and intra-connections between the encoder and the decoder, providing a more comprehensive understanding of organ structures. Additionally, a hybrid loss function contributes to accurate segmentation, especially for organs appearing at varying scales, while reducing network parameters to improve computational efficiency.

The block diagram shows a Comparison of UNet, UNet++ and UNet 3+

Issues with Existing Methods

Existing segmentation methods often face challenges in reducing false positives, especially in non-organ images. Traditional approaches employ attention mechanisms or predefined refinement methods like Conditional Random Fields (CRF) at inference. UNet 3+ takes a different route by introducing a classification task to predict the presence of organs in the input image, offering valuable guidance to the segmentation task.

Read More:

Proposed Method: UNet 3+

UNet 3+ combines the multi-scale features by re-designing skip connections as well as utilizing full-scale deep supervision, which provides fewer parameters but yields a more accurate position-aware and boundary-enhanced segmentation map.

It consists of the following three components:

Full-scale skip connections
Full-scale deep supervision
Classification-guided Module (CGM)

Full-scale Skip Connections

UNet 3+ addresses the shortcomings of UNet and UNet++ by incorporating both smaller- and same-scale feature maps from the encoder and larger-scale feature maps from the decoder at each decoder layer. This innovative approach captures fine-grained details and coarse-grained semantics in full scales, enhancing the model’s position-awareness and boundary definition. Importantly, UNet 3+ achieves this with fewer parameters, ensuring computational efficiency.

The block diagram shows the construction of the full-scale aggregated feature map of the third decoder layer

Full-Scale Deep Supervision

The adoption of full-scale deep supervision in UNet 3+ allows for better segmentation performance. The approach involves multiple supervision signals from different scales, contributing to the accurate delineation of organ boundaries.

Classification-guided Module (CGM)

To tackle false positives, particularly in non-organ images, UNet 3+ introduces a classification-guided module. This involves an additional classification task to predict organ presence in the input image. The classification result guides each segmentation side output, effectively addressing over-segmentation issues by providing corrective guidance.

Loss Function

The hybrid loss function employed by UNet 3+ includes a multi-scale structural similarity index (MS-SSIM) loss to monitor fuzzy boundaries. The total loss comprises Focal Loss, MS-SSIM Loss, and Intersection over Union (IoU) Loss. This comprehensive loss function operates at pixel, patch, and map levels, capturing both large-scale and fine structures with clear boundaries.

Total Loss = Focal Loss + MS-SSIM Loss + IoU Loss

Dataset and Implementation

The evaluation of UNet 3+ utilizes liver and spleen datasets obtained from the ISBI LiTS 2017 Challenge and an ethically approved hospital dataset. Input images are resized to 320 x 320 pixels. Stochastic Gradient Descent (SGD) is employed as the optimizer, with the dice coefficient serving as the evaluation metric.

Results

Comparison of UNet, UNet++, the proposed UNet 3+ without deep supervision (DS) and UNet 3+ on liver and spleen datasets in terms of Dice metrics. The best results are highlighted in bold. The loss function used in each method is focal loss.

Conclusion

In conclusion, UNet 3+ presents a promising advancement in medical image segmentation, combining innovative architectural modifications, a hybrid loss function, and a classification-guided module. The results suggest improved accuracy, reduced false positives, and enhanced computational efficiency, making UNet 3+ a notable contribution to the field of medical image analysis.