[Paper Summary] UNet 3+: A Full-Scale Connected UNET For Medical Image Segmentation
In medical image analysis, accurately identifying and outlining organs is vital for clinical applications such as diagnosis and treatment planning. The UNet architecture, a widely favoured choice for these tasks, has seen enhancements through UNet++, which introduced nested and dense skip connections to improve performance. Taking this evolution further, the introduction of UNet 3+ marks a significant advancement over UNet++, presenting an innovative approach to overcome current limitations and elevate segmentation accuracy to unprecedented levels.
Research Paper: UNET 3+: A FULL-SCALE CONNECTED UNET FOR MEDICAL IMAGE SEGMENTATION
What is UNet 3+?
UNet 3+ is a U-shape encoder-decoder architecture built upon the foundation of its predecessors, i.e., UNet and UNet++. It aims to capture both fine-grained details and coarse-grained semantics from full scales. The paper highlights the re-design of inter and intra-connections between the encoder and the decoder, providing a more comprehensive understanding of organ structures. Additionally, a hybrid loss function contributes to accurate segmentation, especially for organs appearing at varying scales, while reducing network parameters to improve computational efficiency.
![The block diagram shows a Comparison of UNet, UNet++ and UNet 3+](https://idiotdeveloper.com/wp-content/uploads/2024/02/unet-unetpp-unet3-1024x376.webp)
Issues with Existing Methods
Existing segmentation methods often face challenges in reducing false positives, especially in non-organ images. Traditional approaches employ attention mechanisms or predefined refinement methods like Conditional Random Fields (CRF) at inference. UNet 3+ takes a different route by introducing a classification task to predict the presence of organs in the input image, offering valuable guidance to the segmentation task.
Read More:
Proposed Method: UNet 3+
UNet 3+ combines the multi-scale features by re-designing skip connections as well as utilizing full-scale deep supervision, which provides fewer parameters but yields a more accurate position-aware and boundary-enhanced segmentation map.
It consists of the following three components:
- Full-scale skip connections
- Full-scale deep supervision
- Classification-guided Module (CGM)
Full-scale Skip Connections
UNet 3+ addresses the shortcomings of UNet and UNet++ by incorporating both smaller- and same-scale feature maps from the encoder and larger-scale feature maps from the decoder at each decoder layer. This innovative approach captures fine-grained details and coarse-grained semantics in full scales, enhancing the model’s position-awareness and boundary definition. Importantly, UNet 3+ achieves this with fewer parameters, ensuring computational efficiency.
![The block diagram shows the construction of the full-scale aggregated feature map of the third decoder layer](https://idiotdeveloper.com/wp-content/uploads/2024/02/full-scale-aggregated-feature-map-3rd-decoder.webp)
Full-Scale Deep Supervision
The adoption of full-scale deep supervision in UNet 3+ allows for better segmentation performance. The approach involves multiple supervision signals from different scales, contributing to the accurate delineation of organ boundaries.
![The block diagram of the classification-guided module (CGM)](https://idiotdeveloper.com/wp-content/uploads/2024/02/classification-guided-module.webp)
Classification-guided Module (CGM)
To tackle false positives, particularly in non-organ images, UNet 3+ introduces a classification-guided module. This involves an additional classification task to predict organ presence in the input image. The classification result guides each segmentation side output, effectively addressing over-segmentation issues by providing corrective guidance.
Loss Function
The hybrid loss function employed by UNet 3+ includes a multi-scale structural similarity index (MS-SSIM) loss to monitor fuzzy boundaries. The total loss comprises Focal Loss, MS-SSIM Loss, and Intersection over Union (IoU) Loss. This comprehensive loss function operates at pixel, patch, and map levels, capturing both large-scale and fine structures with clear boundaries.
Total Loss = Focal Loss + MS-SSIM Loss + IoU Loss
Dataset and Implementation
The evaluation of UNet 3+ utilizes liver and spleen datasets obtained from the ISBI LiTS 2017 Challenge and an ethically approved hospital dataset. Input images are resized to 320 x 320 pixels. Stochastic Gradient Descent (SGD) is employed as the optimizer, with the dice coefficient serving as the evaluation metric.
Results
![Comparison of UNet, UNet++, the proposed UNet 3+ without deep supervision (DS) and UNet 3+ on liver and spleen datasets in terms of Dice metrics. The best results are highlighted in bold. The loss function used in each method is focal loss.](https://idiotdeveloper.com/wp-content/uploads/2024/02/unet3table1-1024x171.webp)
![Comparison of UNet 3+ and other 5 state-of-the-art methods. The best results are highlighted in bold.](https://idiotdeveloper.com/wp-content/uploads/2024/02/unet3table2.webp)
Conclusion
In conclusion, UNet 3+ presents a promising advancement in medical image segmentation, combining innovative architectural modifications, a hybrid loss function, and a classification-guided module. The results suggest improved accuracy, reduced false positives, and enhanced computational efficiency, making UNet 3+ a notable contribution to the field of medical image analysis.
Hi Nikhil, Thanks for your dedications and explored very well. it is great helps to me. by Palani
ImportError: cannot import name 'load_data' from 'data' (C:\Users\Aravinda\anaconda3\lib\site-packages\data\__init__.py)
sir can you share a code of using squeeze and excitation network on custom CNN for a classification
Hey Nikhil, Very nice, to-the-point article on transfer learning. For a small size of dataset, an image augmentation along with…
Truly good blog short article and also valuable.