What is UNET?

UNET is an architecture developed by Olaf Ronneberger and his team at the University of Freiburg in 2015 for biomedical image segmentation. It is a highly popular approach for semantic segmentation tasks. It is a fully convolutional neural network that is designed to learn from fewer training samples. This architecture is an improvement over the existing FCN (Fully convolutional networks for semantic segmentation) developed by Jonathan Long and his team in 2014.

Related articles on UNET

The diagram of UNET Architecture from the original research paper.

UNET – Network Architecture

The UNET architecture is a U-shaped encoder-decoder network, which consists of four encoder blocks and four decoder blocks that are connected by a bridge. The encoder network, also known as the contracting path, reduces the spatial dimensions and increases the number of filters (feature channels) at each encoder block. Conversely, the decoder network increases the spatial dimensions and reduces the number of feature channels.


The encoder network acts as the feature extractor and learns an abstract representation of the input image through a sequence of the encoder blocks. Each encoder block consists of two 3×3 convolutions, where each convolution is followed by a ReLU (Rectified Linear Unit) activation function. The ReLU activation function introduces non-linearity into the network, which helps in the better generalization of the training data. The output of the ReLU acts as a skip connection for the corresponding decoder block.

Next, follows a 2×2 max-pooling, where the spatial dimensions (height and width) of the feature maps are reduced by half. This reduces the computational cost by decreasing the number of trainable parameters.

Skip Connections

These skip connections provide additional information that helps the decoder to generate better semantic features. They also act as a shortcut connection that helps in the direct flow of gradients to the earlier layers without degradation. In simple terms, we can say that skip connection helps in better flow of gradient while backpropagation. This helps the network to learn better representation and improve performance.

The block diagram of the encoder and the decoder block of the UNET architecture.
The block diagram of the encoder and the decoder block of the UNET architecture.

The above figure shows the block diagram of the encoder and decoder block used to build the UNET architecture.


The bridge connects the encoder and the decoder network and completes the flow of information. It consists of two 3×3 convolutions, where each convolution is followed by a ReLU activation function.

Decoder Network

The decoder network takes the abstract representation generated by the encoder and generates a semantic segmentation mask. The decoder block starts with a 2×2 transpose convolution, which is then concatenated with the corresponding skip connection feature map from the encoder block. These skip connections provide features from earlier layers that may have been lost due to the depth of the network. After this, two 3×3 convolutions are used, followed by a ReLU activation function

The output of the last decoder passes through a 1×1 convolution with sigmoid activation. The sigmoid activation function gives the segmentation mask representing the pixel-wise classification.


  • Some researchers prefer to use a batch normalization layer in between the convolution layer and the ReLU activation function. The batch normalization reduces internal covariance shift and makes the network more stable while training.
  • The dropout is also used sometime after the ReLU activation function. It forces the network to learn a different representation by dropping out (ignoring) some randomly selected neurons. It helps the network to become less dependent¬†upon certain neurons. This in turn helps the network to better generalize and prevent it from overfitting.


In summary, UNET is a cutting-edge architecture specifically designed for biomedical image segmentation. The architecture comprises of a U-shaped encoder-decoder network that includes four encoder blocks, four decoder blocks, and a bridge that connects the two. The encoder network functions as a feature extractor, extracting abstract representations of the input image. On the other hand, the decoder network utilizes these representations to generate a semantic segmentation mask. Furthermore, the skip connections between the encoder and decoder network provide additional information to the decoder and act as a direct channel for the flow of gradients, thereby enhancing the overall performance of the architecture.

Read More

Nikhil Tomar

I am an independent researcher in the field of Artificial Intelligence. I love to write about the technology I am working on.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *