Introduction to Autoencoders

In today’s article, we are going to discuss a neural network architecture called autoencoders. This article is aimed at Machine Learning and Deep Learning beginners who are interested in getting a brief understanding of the underlying concepts behind autoencoders. So let’s dive in and get familiar with the concept of autoencoders.

In this article, we are going to explore the following topics:

What is autoencoders
Architecture of autoencoders
Loss used in autoencoders
Types of autoencoders.
Applications of autoencoders.
Conclusion
Sources

What are Autoencoders

Autoencoders are a type of neural network that attempts to mimic its input as closely as possible to its output. It aims to take an input, transform it into a reduced representation called code or embedding. Then, this code or embedding is transformed back into the original input. The code is also called the latent-space representation.

Basic architecture of autoencoder — CREDIT: https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798

Formally, we can say, an autoencoders describes a nonlinear relationship of an input to an output through an intermediate representation called code or embedding.

It is used to efficiently learn the data representation or representation space in an unsupervised manner. The main purpose is to learn a reduced representation of the input data.

Some of the important things to know about the autoencoders are:

Data-specific compression: Autoencoders compresses the data that is similar to what it had been trained on. An autoencoder trained on dog photos cannot compress human faces photos easily.
Unsupervised: Training an autoencoder is easy as we don’t need labelled data. It is easily trained on any kind of input data.
Lossy in nature: There is always going to be some difference between the input and output of the autoencoder. The output will always have some missing information in it.

Now you know that the autoencoders are unsupervised neural networks. Now let’s know what is unsupervised learning.

In unsupervised learning, the deep learning models or neural networks are trained using the unlabelled data, which allows them to find structure and patterns within data. The structure and pattern help the neural network to learn important features from the data. These features are then useful in training the other deep learning models, which further improve their performance.

Architecture of Autoencoders

Let’s explore the details of the architecture of the autoencoder. An autoencoder consists of three main components:

Encoder
Code or Embedding
Decoder

The encoder compresses the given input into a fixed dimension code or embedding and the decoder transforms that code or embedding the same as the original input. The decoder architecture is the mirror image of an encoder.

A more detail architecture of autoencoder — CREDIT: https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798

While building an autoencoder, we aim to make sure that the autoencoder does not memorize all the information. Instead, it should be constrained to prioritize which information should be kept, which information should be discarded. This constraint is introduced in the following ways:

Reducing the number of units or nodes in the layers.
Adding some noise to the input images.
Adding some regularization.

When you start building your first autoencoder, you will encounter some of the hyperparameters that will affect the performance of your autoencoder. These hyperparameters are discussed here:

Number of layers: You can keep as many layers in the encoder and decoder as you require. You can also choose how many nodes or units you want in your layers. Usually the number of nodes decreases as we increase the number of layers in the encoder and vice-versa for the decoder.
Number of nodes in the code layer: It is always better to have less number of nodes in this layer than input size. Smaller size of code layer leads to better compression.
Loss: For the loss function, we generally use Mean Squared Error or Binary Cross Entropy. We are going to learn more about the loss function in the next section.

Loss used in Autoencoders

In an autoencoder, two main loss functions are used:

Binary Cross Entropy: It is used when the output of the autoencoder is between 0 and 1.
Mean Squared Error: The mean squared error is the mean of the squared difference between the autoencoder output (prediction) and the ground truth. It is generally used when the autoencoder output is a continuous value.

Types of Autoencoders

There are many types of autoencoder ranging from simple to denoising autoencoders used for various purposes. Today we are going to discuss about the following types of autoencoders:

Vanilla Autoencoder
Deep autoencoder
Convolutional autoencoder
Denoising autoencoder
Variational autoencoder

Vanilla Autoencoder

A vanilla autoencoder is the simplest form of autoencoder, also called simple autoencoder. It consists of only one hidden layer between the input and the output layer, which sometimes results in degraded performance compared to other autoencoders.

Deep Autoencoder

In the case of vanilla autoencoders, we limit ourselves to only one hidden layer. Here in deep autoencoder, we can use more than one hidden layer which enables us to increase the accuracy of the autoencoder and to tune it according to the requirements. In deep autoencoders both the encoder and decoder consist of identical deep neural networks.

Convolutional Autoencoder

Both the vanilla and deep autoencoder uses feedforward neural networks (dense layers) to build the model. To work with both these autoencoders the input data needs to be modified due to which the spatial information is lost. Convolutional layers can use input data like images without any modification.

The convolutional autoencoder uses convolutional, relu and pooling layers in the encoder. In the decoder, the pooling layer is replaced by the upsampling layer for increasing the dimensions of the feature maps.

For more: Building Convolutional Autoencoder using TensorFlow 2.0

Denoising Autoencoder

Denoising autoencoder adds some noise to the original input image creating a corrupted version of the input image. Adding noise helps to make the autoencoder robust to noise in the input image. The corrupt images are given to the autoencoder as input and then the autoencoder needs to reconstruct the original undistorted image. It forces the autoencoder to learn features from the input data instead of memorizing it.

Variational Autoencoder

The variational autoencoder is one of the most popular types of autoencoder in the machine learning community. What makes them different from other autoencoders is their code or latent spaces are continuous allowing easy random sampling and interpolation.

In variational autoencoder, the encoder outputs two vectors instead of one, one for the mean and another for the standard deviation for describing the latent state attributes. These vectors are combined to obtain a encdoing sample passed to the decoder for the reconstruction purpose.

The variational autoencoder is a powerful generative model as compared to other autoencoders. They have applications ranging from generating fake human faces to producing synthetic music.

Applications of Autoencoder

The following are the applications of autoencoders:

Dimensionality reduction
Information retrieval
Data compression
Clustering
Generating new examples

Conclusion

In this article, we have discussed and implemented many different types of autoencoders: vanilla, deep, convolutional, denoising and variational. Each of them has its own advantages and disadvantages which make them suitable for different tasks.

The feature learned by the autoencoder can be used for other tasks like image classification or text classification. It is also useful for dimensionality reduction or compression of the data which can be important in some applications.