Converting RGB Mask to Class Index Masks in Python

In the world of semantic segmentation, each pixel in an image carries a meaning — a class label that represents an object or region. These labels can be stored in various formats, and one common way is using a multi-class RGB mask, where each class is represented by a unique color. While this format is visually interpretable, it’s not always ideal for training or evaluating machine learning models. In most deep learning workflows, models expect segmentation masks in the form of class index masks, where each pixel’s value corresponds to a class ID (e.g., 0 for background, 1 for person, etc.).

This post will walk you through a simple Python implementation to convert an RGB mask to a class index mask and vice-versa using OpenCV and NumPy.

A Quick Recap of the Previous Post

In our previous blog post, Extracting RGB Codes from Multi-Class Segmentation Masks with Python, we explored how to extract unique RGB color codes from a segmentation mask. This helped understand how many classes are present and what colors are used to represent them.

Now that we can identify these RGB values, it’s time to assign them proper class indices and make them usable for downstream tasks like model training and evaluation.

Why Do We Need to Convert RGB Masks to Class Index Masks

There are several reasons why this conversion is essential in deep learning workflows:

Model Compatibility: Most segmentation models (like U-Net, DeepLabV3+, etc.) are trained to predict class indices, not colors. Their output is a 2D array of integers, not a 3D RGB image.

Efficiency: Index masks are more memory-efficient since they use only one channel (grayscale) instead of three (RGB). This is especially useful when dealing with large datasets.

Loss Function Requirements: Loss functions like CrossEntropyLoss require integer class labels as input. Feeding an RGB mask will not work and may throw errors during training.

Post-Processing & Visualization: While index masks are efficient for training, RGB masks are great for visualization. Hence, after model inference, it’s useful to convert the predicted class index mask back to RGB for human interpretation or qualitative analysis.

Code Breakdown and Explanation

Imports

import numpy as np
import cv2
  • numpy is used for numerical operations and efficient array handling.
  • cv2 (OpenCV) is used to read/write images in different formats.

Function to convert RGB Mask to Class Index Mask

def rgb_to_index_mask(rgb_mask, rgb_to_class):
    height, width = rgb_mask.shape[:2]
    class_mask = np.zeros((height, width), dtype=np.uint8)

    for rgb, class_id in rgb_to_class.items():
        match = np.all(rgb_mask == rgb, axis=-1)
        class_mask[match] = class_id

    return class_mask

The function converts a 3-channel RGB mask to a 2D class index mask.

  • rgb_mask: Input image (RGB mask as NumPy array).
  • rgb_to_class: Dictionary mapping RGB tuples to class indices.

The working function is as follows:

  1. Get the image dimensions (ignores the color channels).
  2. Create a blank class index mask of the same size (single channel).
  3. Loop through each RGB-to-class mapping:
    • match: Boolean mask where all pixels match the current rgb color.
    • class_mask[match] = class_id: Assign the class ID where the match is true.
  4. Return the resulting class index mask.

Function to Convert the Class Index Mask to RGB Mask

def index_to_rgb_mask(class_mask, class_to_rgb):
    height, width = class_mask.shape
    rgb_mask = np.zeros((height, width, 3), dtype=np.uint8)

    for class_id, rgb in class_to_rgb.items():
        rgb_mask[class_mask == class_id] = rgb

    return rgb_mask

The function is to convert a 2D class index mask back to a 3-channel RGB image.

  • class_mask: 2D mask with integer class values.
  • class_to_rgb: Dictionary mapping class indices back to RGB tuples.

The working function is as follows:

  1. Get image dimensions and initialize a blank RGB image.
  2. For each class ID, find matching pixels in the class mask and assign the corresponding RGB color.
  3. Return the final RGB mask.

Main Execution Block

Step 1:  Define RGB-to-Class

if __name__ == "__main__":
    rgb_to_class = {
        (0, 0, 0):0,
        (0, 74, 111):1,
        (0, 220, 220):2,
        (20, 20, 20):3,
        (30, 170, 250):4,
        (35, 142, 107):5,
        (60, 20, 220):6,
        (70, 0, 0):7,
        (70, 70, 70):8,
        (81, 0, 81):9,
        (100, 100, 150):10,
        (128, 64, 128):11,
        (142, 0, 0):12,
        (152, 251, 152):13,
        (153, 153, 153):14,
        (153, 153, 190):15,
        (156, 102, 102):16,
        (180, 130, 70):17,
        (230, 0, 0):18,
        (232, 35, 244):19
    }

A dictionary where each RGB tuple is mapped to a unique class index (e.g., road, sky, person, etc.).

The original RGB mask
The original RGB mask from the GTA5 dataset.

Step 2: Load the RGB Mask

    rgb_mask = cv2.imread('masks/00001.png', cv2.IMREAD_COLOR)

Reads the RGB segmentation mask image from disk.

Step 3: Convert RGB to Class Index Mask

    class_mask = rgb_to_index_mask(rgb_mask, rgb_to_class)
    cv2.imwrite('results/class_index_mask.png', class_mask)
  • Convert the RGB mask to class indices.
  • Save the resulting grayscale (index) mask to a file.
The class index mask.

Step 4: Convert Class Index Mask Back to RGB

    class_to_rgb = {v: k for k, v in rgb_to_class.items()}
    rgb_converted = index_to_rgb_mask(class_mask, class_to_rgb)

    cv2.imwrite('results/rgb_mask_back.png', rgb_converted)
  • Reverse the original dictionary to map class indices back to RGB.
  • Convert the index mask back to RGB.
  • Save the result as an RGB image (should match the original if everything worked correctly).
From class index mask back to RGB mask.
From class index mask back to the RGB mask.

Output Summary

  • class_index_mask.png — Grayscale image with class indices per pixel.
  • rgb_mask_back.png — Reconstructed RGB mask from class indices.

Conclusion

In this post, we explored how to convert multi-class segmentation masks between RGB format and class index format — a crucial step in the preprocessing and postprocessing pipelines of semantic segmentation tasks.

While RGB masks are human-readable and great for visualization, they are not suitable for model training or evaluation. Most deep learning frameworks require class index masks, where each pixel’s value directly corresponds to a class label.

By understanding and implementing these conversions using simple NumPy and OpenCV functions, you can:

  • Prepare your dataset in the right format for model training.
  • Efficiently store and process label masks.
  • Visualize predictions meaningfully after inference.

This conversion bridges the gap between machine-compatible formats and human-friendly visualizations — ensuring your workflow is both efficient and interpretable.

Mastering this conversion technique will greatly simplify your deep learning journey, whether you’re building your own segmentation model or working with custom datasets.

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *