In the world of semantic segmentation, each pixel in an image carries a meaning — a class label that represents an object or region. These labels can be stored in various formats, and one common way is using a multi-class RGB mask, where each class is represented by a unique color. While this format is visually interpretable, it’s not always ideal for training or evaluating machine learning models. In most deep learning workflows, models expect segmentation masks in the form of class index masks, where each pixel’s value corresponds to a class ID (e.g., 0 for background, 1 for person, etc.).
This post will walk you through a simple Python implementation to convert an RGB mask to a class index mask and vice-versa using OpenCV and NumPy.
A Quick Recap of the Previous Post
In our previous blog post, Extracting RGB Codes from Multi-Class Segmentation Masks with Python, we explored how to extract unique RGB color codes from a segmentation mask. This helped understand how many classes are present and what colors are used to represent them.
Now that we can identify these RGB values, it’s time to assign them proper class indices and make them usable for downstream tasks like model training and evaluation.
Why Do We Need to Convert RGB Masks to Class Index Masks
There are several reasons why this conversion is essential in deep learning workflows:
Model Compatibility: Most segmentation models (like U-Net, DeepLabV3+, etc.) are trained to predict class indices, not colors. Their output is a 2D array of integers, not a 3D RGB image.
Efficiency: Index masks are more memory-efficient since they use only one channel (grayscale) instead of three (RGB). This is especially useful when dealing with large datasets.
Loss Function Requirements: Loss functions like CrossEntropyLoss require integer class labels as input. Feeding an RGB mask will not work and may throw errors during training.
Post-Processing & Visualization: While index masks are efficient for training, RGB masks are great for visualization. Hence, after model inference, it’s useful to convert the predicted class index mask back to RGB for human interpretation or qualitative analysis.
Code Breakdown and Explanation
Imports
import numpy as np
import cv2
- numpy is used for numerical operations and efficient array handling.
- cv2 (OpenCV) is used to read/write images in different formats.
Function to convert RGB Mask to Class Index Mask
def rgb_to_index_mask(rgb_mask, rgb_to_class):
height, width = rgb_mask.shape[:2]
class_mask = np.zeros((height, width), dtype=np.uint8)
for rgb, class_id in rgb_to_class.items():
match = np.all(rgb_mask == rgb, axis=-1)
class_mask[match] = class_id
return class_mask
The function converts a 3-channel RGB mask to a 2D class index mask.
- rgb_mask: Input image (RGB mask as NumPy array).
- rgb_to_class: Dictionary mapping RGB tuples to class indices.
The working function is as follows:
- Get the image dimensions (ignores the color channels).
- Create a blank class index mask of the same size (single channel).
- Loop through each RGB-to-class mapping:
- match: Boolean mask where all pixels match the current rgb color.
- class_mask[match] = class_id: Assign the class ID where the match is true.
- Return the resulting class index mask.
Function to Convert the Class Index Mask to RGB Mask
def index_to_rgb_mask(class_mask, class_to_rgb):
height, width = class_mask.shape
rgb_mask = np.zeros((height, width, 3), dtype=np.uint8)
for class_id, rgb in class_to_rgb.items():
rgb_mask[class_mask == class_id] = rgb
return rgb_mask
The function is to convert a 2D class index mask back to a 3-channel RGB image.
- class_mask: 2D mask with integer class values.
- class_to_rgb: Dictionary mapping class indices back to RGB tuples.
The working function is as follows:
- Get image dimensions and initialize a blank RGB image.
- For each class ID, find matching pixels in the class mask and assign the corresponding RGB color.
- Return the final RGB mask.
Main Execution Block
Step 1: Define RGB-to-Class
if __name__ == "__main__":
rgb_to_class = {
(0, 0, 0):0,
(0, 74, 111):1,
(0, 220, 220):2,
(20, 20, 20):3,
(30, 170, 250):4,
(35, 142, 107):5,
(60, 20, 220):6,
(70, 0, 0):7,
(70, 70, 70):8,
(81, 0, 81):9,
(100, 100, 150):10,
(128, 64, 128):11,
(142, 0, 0):12,
(152, 251, 152):13,
(153, 153, 153):14,
(153, 153, 190):15,
(156, 102, 102):16,
(180, 130, 70):17,
(230, 0, 0):18,
(232, 35, 244):19
}
A dictionary where each RGB tuple is mapped to a unique class index (e.g., road, sky, person, etc.).

Step 2: Load the RGB Mask
rgb_mask = cv2.imread('masks/00001.png', cv2.IMREAD_COLOR)
Reads the RGB segmentation mask image from disk.
Step 3: Convert RGB to Class Index Mask
class_mask = rgb_to_index_mask(rgb_mask, rgb_to_class)
cv2.imwrite('results/class_index_mask.png', class_mask)
- Convert the RGB mask to class indices.
- Save the resulting grayscale (index) mask to a file.

Step 4: Convert Class Index Mask Back to RGB
class_to_rgb = {v: k for k, v in rgb_to_class.items()}
rgb_converted = index_to_rgb_mask(class_mask, class_to_rgb)
cv2.imwrite('results/rgb_mask_back.png', rgb_converted)
- Reverse the original dictionary to map class indices back to RGB.
- Convert the index mask back to RGB.
- Save the result as an RGB image (should match the original if everything worked correctly).

Output Summary
- class_index_mask.png — Grayscale image with class indices per pixel.
- rgb_mask_back.png — Reconstructed RGB mask from class indices.
Conclusion
In this post, we explored how to convert multi-class segmentation masks between RGB format and class index format — a crucial step in the preprocessing and postprocessing pipelines of semantic segmentation tasks.
While RGB masks are human-readable and great for visualization, they are not suitable for model training or evaluation. Most deep learning frameworks require class index masks, where each pixel’s value directly corresponds to a class label.
By understanding and implementing these conversions using simple NumPy and OpenCV functions, you can:
- Prepare your dataset in the right format for model training.
- Efficiently store and process label masks.
- Visualize predictions meaningfully after inference.
This conversion bridges the gap between machine-compatible formats and human-friendly visualizations — ensuring your workflow is both efficient and interpretable.
Mastering this conversion technique will greatly simplify your deep learning journey, whether you’re building your own segmentation model or working with custom datasets.