computer vision Archives - Idiot Developer

Multiclass Segmentation in PyTorch using U-Net

Posted on 2nd July 20252nd July 2025 by Nikhil Tomar

Semantic segmentation is a crucial task in computer vision that involves labeling each pixel in an image with its corresponding class. In this blog post, we’ll dive into building a multiclass semantic segmentation pipeline using the U-Net architecture with PyTorch. Our goal is to segment different types of weeds from Continue Reading

Converting RGB Mask to Class Index Masks in Python

Posted on 29th June 202529th June 2025 by Nikhil Tomar

In the world of semantic segmentation, each pixel in an image carries a meaning — a class label that represents an object or region. These labels can be stored in various formats, and one common way is using a multi-class RGB mask, where each class is represented by a unique Continue Reading

YOLO: From Real-Time to State-of-the-Art Object Detection

Posted on 14th June 202518th June 2025 by Nikhil Tomar

The You Only Look Once (YOLO) series has revolutionized object detection since its inception in 2015. Developed initially by Joseph Redmon and colleagues, YOLO redefined speed and efficiency in computer vision by transforming detection into a single regression problem. Unlike earlier two-stage detectors (e.g., R-CNN), which required multiple passes over Continue Reading

GradCAM and its Implementation in PyTorch

Posted on 1st March 20251st March 2025 by Nikhil Tomar

Deep learning models, especially convolutional neural networks (CNNs), often function as black boxes, making it difficult to interpret their decision-making processes. Gradient-weighted Class Activation Mapping (GradCAM) is a powerful technique used to visualize and understand these models by highlighting the regions of an image that contribute most to a prediction. Continue Reading

GradCAM with TensorFlow: Interpreting Neural Networks with Class Activation Maps

Posted on 26th February 202520th June 2025 by Nikhil Tomar

Deep learning models, particularly convolutional neural networks (CNNs), are widely used for image classification, object detection, and various computer vision tasks. However, these models are often referred to as “black boxes” due to their complex decision-making processes. To interpret these decisions and understand what parts of an image influence the Continue Reading

[Paper Summary] EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation

Posted on 14th September 202414th September 2024 by Nikhil Tomar

This post will analyze the research paper “EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation.” We will discuss the problems with existing medical image segmentation methods and how the given method (EMCAD) solves these issues. What is EMCAD? EMCAD is a newly developed efficient multi-scale convolutional attention decoder Continue Reading

What is Image Captioning?

Posted on 29th August 2024 by Nikhil Tomar

In recent years, the field of artificial intelligence (AI) has seen remarkable advancements, particularly in how machines can understand and describe visual content. One of the fascinating developments in this area is image captioning, where AI models are trained to generate descriptive captions for images. This technology, often referred to Continue Reading

Read Video Files Using OpenCV Python

Posted on 25th August 202425th August 2024 by Nikhil Tomar

Reading and processing video files is a common task in computer vision, and OpenCV makes it easy to work with video data. In this article, we’ll go through the process of reading and displaying video files using OpenCV. Whether you’re working on a video analysis project or want to learn Continue Reading

Image Masking with OpenCV AddWeighted

Posted on 13th August 202415th August 2024 by Nikhil Tomar

Image masking is a powerful technique used in image processing to manipulate specific parts of an image while leaving other areas untouched. This is particularly useful in applications like object detection, image segmentation, and photo editing. In this tutorial, we’ll explore how to perform image masking using OpenCV addWeighted function. Continue Reading

ResUNet++ Implementation in TensorFlow

Posted on 20th April 202420th April 2024 by Nikhil Tomar

In this article, we will study the ResUNet++ architecture and implement it using the TensorFlow framework. ResUNet++ is a medical image segmentation architecture built upon the ResUNet architecture. It takes advantage of Residual Networks, Squeeze and Excitation blocks, Atrous Spatial Pyramidal Pooling (ASPP), and attention blocks. What is ResUNet++? Debesh Continue Reading