Semantic segmentation is defined, explained, and compared to other image segmentation techniques in this article.
If you’ve ever used a filter on Instagram or TikTok, you’ve employed semantic segmentation from the palm of your hand. But this computer vision technique goes far beyond digital makeup and mustaches. You’ll find it hard at work in hospitals, farms, and even Teslas. In the following article, you’ll learn more about how semantic segmentation works, its importance, and how to do it yourself.
Semantic segmentation identifies, classifies, and labels each pixel within a digital image. Pixels are labeled according to the semantic features they have in common, such as color or placement. Semantic segmentation helps computer systems distinguish between objects in an image and understand their relationships. It’s one of three subcategories of image segmentation, alongside instance segmentation and panoptic segmentation.
Instance segmentation expands upon semantic segmentation by assigning class labels and differentiating between individual objects within those classes.
Example:
Semantic segmentation | Instance segmentation |
---|---|
Dogs | Yellow dog, brown dog |
Panoptic segmentation is a hybrid technique combining semantic and instance segmentation for a unified, interpreted view; hence, the prefix pan, meaning “all.” The panoptic segmentation process places objects into the following two categories:
Things. In the context of computer vision, “things” are quantifiable objects with defined shapes, for example, vehicles, people, animals, and trees.
Stuff. “Stuff” describes objects lacking defined shapes that computer vision can identify by material or texture. Examples include bodies of water, mountain ranges, and the sky.
Image classification can be a form of supervised machine learning, depending on the case. Image classification models may be trained to recognize objects in images using labeled example photos. This process initially depended upon raw pixel data. However, this data type is prone to uncorrectable fluctuations caused by camera focus, lighting, and angle variations. Introducing a convolutional neural network (CNN) to this process made it possible for models to extract individual features and deduce what objects they represent.
Semantic models take this approach a step further. After passing input images through the neural network architecture, semantic segmentation models create a color-coded map wherein each color represents a different class label. These defined spatial features help computers identify boundaries between different objects and distinguish between background and foreground focus items.
1. Classification. Pixels in an image are assigned a class label representing particular objects.
2. Localization. Objects are outlined with a bounding box. A bounding box is a line drawn around the perimeter of an object.
3. Segmentation. In the localized image, pixels are grouped using a segmentation mask. A segmentation mask reduces noise by separating one portion of an image from the rest. One way to visualize segmentation masking is to imagine sliding a piece of black construction paper with a hole cut out over an image to isolate specific portions.
Photography and social media filters. All commonly used camera effects and filters on social media applications like Instagram and TikTok rely on semantic segmentation. For example, it identifies the placement of eyes to apply sunglasses. Semantic segmentation also allows cameras to switch between landscape and portrait formats.
Medical imaging analyses. AI segmentation models trained on medical imagery can perform automated analysis to measure and detect anomalies on a pixel level. By highlighting and mapping anatomical features, segmentation enhances visualization for more precise identification of tumors and other irregularities.
Agriculture. Farmers employ AI and semantic segmentation to automate maintenance and manage the health of their crops. Computer vision technology helps farmers quickly detect at-risk portions of their fields to eradicate pests or contain infections.
Self-driving cars. Autonomous vehicles rely heavily on semantic segmentation to identify obstacles, analyze road conditions, and map surroundings.
Many different tools and models exist that you can use to perform semantic segmentation. If you’d like step-by-step guidance throughout your project, consider the Semantic Segmentation with Amazon Sagemaker Guided Project on Coursera. You’ll visualize and prepare data for model training via a split-screen web browser environment. To complete this advanced-level project, experience with Python programming, deep learning concepts, and AWS is required. Consider the resources in the following sections if you want to start a semantic segmentation project independently.
Data sets for semantic segmentation are typically huge and complex. The more diverse labels in the data set, the better the model can learn. Here are a few commonly used segmentation data sets:
Microsoft Common Objects in Context (MS COCO). MS COCO is a large-scale data set used for captioning, key-point detection, object detection, and segmentation. It includes over 320,000 images with a wide variety of annotations having been refined by community feedback.
Cityscapes Dataset. The central focus of this data set is the semantic understanding of city and street scenes. It includes 30 different classes, 25,000 annotated images, dense semantic segmentation, and instance segmentation for people and vehicles.
ScanNet. ScanNet is an RGB-D video data set with 2D and 3D data. It comprises 2.5 million indoor views in 1,513 scenes with semantic segmentation annotations and surface reconstructions.
Semantic segmentation models are used to classify objects in images. The list below includes a few popular segmentation models:
Pyramid Scene Parsing Network (PSPNet). PSPNet uses a pyramid parsing module to discern multi-level features for a more comprehensive context of an image. It’s capable of processing global and local information.
Fully Convolutional Network (FCN). FCNs have notably less dense layers than traditional CNNs, shortening the training process.
SegNet. SegNet is a semantic segmentation model comprising an encoder network, a decoder network, and a classification layer.
If you’re new to the field of computer vision, consider enrolling in an online course like Image Processing for Engineering and Science Specialization from MathWorks. You’ll gain a foundational understanding of image processing and analyzing techniques.
DeepLearning.AI offers an intermediate-level course, Advanced Computer Vision with TensorFlow, to build upon your existing knowledge of image segmentation using TensorFlow.
If you’re ready to dive straight into a semantic segmentation project, the Guided Project Semantic Segmentation with Amazon Sagemaker walks you through the entire process.
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.