Computer VisionBrowser-basedNo API requiredComputer VisionPixel-level predictionEducational demo

Image Segmentation Studio

Explore how machine learning models divide images into meaningful regions by predicting labels at the pixel level.

This tool demonstrates image segmentation. Instead of only saying what is in an image, segmentation shows where different regions are by creating masks.

What is Image Segmentation?

Image segmentation is a computer vision task where a model assigns a label to pixels or regions in an image. Instead of giving one label to the whole image, segmentation produces a mask that shows which pixels belong to a person, background, object, road, sky, animal, or another category.

If an image contains a person standing in front of a background, image segmentation can separate person pixels from background pixels. In a road scene, segmentation can label the road, cars, pedestrians, sky, buildings, and trees.

How does Image Segmentation work?

The input image is resized and converted into pixel values.
A neural network processes the image.
Instead of predicting one class for the whole image, the model predicts a class for each pixel or region.
The output is a segmentation mask.
The mask can be drawn over the original image using colors or transparency.

What is a segmentation mask?

A segmentation mask is an image-like output where each pixel represents a predicted category. For example, person pixels may be white and background pixels may be black. In multi-class segmentation, each class can be shown in a different color.

Visible notes

- Image segmentation predicts regions at the pixel level. It is more detailed than object detection because it estimates object shape rather than only drawing a bounding box.
- This browser demo is educational. Results may be inaccurate for images that differ from the model's training data.
- Images are processed locally in the browser and are not uploaded to a server.

Types of Image Segmentation

Semantic, instance, panoptic, and person/background

Type	Meaning	Example
Semantic Segmentation	Labels each pixel by class, but does not separate individual objects of the same class.	All cars are labeled as car.
Instance Segmentation	Labels each pixel and separates each individual object instance.	Car 1, Car 2, and Car 3 are separated.
Panoptic Segmentation	Combines semantic and instance segmentation.	Road and sky are labeled while each person and car is separated.
Person/Background Segmentation	Separates a person from the background.	Used in video calls for background blur.

Choose an imagePNG, JPG, JPEG, or WEBP. Max 12 MB.

Image preview and segmentation output appear here.

View modeOverlay opacity

Opacity: 55%

Results and interpretation

Segmentation output

Model status: Loading Image Segmenter in the browser...
Input type: No input selected
Segmentation mode: Not run yet
Output type: Overlay
Number of classes detected: 0
Active class labels: No foreground classes detected yet

Educational interpretation: Run segmentation to see how the model separates image regions at the pixel level.

Current demo model

This page uses a browser-based MediaPipe Image Segmenter with a semantic segmentation model. It predicts pixel classes such as background, person, car, dog, road-scene objects, and other categories from its training data.

Comparison

Image Segmentation vs Other Computer Vision Tasks

Task	Main Question	Output	Example
Image Classification	What is this image?	One label + confidence	Dog
Object Detection	Where are the objects?	Bounding boxes + labels	Box around dog
Image Segmentation	Which pixels belong to each region?	Pixel-level mask	Exact dog outline
Object Identification	Which specific object/person is this?	Identity/name	This is Person A
Object Verification	Does this match the reference?	Yes/No or similarity score	Does this face match the ID photo?
Pose Estimation	Where are the body joints?	Keypoints/skeleton	Elbows, knees, shoulders

Segmentation is more detailed than detection because it does not just draw a rectangle. It estimates the actual shape of the object or region.

Try This in Class

Student tasks

Upload a simple image with one clear object.
Upload an image with a person and background.
Upload a crowded image with multiple objects.
Upload a dark or blurry image.
Compare original image, mask-only view, and overlay view.
Adjust mask opacity and observe how the interpretation changes.
Discuss why pixel-level prediction is harder than classification.

Discussion questions

Why is segmentation more detailed than object detection?
What is the difference between a bounding box and a mask?
Why might segmentation fail on unusual images?
How could segmentation be used in medicine?
How could segmentation be used in self-driving cars?
What privacy issues arise when using webcam-based segmentation?
Why should browser-based processing be preferred for sensitive images?

Medical imaging

Tumor segmentation, organ segmentation, and cell segmentation.

Autonomous driving

Road, lane, pedestrian, car, and sign segmentation.

Video conferencing

Background blur and background replacement.

Agriculture

Plant disease region segmentation and crop/weed separation.

Satellite imaging

Land cover, water, forest, and urban area segmentation.

Digital forensics

Separating foreground/background regions or identifying manipulated regions.

Creative tools

Background removal and image editing.

Technical Notes

Library: @mediapipe/tasks-vision
Task: Image segmentation
Input: Image, video frame, or canvas
Output: Segmentation mask
Execution: Browser/client-side
Privacy: Images remain in the browser
Limitation: The model only recognizes categories it was trained to segment

Related Tools

Background Remover - AI-Powered, Free & Local

Remove image backgrounds instantly in your browser using on-device AI. Get transparent PNG, add replacement backgrounds, and batch export - no upload required.

Object Classification Studio

Classify uploaded images or webcam frames using a browser-based MobileNet model, then inspect top predictions and confidence scores.

Pose Estimation Studio

Detect human body keypoints from webcam input or uploaded images, then study skeleton tracking, posture cues, and movement analysis.

Hand Pose Detection

Use your webcam to see real-time hand landmarks and finger joints using a pre-trained hand pose model.

Image Convolution Interactive Tool

Learn CNN-style image convolution with preset kernels, a custom 3x3 matrix editor, pixel-grid calculations, and local image filtering.