Image Segmentation Studio
Explore how machine learning models divide images into meaningful regions by predicting labels at the pixel level.
What is Image Segmentation?
Image segmentation is a computer vision task where a model assigns a label to pixels or regions in an image. Instead of giving one label to the whole image, segmentation produces a mask that shows which pixels belong to a person, background, object, road, sky, animal, or another category.
If an image contains a person standing in front of a background, image segmentation can separate person pixels from background pixels. In a road scene, segmentation can label the road, cars, pedestrians, sky, buildings, and trees.
How does Image Segmentation work?
- The input image is resized and converted into pixel values.
- A neural network processes the image.
- Instead of predicting one class for the whole image, the model predicts a class for each pixel or region.
- The output is a segmentation mask.
- The mask can be drawn over the original image using colors or transparency.
What is a segmentation mask?
A segmentation mask is an image-like output where each pixel represents a predicted category. For example, person pixels may be white and background pixels may be black. In multi-class segmentation, each class can be shown in a different color.
Visible notes
- - Image segmentation predicts regions at the pixel level. It is more detailed than object detection because it estimates object shape rather than only drawing a bounding box.
- - This browser demo is educational. Results may be inaccurate for images that differ from the model's training data.
- - Images are processed locally in the browser and are not uploaded to a server.
Types of Image Segmentation
Semantic, instance, panoptic, and person/background
| Type | Meaning | Example |
|---|---|---|
| Semantic Segmentation | Labels each pixel by class, but does not separate individual objects of the same class. | All cars are labeled as car. |
| Instance Segmentation | Labels each pixel and separates each individual object instance. | Car 1, Car 2, and Car 3 are separated. |
| Panoptic Segmentation | Combines semantic and instance segmentation. | Road and sky are labeled while each person and car is separated. |
| Person/Background Segmentation | Separates a person from the background. | Used in video calls for background blur. |
Results and interpretation
Segmentation output
- Model status
- Loading Image Segmenter in the browser...
- Input type
- No input selected
- Segmentation mode
- Not run yet
- Output type
- Overlay
- Number of classes detected
- 0
- Active class labels
- No foreground classes detected yet
This page uses a browser-based MediaPipe Image Segmenter with a semantic segmentation model. It predicts pixel classes such as background, person, car, dog, road-scene objects, and other categories from its training data.
Comparison
Image Segmentation vs Other Computer Vision Tasks
| Task | Main Question | Output | Example |
|---|---|---|---|
| Image Classification | What is this image? | One label + confidence | Dog |
| Object Detection | Where are the objects? | Bounding boxes + labels | Box around dog |
| Image Segmentation | Which pixels belong to each region? | Pixel-level mask | Exact dog outline |
| Object Identification | Which specific object/person is this? | Identity/name | This is Person A |
| Object Verification | Does this match the reference? | Yes/No or similarity score | Does this face match the ID photo? |
| Pose Estimation | Where are the body joints? | Keypoints/skeleton | Elbows, knees, shoulders |
Segmentation is more detailed than detection because it does not just draw a rectangle. It estimates the actual shape of the object or region.
Try This in Class
Student tasks
- Upload a simple image with one clear object.
- Upload an image with a person and background.
- Upload a crowded image with multiple objects.
- Upload a dark or blurry image.
- Compare original image, mask-only view, and overlay view.
- Adjust mask opacity and observe how the interpretation changes.
- Discuss why pixel-level prediction is harder than classification.
Discussion questions
- Why is segmentation more detailed than object detection?
- What is the difference between a bounding box and a mask?
- Why might segmentation fail on unusual images?
- How could segmentation be used in medicine?
- How could segmentation be used in self-driving cars?
- What privacy issues arise when using webcam-based segmentation?
- Why should browser-based processing be preferred for sensitive images?
Medical imaging
Tumor segmentation, organ segmentation, and cell segmentation.
Autonomous driving
Road, lane, pedestrian, car, and sign segmentation.
Video conferencing
Background blur and background replacement.
Agriculture
Plant disease region segmentation and crop/weed separation.
Satellite imaging
Land cover, water, forest, and urban area segmentation.
Digital forensics
Separating foreground/background regions or identifying manipulated regions.
Creative tools
Background removal and image editing.
Technical Notes
- Library
- @mediapipe/tasks-vision
- Task
- Image segmentation
- Input
- Image, video frame, or canvas
- Output
- Segmentation mask
- Execution
- Browser/client-side
- Privacy
- Images remain in the browser
- Limitation
- The model only recognizes categories it was trained to segment
Related Tools
Background Remover - AI-Powered, Free & Local
Remove image backgrounds instantly in your browser using on-device AI. Get transparent PNG, add replacement backgrounds, and batch export - no upload required.
Object Classification Studio
Classify uploaded images or webcam frames using a browser-based MobileNet model, then inspect top predictions and confidence scores.
Pose Estimation Studio
Detect human body keypoints from webcam input or uploaded images, then study skeleton tracking, posture cues, and movement analysis.
Hand Pose Detection
Use your webcam to see real-time hand landmarks and finger joints using a pre-trained hand pose model.
Image Convolution Interactive Tool
Learn CNN-style image convolution with preset kernels, a custom 3x3 matrix editor, pixel-grid calculations, and local image filtering.