MuhammadLab
AI ToolsBrowser-basedWebcam AIReal-time landmarksGesture recognitionStudent demo

Hand Pose Detection

Use your webcam to see how an AI model detects hand landmarks in real time.

Webcam preview

Click Start Camera to enable your webcam.

Camera controls

Start webcam and track hands

The demo does not start the webcam automatically.

Camera not started

Permission not requested yet

Privacy note

  • Webcam frames are processed locally in the browser.
  • No images/video are uploaded to a server by this demo.
  • The camera does not start automatically; click “Start Camera”.
  • You can stop the camera at any time.

Model & camera status

Model ready

Loading hand model...

Detected hands

0

No hand detected yet.

Performance

0 frames processed

Camera

Camera not started

Permission not requested

Interpretation

What the model is doing now

Loading the hand model...

Left hand

Not detected

Right hand

Not detected

Visual controls

Landmarks & skeleton

Adjust what you see, and use the confidence threshold to hide uncertain points.

Higher threshold hides more low-confidence keypoints.

Gesture recognition (educational)

From landmarks to simple gestures

This demo focuses on landmark detection first. The gesture indicator below is a simple rule-based educational example that uses relative landmark positions and distances (not a trained gesture classifier).

Current gesture

Based on the latest detected landmarks.

How it works (rules)

  • Open hand: fingertip joints appear above lower finger joints.
  • Fist: fingertip joints appear below finger joints.
  • Pointing: index finger is more extended than other fingers.
  • Pinch: thumb tip is close to index fingertip.

Step-by-step pipeline

How hand pose detection works

This pipeline highlights the flow from webcam input to landmark points and gesture cues.

1

Webcam frame input

A webcam provides live video frames for real-time processing.

2

Frame preprocessing

The browser prepares the frame for the model (resize/format as needed).

3

Hand region detection

The model looks for hand areas in the frame.

4

Landmark prediction

It predicts key points on the hand, like wrist and finger joints.

5

Confidence scoring

Each landmark can include a confidence score (how sure the model is).

6

Skeleton drawing

When points are confident enough, the demo draws a hand skeleton.

7

Coordinates & table

Coordinates update continuously so students can inspect the numbers.

8

Gesture understanding

Simple gesture examples can be estimated from relative landmark positions.

Landmark coordinates

Coordinates will appear after detection

Start the camera and show your hand in good lighting.

Hand landmark map

Common landmark groups

Pose detection predicts key points on your hand. By connecting landmark groups, we can draw a skeleton that shows hand pose.

Wrist

The base landmark where the hand attaches. It helps anchor pose and motion.

Thumb joints

The thumb joints help model grip and pointing angles.

Index finger joints

Index landmarks describe pointing and fine finger movement.

Middle finger joints

Middle landmarks help describe overall hand posture.

Ring finger joints

Ring landmarks contribute to hand shape and curl.

Pinky finger joints

Pinky landmarks help complete the hand pose estimate.

Real-time AI

Why it updates continuously

Your webcam provides many frames per second. The model runs inference repeatedly on those frames, then the canvas overlay redraws with the latest hand landmarks. Faster devices may run more smoothly, while lighting, distance, occlusion, and device performance affect accuracy.

Browser-based AI helps keep frames local: this demo does not upload video to a server.

How it works

Hand landmarks, then skeleton lines, then (simple) gesture cues

Hand pose detection is a computer vision task where an AI model identifies important landmark points on a hand, such as the wrist, fingertips, and finger joints.

A landmark is a predicted key point. By connecting landmarks, we can draw a hand skeleton that represents the pose.

Gesture recognition can be built on top of landmark positions. This page focuses on landmark detection first; gesture cues shown here are simple rule-based educational examples.

The model is pre-trained and runs inference in the browser. It is not learning from your webcam during the demo.

Landmark coordinates explained

How to read the table

X coordinate

Horizontal position of the landmark in the video frame.

Y coordinate

Vertical position of the landmark in the video frame.

Z coordinate

Estimated depth or relative distance if provided by the model. If missing, the demo shows “--”.

Confidence threshold

Only keypoints with confidence above this value are drawn (and treated as “shown” in the table).

Real-time inference means predictions repeat across frames while the webcam is running.

Privacy note

Keep the webcam local

Webcam frames are processed locally in your browser. This demo does not upload video to a server.

You can stop the camera at any time. The page does not identify you; it only estimates hand landmarks.

Limitations & ethics

Understand the risks before using tracking

  • The model can lose tracking.
  • Poor lighting can reduce accuracy.
  • Fast movement can cause unstable landmarks.
  • Hands partly outside the frame may not be detected.
  • Gloves, occlusion, unusual angles, and overlapping hands reduce accuracy.
  • Hand tracking should be used carefully in surveillance, identity-related, workplace, or high-stakes settings.
  • This demo is for learning and teaching only.

Student learning outcomes

What you will learn

  • Understand what hand pose detection means.
  • Understand landmarks and skeleton connections.
  • Understand real-time webcam inference.
  • Understand how landmark coordinates can support gesture recognition.
  • Understand classification vs detection vs landmarks vs pose tracking.
  • Understand privacy considerations for webcam-based AI.
  • Learn how pre-trained computer vision models can run in the browser.

Final reminder

The model is pre-trained and performs inference only. It does not learn from your webcam during this demo.