MuhammadLab
Computer VisionBrowser-basedNo AI model neededConfusion matrixPrecision / Recall / F1Student lab

Confusion Matrix and Model Evaluation Visualizer

Paste true labels and predicted labels, then compute confusion matrices and core evaluation metrics for computer vision classifiers.

This page teaches how computer vision models are evaluated after prediction. Students can paste model outputs from any classification task and inspect not just overall accuracy, but also which classes get confused, where false positives appear, and why precision and recall tell different stories.

Why evaluation matters

A computer vision model is not judged only by one prediction. Students need to compare many predictions against ground truth labels to understand accuracy, mistakes, and class-level weaknesses.

What this page measures

This visualizer calculates accuracy, precision, recall, F1-score, false positives, false negatives, and a confusion matrix so students can inspect model quality after prediction.

What to discuss

High accuracy can still hide poor performance on minority classes. The confusion matrix and per-class metrics reveal whether the model is mixing up specific categories.

Input parsing: labels can be pasted one per line or separated by commas, semicolons, or tabs.
Current counts: true = 10, predicted = 10

Confusion Matrix

True \ Predcatdogbird
cat210
dog120
bird103

Overall Metrics

Accuracy

0.700

Macro Precision

0.722

Macro Recall

0.694

Macro F1

0.698

Weighted F1

0.714

Samples

10

Error Totals

False positives

3

False negatives

3

These are aggregated one-vs-rest counts across classes. They help students understand how often the model predicts a class incorrectly or misses a class when it should have predicted it.

Per-Class Metrics

ClassPrecisionRecallF1TPFPFNSupport
cat0.5000.6670.5712213
dog0.6670.6670.6672113
bird1.0000.7500.8573014

How To Read It

  • Accuracy tells how many predictions were correct overall, but it can hide weak performance on small classes.
  • Precision answers: when the model predicts a class, how often is it right?
  • Recall answers: when a class is truly present, how often does the model find it?
  • F1-score balances precision and recall, which is useful when students want one summary score per class.
  • The confusion matrix shows exactly which labels are being mixed up, which is often the most useful diagnostic after model prediction.