Computer VisionBrowser-basedNo AI model neededConfusion matrixPrecision / Recall / F1Student lab

Confusion Matrix and Model Evaluation Visualizer

Paste true labels and predicted labels, then compute confusion matrices and core evaluation metrics for computer vision classifiers.

This page teaches how computer vision models are evaluated after prediction. Students can paste model outputs from any classification task and inspect not just overall accuracy, but also which classes get confused, where false positives appear, and why precision and recall tell different stories.

Why evaluation matters

A computer vision model is not judged only by one prediction. Students need to compare many predictions against ground truth labels to understand accuracy, mistakes, and class-level weaknesses.

What this page measures

This visualizer calculates accuracy, precision, recall, F1-score, false positives, false negatives, and a confusion matrix so students can inspect model quality after prediction.

What to discuss

High accuracy can still hide poor performance on minority classes. The confusion matrix and per-class metrics reveal whether the model is mixing up specific categories.

True labelsPredicted labels

Input parsing: labels can be pasted one per line or separated by commas, semicolons, or tabs.
Current counts: true = 10, predicted = 10

Confusion Matrix

True \ Pred	cat	dog	bird
cat	2	1	0
dog	1	2	0
bird	1	0	3

Overall Metrics

Accuracy

0.700

Macro Precision

0.722

Macro Recall

0.694

Macro F1

0.698

Weighted F1

0.714

Samples

Error Totals

False positives

False negatives

These are aggregated one-vs-rest counts across classes. They help students understand how often the model predicts a class incorrectly or misses a class when it should have predicted it.

Per-Class Metrics

Class	Precision	Recall	F1	TP	FP	FN	Support
cat	0.500	0.667	0.571	2	2	1	3
dog	0.667	0.667	0.667	2	1	1	3
bird	1.000	0.750	0.857	3	0	1	4

How To Read It

Accuracy tells how many predictions were correct overall, but it can hide weak performance on small classes.
Precision answers: when the model predicts a class, how often is it right?
Recall answers: when a class is truly present, how often does the model find it?
F1-score balances precision and recall, which is useful when students want one summary score per class.
The confusion matrix shows exactly which labels are being mixed up, which is often the most useful diagnostic after model prediction.