MuhammadLab
Machine LearningEDABrowser-basedStudent demo

Interactive data analysis explorer

Univariate, Bivariate, and Multivariate Analysis Explorer

Understand how data analysis helps us study patterns before building machine learning models.

Before we build a machine learning model, we need to understand the data. Data is not just numbers in a table. Each column may contain useful information about people, objects, events, or behaviours.

Why do we study data patterns?

Machine learning models learn from patterns in data. If the data contains errors, missing values, strange values, weak relationships, or irrelevant variables, the model may perform poorly.

What does each variable look like?
Are there missing or unusual values?
Are two variables connected?
Which variables may help predict an outcome?
Are some variables repeating the same information?
Is the dataset suitable for machine learning?
Data analysis is the first step before machine learning because it helps us understand what the model will learn from.

1. Univariate Analysis

Univariate analysis means analysing one variable at a time.

If we analyse only the exam scores of students, we are doing univariate analysis.

Questions it helps answer

  • - What is the average exam score?
  • - What is the highest score?
  • - Are there missing values?
  • - Are there unusual values or outliers?
MeanMedianMinimumMaximumRangeStandard deviationFrequency countsHistogramBar chartBox plot
Univariate analysis is like looking at one column in a dataset and asking: "What does this variable look like by itself?"

2. Bivariate Analysis

Bivariate analysis means analysing two variables at the same time.

If we analyse study hours and exam scores together, we are doing bivariate analysis.

Questions it helps answer

  • - Do students who study more get higher scores?
  • - Is age related to income?
  • - Are two categories related?
  • - Is the relationship positive, negative, or weak?
Scatter plotCorrelationGroup comparisonCross-tabulationBox plot by categoryGrouped bar chartSimple linear regression
Bivariate analysis is like looking at two columns and asking: "Are these two variables connected?"

3. Multivariate Analysis

Multivariate analysis means analysing three or more variables at the same time.

If we analyse study hours, attendance, sleep hours, screen time, and exam score together, we are doing multivariate analysis.

Questions it helps answer

  • - Which variables are most important?
  • - Can we predict exam score using several variables?
  • - Are some variables strongly connected?
  • - Which variables should we use in a machine learning model?
Multiple regressionLogistic regressionCorrelation matrixHeatmapPCAClusteringDecision treesFeature importance
Multivariate analysis is like looking at many columns together and asking: "How do these variables work together?"

Comparison table

Type of AnalysisNumber of VariablesMain QuestionExampleCommon Visualisation
UnivariateOne variableWhat does this variable look like?Exam score onlyHistogram, bar chart, box plot
BivariateTwo variablesAre these two variables related?Study hours vs exam scoreScatter plot, box plot, correlation
MultivariateThree or more variablesHow do many variables work together?Study hours + attendance + sleep predicting exam scoreHeatmap, regression, PCA, feature importance

Interactive Analysis Explorer

Explore a sample dataset or upload your own

Choose univariate, bivariate, or multivariate analysis. The tool detects column types, updates the selectors, calculates statistics, and explains what the results mean.

Rows

10

Columns

7

Numerical columns

5

Categorical/text columns

2

Univariate analysis: Exam Score

One variable by itself

Mean

64.5

Median

67

Minimum

40

Maximum

88

Range

48

Std. deviation

15.61

Missing values

0

Unique values

10

Histogram / distribution

40-482
48-562
56-640
64-722
72-802
80-882
The selected variable has an average value of 64.5. The lowest value is 40 and the highest value is 88. This helps us understand the overall distribution before comparing it with other variables.

Dataset preview

Sample student dataset - showing first 8 rows

Sample data
Student NameStudy HoursAttendance PercentageSleep HoursScreen TimeExam ScorePass/Fail
A2706555Pass
B4857370Pass
C5908282Pass
D1605745Fail
E3756464Pass
F6958188Pass
G2654650Fail
H5887379Pass

Student Name

categorical

Missing: 0 | Unique: 10

Study Hours

numerical

Missing: 0 | Unique: 6

Attendance Percentage

numerical

Missing: 0 | Unique: 10

Sleep Hours

numerical

Missing: 0 | Unique: 5

Screen Time

numerical

Missing: 0 | Unique: 8

Exam Score

numerical

Missing: 0 | Unique: 10

Pass/Fail

categorical

Missing: 0 | Unique: 2

Why is this important in machine learning?

Machine learning models learn patterns from data. If we do not understand the data first, we may build a poor model.

Understand the distribution of each variable
Find relationships between variables
Detect outliers and errors
Identify useful features
Remove irrelevant or duplicated variables
Understand possible bias in the dataset
Choose the right machine learning model
Explain results more clearly

For example, before predicting exam score, we should first check whether study hours, attendance, sleep, and screen time are related to the score.

Important: Correlation does not always mean causation

If two variables are related, it does not always mean one variable causes the other. Other factors such as attendance, previous knowledge, sleep, and motivation may also affect the result.

1

Start with one variable

Use univariate analysis first to understand each column separately. This helps detect missing values, unusual values, high variation, and general patterns.

2

Then compare two variables

Use bivariate analysis to check whether two variables are related. This helps identify useful predictors and possible relationships.

3

Finally study many variables together

Use multivariate analysis when the outcome depends on many factors. This is closer to real machine learning because models usually use multiple input features.

Check Your Understanding

Mini quiz

Score: 0/5

1. If we analyse only exam score, what type of analysis is it?

2. If we analyse study hours and exam score together, what type of analysis is it?

3. If we analyse study hours, attendance, sleep, and exam score together, what type of analysis is it?

4. Which visualisation is commonly used for two numerical variables?

5. Why is data analysis important before machine learning?

Summary

Univariate analysis studies one variable by itself. Bivariate analysis studies the relationship between two variables. Multivariate analysis studies three or more variables together.

These methods are important because machine learning depends on patterns in data. Before building a model, we should understand individual variables, relationships between variables, and the combined effect of multiple variables. A good machine learning workflow starts with data understanding before model building.