MuhammadLab
Natural Language Processing

Natural Language Generation

Build intuition for language models by training a small n-gram predictor, typing a context, and inspecting the exact probability calculation behind each suggested next word.

Current best prediction

students

2.26% estimated probability

Model context

processing helps + helps + unigram

1. Type the context

The last words become the evidence used for prediction.

2. Train the mini corpus

Edit the examples and watch the predictions change.

Prediction ranking

Top next words sorted by estimated probability.

1

students

count signal 1

2.26%

2

and

count signal 8

1.31%

3

models

count signal 6

1.15%

4

learning

count signal 5

1.07%

5

to

count signal 5

1.07%

6

next

count signal 4

0.99%

7

the

count signal 4

0.99%

8

word

count signal 4

0.99%

What the model learns

An n-gram model counts which words follow a context in the training corpus. If the context appears many times, the prediction becomes more confident.

Why smoothing matters

Small corpora miss many possible word combinations. Laplace smoothing gives every vocabulary word a small chance instead of treating unseen words as impossible.

How LLMs extend this

Modern transformers do not only count nearby words. They learn embeddings and attention patterns so much longer context can influence the next token.