Speech Recognition
Professional speech-to-text running entirely in your browser. Record, transcribe, inspect speech patterns, detect filler words, and export results — no server, no API key, no audio leaves your device.
Tap to record
00:00
Audio waveform — live during recording
Settings
Session stats
Live transcript
ReadyPress the microphone to start
Final text shown in black · interim results in grey italic
How automatic speech recognition (ASR) works
The 6-stage pipeline that converts your voice into text
Microphone Input
PCM / 16 kHz
Feature Extraction
MFCC / FFT
Acoustic Model
Deep Neural Network
Language Model
N-gram / Transformer
Beam Search Decoder
Beam Search
Text Output
Final + Interim
Step 1 ·Microphone Input
Your voice is captured as analog sound waves and digitized at 16,000+ samples per second into raw audio data.
Step 2 ·Feature Extraction
Audio is split into 25ms frames. Mel-Frequency Cepstral Coefficients (MFCCs) encode how each frame sounds — similar to how a musician reads a score.
Step 3 ·Acoustic Model
A neural network maps audio features to phonemes — basic sound units like /h/-/ɛ/-/l/-/oʊ/. Trained on thousands of hours of speech.
Step 4 ·Language Model
"Ice cream" vs "I scream" — the language model ranks word sequences by their probability in real language, resolving ambiguity.
Step 5 ·Beam Search Decoder
Thousands of hypotheses are explored simultaneously. The decoder picks the highest-probability sequence that fits both acoustics and language.
Step 6 ·Text Output
The winning sequence is delivered as text. Interim (partial) results stream in live as you speak; final results are locked in.
Browser compatibility & privacy
Best supported in Chrome and Edge. Safari works on macOS & iOS (may route audio through Apple servers). Firefox has limited support. All audio is processed locally by your browser — this page never receives audio data.