Transcript vs subtitles
Transcript vs subtitles
A transcript is a written record of spoken content. Subtitles are timed text overlaid on a video. Same words, different deliverable.
In depth
A transcript is the written form of spoken content — usually delivered as a text file or document, optionally with speaker labels and timestamps marking each paragraph or sentence. Subtitles are timed text segments designed to be displayed on screen alongside a video, with strict reading-speed limits and short cue durations. Transcripts are for reading; subtitles are for watching. The same words, but the output, formatting, and use case are completely different.
When to use it
Want a written record people will read on its own — show notes, blog repurposing, search indexing, legal record? Make a transcript. Want text on screen that helps viewers follow the video? Make subtitles. Many workflows generate both from a single Whisper pass.
Frequently asked
Can I turn a transcript into subtitles?+
Only if you also have the original audio for timing — or you accept rough timing from the text alone. The Text-to-SRT tool does the latter (estimating timing by reading speed). For accurate subtitles, run the audio through a transcription tool that outputs timed cues directly.
Can I turn subtitles into a transcript?+
Yes — strip the timestamps from an SRT or VTT and you have a near-transcript. The SRT to plain text converter does this. Add speaker labels and paragraph breaks for a polished result.
What does a 'verbatim' transcript include?+
Every utterance — uhms, false starts, repetitions, filler words. Most published transcripts use 'clean verbatim': the same content, with filler words removed for readability. Subtitles are almost always clean verbatim plus condensation to fit reading-speed limits.
The most common subtitle file format. Plain text with numbered cues and HH:MM:SS,mmm timestamps.
Captions that the viewer can toggle on or off, typically delivered as a separate text track encoded into or alongside the video.
OpenAI's open-source automatic speech recognition model. The de facto baseline for AI subtitle generation and the engine behind most modern caption tools.