Force-aligned
Force-aligned subtitles
Subtitles where the existing text is automatically time-aligned to the audio, instead of being transcribed from scratch.
In depth
Force alignment takes a known subtitle text and aligns each word to the audio's actual timestamps. Useful when you have a manually-typed transcript or pre-translated subtitles and need timing without re-transcribing. Force alignment is faster and often more accurate than full ASR because the text is already correct — only timing is being inferred. Whisper-based tools support force alignment as an alternative to standard transcription.
When to use it
Use force alignment when you already have an accurate transcript (translation, manual typing, broadcast script) and need the timing populated. Skip it when the transcript itself is unknown.
Frequently asked
How accurate is force alignment compared to transcription?+
Significantly more accurate on text — because the text is given. Word-level timestamps from force alignment are typically within ±50ms of true onset, comparable to direct ASR.
Can I force-align a translated SRT?+
Yes, but only if the translation matches the audio language. Force-aligning an English translation to a Spanish audio won't work — alignment requires the text to be in the audio's spoken language.
Software that converts spoken audio into text. Whisper, AssemblyAI, Deepgram, Google Speech-to-Text are all ASR engines.
OpenAI's open-source automatic speech recognition model. The de facto baseline for AI subtitle generation and the engine behind most modern caption tools.
Timing data that marks the start and end of every word, not just every cue. The foundation for karaoke captions and word-by-word reveal animations.