How long does it take to caption a video?
Short answer
AI captioning: 10–60 seconds for a 5–20 minute video. Manual captioning: 4–6 minutes per minute of video. Hybrid (AI + edit): 30 seconds plus 5–10% of video length.
Detail
Captioning time depends on the method. AI captioning with Whisper-class models takes roughly 5% of video runtime to transcribe — a 20-minute video takes about a minute. Manual captioning by hand takes 4–6 minutes per minute of video for clean speech, longer for accented or fast speech. The practical 2026 workflow is hybrid: let AI generate the first pass, then spend 5–10% of the video runtime hand-correcting proper nouns, technical terms, and any audio artifacts.
| Method | Time per minute of video |
|---|---|
| AI captioning (Whisper) | ~3–5 seconds |
| AI + hand-edit | ~30–60 seconds |
| Manual transcription | 4–6 minutes |
| Professional service (Rev, 3Play) | Same-day human delivery |
Related answers
Try SoCaptions free.
5 minutes of transcription free, no card required.