Speaker label
Speaker label / speaker ID
A text prefix or color cue that identifies who is speaking. Used in interviews, podcasts, and SDH captions.
In depth
Speaker labels identify the speaker in multi-speaker captions. Conventions vary: text labels in brackets ([JOHN]) for SDH and broadcast, color-coding for short-form social, or initial letters at the start of each cue. Whisper and modern ASR support speaker diarization to assign labels automatically — accuracy is 85-95% on clean two-speaker audio. Skip speaker labels when there's only one speaker on screen or when the visual makes it obvious.
When to use it
Use speaker labels when there are multiple speakers and the audio alone (or pure visual) doesn't disambiguate. Required for SDH and accessibility-grade captions.
Frequently asked
Should speaker labels go in brackets or at the line start?+
Both are accepted. Netflix uses brackets ([JOHN]: Hello) for accessibility content. Short-form social often skips text labels entirely and uses color-coded captions per speaker.
The process of identifying which speaker said which words in an audio recording. Critical for interviews, podcasts, and any multi-speaker content.
Subtitles that include speaker labels and non-speech audio cues like [music], [door slams], so deaf and hard-of-hearing viewers get the full experience.
Captions that the viewer can toggle on or off, typically delivered as a separate text track encoded into or alongside the video.