Speaker label

Speaker label / speaker ID

A text prefix or color cue that identifies who is speaking. Used in interviews, podcasts, and SDH captions.

In depth

Speaker labels identify the speaker in multi-speaker captions. Conventions vary: text labels in brackets ([JOHN]) for SDH and broadcast, color-coding for short-form social, or initial letters at the start of each cue. Whisper and modern ASR support speaker diarization to assign labels automatically — accuracy is 85-95% on clean two-speaker audio. Skip speaker labels when there's only one speaker on screen or when the visual makes it obvious.

When to use it

Use speaker labels when there are multiple speakers and the audio alone (or pure visual) doesn't disambiguate. Required for SDH and accessibility-grade captions.

Frequently asked

Should speaker labels go in brackets or at the line start?+

Both are accepted. Netflix uses brackets ([JOHN]: Hello) for accessibility content. Short-form social often skips text labels entirely and uses color-coded captions per speaker.

Related terms

Speaker diarization

The process of identifying which speaker said which words in an audio recording. Critical for interviews, podcasts, and any multi-speaker content.

SDH

Subtitles that include speaker labels and non-speech audio cues like [music], [door slams], so deaf and hard-of-hearing viewers get the full experience.

Closed captions

Captions that the viewer can toggle on or off, typically delivered as a separate text track encoded into or alongside the video.

Skip the file-format gymnastics.

Drop a video into the SoCaptions editor — get ready-to-publish captions in any format.

Try free