Captions for · Use case

Captions for podcast clips that earn the swipe.

Podcast video moves slower than vlog or sketch content. Captions are not optional — they're how viewers triage whether the conversation is worth their attention.

Aspect ratio
9:16 (1080×1920) for TikTok/Reels/Shorts; 16:9 for YouTube long-form
Resolution
Source 16:9 1920×1080; clipped 9:16 1080×1920
Font size
56–68px on a 1080-wide canvas
Safe zone
Vertical clips inherit the destination platform's safe zone (TikTok 18%, Reels 22%, Shorts 13%). Captions sit best at 55–65% from top to leave room for both speakers' faces in a side-by-side cut.

Why captions matter on Vertical podcast clips

Podcast clips compete in feeds against high-energy vlog content. Captions are the equalizer — a static talking-head clip with crisp captions outperforms a flashy unfocused clip 9 times out of 10.

Recommended style

Speaker-color highlights (one accent per speaker) help viewers track conversational flow. Stick to 2 colors maximum. Keep the font calmer than vlog content — podcast viewers are reading-leaning.

The Vertical podcast clips captioning playbook

  1. 01
    Cut your clip from the master record
    30–90 seconds is the sweet spot. Pick a single argument or punchline that stands without context.
  2. 02
    Reframe to 9:16
    Side-by-side speaker layout for two-person podcasts. Single speaker on the top half for solo or interview cuts.
  3. 03
    Caption with speaker colors
    Assign a single highlight color per speaker. Viewers track who's talking without speaker labels cluttering the frame.
  4. 04
    Export MP4 + matching SRT
    MP4 for cross-posting. SRT for the long-form YouTube upload, which YouTube indexes for search.
Do
  • Cut the clip around a single sentence-long hook. Captioning every clip you post forces you to find the hook first.
  • Caption every clip identically — the same font, size, position. Podcast clips perform best as a recognizable visual pattern.
  • Always upload the matching SRT to the long-form YouTube episode. The transcript ranks for the conversation's keywords.
  • Test 30s, 60s, 90s cuts of the same clip. Length sensitivity varies by topic.
Don’t
  • Don't use word-by-word reveals for slow conversations. The reveal pace clashes with the speaker's cadence.
  • Don't crop both speakers into a single 9:16 frame side-by-side without resizing. Faces should be at least 25% of frame height.
  • Don't ship podcast clips without captions. The category lives or dies on text-readable hooks.
  • Don't auto-translate speaker names. Whisper occasionally guesses at proper nouns — hand-correct.

Frequently asked

What's the best length for a podcast clip?+

30–90 seconds. Anything under 30s rarely lands the punchline; anything over 90s loses the swipe-feed audience. 60s is the median best performer.

Should podcast captions identify each speaker?+

Use color, not text labels. A single accent color per speaker is enough — text labels eat captioning real estate and viewers lose the punchline.

What font size for podcast clips on mobile?+

56–68px on a 1080-wide canvas. Slightly larger than vlog content because the speaker's face takes more attention budget than in a one-shot vlog.

Can I use the same captions on the long-form episode?+

Yes — export an SRT from the master and upload it to YouTube Studio. The clip captions are a subset of the master SRT.

Should I add background music to a captioned clip?+

Quiet bed music (-30 dB under speech) helps with feed retention. Loud music fights the captions for attention.

Keep reading
Caption your next Vertical podcast clips video in seconds.
Free for the first 5 minutes. No card required.
Open editor