Skip to content

Captioning style guide

Drew Neil edited this page Jul 6, 2015 · 9 revisions

These are the house styles for captioning Peer to Peer videos.

Guidelines at a glance

  • transcribe all significant audio content (spoken words, speaker ID, and any relevant sounds)
  • caption text should not exceed 42 characters per line
  • each caption should contain no more than two lines of text
  • characters-per-second (CPS) should not exceed 15

Transcribe all significant audio content

Captions are like subtitles that have been tailored for deaf and hard of hearing people. Wikipedia makes this distinction:

"Subtitles" assume the viewer can hear but cannot understand the language or accent. "Captions" aim to describe to the deaf and hard of hearing all significant audio content – spoken dialogue and non-speech information such as the identity of speakers – along with any significant music or sound effects.

When creating captions for Peer to Peer videos, remember that they should make the content accessible for deaf people. Any sound or spoken word that is relevant to our hearing audience should be transcribed in the captions. If in doubt, try watching the video with captions enabled and sound muted and see if it makes sense!

Use [silence] when nothing is spoken for a while

If there are no captions, we can assume that there is no significant audio content. It's reasonable to let a second or so go by without captions, but if too much time passes, then deaf viewers may start to wonder if the captions have stopped working. Don't leave them guessing. If 3 seconds go by without any speech, insert a caption containing the text [silence]. If the silence continues for a long period, display the [silence] caption for 3 seconds then no captions until the next time something is spoken. This is to reassure deaf viewers that they aren't missing out on information.

Use inaudible when you can't discern what's spoken

With recorded speech it's sometimes hard to tell what the speaker is saying. If you can't make out the words, use inaudible as a placeholder.

We'll get a second opinion when the technical reviewers proofread the captions. If our tech reviewer is also unable to hear what's being spoken, they'll replace inaudible with [inaudible]. If our tech reviewer can understand what's being spoken, they'll replace inaudible with the word(s) that they hear.

In the first draft of captions (before proofreading), use inaudible instead of [inaudible]. In the final draft of captions (which we publish for our customers), all instances of the word inaudible should appear in square brackets: [indaudible]. There's one exception to this rule: if the word 'inaudible' is spoken by the people in the video, then it should appear without the square brackets!

Don't omit spoken phrases just because they're visible on screen

If someone reads text aloud that can be seen elsewhere on screen, that text should still be transcribed in the captions. For example, say we have this passage:

    [Camera shows Drew Neil and Tom Stuart]
DN: So Tom would you like to start by
    reading out the problem?
TS: Sure, it says:
    [Camera angle changes to show Tom's screen]

Tom's screen shows a written problem statement, which he then reads aloud word for word. The captions should faithfully reproduce Tom's speech. It does mean that the words will be visible on screen in two places at once, but that's not a problem. The viewer can choose to read either text at their own pace.

In our first draft captions for this video, the captions stopped when the text appeared on screen. That's inconsiderate to our deaf viewers.

A hearing viewer watching the video (without captions enabled) might tune out the sound of Tom's voice and read the text at their own pace, or they might listen closely and read along with Tom. A deaf viewer watching the video with captions should be able to make the same choice.

Characters per second (CPS)

If a caption is displayed too briefly then it will be hard to read. The more text a caption contains, the longer it should be displayed for. Measuring characters per second (CPS) provides a useful guideline for how long a caption should be visible.

In Aegisub, there's a CPS column which shows the value for each individual caption:

Aegisub displays CPS for each caption, highlighting cells with red if they exceed 15 CPS

If the CPS value is 15 or less, the cell has a white background meaning that the caption will be visible for long enough to read all of the text. If the CPS value is 16 or more, the background goes pink or red. This indicates that the caption will not be visible for long enough to read.

15 characters per second is a guideline, but it won't always be possible. If the dialogue on screen is delivered very quickly, then it may be necessary to let the CPS value go higher. For quick readers, a CPS value as high as 25 may be tolerable. For slower readers, there's always the option of playing the video at reduced speed, so as to give longer to read each caption.

Resources