Accessibility in video and media is increasingly important, since as a society we have almost completely moved to online communication. Common accessibility barriers become apparent when visual or audio information becomes unavailable to viewers due to blindness, low vision, deafness or hard-of-hearing, or difficulty in focusing and comprehending content.

Some viewers depend on speech recognition software to operate their devices. People without disabilities also benefit from accessibility features in a variety of situations, such as loud environments like shopping malls, airports, etc., or when silence is expected, like libraries, schools, or museums. Poor quality lighting and not enough contrast between text and background color make it hard to see and perceive content.

71% of students without hearing difficulties use captions, primarily to help them focus and retain information. It’s estimated that 85% of Facebook videos are watched without sound.

There are four basic methods to address needs for accessibility: captions, transcripts, descriptive transcripts, and audio descriptions.

Captions and Transcripts

Captioning is a process where spoken content within a video is converted into a text file, split into chunks, and time-coded to synchronize with video frames. In addition, captions should identify speakers, and include descriptions of sound effects, such as music, traffic noise, and sounds that are not visually apparent.

When video is pre-recorded, it is best to provide captions within a video as well as a separate transcript. For videos, captions enable people who are deaf or hard-of-hearing to see the visual content and read the captions at the same time. For audio-only, captions enable people who are hard-of-hearing to get the richness of listening to the audio and fill in what they don’t hear well by reading the captions.

Transcripts are needed to provide access for people who are deaf-blind and depend on synthesized speech or Braille. People without disabilities also use transcripts for varied reasons. Transcription is a process by which speech or audio is converted into a written, plain-text document. As the plain-text output, the transcript itself does not have any time information attached to it. Captions and transcripted content will often contain the same information.

Descriptive transcripts and audio descriptions

Descriptive transcripts delivered as text are needed for most videos to be accessible to people who are deaf-blind. People who have difficulty processing auditory information and people who cannot focus and comprehend auditory or visual information benefit from descriptions of information presented in a video. Descriptive transcripts are also used by people without disabilities.

Text-based descriptions have been around for many years. They contain the text of the dialogue (indicating who is speaking it), as well as descriptions of other non-speech sounds and situations happening while a video is playing.

Audio descriptions are a newer technology that can supplement captioning. Audio descriptions provide recorded narration within the audio, describing what is going on visually on the screen. Descriptions are often recorded with a voice contrasting to that of the narrator. Both the blind and those without impairments who are only listening to the video benefit from this approach.

Based on WCAG guidelines, accessibility requirements for video and audio are different depending if they are pre-recorded, or live video with audio, video without audio (video-only), or audio-only.

