Multimedia Accessibility

Multimedia content such as video and audio enrich the digital experience for website visitors. The following practices for captions, audio descriptions, and transcripts will help you create equivalent experiences for all users.


Captions convert audio dialogue and sounds into text that appears on a video, synchronized with audio. They can be added to a pre-recorded video, or added to live videos in real time. Captions provide equal access and effective communication for the Deaf and hearing impaired, which means that captions are essential in allowing full participation. But captions can also enhance the experience for non-native language speakers, people with learning disabilities, and anyone in a noisy environment.

 When we recruit great faculty,

Accuracy requirement for captions

To ensure the videos you produce and post at Harvard are accessible and conform to WCAG 2.1 guidelines, they should have accurate captions that include proper punctuation, speaker identification, and the identification of sounds other than speech. Professional captioning vendors are excellent resources for providing budget-friendly, high-quality captions with a quick turnaround. While auto-captioning tools continue to make significant progress in accuracy, they are not yet sufficient to meet the accuracy standard of Harvard’s Policy without substantial editing for quality assurance.

Captions for pre-recorded video

There are several approaches to adding post-production captions. Select an approach that’s right for your project based on resources available, timeline, and workflow.

At a Glance: Approaches for adding captions to pre-recorded video
Approach Cost Accuracy Effort Vendors
Professional vendor (recommended) Fee-for-service High Minimal editing required  3Play Media (Harvard’s preferred vendor) or Rev
Auto-generated and manual editing Free Variable Substantial editing required YouTube, Microsoft Stream, or IBM Watson

Professional vendor (recommended approach)

Use a captioning service such as 3Play Media (Harvard’s preferred vendor) or Rev that provide both the captions or transcription and synchronization to your video. This fee-for-service approach is highly recommended, as it will provide the greatest accuracy and quick turnaround time, with a minimal use of internal resources and effort. You might consider providing the vendor with a “cheat sheet” of speaker names, terminology, or other notes that might assist in producing accurate captions.

See Harvard’s discounted 3Play Media pricing and request an account if you do not already have one. Find step-by-step instructions or watch a short video on how to log in to 3Play with HarvardKey.

Get Started with Harvard's Captioning Vendors

Auto-generated and manual editing

Captions generated by Artificial Intelligence (AI) provide variable degrees of accuracy. This approach works best when the video is of a single person speaking clearly, and with limited noise interference in the background. Accuracy rates drop if there are multiple people with overlapping dialog, if there is background noise in the video, or if specialized terminology is used. 

In order to ensure captions meet the accuracy standards of Harvard’s Policy, any captions produced with AI will need to be edited before the video is posted online. If you use an AI tool (such as YouTube, Microsoft Stream, or IBM Watson), you can then make corrections with a caption-editing tool (such as CADET or Amara) and synchronize the text with your video. While this approach is low-cost, it requires more time spent on editing and proofreading for accuracy. Watch this video on how to edit captions in YouTube Studio from Harvard Library’s User Research Center.

Captions for live videos and eventsIcon of laptop with video and captions on the screen.

Live captions convert audio dialogue and sounds into text that appears on a video in real time. They are commonly provided for events and meetings that are streamed over the internet or for in-person meetings. Learn more about providing live captions for events

Zoom offers automatically generated live captions for every meeting, and meeting hosts are encouraged to turn them on as an inclusive practice. Turning on live transcripts in Zoom

VITAC is Harvard’s preferred vendor for professional live captioning. View pricing and request services for professional live captioning with VITAC (HarvardKey required).


Audio files posted on Harvard websites should include a transcript, a non-synchronized text of the audio material provided adjacent to the video or audio file. The transcript should include speaker identification and non-speech sounds, such as: [Professor], [doorbell rings], [cough], [jazz music], etc. Like captions, transcripts provide equal access for people with hearing impairments. But they also benefit non-native language speakers and anyone in a noisy environment, and they have the added benefit of being searchable. 

Transcripts for audio files can be produced using the same methods as captioning - either through a professional vendor or with AI tools and manual editing. The transcript can be linked or displayed alongside the audio file. Transcripts must meet the same accuracy requirements as captions, and should include proper punctuation, speaker identification, and the identification of sounds other than speech.

Audio Description

Videos should be carefully scripted or edited in a way that ensures all important content is accessible through the audio track. Common examples include title cards and speaker identification. Because audio description is much more expensive than captioning—and many of the most prevalent examples can be made accessible in standard scripting and editing workflows—we recommend planning for these methods before seeking to outsource audio description. 

If this is not the case, any important information that’s presented visually may be described in a separate narration track using a technique called audio description. Usually this is a spoken audio track that describes what is visually happening in a video. (Think of it like alternative text for videos.) The audio narrative plays during the natural pauses in the video. Audio description provides equal access to people with vision loss as well as people with cognitive disabilities. See this example of a described video. 3Play Media and Vitac both provide audio description services. 

Social Media Accessibility Best Practices

Social media accounts at Harvard should make every effort to ensure the content they share is accessible to all audiences. In addition to being more inclusive, Harvard flagship channels prioritize promoting content that is accessible. Harvard Public Affairs & Communications has provided guidance for accessibility on social media in general, and specifically on Twitter, Facebook, Instagram, and LinkedIn.