Auto-generated and manual editing
Captions generated by Artificial Intelligence (AI) provide variable degrees of accuracy. This approach works best when the video is of a single person speaking clearly, and with limited noise interference in the background. Accuracy rates drop if there are multiple people with overlapping dialog, if there is background noise in the video, or if specialized terminology is used.
In order to ensure captions meet the accuracy standards of Harvard’s Policy, any captions produced with AI will need to be edited before the video is posted online. If you use an AI tool (such as YouTube, Microsoft Stream, or IBM Watson), you can then make corrections with a caption-editing tool (such as CADET or Amara) and synchronize the text with your video. While this approach is low-cost, it requires more time spent on editing and proofreading for accuracy. Watch this video on how to edit captions in YouTube Studio from Harvard Library’s User Research Center.
Captions for live videos
Live captions convert audio dialogue and sounds into text that appears on a video in real time. They are commonly provided for events and meetings that are streamed over the internet or for in-person meetings. Learn more about providing live captions for events.
VITAC is Harvard’s preferred vendor for professional live captioning. View pricing and request services for professional live captioning with VITAC (HarvardKey required).
Transcripts
Audio files posted on Harvard websites should include a transcript, a non-synchronized text of the audio material provided adjacent to the video or audio file. The transcript should include speaker identification and non-speech sounds, such as: [Professor], [doorbell rings], [cough], [jazz music], etc. Like captions, transcripts provide equal access for people with hearing impairments. But they also benefit non-native language speakers and anyone in a noisy environment, and they have the added benefit of being searchable.
Transcripts for audio files can be produced using the same methods as captioning - either through a professional vendor or with AI tools and manual editing. The transcript can be linked or displayed alongside the audio file. Transcripts must meet the same accuracy requirements as captions, and should include proper punctuation, speaker identification, and the identification of sounds other than speech.
Audio Description
Videos should be carefully scripted or edited in a way that ensures all important content is accessible through the audio track. Common examples include title cards and speaker identification. Because audio description is much more expensive than captioning—and many of the most prevalent examples can be made accessible in standard scripting and editing workflows—we recommend planning for these methods before seeking to outsource audio description.
If this is not the case, any important information that’s presented visually may be described in a separate narration track using a technique called audio description. Usually this is a spoken audio track that describes what is visually happening in a video. (Think of it like alternative text for videos.) The audio narrative plays during the natural pauses in the video. Audio description provides equal access to people with vision loss as well as people with cognitive disabilities. See this example of a described video. 3Play Media and Vitac both provide audio description services.
Social Media Accessibility Best Practices
Social media accounts at Harvard should make every effort to ensure the content they share is accessible to all audiences. In addition to being more inclusive, Harvard flagship channels prioritize promoting content that is accessible. Harvard Public Affairs & Communications has provided guidance for accessibility on social media in general, and specifically on Twitter, Facebook, Instagram, and LinkedIn.