Captions for Videos
Captions convert audio dialogue and sounds into text that appears on a video, synchronized with audio. They can be added to a pre-recorded video, or added to live videos in real time.
Captions provide equal access and effective communication for Deaf and hard of hearing people, which means that captions are essential in allowing full participation. But captions can also enhance the experience for non-native language speakers, people with learning disabilities, and anyone in a noisy environment.
In this page:
Accuracy requirement for captions
To ensure the videos you produce and post at Harvard are accessible and conform to WCAG 2.1 guidelines, they should have accurate captions that include proper punctuation, speaker identification, and the identification of sounds other than speech. Professional captioning vendors are excellent resources for providing budget-friendly, high-quality captions with a quick turnaround. While auto-captioning tools continue to make significant progress in accuracy, they are not yet sufficient to meet the accuracy standard of Harvard’s Policy without substantial editing for quality assurance.
Captions for pre-recorded video
There are several approaches to adding post-production captions. Select an approach that’s right for your project based on resources available, timeline, and workflow.
| Approach | Cost | Accuracy | Effort | Vendors |
|---|---|---|---|---|
Professional vendor (recommended) | Fee-for-service | High | Minimal editing required | 3Play Media (Harvard’s preferred vendor) |
Auto-generated and manual editing | Free | Variable | Substantial editing required |
Professional vendor (recommended approach)
Use a captioning service such as 3Play Media (Harvard’s preferred vendor) that provide both the captions or transcription and synchronization to your video. This fee-for-service approach is highly recommended, as it will provide the greatest accuracy and quick turnaround time, with a minimal use of internal resources and effort. You might consider providing the vendor with a “cheat sheet” of speaker names, terminology, or other notes that might assist in producing accurate captions.
See Harvard’s discounted 3Play Media pricing and request an account if you do not already have one. Find step-by-step instructions or watch a short video on how to log in to 3Play with HarvardKey.
Auto-generated and manual editing
Captions generated by Artificial Intelligence (AI) provide variable degrees of accuracy. This approach works best when the video is of a single person speaking clearly, and with limited noise interference in the background. Accuracy rates drop if there are multiple people with overlapping dialog, if there is background noise in the video, or if specialized terminology is used.
In order to ensure captions meet the accuracy standards of Harvard’s Policy, any captions produced with AI will need to be edited before the video is posted online. If you use an AI tool (such as YouTube, Microsoft Stream, or IBM Watson), you can then make corrections with a caption-editing tool (such as CADET or Amara) and synchronize the text with your video. While this approach is low-cost, it requires more time spent on editing and proofreading for accuracy. Watch this video on how to edit captions in YouTube Studio from Harvard Library's UX team.
Academic video captioning
3Play Media offers discounted pricing on 24- and 48-hour turnaround times for captioning recorded academic content. 3Play Projects can easily be linked to Panopto or Kaltura accounts, allowing for efficient application of professional captioning to a recorded file. For those interested, fill out the online form at the 3Play-Harvard Getting Started page, and indicate your interest in course video captioning. Questions can be directed to Digital Accessibility Services at digitalaccessibility@harvard.edu.
Captioning of sensitive data
Harvard affiliates needing to caption or transcribe sensitive data can do so via Harvard’s agreement with 3Play Media, which includes provisions for Data Security Level 4, Personal Identifiable Information (PII), or HIPAA. 3Play takes extra precautions to ensure this content is secure resulting in a slightly higher rate for PII projects, while still receiving discount pricing under the agreement. To request this type of 3Play Project, indicate the requirement on the Getting Started with 3Play form. Learn more about 3Play Media Security or contact Digital Accessibility Services with questions.
Captions for live videos and events
Live captions convert audio dialogue and sounds into text that appears on a video in real time. They are commonly provided for events and meetings that are streamed over the internet or for in-person meetings. Learn more about when to provide live captions for events.
Zoom offers automatically generated live transcripts for every meeting, and meeting hosts are encouraged to turn on live captions in Zoom as an inclusive practice. Note: If captions have been requested as an accommodation for an event, or if Harvard's Digital Accessibility Policy requires your event to be live captioned, auto-generated captions are not sufficient. A professional vendor should be used to provide live captions in such cases.
View pricing and request services for professional live captioning (HarvardKey required)