Captions for Videos

Captions convert audio dialogue and sounds into text that appears on a video, synchronized with audio. They can be added to a pre-recorded video, or added to live videos in real time.

Captions provide equal access and effective communication for Deaf and hard of hearing people, which means that captions are essential in allowing full participation. But captions can also enhance the experience for non-native language speakers, people with learning disabilities, and anyone in a noisy environment.

 When we recruit great faculty,

Accuracy requirement for captions

To ensure the videos you produce and post at Harvard are accessible and conform to WCAG 2.1 guidelines, they should have accurate captions that include proper punctuation, speaker identification, and the identification of sounds other than speech. Professional captioning vendors are excellent resources for providing budget-friendly, high-quality captions with a quick turnaround. While auto-captioning tools continue to make significant progress in accuracy, they are not yet sufficient to meet the accuracy standard of Harvard’s Policy without substantial editing for quality assurance.

Captions for pre-recorded video

There are several approaches to adding post-production captions. Select an approach that’s right for your project based on resources available, timeline, and workflow.

At a Glance: Approaches for adding captions to pre-recorded video
Approach Cost Accuracy Effort Vendors
Professional vendor (recommended) Fee-for-service High Minimal editing required  3Play Media (Harvard’s preferred vendor) or Rev
Auto-generated and manual editing Free Variable Substantial editing required YouTube, Microsoft Stream, or IBM Watson

Professional vendor (recommended approach)

Use a captioning service such as 3Play Media (Harvard’s preferred vendor) or Rev that provide both the captions or transcription and synchronization to your video. This fee-for-service approach is highly recommended, as it will provide the greatest accuracy and quick turnaround time, with a minimal use of internal resources and effort. You might consider providing the vendor with a “cheat sheet” of speaker names, terminology, or other notes that might assist in producing accurate captions.

See Harvard’s discounted 3Play Media pricing and request an account if you do not already have one. Find step-by-step instructions or watch a short video on how to log in to 3Play with HarvardKey.

Get Started with Harvard's Captioning Vendors

Auto-generated and manual editing

Captions generated by Artificial Intelligence (AI) provide variable degrees of accuracy. This approach works best when the video is of a single person speaking clearly, and with limited noise interference in the background. Accuracy rates drop if there are multiple people with overlapping dialog, if there is background noise in the video, or if specialized terminology is used. 

In order to ensure captions meet the accuracy standards of Harvard’s Policy, any captions produced with AI will need to be edited before the video is posted online. If you use an AI tool (such as YouTube, Microsoft Stream, or IBM Watson), you can then make corrections with a caption-editing tool (such as CADET or Amara) and synchronize the text with your video. While this approach is low-cost, it requires more time spent on editing and proofreading for accuracy. Watch this video on how to edit captions in YouTube Studio from Harvard Library’s User Research Center.

Academic video captioning

3Play Media offers discounted pricing on 24- and 48-hour turnaround times for captioning recorded academic content. 3Play Projects can easily be linked to Panopto or Kaltura accounts, allowing for efficient application of professional captioning to a recorded file. For those interested, fill out the online form at the 3Play-Harvard Getting Started page, and indicate your interest in course video captioning. Questions can be directed to Digital Accessibility Services at digitalaccessibility@harvard.edu.

Captioning of sensitive data

Harvard affiliates needing to caption or transcribe sensitive data can do so via Harvard’s agreement with 3Play Media, which includes provisions for Data Security Level 4, Personal Identifiable Information (PII), or HIPAA. 3Play takes extra precautions to ensure this content is secure resulting in a slightly higher rate for PII projects, while still receiving discount pricing under the agreement. To request this type of 3Play Project, indicate the requirement on the Getting Started with 3Play form. Learn more about 3Play Media Security or contact Digital Accessibility Services with questions. 

Captions for live videos and events

Live captions convert audio dialogue and sounds into text that appears on a video in real time. They are commonly provided for events and meetings that are streamed over the internet or for in-person meetings. Learn more about when to provide live captions for events

Zoom offers automatically generated live transcripts for every meeting, and meeting hosts are encouraged to turn on live captions in Zoom as an inclusive practice. Note: If live transcripts/captions have been requested as an accommodation for an event, or if Harvard's Digital Accessibility Policy requires your event to be live captioned, auto-generated captions are not sufficient. A professional vendor such as Vitac must be used to provide live captions in such cases.

VITAC is Harvard’s preferred vendor for professional live captioning. View pricing and request services for professional live captioning with VITAC (HarvardKey required).

New Self-Paced Training: Digital Accessibility for Content Creators 

In this training, you’ll learn essentials and best practices for creating digital content at Harvard that’s accessible to everyone. The material will be applicable across digital platforms, such as websites, documents, and learning management systems.

Launch in the Training Portal (HarvardKey required)