Add Multilingual Closed Captions to your Livestream

Clevercast allows (multilingual) closed captions to be added to an existing live stream in several ways:

  • Transcription in realtime: humans can use Clevercast to type a transcription in realtime via their browser, using a shorthand or regular keyboard, resulting in closed captions in the video player.
  • Speech-to-text conversion with near-realtime correction. The speech in a live video stream can be automatically converted into closed captions by using a speech-to-text service (AI). Before they are shown in the video player (or auto-translated into other languages), the closed captions can be corrected by a human editor using a browser. This is a unique solution. It allows for accurate closed captions in any number of languages without professional transcribers. Anyone who speaks the language will do. This way, high-quality closed captions are possible on a limited budget!
  • Automatic multilingual translation: closed captions resulting from transcription and speech-to-text conversion can be automatically translated into closed captions in other languages.

All types of closed captions can be combined with audio translations through remote simultaneous interpretation. Any number of languages is possible. Combining speech-to-text and manual transcription in the same live stream is currently not possible.

Clevercast delivers a high-quality, global live stream to your viewers using adaptive streaming. The embedded player’s closed caption menu allows your viewers to choose their preferred language at any point in the live stream. Clevercast player provides the best possible viewing experience on any device and platform. All closed captions are recorded and can be used for Video on-Demand.

If you require a fully hosted solution, you can also use Clevercast Webinar with multilingual closed captions. If your content is pre-recorded you may want to use pseudo-live streaming, with support for closed captions and audio translations.

How does it work?

» Create a T@H Event in Clevercast. Copy the player’s embed code to your site or platform.

» Add a speech-to-text or transcription language. For speech-to-text, you can set a ‘speech context’ with words and phrases likely to be used in the video.

» Optionally, add languages for automatically translated closed captions and/or audio translations.

» Set the event to ‘preview’ and send an RTMP or SRT broadcast to Clevercast. When using manual transcription, use the Transcription Room. When using speech-to-text, you can use the Correction Room for editing captions.

» Press the ‘Start Event’ button when the action is about to start. The stream is automatically displayed in the player. Viewers can select a closed caption language in the CC menu.

Fragment from Sintel by the Blender Foundation. Press the headset menu to change the language.

How does it work?

» Create a T@H Event in Clevercast. Choose the language spoken in the live stream as default language.

» Add the speech-to-text and/or manual transcription language. For speech-to-text, create a ‘speech context’ with names and terminology that are likely to be used.

» Add automatic translations based on the speech-to-text and/or transcription languages.

» Set the event to ‘preview’ and send an RTMP or SRT broadcast to Clevercast. When using manual transcription, interpreters can see the stream and transcribe it.

» Press the ‘Start Event’ button when the action is about to start. Go to your site, to which you copied the embed code. Use the player’s CC menu to turn on closed captions.

Brands and organizations using Clevercast for multilingual live streaming


A multilingual closed captions account has the same features as other multilingual Clevercast accounts.

Features specific to closed captions accounts are:
» The ability to configure live events with closed captions.
» The ability to automatically generate closed captions through speech-to-text conversion.
» A Correction Room interface for editing speech-to-text captions before they are shown, and sending hints to the speech-to-text service.
» A Transcription Room interface for human transcription and conversion to closed captions in real time.
» The ability to automatically generate closed captions for extra languages through real time text-to-text conversion.
» The display and language selection of multilingual closed captions in Clevercast player.
» Server-side recording of the video with closed captions and (optionally) conversion to and hosting as Video on-Demand.

Accuracy of closed captions

If closed captions are the result of manual transcription, the quality depends on the human interpreter. If she does a good job, the automatic translations will also be pretty good.

The accuracy of captions generated through (translated) speech-to-text mainly depends on:
» The clarity of the audio and of the speaker (eg articulation, speed, accent, dialect)
» The speaker’s language: speech-to-text conversion usually works better for more common languages (eg English, Spanish)
» Word usage: if many technical or infrequent words are used, this often has a negative effect. Names and abbreviations are also often not recognized.

The accuracy can be improved by defining a speech context, which is a set of words and phrases that Clevercast can pass on to the speech-to-text engine. This can be updated during the live stream.

We expect the accuracy to keep improving in the future. The language modules used by Clevercast are constantly evolving.

Frequently Asked Questions

General questions about platform, player, streaming, plans …?
See our multilingual streaming and FAQ pages.

How many closed caption languages are possible?
Unlimited. In practice, it depends on your plan.

Is it possible to combine closed captions with audio translations in the same live stream?
Yes, if you are using Translate@Home. In that case, viewers can select both an audio language and closed captions in the player.

Is it possible to have speech-to-text and transcription in the same live stream?

Do the live streams with closed captions have a delay?
Live streams with human transcription have the standard HLS delay of about 18 seconds (like other live stream).
Live streams with speech-to-text have a delay of about 2 minutes, necessary for good speech-to-text results and to allow for human correction.

Can speech-to-text be used if multiple languages are spoken in the floor audio?
Currently not. Speech-to-text services expect the language and dialect to be set in advance. In a future version, Clevercast will make it possible to adjust this during a live stream.

What are the costs? How can I order?
Use our price calculator to get a quote for a monthly plan. To order, send us the quote number and we’ll send back an invoice. For more info, see our pricing FAQ.

What are ‘auto-captioning hours’? How are they calculated?
Unless you only use manual transcription, your plan will include a number of ‘auto-captioning hours‘. They are the sum of the minutes of speech-to-text conversion and auto-translation used. You can calculate this by taking the minutes you broadcast to Clevercast, multiplied by the number of auto-caption languages. When using our main and backup server simultaneously, the number of hours will double.

For example, if you broadcast during 1 hour to a single streaming server and have 3 caption languages that are automatically generated, you’ll use 3 hours.

Getting started

Setting up a live stream with closed captions is easy. See our getting started guide for more info.

See also our manuals for real time transcribers and speech-to-text correctors.

Want to try it yourself?

Try Now