Multilingual Live Captions

Clevercast lets you add accurate closed captions in multiple languages, through remote transcription and artificial intelligence. The result is a global live stream with captions that can be watched on every device and platform.

Get accurate closed captions in your live stream

Clevercast's unique solution allows you to vastly improve the accuracy and readability of closed captions. By combining artificial intelligence and remote transcription by professional captioners, Clevercast makes it possible to generate highly accurate closed captions for any number of languages. In necessary, you can rely on us to source the right captioners and help you manage your event.

Speech-to-text conversion

Automatic Speech Recognition (ASR) technology is used to automatically generate the captions. Near-realtime human corrections are optional.

Text-to-text translation

AI translation of the initial closed captions into any number of languages. If the initial captions are accurate, quality of the translated captions will also be good.

Human transcription

Captioners can use a stenotype keyboard or re-speaking software to add live captions in a browser. Near-realtime corrections by a second person are optional.

Trusted by global brands and companies

Best-in-class closed captions

The combination of real-time transcription and machine translation currently offers the best-in-class solution for multilingual closed captions.

It assumes that closed captions for an initial language are generated through real-time transcription. All other languages are automatically translated. Since this is text-to-text translation, the accuracy will be high.

If two transcribers work together, one of them can make corrections to the transcribed captions, just before they are translated and displayed in the live stream. This further increases the accuracy and readability of all captions.

Unique closed caption technology

Clevercast’s unique technology to generate closed captions leverages the latency that comes with the HTTP Live Streaming protocol. By slightly increasing it, Clevercast is able to produce multilingual closed captions that are more accurate and better readable than any other solution on the market.


Automatic conversion and translation

Clevercast ensures that ASR and AI services have a more comprehensive context at the time of speech-to-text conversion and text-to-text translation. This results in significantly greater accuracy.


Longer phrases and sentences in the video player

Clevercast player shows entire phrases and sentences, which makes the closed captions easier to read and understand. Alternatively, you can choose to show the live audio transcript as rolling text in a separate widget.


Unlimited number of languages

By combining transcription of a single source language with automatic text-to-text translations, you can easily make accurate closed captions available for a large number of languages.


Broad support for real-time transcription

Clevercast supports stenotype keyboards and re-speaking software to facilitate remote transcription and increase accuracy. Our user interfaces allow transcribers to collaborate and obtain optimal results.


Just-in-time correction for flawless captions

By slightly increasing the latency, Clevercast lets remote editors make last minute adjustments to the closed captions. This way, the source for automatic translation to other languages is also error-free.


Broadcast from everywhere

Clevercast doesn’t require expensive on-site hardware. You can broadcast with any encoder or use an in-browser studio like Streamyard. You can even restream a Zoom, Teams or WebEx meeting.

Closed captions as a Managed Service or SaaS Solution

Clevercast can be used as a SaaS platform. For those who prefer it, we also offer it as a managed service. We partner with leading language service providers to source professional transcribers and correctors. We can provide a transcriber for most languages and subjects, if requested in a timely manner.

Managed Service

We will source transcribers, help you manage the event and provide assistance during the live stream. This way, we ensure an optimal viewing experience with closed captions of the best possible quality.

Self-Service Solution

Clevercast is a SaaS platform, allowing you to hire your own remote transcribers. You can order a plan depending on your needs. We offer premium support for a guaranteed response time and service level.

Choose the best option for your budget

Clevercast has several options to generate closed captions. While hiring real-time transcribers is more expensive than using automatic speech-to-text conversion, it will also lead to more accurate captions. However, since the field of AI is continuously evolving, this may change in the future.

Combination of real-time transcription and machine translation

In this scenario, the closed captions for an initial language are generated through real-time transcription.

Captions for all other languages can be automatically generated through machine translation. Since this is text-to-text conversion, the accuracy of the translations will be high.

Currently, we consider this the best value for money.

Moreover, if two transcribers work together, one of them can make corrections to the transcribed captions just before they are translated and displayed in the live stream. This further increases the accuracy and readability of both the initial and translated captions.

Real-time transcription for multiple languages

Clevercast allows real-time transcription for an unlimited number of languages. This allows you to fully control the quality of each caption language.

Since the cost of hiring transcribers is a very significant part of the total cost, this option will be more expensive than a single transcription language with auto-translation.

Combining multiple transcription languages with (other) machine translation languages is also possible.

Speech-to-text conversion and machine translation, with just-in-time corrections

Clevercast supports the best-in-class Automatic Speech Recognition (ASR) technology. This requires that the floor audio is clear – no background noise, good articulation, lack of dialect – and only contains a single language. Recognition of names, acronyms and technical terms can be improved by providing a speech context.

In this case, Clevercast introduces a two-minute delay to ensure that the ASR engine has the full speech context at its disposal, allowing for a better interpretation of words and sentences.

Even then, the quality of speech-to-text conversion will not be at the level of human transcription. To improve it, Clevercast allows a human editor to make just-in-time corrections to the captions. Unlike real-time transcription, which requires highly skilled professionals, making corrections doesn’t require any specific training.

Captions for additional languages can be automatically generated via text-to-text translation.

Other options

If the live event is (partially) scripted, the transcription interface can also be used to add captions written out in advance (contact us for more info).

If the live stream can be recorded in advance, we strongly recommend using a simulive stream.

Why choose Clevercast?

Extensive feature set

Clevercast has all necessary features for live and on-demand video streaming, management, distribution, monetization and analytics. Whatever your project needs are, we’ve got you covered.

Our customizable HTML5 player can be easily embedded into any device and platform. Just copy the embed code from Clevercast.

Combine with simultaneous interpretation

Closed captions can be added to any live stream with on-site or remote simultaneous interpretation. Viewers can choose both an audio translation and closed caption. Transcribers can listen to the audio translations in real-time.

Branded multilingual video player

Our responsive HTML5 player can be styled as desired. It allows you to display a poster image before the livestream, show interactive messages in an overlay, and much more. Works perfectly in any browser on desktop and mobile.

Full live stream redundancy

Clevercast supports a fully redundant set-up. Our player automatically detects if the main stream becomes unavailable and switches to the backup stream. This way, the live stream won’t drop out if there is an encoder or local network issue.

Cloud recording

Clevercast makes a server-side recording of the multilingual live stream, which can be downloaded. All caption languages can be downloaded as WebVTT files. This allows you to upload them to YouTube or social media channels for on-demand viewing.

Limit stream accessibility

You can determine who can watch your live stream by configuring white and blacklists for countries, domains and IP addresses. Different settings are possible for each live stream.

Detailed analytics

Our dashboard informs you in real time how many viewers are watching and from which country. After the live stream ends, it provides detailed insights into the behaviour of your viewers.

Conversion to Video on-Demand

The cloud recording of your live stream can easily be converted to Video on-Demand. The VoD player with closed captions can be added to your site or platform by just copying the embed code from Clevercast.

Adaptive Bitrate Streaming

Flawless HD streaming
to global audiences

Clevercast starts where other remote interpreting solutions stop. Rather than targeting a limited number of participants in a controlled environment, our live streams are open to an unlimited number of global viewers.

They are delivered through the Akamai CDN with edge servers all over the world.


Clevercast automatically transcodes your broadcast to multiple resolutions for adaptive bitrate streaming.

This allows for full HD streaming, while also delivering smooth streams to viewers with small screens or poor internet connections. Clevercast also supports redundant setups with automatic failover by the player.

Frequently Asked Questions

How many closed caption languages are possible?

Unlimited. In practice, it depends on your plan.

Can Clevercast provide captioners and/or correctors for my event?

Yes. We partner with leading language service providers to source professional captioners and correctors. We can provide a captioners for most languages and subjects, if requested in a timely manner.

Is it possible to combine closed captions with audio translations in the same live stream?


Do live streams with closed captions have a delay? Are captions always in sync with the audio?

If you use real time transcription, the live stream has the standard HLS delay of about 18 seconds (like any other live stream). When you also use near real-time correction by a second transcriber, the delay increases to 1 minute.

If you use speech-to-text conversion, the live stream has a delay of about 2 minutes, which is necessary to improve accuracy and readability of the captions and allows for near real-time correction.

No matter what the delay is, captions are always in sync with the video and audio of the live stream.

Can the look and feel of captions in the player be adjusted?

Yes, this is possible to some extent.

Can captions be displayed outside of the video player?

Yes. It is possible to embed a separate widget, together with the player. In the widget, the captions are shown as continuous text.

Are the live captions recorded? Can they be downloaded afterwards?

Yes, all live captions are recorded in the cloud. You can download them afterwards as WebVTT files. Or you can publish a Video on-Demand with captions, hosted by Clevercast.

What are the costs? How can I order?

If you are using Clevercast as a SaaS solution (without premium support), you can use our price calculator to get a quote for a monthly plan. To order, send us the quote number and we’ll send back an invoice. For more info, see our pricing page.

If you want us to source transcribers and/or correctors, please contact us well in advance and describe your needs in some detail. After a virtual meeting (usually), we will provide you with a quote. The cost depends greatly on the duration of the live stream. Also keep in mind that professional transcribers usually work in pairs.

What are 'auto-captioning hours'? How are they calculated?

Unless you only use manual transcription, your plan will include a number of ‘auto-captioning hours‘. They are the sum of the minutes of speech-to-text conversion and text-to-text translation. You can calculate this by taking the minutes you broadcast to Clevercast, multiplied by the number of auto-caption languages. When using our main and backup server simultaneously (for live stream redundancy) the number of hours will double.

For example, if you broadcast during 1 hour to a single streaming server and have 3 caption languages that are automatically generated, you’ll use 3 hours.

Please note that this is based on the number of hours you broadcast to Clevercast. So the minutes will also count while your event status is ‘preview’ or ‘paused’.

What is the accuracy of automatic speech-to-text conversion?

The accuracy of captions generated through speech-to-text conversion mainly depends on:
» The clarity of the audio (e.g. volume, background noises) and of the speaker (e.g. articulation, speed, accent, dialect).
» The speaker’s language: speech-to-text conversion usually works better for more common languages (eg English, Spanish)
» Word usage: if many technical or infrequent words are used, this often has a negative effect. Names and abbreviations are also often not recognized.

Speech-to-text conversion currently assumes that the same language is spoken throughout the live stream. If multiple languages are spoken, you will need real-time transcription.

The accuracy can be improved by defining a speech context, which is a set of words and phrases that Clevercast can pass on to the speech-to-text engine. This can be updated during the live stream.

Although Clevercast is the most accurate speech-to-text solution on the market, we strongly recommend using a human editor to make just-in-time corrections. This doesn’t have to be a professional or someone with experience in the matter. Even if the number of corrections is limited (e.g. filler words, self-correction by the speaker, technical terms, names, abbreviations) this can make the captions much easier to understand for your viewers. If you are providing multilingual captions, this also ensures that the source for the translations is more accurate.

We expect the accuracy of speech-to-text to keep improving in the future. The best-of-class Automatic Speech Recognition (ASR) technology used by Clevercast is constantly evolving.

Get Started Now

Start live streaming today with a solution of choice. No credit card required.

Or contact us for more info.