Multilingual Live Captions

Clevercast lets you add very accurate closed captions in multiple languages, through the latest AI technology. Remote human transcription and near real-time correction are also available. The result is a global live stream with captions that can be watched on every device and platform.

Add the most accurate closed captions to your live stream

Clevercast's unique solution allows you to vastly improve the accuracy and readability of closed captions, compared to other solutions on the market. By using artificial intelligence and/or remote transcription, Clevercast makes it possible to generate highly accurate closed captions for any number of languages. If necessary, you can rely on us to source the right captioners and help you manage your event.

Speech-to-text conversion

Automatic Speech Recognition (ASR) technology is used to automatically generate very accurate captions. Near real-time human corrections are optional.

Text-to-text translation

AI translation of the initial closed captions into any number of languages. If the initial captions are accurate, quality of the translated captions will also be very good.

Human transcription

Captioners can use a stenotype keyboard or re-speaking software to add live captions through their browser. Near-realtime corrections by a second person are optional.

Trusted by global brands and companies

AI Powered Live Captioning

Auto-generated livestream captions with the highest accuracy

Our state-of-the-art Automatic Speech Recognition (ASR) models provide you with highly accurate and readable AI-generated captions, which can be translated into any number of languages. All this in real-time.

On the right is an unedited recording of the same live stream with auto-generated captions in Clevercast, YouTube and Vimeo. Want to see it in action? Sign up for a free trial or contact us.

Unique closed caption technology

Clevercast’s unique technology to generate closed captions leverages the latency that comes with the HTTP Live Streaming protocol. By slightly increasing it, Clevercast is able to produce multilingual closed captions that are more accurate and better readable than any other solution on the market.


Automatic speech-to-text conversion and translation

Clevercast ensures that ASR and AI services have a more comprehensive context at the time of speech-to-text conversion and text-to-text translation. This results in significantly greater accuracy.


Longer phrases and sentences in the video player

Clevercast player shows entire phrases and sentences, which makes the closed captions easier to read and understand. Alternatively, you can choose to show the live audio transcript as rolling text in a separate widget.


Unlimited number of languages

By using automatic text-to-text translations, you can easily make accurate live captions available for almost every language on the planet. If the initial captions are good, the translated captions will also be accurate.


Best-in-class AI and ASR technology

Thanks to extensive R&D, we’ve managed to include the best ASR models for speech-to-text conversion. This guarantees an accuracy that goes far beyond other solutions on the market.


Just-in-time correction for flawless captions

By slightly increasing the latency, Clevercast lets remote editors make last minute adjustments to the closed captions. This way, the source for automatic translation to other languages will also be error-free.


Broad support for real-time transcription

Clevercast supports stenotype keyboards and re-speaking software to facilitate remote transcription and increase accuracy. Our user interfaces allow transcribers to collaborate and obtain optimal results.

Closed captions as a Managed Service or SaaS Solution

Clevercast can be used as a SaaS platform. For those who prefer it, we also offer it as a managed service. We partner with leading language service providers to source professional transcribers and correctors. We can provide a transcriber for most languages and subjects, if requested in a timely manner.

Managed Service

We will source transcribers (if required), help you manage the event and provide assistance during the live stream. This way, we ensure an optimal viewing experience with closed captions of the best possible quality.

Self-Service Solution

Clevercast is a SaaS platform, allowing you to hire your own remote transcribers. You can order a plan depending on your needs. We offer premium support for a guaranteed response time and service level.

Choose the best option for your budget

Clevercast has several options to generate (very) accurate closed captions. Which option is best for you depends on a number of factors such as your budget, the language(s) spoken in the floor audio, the amount of jargon, the degree of accuracy wanted …

AI-generated captions, optionally translated into multiple languages

Clevercast supports the best-in-class Automatic Speech Recognition (ASR) technology, which results in highly accurate closed captions. Since AI is going through a revolution, we see accuracy improve almost daily.

In this case, Clevercast introduces a two-minute delay (between broadcast and live stream) to ensure that the ASR engine has the full speech context at its disposal, resulting in correct interpretation of words and sentences.

If you want to strive for perfection, you can still correct the AI-generated captions before they are added to the live stream. In general, this isn’t necessary, except maybe for some names and acronyms.

This is the most budget-friendly option since you don’t have to hire professional captioners. The (optional) correction can be done remotely by anyone with a standard keyboard and mouse.

This is currently a great choice for live streams in a popular language (e.g., English, Spanish). For other languages, the accuracy may be lower, but we expect rapid improvement in the near future.

Real-time transcription for multiple languages

It is also possible to hire professional captioners for every caption language.

This is undoubtedly the most expensive option. We do not recommend it – since all other options are much more cost-effective – except perhaps for high-profile events that require full control.

Combining real-time transcription for some languages with machine translation for others is also possible.

When hiring professional transcribers for multiple languages through us, we can offer a volume discount.



Combination of real-time transcription and machine translation

In this scenario, the closed captions for the initial language are generated through real-time transcription by professional captioners. This way, you can ensure that all captions are accurate and easy to read, regardless of the language of the live stream, the presence of complex jargon …

It is the ideal solution for high-profile and/or very technical events such as medical conferences, where one wishes to maintain control over exact wording. Or for events with multiple languages and lots of dialogue.

Captions for extra languages can be automatically generated through machine translation. Since this is text-to-text conversion, the accuracy of the translations will be high.

This option requires a higher budget, since you have to hire professional captioners (either by yourself or through us).

If two transcribers work together, one of them can make corrections to the transcribed captions just before they are translated and displayed in the live stream. This further improves the quality of both the initial and translated captions.

Scripted events and other options

If the live event is (partially) scripted, our remote interface can also be used to add captions to a live stream that are written out in advance. If the speaker improvises, the operator can still make real-time changes to the captions. This is a great option for live streams where the scenario is largely predetermined.

If the live stream is entirely recorded in advance, we strongly recommend using a simulive stream.

Still have a different scenario or don’t know which options is best? Don’t hesitate to contact us.


Why choose Clevercast?

Extensive feature set

Clevercast has all necessary features for live and on-demand video streaming, management, distribution, monetization and analytics. Whatever your project needs are, we’ve got you covered.

Our customizable HTML5 player can be easily embedded into any device and platform. Just copy the embed code from Clevercast.

Combine with simultaneous interpretation

Closed captions can be added to any live stream with on-site or remote simultaneous interpretation. Viewers can choose both an audio translation and closed caption. Transcribers can listen to the audio translations in real-time.

Branded multilingual video player

Our responsive HTML5 player can be styled as desired. It allows you to display a poster image before the livestream, show interactive messages in an overlay, and much more. Works perfectly in any browser on desktop and mobile.

Full live stream redundancy

Clevercast supports a fully redundant set-up. Our player automatically detects if the main stream becomes unavailable and switches to the backup stream. This way, the live stream won’t drop out if there is an encoder or local network issue.

Cloud recording

Clevercast makes a server-side recording of the multilingual live stream, which can be downloaded. All caption languages can be downloaded as WebVTT files. This allows you to upload them to YouTube or social media channels for on-demand viewing.

Limit stream accessibility

You can determine who can watch your live stream by configuring white and blacklists for countries, domains and IP addresses. Different settings are possible for each live stream.

Detailed analytics

Our dashboard informs you in real time how many viewers are watching and from which country. After the live stream ends, it provides detailed insights into the behaviour of your viewers.

Conversion to Video on-Demand

The cloud recording of your live stream can easily be converted to Video on-Demand. The VoD player with closed captions can be added to your site or platform by just copying the embed code from Clevercast.

Adaptive Bitrate Streaming

Flawless HD streaming
to global audiences

Clevercast starts where other remote interpreting solutions stop. Rather than targeting a limited number of participants in a controlled environment, our live streams are open to an unlimited number of global viewers.

They are delivered through the Akamai CDN with edge servers all over the world.


Clevercast automatically transcodes your broadcast to multiple resolutions for adaptive bitrate streaming.

This allows for full HD streaming, while also delivering smooth streams to viewers with small screens or poor internet connections. Clevercast also supports redundant setups with automatic failover by the player.

Frequently Asked Questions

What is the accuracy of automatic speech-to-text conversion?

The accuracy of speech-to-text conversion has improved drastically, thanks to the use of the best AI and ASR technology on the market.

The accuracy of captions generated through speech-to-text conversion mainly depends on the language being spoken in the live stream: popular languages are more accurate. Other factors such as speaking speed, articulation and dialect of the speaker or word usage like jargon, names and acronyms only lead to reduced accuracy in very extreme cases (or only to a very limited extent).

Speech-to-text conversion currently assumes that the same language is spoken throughout the live stream. If multiple languages are spoken, you currently still need real-time transcription. But we will soon come with other solutions for this scenario.

Even though the accuracy is very high, there is always room for improvement. You can do this by using a human operator to make just-in-time corrections to the AI-generated captions. The operator doesn’t have to be a professional or someone with experience in the matter.

We expect the accuracy of speech-to-text to continue to improve in the future. The best-of-class ASR technology used by Clevercast is constantly evolving.

How many closed caption languages are possible?

Unlimited. In practice, it depends on your plan.

Can Clevercast provide captioners and/or correctors for my event?

Yes. We partner with leading language service providers to source professional captioners and correctors. We can provide a captioners for most languages and subjects, if requested in a timely manner.

Is it possible to combine closed captions with audio translations in the same live stream?


Do live streams with closed captions have a delay? Are captions always in sync with the audio?

If you use real time transcription, the live stream has the standard HLS delay of about 18 seconds (like any other live stream). When you also use near real-time correction by a second transcriber, the delay increases to 1 minute.

If you use speech-to-text conversion, the live stream has a delay of about 2 minutes, which is necessary to improve accuracy and readability of the captions and allows for near real-time correction.

No matter what the delay is, captions are always in sync with the video and audio of the live stream.

Can the look and feel of captions in the player be adjusted?

Yes, this is possible to some extent.

Can captions be displayed outside of the video player?

Yes. It is possible to embed a separate widget, together with the player. In the widget, the captions are shown as continuous text.

Are the live captions recorded? Can they be downloaded afterwards?

Yes, all live captions are recorded in the cloud. You can download them afterwards as WebVTT files. Or you can publish a Video on-Demand with captions, hosted by Clevercast.

What are the costs? How can I order?

If you are using Clevercast as a SaaS solution (without premium support), you can use our price calculator to get a quote for a monthly plan. To order, send us the quote number and we’ll send back an invoice. For more info, see our pricing page.

If you want us to source transcribers and/or correctors, please contact us well in advance and describe your needs in some detail. After a virtual meeting (usually), we will provide you with a quote. The cost depends greatly on the duration of the live stream. Also keep in mind that professional transcribers usually work in pairs.

What are 'auto-captioning hours'? How are they calculated?

Unless you only use manual transcription, your plan will include a number of ‘auto-captioning hours‘. They are the sum of the minutes of speech-to-text conversion and text-to-text translation. You can calculate this by taking the minutes you broadcast to Clevercast, multiplied by the number of auto-caption languages. When using our main and backup server simultaneously (for live stream redundancy) the number of hours will double.

For example, if you broadcast during 1 hour to a single streaming server and have 3 caption languages that are automatically generated, you’ll use 3 hours.

Please note that this is based on the number of hours you broadcast to Clevercast. So the minutes will also count while your event status is ‘preview’ or ‘paused’.

Get Started Now

Start live streaming today with a solution of choice. No credit card required.

Or contact us for more info.