Multilingual Live Captions
Clevercast lets you add very accurate closed captions in multiple languages, through the latest AI technology. Remote human transcription and near real-time correction are also available. The result is a global live stream with captions that can be watched on every device and platform.
Revolutionizing AI powered live captions
Clevercast's unique solution allows you to vastly improve the accuracy and readability of captions during a live stream, compared to all other solutions on the market. For the first time, you can simply rely on AI to add highly accurate captions to your live stream. You can even reach 100% accurate captions with minimal effort, through our innovative real-time correction interface. We can also provide captioners and correctors for you.
AI Powered Live Captions
Automatic Speech Recognition (ASR) technology is used to generate very accurate captions. A correction interface or service is available to obtain 100% accuracy.
AI Translation of Live Captions
AI translation of the initial closed captions into any number of languages. If the initial captions are accurate, quality of the translated captions will also be excellent.
Real-time Human Transcription
Captioners can use our interface with a stenotype keyboard or re-speaking software to add live captions in their browser. Corrections by a second person are optional.
Trusted by global brands and companies
AI Powered Live Captioning
Auto-generated livestream captions with the highest accuracy
Our state-of-the-art Automatic Speech Recognition (ASR) models provide you with highly accurate and readable AI-generated captions, which can be translated into any number of languages. All this in real-time.
Unique closed caption technology
Clevercast’s revolutionary live captions technology leverages the latency that comes with the HTTP Live Streaming protocol. By slightly increasing it, Clevercast has set a new standard for multilingual closed captions that are automatically generated and translated by AI engines.
Automatic speech-to-text conversion and translation
Clevercast ensures that ASR and AI services have a more comprehensive context at the time of speech-to-text conversion and text-to-text translation. This results in significantly greater accuracy.
Intelligent and innovative pre- and post-processing
Clevercast sends optimal audio input to the AI engines (e.g. fragment start/end, sound quality, background noise) and intelligently processes the output so it is converted into easily readable and synchronous captions.
Longer phrases and sentences in the video player
Clevercast player shows entire phrases and sentences, which makes the closed captions easier to read and understand. Alternatively, you can choose to show the live audio transcript as rolling text in a separate widget.
Best-in-class AI and ASR technology
Thanks to extensive R&D, we’ve managed to include the best ASR models for speech-to-text conversion. This guarantees an accuracy that goes far beyond other solutions on the market.
Unlimited number of languages
By using automatic text-to-text translations, you can easily make accurate live captions available for almost every language on the planet. If the initial captions are good, the translated captions will also be accurate.
Just-in-time correction for flawless captions
By slightly increasing the latency, Clevercast lets remote editors make last minute adjustments to the closed captions. This way, the source for automatic translation to other languages will also be error-free.
Closed captions as a Managed Service or SaaS Solution
Clevercast can be used as a SaaS platform. For those who prefer it, we also offer it as a managed service. We partner with leading language service providers to source professional transcribers and AI correctors. We can provide them for most languages and subjects, if requested in a timely manner.
Clevercast is a SaaS platform, allowing you to to use our AI solutions independently. You can also hire correctors and captioners yourself. We offer premium support for a guaranteed response time and service level.
We can source AI correctors and/or captioners, help you manage the event and provide assistance during the live stream. This way, we ensure an optimal viewing experience with closed captions of the best possible quality.
Choose the best option for your budget
Clevercast has several options to generate very accurate closed captions. Which option is best for you depends on a number of factors such as your budget, the language(s) spoken in the floor audio and the degree of accuracy you want.
AI powered live captions
Clevercast supports the best-in-class Automatic Speech Recognition (ASR) technology, which results in 99+% accurate closed captions for frequently used languages.
To achieve 99+%, Clevercast stretches the HLS latency slightly to ensure that the AI engine has the maximum speech context at its disposal. This is the most budget-friendly option.
This will be more than adequate in most cases, unless you are going for absolute perfection or multiple languages are spoken in your live stream.
Real-time correction for AI powered captions
If you want 100% accurate closed captions, we recommend using AI combined with our real-time correction interface (e.g. to correct badly pronounced names). This lets you read and correct the captions before they are added to the live stream.
There is no additional cost for using the corrector interface. It can be used remotely in a browser by anyone with a standard keyboard and mouse. You don’t need any prior knowledge.
Of course, we can also hire professional correctors for you, so you don’t have to worry about anything. Their cost is lower than live captioners, who have to go through a much longer and more difficult training.
Scripted events and other options
If the live event is (partially) scripted, our remote interface can also be used to add captions to a live stream that are written out in advance. If the speaker improvises, the operator can still make real-time changes to the captions. This is a great option for live streams where the scenario is largely predetermined.
If the live stream is entirely recorded in advance, we strongly recommend using a simulive stream.
Still have a different scenario or don’t know which options is best? Don’t hesitate to contact us.
AI powered translation to multiple languages
In case of multilingual captions, the captions for an initial language are either generated through AI or through real-time human captioning. All other languages are the result of real-time AI translation, using the initial captions as its source.
The accuracy of the additional languages mostly depends on the accuracy of the source transcription. Since this is AI text-to-text conversion, an accurate source will result in 99.9+% accurate translations.
Note: the source of the AI translations is the transcription after real-time correction. So if you use our correction interface, this will also have an impact on the accuracy of the translated captions.
Remote human captioning
In this scenario, the closed captions are generated by professional captioners through Clevercast’s interface for real-time transcription. This way, the captioners ensure that the closed captions are accurate.
Usually, two captioners will work together on a single language. In that case, Clevercast allows one of them to make corrections to the other’s transcription before the captions are used and (optionally) translated.
You can hire the captioners yourself or through us. This is the most expensive option, as the cost of hiring captioners is usually a lot higher than the cost of our SaaS platform. It is typically used in case of high-profile events that require full control.
Note: it is possible to combine real-time transcription for some languages with AI translation for others. When hiring captioners for multiple languages through us, we can offer a volume discount.
Why choose Clevercast?
Extensive feature set
Clevercast has all necessary features for live and on-demand video streaming, management, distribution, monetization and analytics. Whatever your project needs are, we’ve got you covered.
Our customizable HTML5 player can be easily embedded into any device and platform. Just copy the embed code from Clevercast.
Combine with simultaneous interpretation
Closed captions can be added to any live stream with on-site or remote simultaneous interpretation. Viewers can choose both an audio translation and closed caption. Transcribers can listen to the audio translations in real-time.
Branded multilingual video player
Our responsive HTML5 player can be styled as desired. It allows you to display a poster image before the livestream, show interactive messages in an overlay, and much more. Works perfectly in any browser on desktop and mobile.
Full live stream redundancy
Clevercast supports a fully redundant set-up. Our player automatically detects if the main stream becomes unavailable and switches to the backup stream. This way, the live stream won’t drop out if there is an encoder or local network issue.
Clevercast makes a server-side recording of the multilingual live stream, which can be downloaded. All caption languages can be downloaded as WebVTT files. This allows you to upload them to YouTube or social media channels for on-demand viewing.
Limit stream accessibility
You can determine who can watch your live stream by configuring white and blacklists for countries, domains and IP addresses. Different settings are possible for each live stream.
Our dashboard informs you in real time how many viewers are watching and from which country. After the live stream ends, it provides detailed insights into the behaviour of your viewers.
Conversion to Video on-Demand
The cloud recording of your live stream can easily be converted to Video on-Demand. The VoD player with closed captions can be added to your site or platform by just copying the embed code from Clevercast.
Adaptive Bitrate Streaming
Flawless HD streaming
to global audiences
Clevercast starts where other remote interpreting solutions stop. Rather than targeting a limited number of participants in a controlled environment, our live streams are open to an unlimited number of global viewers.
They are delivered through the Akamai CDN with edge servers all over the world.
Clevercast automatically transcodes your broadcast to multiple resolutions for adaptive bitrate streaming.
This allows for full HD streaming, while also delivering smooth streams to viewers with small screens or poor internet connections. Clevercast also supports redundant setups with automatic failover by the player.
Frequently Asked Questions
What is the accuracy of automatic speech-to-text conversion?
The accuracy of speech-to-text conversion has improved drastically, thanks to the use of the best AI and ASR technology on the market.
AI powered captions are 99+% accurate for commonly used languages like English, Spanish, French, German, Italian, Portuguese, Dutch, Japanese and others. For lesser-used languages, the accuracy will be somewhat lower.
Factors such as speaking speed, articulation and dialect of the speaker or word usage like jargon and acronyms only reduce accuracy in extreme cases (and only to a very limited extent).
Even though the accuracy is very high, there is always room for improvement. You can do this by using a human operator to make just-in-time corrections to the AI-generated captions. The operator doesn’t have to be a professional or someone with experience in the matter.
Note: Clevercast supports AI powered captions for live streams with multiple languages. However, this does affect accuracy. In such case, we strongly recommend using our real-time correction interface to correct inaccuracies.
We expect the accuracy of speech-to-text to continue to improve in the near future. The best-of-class ASR technology used by Clevercast is constantly evolving.
How many closed caption languages are possible?
Unlimited. In practice, it depends on your plan.
Can Clevercast provide captioners and/or correctors for my event?
Yes. We partner with leading language service providers to source professional captioners and correctors. We can provide them for most languages and subjects, if requested in a timely manner.
Is it possible to combine closed captions with audio translations in the same live stream?
Do live streams with closed captions have a delay? Are captions always in sync with the audio?
If you use AI powered captions, the live stream has a delay of about one minute, which is slightly higher than the normal latency in the HTTP Live Streaming (HLS) protocol. This is necessary to improve accuracy and readability of the captions and allows for near real-time correction.
If you use human transcription, the live stream has the standard HLS delay of about 18 seconds (like any other live stream). When you also use near real-time correction by a second transcriber, the delay increases to 1 minute.
No matter what the delay is, captions are always in sync with the video and audio of the live stream.
Can the look and feel of captions in the player be adjusted?
Yes, this is possible to some extent.
Can captions be displayed outside of the video player?
Yes. It is possible to embed a separate widget, together with the player. In the widget, the captions are shown as continuous text.
Are the live captions recorded? Can they be downloaded afterwards?
Yes, all live captions are recorded in the cloud. You can download them afterwards as WebVTT files. Or you can publish a Video on-Demand with captions, hosted by Clevercast.
What are the costs? How can I order?
If you are using Clevercast as a SaaS solution (without premium support), you can use our price calculator to get a quote for a monthly plan. To order, send us the quote number and we’ll send back an invoice. For more info, see our pricing page.
If you want us to find captioners and/or correctors, please contact us well in advance and describe your needs in some detail. After a virtual meeting (usually), we will provide you with a quote. The cost depends greatly on the duration of the live stream. Also keep in mind that professional captioners usually work in pairs.
What are 'auto-captioning hours'? How are they calculated?
Auto-captioning hours are used when closed captions are generated by speech-to-text conversion or text-to-text translation. Usage doesn’t depend on the number of AI captioning languages. For example, if you broadcast during 1 hour to a single streaming server and have 3 caption languages that are automatically generated, you will use 1 auto-captioning hour.
Please note that this is based on the number of hours you broadcast to Clevercast. So auto-captioning minutes will also count while your event status is ‘preview’ or ‘paused’.
When you broadcast to our main and backup server simultaneously (for live stream redundancy) the number of auto-captioning hours will double.