Add Subtitles with Speech-to-Text

With Cinamaker you can add Subtitles to your multi-camera video recordings and live streams with the help of AI and Speech-To-Text.  Customize subtitles while you are live, or when using the in-app editor; change position, fonts, text colors, background, speed, and more.


How it works

This document describes how to create and include Subtitles using Speech-to-Text from the Cinamaker Media panel.

- Add Subtitles as a Media object in your production

- Configure Subtitles like any Cinamaker Media object

- Begin a session with one or more audio sources selected for Speech-to-Text transcription

- Speech-to-Text transcription appears as Subtitles during your session by activating/deactivating the eyeball adjacent to the Subtitles media object during your sessions


Add Subtitles from the Media Panel

1. Launch Cinamaker Director Studio and start a New Session 

2. From the Main interface go to the Media Tab and click/tap Add (you will see a list of available Media options)

3. Select Subtitles from the list of Media options

4. The Subtitles configuration popup will appear



See below for how to configure: Sources, Language, Font, Fill, Advanced


5. When the configuration is finished, Click to Save (upper-right corner of popup window) 


6. When ready to activate Subtitles through Speech-to-Text, tap the adjacent eyeball icon to enable/disable. Then Cinamaker will begin to transcribe the audio coming from one or more selected audio sources and the Subtitles will begin to appear on the Program screen.

7. The Program screen will begin to populate Subtitles as your session proceeds and the dialogue flows.


Speech-to-Text Configurations

Cinamaker Speech-to-Text can be configured in a number of ways. Read the following to determine the best way to create Subtitles for your unique needs.


NOTE: The quality of audio input is a key factor for successful Speech-to-Text recognition. Therefore, be sure that the Volume levels and proximity of each person to their microphone are monitored for optimal results. Headphones are always optimal for monitoring audio.


Audio Source Configuration:

Mixed audio: All Audio Sources

By default, speech recognition and transcription will be applied to each audio source and generate Subtitles in real time.


Individual audio: Selected Audio Sources

Alternatively, individual audio sources can be selected and transcription will be applied to only those audio sources and generate Subtitles in real-time.

- When individually selected, speech recognition will be applied to all audio/video sources including pre-recorded video objects.

- When volume levels of audio/video sources = 0 (audio/video source is muted), then this audio/video source will not be heard in the live audio mix, however this Source(s) will be transcribed.

- You can not concurrently select Mixed Audio when individual Sources are selected.


- A list of available audio/video sources will be displayed under Sources. By default, speech recognition and transcription will be applied to each audio source and generate Subtitles in real time.

- Alternatively, individual audio sources can be selected and transcription will be applied to only those audio sources and generate Subtitles in real-time.

- When one (1) or more audio source is selected, then speech recognition and transcription and Subtitles will be applied to that single (or more) audio/video source(s).

- When more than one audio source is selected, then Speech-to-Text transcription and Subtitles will be applied to those selected audio sources.

- When one or more Source(s) is selected then transcribing will not be applied to pre-recorded video objects

- When transcribing a pre-recorded video being played as a Media object is desired, select Mixed audio


Display title

When this option is selected, the Source name will be displayed before the transcribed text on the Main output screen after saving but will not be visible on the Preview Screen. 



Languages

Default language: English (United States).


Voice recognition supported languages: Danish, Dutch, English, English(Australia), English(United Kingdom), English(India), English(New Zealand), English(United States), Flemish, French, French(Canada), Italian, German, Norwegian, Polish, Portuguese, Portuguese(Brazil), Portuguese(Portugal), Spanish, Spanish (Latin America), Ukrainian.


There is no language auto-detection


Default: English (United States)


NOTE:  It is possible to assign different languages to different subtitles.


Fonts & Fills

Font

Default font-size: 10


Fill

Same as Text Media objects.


Advanced

Advanced configuration changes will be visible after saving them and navigating to the Main screen.



Subtitles direction: Default value: Up

The direction of transcribed blocks of text default in the UP direction. 



Display time per average word

Display time for a single transcribed word can be configured.

Values: 0.01s - 0.09s

Minimum Display time

The minimum display time for a phrase can be configured between 2.0s - 10.0s