Improve Text-To-Speech

  • 2
  • Problem
  • Updated 5 years ago
Text to Speech is a great feature. I have a video that is 35 minutes long with quality audio from a live Internet event. Text conversion is far from accurate (6 on a 10 scale). Unfortunately, when you have a long video it does not make sense to edit every line. I would like to see the text engine improved.

It can be trained for your voice, but not others from pre-recorded
Photo of rgletter

rgletter

  • 19 Posts
  • 3 Reply Likes

Posted 7 years ago

  • 2
Photo of Dave Gehman

Dave Gehman

  • 76 Posts
  • 5 Reply Likes
AFAIK, this uses Windows Speech Recognition... assuming you're not on a Mac... no idea what Apples uses....

What version of Windows, assuming it's Windows, are you using?

By happenstance a few days ago, I tried applying it to narration by the president of our company, and Windows Speech Recognition in Windows 7 was almost flawless. (My computer, of course, is not trained for Scott's voice at all, which is what made the result very surprising).
Photo of kayakman

kayakman, Champion

  • 6863 Posts
  • 2188 Reply Likes
You may be able to signigicantly improve the accuracy by doing STT on short, phrase based, audio segments instead of the whole project.

In CS8, this requires that the audio clip be defined by splits, it needs to then be selected; I use the right click context menu do do this.with excellent results.
Photo of Michael Rana II

Michael Rana II

  • 18 Posts
  • 0 Reply Likes
I have found in the short time I've been using Camtasia, that the text conversion is more like a 3 out of 10.  It's to the point where I only find myself using the STT so that the software will define where a speaker stops and ends in dialogue.  I would say my created captions are better than Camtasia's engine.

As far as training the engine, I am captioning video that was already recorded from an event.

I see kayakman mentioned training it based on short phrases.  The video I am captioning now is from a speaker who is frequently on film, so having his text translated accurately would be optimal.  To clarify:  Do I just import footage from this speaker, and repeatedly have it translate short phrases?  How does the software 'learn?'  Do you have to save settings?
Photo of kayakman

kayakman, Champion

  • 6863 Posts
  • 2188 Reply Likes
since you are STTing for another speaker, you'll have to train using his voice; I believe you can set profiles for different speakers; see Windows Help

the best approach is to repeatedly STT the same spoken words, correcting as you go along

suggest you put the other speakers audio on timeline, break it up with splits into short, logical sections, and right click a section, and apply-speech-to-text; then correct any errors; then move on

repeat until the STT engine gets it mostly right

this might be helpful ...
http://www.screencast.com/t/vuXgkgBMo4f


(Edited)
Photo of Michael Rana II

Michael Rana II

  • 18 Posts
  • 0 Reply Likes
kayak, I see how you applied the STT.  If you have to correct the text Camtasia outputs, how do you apply those changes to the STT engine?

I take it, that you just have to STT the dialogue, correct it if needed, then remove the track, STT again, and repeat?
Photo of kayakman

kayakman, Champion

  • 6863 Posts
  • 2188 Reply Likes
although I do not "know" how the SST engine actually works, I suspect that ...

as the SST engine listens to a phrase, it compares that audio signature to its internal database of word signatures, selects the best fit, and types out what it thinks was the word [or phrase] spoken

when you immediately correct any mistakes, the STT engine makes note of what you type vs. its original pick, and adjusts its behavior accordingly, probably creating an alternate database of signatures vs. words for the current "profile"

I have never had to specifically apply "changes" to the STT engine as it seems to do so on its own

with STT, lots of practice makes for [more near] perfect results; lots of STT actions = better STT engine behavior; I found that the more STT I did [just myself] the better the STT engine did over time

the quality of the audio also makes a big difference [background noise, consistent narration style, etc]; and, the clearer the narration, the faster STT seemed to self-train

basically, it usually takes a lot of STTing the same speaker to get a high accuracy rate
Photo of Michael Rana II

Michael Rana II

  • 18 Posts
  • 0 Reply Likes
I see how you create profiles, but it wants me to use a microphone connected to a computer (which is how most do it, and probably how Camtasia is designed).

I did create a new profile with the speaker's name.  I have captioned at least two videos from his speeches, so far.  I'm taking a guess: I would have to play the audio from his speech from a set of PC speakers, and have some external microphone 'hear it' in order to train the STT engine.

Have I over-complicated it?
Photo of kayakman

kayakman, Champion

  • 6863 Posts
  • 2188 Reply Likes
not sure; I've not had to STT another's voice

but the speakers-to-mic approach should work as long as the room is otherwise quiet

another approach you could take would be to narrate the content yourself, STT your audio clips, delete your clips, and keep the STTed captions with the original speakers voice[?]  

might be faster if the other approach is a real time burner
Photo of Michael Rana II

Michael Rana II

  • 18 Posts
  • 0 Reply Likes
kayak, I appreciate your suggestions.  I'm going to try the idea of playing the audio already recorded, and try to use an external microphone as a voice to train the TTS engine.  It obviously won't be a natural voice, but I figure some voice (and the quality is fairly good) is better than nothing.

If this works, I'm likely going to create separate profiles for each speaker.  Is there a way to tell Camtasia which one to use?
Photo of kayakman

kayakman, Champion

  • 6863 Posts
  • 2188 Reply Likes
I'm not aware of a CS switch for this, but I think its possible in Windows Control Panel?
Photo of Michael Rana II

Michael Rana II

  • 18 Posts
  • 0 Reply Likes
I'm not sure I'd know where to look.

Either way, thanks for your help!
Photo of kayakman

kayakman, Champion

  • 6863 Posts
  • 2188 Reply Likes
try control panel/speech recognition/advanced speech options [I'm on Win 7]

This conversation is no longer open for comments or replies.