Audiate not ready for prime time

  • 1
  • Idea
  • Updated 3 weeks ago

Apr 30, 2020, 1:11:43 PM EDT

Audiate looks like a really nice product if all you are doing is recording audio files. However, I use Camtasia to record screen captures while I tell my customers what I am doing and why. It appears that the only way to use Audiate at this time would be to record the video with no audio. Then run the video and Audiate so that I can record my audio track. Then I would have to bring in the Audiate output and try to sync it with the Camtasia video track. If I used Audiate's editing features then I would have to edit my video to synch everything aback up. Does this sound like too much work to you? It does to me. Send me another notice when Audiate is integrated with Camtasia 2020.


Photo of Tim Frost

Tim Frost

  • 77 Posts
  • 55 Reply Likes
  • VERY DISAPPOINTED

Posted 2 months ago

  • 1
Photo of Jake Pechtel

Jake Pechtel, Strategy Lead - Camtasia

  • 19 Posts
  • 5 Reply Likes
Actually, the best workflow for Audiate right now is to script and record your audio first, edit and import into Camtasia. I understand that isn't for everyone. More iterations will come, and Audiate ends up working for you, great. If it doesn't... great. 

As always, we appreciate your constant disappointment ;) 
Photo of Bob Rast

Bob Rast

  • 6 Posts
  • 0 Reply Likes
I have to agree. Audiate's voice recognition isn't ready for prime time.  I use Camtasia to edit webinars for a trade association. They average 45 to 60 minutes, and much of the editing is excizing  vocal hesitations, ums, ahs, etc.  I have  a fairly high end Dell system and tried Audiate on a 50 minute audio clip.  It took about 50 minutes of grinding before Audiate hung up at 93 pct. complete.  I tried a 25 minute clip which took about a half hour to process.  There were so many misunderstood words that it would take as long to edit the text as it would to just edit it within Camtasia.
I applaud Techsmith for developing Audiate because it would be a real time saver if it the AI voice recognition was better formed.
While I can see that if you were using Audiate to record your voice for an off the cuff voice-over of an existing video it could be useful in cleaning up a short audio clip.
But for a subscription price of 30 a month, it isn't worth the price at present.  And even if it worked as advertised, that's a pretty steep price for the benefit.
Photo of kayakman

kayakman, Champion

  • 8020 Posts
  • 2774 Reply Likes
when you recorded the webinar, did you capture system audio?

Audiate seems to do a much better job of transcribing system audio [as exported from a Camtasia project] than with other options
Photo of kayakman

kayakman, Champion

  • 8020 Posts
  • 2774 Reply Likes
Audiate Transcription Accuracy From Camtasia Recorder System Audio At Half Speed 2020-05-13
https://www.screencast.com/t/2KDUmcOGkhs

Photo of paulwilliamengle

paulwilliamengle

  • 420 Posts
  • 155 Reply Likes
Kayakman, if Audiate’s value proposition is: “our product accurately transcribes audio, then the accuracy needs to be up to par for both microphone and system audio.
(Edited)
Photo of Bob Rast

Bob Rast

  • 6 Posts
  • 0 Reply Likes
My trade association client uses WebEx to capture both audio and video.  Generally, any audio I've exported from WebEx files being edited within Camtasia has been pretty close to the live quality of the WebEx sound. I agree with paul william engle" that if you're going to say the product accurately transcribes audio, it needs to do better than my limited experience so far.
I look forward to improvement.  Still think the subscription cost will be too high even if transcription is better
Photo of kayakman

kayakman, Champion

  • 8020 Posts
  • 2774 Reply Likes
my approach ...

make your screencast, narrating as you capture; edit the project as necessary

then export just the audio as a WAV [on Share menu], and import into Audiate

I have very good success with this approach
Photo of Jake Pechtel

Jake Pechtel, Strategy Lead - Camtasia

  • 19 Posts
  • 5 Reply Likes
Yeah this absolutely works.

In some instances, and perhaps the content Tim is referencing, complex visual material like multi-source camera, talking head video, etc... editing the audio (specifically removing things) can make for a long day in the video editing chair.

Audiate isn't making any new work that doesn't already exist today if you export and edit audio in a dedicated app. You'd still have to spend some time editing back in Camtasia to sync the audio and video. We are hoping to do something to make this process a bit less time consuming. 
Photo of chip.maguire

chip.maguire

  • 5 Posts
  • 0 Reply Likes
Shouldn't the timestamps in the SRT file remain sync'd with the original Camtasia project that the audio was exported from? Even if I do a delete from Audiate - I would expect that the timeline is not changed. However, it seems to actually delete the audio and shift the timeline!

Is the alternative that one has to use silence to simply turn the audio to silence, but keep the timelines sync'd.
Photo of Jake Pechtel

Jake Pechtel, Strategy Lead - Camtasia

  • 19 Posts
  • 5 Reply Likes
Cheekiness aside - I should add, we DO want to address the audio + video editing problem for the use cases where audio and video are created simultaneously, or when visuals are created before audio. I don't have a timeline on that but if you've got specific ways you would expect that to work, please share. We're looking to collect feedback there and understand what users are hoping for. 

Basic Camtasia / Audiate sharing is planned, but it won't be that intelligent... yet. 
Photo of chip.maguire

chip.maguire

  • 5 Posts
  • 0 Reply Likes
I am glad to hear that better integration is planned for the future. Being a long time user of Camtasia, I expected that editing would keep the timeline and would just result in an edit list (comparable to the way that a Camtasia project file keeps tracks of clips).  Given that the Audiate transcription file exported in JSON format has the start times for each word that is recognized,  I would expect that Audiate should export an SRT file that preserves the same timing as the original.  I understand that potentially the edited audio will be longer than that in the original recording, but in this case, when it is imported I would expect that the other tracks would be shifted to increase the slot into which the new audio is to be fitted (even if the video frames simply are repeated - rather than fully interpolated). Otherwise, the audio could be resampled to fit into its original duration (this will handle the case of a small increase or decrease in duration). At least the above is my thinking when working with material recorded together with PowerPoint slides. Moreover, one has the chance to resynchronize/time shift at the start of each new slide/keyframe. For this reason, I want to combine the Camtasia project data regarding keyframes from the "toc" to the SRT file.  This makes my human editing of the SRT file easier (as I can more easily find the slide that I need to look at when correcting the text) but also because I can be more clever in using the start of the slide to be a place where a new caption starts [as it is a little annoying to have a caption that runs over from one slide to the next when there is a content transition].

An additional reason for better integration is to take the knowledge of the words on a keyframe and use them during recognition or at least use them when postprocessing the resulting text. This is especially true for names and domain-specific vocabulary, both of these are rather common in the videos that I have of course lectures.

To follow up what was said in https://feedback.techsmith.com/techsmith/topics/some-general-ideas-for-audiate
I export the SRT file to an XLSX file to be able to do computations on the times or to match up an SRT file following editing with Audiate. I edit the SRT files using Emacs  (using the dynamic abbreviation package (dabbrev) really speeds up typing), then I take this content and drop it into Word and use Grammarly, then I take the corrections from Grammarly and make them using Emacs on the SRT file and as a side effect, I get a formatted and edited transcription in Word (with the content split apart using the slide numbers as a heading). My future plan is to use this in combination with exports of the slides to wikipages and adding the transcripted narration to the bottom of each page. In this way, students can use a tool such as ReadSpeaker to do text to speech on the slide (rather than being tied to the linear structure of the audio/video recording).

Tools for SRT to XLSX and the reverse can be found at: https://github.com/gqmaguirejr/Camtasia_and_Audiate_tools

I have already built some tools so that I can compute various types of index files over the wikipages, see find_keyords_phrase_in_files.py and create_page_from_json.py
at https://github.com/gqmaguirejr/Canvas-tools
A future addition would be to be able to compute a pointer into the audio/video file for each of the entries in the index - thus from the index you could not only go to the relevant wikipage but could also go directly to the correct place in a audio/video clip.

Photo of chip.maguire

chip.maguire

  • 5 Posts
  • 0 Reply Likes
A few things that I found useful in the JSON in the tscproj file are the value of "mediaStart" (in my case this was 3074), and the "to" which has the key frames and the "endTime", "time"  (that seems to be the same as "endTime" and  "value" that contains a PowerPoint slide's header. Using the value of "videoFormatFrameRate" one can convert from endTime/Time (which are given in terms of frame numbers) and seconds.
So I built a spreadsheet of:
slide #, frame# in SRT file, time in SRT file (seconds), time in SRT file converted to hh:mm:ss format, frame number from endTime, endTime (seconds), endTime (in hh:mm:ss format), slide heading (from "value" above). 

Subtracting the estimated frame # in the SRT file and comparing this to the endTime and subtracting mediaStart, I suspect to see the numbers would match. However, they seem to be off by ~100 frames.  Is there a reason for this difference in the time bases?

Once I had a better understanding of where things were I could my pair of programs to convert the SRT file produced by Audiate into a spreadsheet, edit, convert it back into a SRT file, and then add as captions. Some of the edits in the spreadsheet that proved to be useful include being able to take another SRT file (when I had to split the WAV file into pieces because the audio was over the fixed limit of the duration that Audiate can process at one go, and then put the two SRT files back together. I had thought that by doing a split in Camtasia to split the file into pieces that I could just insert the one after the other in the spreadsheet and renumber the captions and shift the starting time. Unfortunately, I found that I needed to shift the second SRT time values by ~12.5 seconds to get the captions to align.  With a third piece of the WAV file, I had to shift its time offset by -12 seconds to get the captions synchronized with the audio. 

Another edit that was useful was to be able to change the endtine when a caption is no longer displayed. I found the times produced by Audiate were too short to handle the manual edits I made to the caption contents, so I used the spreadsheet to calculated how long I hand until the start of the next caption and subtract 10ms from this and use this as the time to stop displaying the caption.

So, is there an easier way to do the above. Can one import multiple SRT files into Camtasia and then shift their placement?

Is there a way to use the JSON data in the Audiate transcript file to get better estimates of the caption duration?

Has anyone post-processed the JSON transcript files to do systematic correction of the alternatives. For example, the initial for my university is "KTH", but Audiate seems to get this as "KGH" - for several different speakers. So I was thinking of a small tool to manipulate the project file to make such corrections. Similarly for people's names.