I produce captions in my videos by pasting a copy of my audio script into a caption and using Sync Captions to split it up. At the end of the process, I have everything sort of in the right place, which is way better than trying to add captions one at a time, but I still spend 5X the duration of the video (or more) cleaning up errors where I clicked at the wrong time or clicked on the wrong word. I find it very difficult to click accurately during real time playback and it would be so much easier to do this with fewer mistakes if any or all of the following were possible
- The font in the captions field was larger, reducing the possibility of clicking on the wrong word. It would not have to be much larger. Even 1.5-2 times larger would be helpful and it would still be possible to see 15 or so lines of text in the captions field.
- There was some way to view the Captions panel (the one you click in during Sync Captions) centrally above the audio track, by undocking or repositioning or something. The waveform in the audio is an excellent visual cue for when to click to create the next split, but it is hard to simultaneously watch both the tiny words way up near the left hand corner and the playhead scrolling across the waveform way down near the bottom right corner.
- The playback speed during editing could be made slower. It would make it easier to click on the right word at the right time because there would be more time to mentally associate the peaks in the waveform with the words in the Captions panel.
- There was an option for the tracks to scroll across the timeline with the playhead in the center, or towards the right side of the screen. Like the previous bullet, it would be easier to identify which peaks in the waveform went with which words in the Captions panel.