Processing Speech-to-Text with GPT-3

Dec 06, 2022 by theo

I've been experimenting with using GPT-3 to process speech-to-text transcripts. These transcripts, in their raw form, contain no line breaks or paragraph breaks, and are not how I would normally write because they are direct transcriptions of my speech. I have written a small Python script to feed these unprocessed transcripts into GPT-3. Of course, GPT-3 cannot be run locally and requires an external API call.

But how the script I wrote works is that it first makes one API call to split the text into individual paragraphs, and then it makes another set of API calls for each paragraph to correct the grammar, style, and spelling. I opted for the two-part approach because, based on my experimentation, GPT-3 doesn't really handle large blocks of text very well. So, splitting it up into paragraphs is one of the best techniques I've found to prevent GPT-3 from removing too much text without creating replacement text or adding totally new text.

From what I can tell, the small script I wrote is able to keep things faithful to how I originally dictated, whilst still improving the grammar and resolving much of the editing I would have to do to make a speech-to-text transcript usable on my blog or something. Thus, I think it's helpful as it reduces a lot of the error-prone aspects associated with using speech-to-text to write.

The script can be found here https://gogs.theopjones.blog/theo/LittleScripts/src/master/transcribefolder.py (this post is just the output of this workflow, with minimal additional editing) (this post is just the output of this workflow, with minimal additional editing)

Theo's Site

Processing Speech-to-Text with GPT-3

Leave a Comment Cancel reply