Theo's Site

Writing about technology, self-hosting, and things I find interesting.

Note: This post is 3 years old and may no longer reflect current thinking or accurate information.

Voice Typing Wrapper Around Whisper

by theo

I just wrote a voice typing wrapper around Whisper. It types what I say as keyboard input, and it creates a system tray icon to turn on and turn off the dictation.

https://github.com/theopjones/voice-typing

(I just created it, it might have bugs, only tested on Linux)

I'm not sure how much additional time I want to invest in this little project. Because I'm not an expert in this type of technology or AI in general. I'm not sure if I'd do a super good job at implementing it further.

I think right now I have something that's a very interesting proof of concept. But while testing it, I have encountered a few bugs and little glitches. And I definitely don't get the same exact level of accuracy while voice typing with this tool that I'd get just pre-recording my voice and feeding it in all at once. But it is IMHO a more convenient way to write documents than recording a big audio file all at once.

Internally what this does is it breaks up the audio into small little snippets and parses each one of those snippets automatically. This doesn't do wonders for interacting with the underlying model because it's not consistent with the assumptions being made in Whisper.

The underlying Whisper model and its ability to parse grammar and everything kind of assumes that it is dealing with really long blocks of audio that it can go through all at once.

When I dictate a lot to it at once it kind of jumbles up the grammar/punctuation. This is pretty easy to correct for, but it does look a bit weird, at least until I get done correcting it. I'm actually writing this right now with it.

The method I am using to feed a constant stream of audio seems kind of like a Jerry-rig, instead of an actual solution to the underlying problem.

In its current state, it comes pretty close to meeting my immediate need for a dictation program/voice typing program. I mostly just want some reasonably accurate way to write short to medium sized documents.

Leave a Comment

Your email address will not be published. Required fields are marked *