HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Using Python to Code by Voice

Next Day Video · Youtube · 54 HN points · 61 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Next Day Video's video "Using Python to Code by Voice".
Youtube Summary
Tavis RuddTwo years ago I developed a case of Emacs Pinkie (RSI) so severe my hands went numb and I could no longer type or work. Desperate, I tried voice recognition. At first programming with it was painfully slow but, as I couldn't type, I persevered. After several months of vocab tweaking and duct-tape coding in Python and Emacs Lisp, I had a system that enabled me to code faster and more efficiently by voice than I ever had by hand.

In a fast-paced live demo, I will create a small system using Python, plus a few other languages for good measure, and deploy it without touching the keyboard. The demo gods will make a scheduled appearance. I hope to convince you that voice recognition is no longer a crutch for the disabled or limited to plain prose. It's now a highly effective tool that could benefit all programmers.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Like this: https://www.youtube.com/watch?v=8SkdfdXWYaI ? Here you traverse the AST, but the idea is similar, I think.
“I think there is a world market for maybe five computers.” - Thomas Watson

I bet if we use our imaginations, we’ll think of a lot of places were using voice to code could come in handy.

Personally, I’ve been waiting for it for a few decades.

The creator of TCL has RSI and has been using voice since the late 1990’s

https://web.stanford.edu/~ouster/cgi-bin/wrist.php

Thought we were really close 10 years ago when Tavis Rudd developed a system:

https://youtu.be/8SkdfdXWYaI

GitHub seems to be more high-level. It figures out the syntax and what you actually want to write.

This would help if you barely knew the language.

Time to learn Rust or Scala with a little help from machine learning.

mkl
> “I think there is a world market for maybe five computers.” - Thomas Watson

This statement probably didn't happen. The closest thing to it was 10 years after the quote is usually supposed to have happened and was about a single model of a single machine: https://geekhistory.com/content/urban-legend-i-think-there-w...

darkwater
> GitHub seems to be more high-level. It figures out the syntax and what you actually want to write.

To me, looks like it's feeding your voice input to Copilot that then generates the code output just as before. So, the same strength and weaknesses of Copilot apply (and you can probably mimic it locally with a voice input method you control, just dictate comments for copilot)

I think you're talking about this video: https://youtu.be/8SkdfdXWYaI?t=1049
Quequau
Yes! That's the talk.
Some people have been doing it for a long time.

https://www.youtube.com/watch?v=8SkdfdXWYaI

tmtvl
Yeah, when I saw Emily Shea's talk* I became aware of the accessibility-unfriendly code I normally put out. I still find myself writing bad code, but at least now I'm aware what to be aware of.

* https://www.youtube.com/watch?v=Mz3JeYfBTcY

Are you thinking perhaps of this [1] PyCon talk by Tavis Rudd, who had RSI and a rotator cuff injury? He demos a voice command system that's all short nonsense syllables, including with a few humorous video clips interspersed, and then explains it after the demo.

[1] https://www.youtube.com/watch?v=8SkdfdXWYaI (demo starts 9 minutes in)

Jan 14, 2022 · throwanem on Life at 800 MHz
You refer to Tavis Rudd's PyCon 2013 demo: https://youtu.be/8SkdfdXWYaI
One of my back-burner projects is putting some esp32s or similar around the house with microphones, streaming whatever audio they hear to Dragon NaturallySpeaking like in this video https://www.youtube.com/watch?v=8SkdfdXWYaI#t=9m5s , but hooked up to home assistant scenes / automations instead of emacs commands.
I injured one of my fingers for a few months a few years back.

I know I tried several things.

I recall that there were apps that would let you type with one hand; for example, you use the index finger on your left hand to type an 'f' and on your right hand to type a 'j'. With the software, you'd use the correct finger (say, always typing 'f's whenever you wanted an 'f' or a 'j') and it would look up what you typed and figure out the word you meant. I think this falls into that class: http://www.onehandkeyboard.org

I looked into various hardware assistive devices, and was amazed that they were extremely expensive. I believe this one-handed keyboard falls into that category: https://tipykeyboard.com/en/home/

I looked into typing by voice. On my Mac, you could type a letter at a time -- so painful! Someone came up with an incredibly way to code -- see https://www.youtube.com/watch?v=8SkdfdXWYaI -- but as I recall, on a Mac, it required having a Windows box to do the voice recognition (the Mac's built-in recognition is decent, but didn't have good enough APIs, IIRC), buy Dragon Naturally Speaking (for something like $500!) and use a little glue to make it all work together.

I wondered if any phone keyboard could be commandeered to become useful -- something like using Swype on your phone and having it routed to your computer. Alternatively, the Mac has assistive technologies built-in (and I imagine other OSes do too) where you can control the entire machine using one button -- including doing mouse drags and such -- but it looks so incredibly tedious that it would only be something to pursue if you had no alternative.

As I recall, needing to type as a programmer, and not wanting to spend tons of money -- I would've if I'd had a permanent injury, mind you -- that I found I could type with one-and-a-half hands, slowly, and use the intellisense/code completion like crazy, and that worked poorly but was workable.

This is actually an amazing talk[1] about doing voice coding in Emacs, over 8 years ago.

[1] https://www.youtube.com/watch?v=8SkdfdXWYaI

There was a great talk from 2013 about coding with voice commands: https://www.youtube.com/watch?v=8SkdfdXWYaI

It's by a developer who developed RSI and had to find another way to write code. He uses a combination of Dragon and custom Python scripts to control Emacs.

The fascinating bit for me was the language he created around text navigation and manipulation. Lots of custom short words to optimise the amount of speaking he actually had to do.

Really worth a watch for anyone interested in this. If you want a quick demo, this part of the video is fairly representative: https://youtu.be/8SkdfdXWYaI?t=1034

https://youtu.be/8SkdfdXWYaI?t=600

this guy is already there: Slurp slap scratch buff yank

skratlo
I now get the joke about Emacs and OS
You should talk to Tavis Rudd, maybe. https://youtu.be/8SkdfdXWYaI
WesolyKubeczek
Nah, I don't want dictation, I want keypresses :)
Xah first wrote this over a decade ago.

He’s had some RSI issues, and the real problem is simply typing in any form.

I remember reading several of his articles thinking that surely within a decade, we’d have better voice assist for programming.

Tavis Rudd’s demo looked promising in 2013:

https://youtu.be/8SkdfdXWYaI

I’m not sure what’s the best solution. Here are a few that I known of:

https://serenade.ai/

https://talonvoice.com/

https://voxhub.io/silvius

First saw this Using Python to Code by Voice almost 8 years ago.

https://youtu.be/8SkdfdXWYaI

I’m disappointed that we’ve gone nowhere in that time.

Anyone with early symptoms could use voice as an option, or partial option.

lunixbochs
We've gone places in that time. There's Talon (my full time project), Dragonfly/Caster, and Serenade at least. Voicecode also came and went.

My goal with Talon is to give people at all ability/pain levels a high quality free keyboard/mouse alternative - you can use it preventatively, or if you start to feel discomfort, or if you can't type at all.

I also linked some more recent talks / blog posts in this comment: https://news.ycombinator.com/item?id=26118864

The author links to this other project, which I’ve never heard of:

https://serenade.ai/

He also references Tavis Rudd’s viral voice coding video, which is already 7 years old.

https://www.youtube.com/watch?v=8SkdfdXWYaI

The future is arriving slower than i would have guessed. I thought we’d be developing on the iPad using voice, gestures, and eye tracking by now.

Can I a least get “build and run” with my voice soon?

CoffeePython
I'm very surprised none of the videos on the serenade landing page have audio.

I would love to see/hear what it's like. I'd suspect adding audio to those videos would be a win on conversions.

tmacwilliam
Thanks for the suggestion! We recently revamped our homepage, and we're in the process of recording some new videos. Will keep that in mind!

In the meantime, here's a video that a developer using Serenade recently made about their first experience! https://www.youtube.com/watch?v=Pc-EbY1fRWk

Tavis Rudd, "Using Python to Code by Voice", 2013: https://www.youtube.com/watch?v=8SkdfdXWYaI . Old but fun.
Aug 17, 2020 · arijun on Art of chording
I'm surprised that there seem to be no examples anywhere of anyone seriously using chorded typing for programming. After cheap chorded keyboards entered the market and the associated software became free, I assumed it would only be a matter of time before something like Travis Rudd's incredible demo of voice coding [1] came out. Perhaps that it hasn't speaks to how difficult it is to get real speed improvements with a chorded setup?

[1] https://www.youtube.com/watch?v=8SkdfdXWYaI

dpflug
One example I know of is the creator of Picolisp.

https://www.youtube.com/watch?v=z_01ha1uS6Y https://www.software-lab.de/penti.html

Does his development work on a tablet with a chorded keyboard software.

morinted
I'm the author of Art of Chording—I program full-time with steno in JavaScript (working mainly with React.)

I'd love ideas on how to demonstrate coding in steno. I struggle with it sometimes because the slowest part about coding is not the input rate… it's the brain. I guess if people are looking to code "quicker"… it's not the rate of input that one would want to explore. I will say that writing comments became a lot easier when the words started to just flow onto the screen.

Here are my existing videos:

Unscripted: https://www.youtube.com/watch?v=711T2simRyI

Scripted: https://www.youtube.com/watch?v=RBBiri3CD6w

Looking forward to any suggestions on how to improve or even just the big open questions you'd want answered on this subject.

pimlottc
I would imagine that one of the difficulties with use chording for programming is that so many of the terms used are not normal words. I suppose you can just create new chords for language keywords or common APIs but it just seems like there are so many possible unique terms you might need to type at any given moment. How do you handle long function names, snake case, camel case? Has it changed the way you name your own variables and functions?
Aug 17, 2020 · monsieurbanana on Art of chording
Maybe lisp could shine here?

While writing lisp code in classic editors, you already shouldn't be editing or navigating code with classic commands, instead you will be using structured editing.

Of course it isn't limited to lisp, but languages with C-like syntax are harder to use for structured editing.

I highly recommend watching this video, where the speaker uses voice commands to write both in emacs-lisp and in python.

https://youtu.be/8SkdfdXWYaI?t=540

Aug 03, 2020 · danielbarla on Bill English has died
I wonder if sub-vocal will happen anytime soon, and if it does, I am still not sure how it will solve voice being sub-optimal for some types of text. One example is plain old programming, and needs a fairly elaborate setup. The best example I've been able to find was Tavis Rudd's presentation [1], which seems to be a decent solution.

I have tried to more or less replicate this setup, but found it less satisfactory. Aside from it demanding near-perfect audio quality to work remotely well in general (as well as a heavy time investment, initial and ongoing), you frequently run into edge cases where it just doesn't want to understand something you're saying. After the 5th or 6th mis-transcription of some specific phrase, you tend to just give up and reach for the keyboard, RSI be damned. (And this as a native English-speaker, with a relatively neutral accent.)

I can definitely imagine better fleshed out solutions to either voice recognition for programming, or programming languages whose syntax lends themselves to easier and more productive voice recognition. I could see some definite productivity enhancements if it were achieved.

[1] https://www.youtube.com/watch?v=8SkdfdXWYaI

mncharity
One upside of AR is sensor fusion, so components can be individually less robust. Those contending (mis)transcriptions might be selected among by an eye-tracked glance, rather than reaching for a keyboard, or even an "eep alt 3".
skj
Regarding sup-optimality...

Currently our programming is optimized for keyboard input. If sub-vocal instruction becomes "a thing", I suspect we would spend some collective effort to create a programming language optimized for that mode of input.

I think the only reason we don't have real languages using touch input is because of the lack of bandwidth for that signal.

Tavis Rudd gave an inspiring talk about solving this problem.

https://www.youtube.com/watch?v=8SkdfdXWYaI

Aug 11, 2019 · 1 points, 0 comments · submitted by susam
Tavis Rudd's Using Python to Code by Voice [0] is also interesting.

Looking up that URL, I came across the more recent Coding by Voice with Dragonfly [1]. Haven't seen that one yet, but just quickly skimmed through. Unfortunately the live demos are silent (you can't hear speaker Boudewijn Aasman talk to the computer), but it seems worth checking out if you're interested in setting up some recent voice recognition software with your own custom voice command syntax.

[0] https://youtu.be/8SkdfdXWYaI [1] https://youtu.be/P5DCDiCv4TE

I have been lucky to not be impacted by hand or wrist pain. I did experience chronic back pain for years due to a pinched sciatic nerve, it can be debilitating and depressing. Chronic pain is a large precursor to suicide, if you have it, you need to get it fixed.

My interest in this is from a FutureOfProgramming perspective, how can we interact with computers in different, higher order ways. What does the Mother of All Demos [0] look like in 2020?

Travis Rudd, "Coding by Voice" (2013) [1] this uses Dragon in a Windows VM to handle the speech to text. This was the original, "termie slash slap", a kind of Pootie Tang crossed structural editing.

Coding by Voice with Open Source Speech Recognition, from HOPE XI (2016) [2]

Another writeup [3] that outlines Dragon and a project I hadn't heard of called Silvius [4]

It looks like most of these systems rely on Dragon, and Dragon on Windows at that due to not having the extension APIs on Mac/Linux. Are there any efforts to use the Mac's built in STT or the Cloud APIs [5,6]?

[0] Mother of All Demos https://www.youtube.com/watch?v=yJDv-zdhzMY

[1a] https://www.youtube.com/watch?v=8SkdfdXWYaI

[1b] https://youtu.be/8SkdfdXWYaI?t=510

[2] https://www.youtube.com/watch?v=YRyYIIFKsdU

[3] https://blog.logrocket.com/programming-by-voice-in-2019-3e18...

[4] http://www.voxhub.io/silvius

[5] https://cloud.google.com/speech-to-text/

[6] https://aws.amazon.com/transcribe/

caspar
Talon actually uses Mac's built in STT if you don't have Dragon.

James from http://handsfreecoding.org/ was working on a fork of Dragonfly[0] to add support for Google's speech recognition, but I'm not sure if he still is. There are several barriers to that working well though: additional latency really hurts, API usage costs and (as far as I know) an inability to specify a command grammar (Dragonfly/Vocola/Talon all let you use EBNF-like notation to define commands, which are preferentially recognized over free-form dictation).

[0]: https://github.com/dictation-toolbox/dragonfly

tmacwilliam
We’re working on a new voice coding app called Serenade [1] that sounds like what you’re describing. We tried cloud speech APIs, but found the accuracy wasn’t good enough for common programming words like "enum" or "attr". We found that using our speech engine (based on Kaldi [2]) designed specifically for coding gave us better accuracy.

Serenade also enables you to use natural English voice commands like "add method hello" or "move value to function". Not only is this often faster than typing, but it also means you don’t have to memorize the syntax details of every language or many editor shortcuts. Our app is still early, but we think the future of programming is working with these higher-level inputs rather than typing out all code by hand.

We’re looking for people to give feedback, so if anyone is interested in giving it a try, you can download Serenade at [3] or email me at [email protected].

[1] https://serenade.ai

[2] http://kaldi-asr.org

[3] https://serenade.ai/download

This might be the programmer in question. Pretty impressive.

https://www.youtube.com/watch?v=8SkdfdXWYaI

melling
There are several other newer resources:

https://github.com/melling/ErgonomicNotes/blob/master/progra...

The emacs macros for voice coding talk:-

https://youtu.be/8SkdfdXWYaI

I believe this is the PyCon 2013 talk to which TFA refers:

https://www.youtube.com/watch?v=8SkdfdXWYaI

Yeah, that’s one of common things people claim. Someone has said pushups would do the trick.

Tavis Rudd in this Programming by Voice video thought his rock climbing helped prevent issues before he got RSI.

https://youtu.be/8SkdfdXWYaI

I’ve cataloged lots of solutions:

https://github.com/melling/ErgonomicNotes

Your comment reminded me of an interesting video I saw a while back. The video is a talk [0] by a programmer explaining and demonstrating how he developed and used a voice recognition system to program in emacs, after a significant RSI injury.

The commands and syntax he uses to interact with the system might be a decent approximation of the custom English syntax you suggest.

[0]: https://youtu.be/8SkdfdXWYaI?t=537

Mar 04, 2018 · gknoy on Show HN: Emacs Anywhere
Tavis Rudd uses voice recognition to do his editing (in Emacs, no less):

https://www.youtube.com/watch?v=8SkdfdXWYaI

The key is that he basically made up his own language to handle some things like special characters.

zamber
Thanks for this. Will keep it in mind when RSI kicks in.

Here's a synopsis of the talk http://ergoemacs.org/emacs/using_voice_to_code.html

Couldn't find the repo for the dragon stuff he promised to publish on his GH profile :/.

Edit: Found this summary he re-tweeted recently https://medium.com/bambuu/state-of-voice-coding-2017-3d2ff41....

First, it doesn't feel unpleasant as I do it. There are a few reasons why it might look that way.

I have very little practice at this point, especially considering I'm still changing my grammars regularly, so the editing video there is more a technical demo of the software's performance in spite of my own awkwardness while learning. That was my first and only take. I'm basically learning to type again, with cache misses as I try to remember commands and keep track of my position (like trying to remember where keys are, or the next note in a song as you pluck it out).

It's also possible the most accurate form of voice input does not sound like natural language. Finally, it's a different mental exercise to copy text that is already written versus writing and editing my own code, so I'm triply out of my element when recording these early videos.

Compare to Tavis Rudd's video from 2013, where he pauses between every command: https://youtu.be/8SkdfdXWYaI?t=1050

I've recorded my own version of his demo today for comparison: https://youtu.be/wt4PR5j7vBE

I won't be switching any time soon, but it's certainly possible: https://www.youtube.com/watch?v=8SkdfdXWYaI
Nov 17, 2017 · 31 points, 8 comments · submitted by tosh
jsjolen
I purchased voicecode.io for 300 bucks. A few years later and no Windows or Linux release, too bad!
singularity2001
In order to code with your voice it is helpful to have a syntax which is voice friendly. Now ported to python: https://github.com/pannous/angle
tosh
Demo starts about 9min into the talk
wiredfool
I was really excited by this when I saw it 4 years back, but the promised open source release never happened.
explorigin
This is really cool, but he never released his code (https://github.com/tavisrudd). However there are others: https://github.com/simianhacker/code-by-voice https://github.com/calmofthestorm/aenea https://github.com/dictation-toolbox
jsjolen
He kind of did. I asked him and he sent me a copy, I'm pretty sure I'm not allowed to send it along however.
explorigin
Literally his response on Youtube:

For those who are interested in trying something like this, please see https://github.com/dictation-toolbox or google for 'github voice code'. I've been releasing tarballs of my code privately to those who already have natlink/dragonfly working but it is so custom to my setup that you wouldn't be able to use it directly. Others, who appear to have much more time than I, have taken up the banner on this.

----

That's not really "released" in most peoples understanding. I don't fault him for not starting a project that he doesn't have time to maintain but he kinda started something and left people hanging. I think the better way to handle this would have been to just release his code and label it as abandoned. Even if it is custom, other (literally hurting people) would have the time and motivation to make it reusable.

melling
There is another open source project for this: https://github.com/dictation-toolbox

A bunch of other resources are here: https://github.com/melling/ErgonomicNotes/blob/master/progra...

I love this concept. Anyone know of software (other than Dragon) that does this with voice recognition? I could see myself using context aware voice commands in my day to day! I remember seeing some talks a few years back about a guy who used the Dragon API and Python to exclusively do programming due to his carpel tunnel's being very bad.

EDIT: programmer's name is Tavis Rudd https://www.youtube.com/watch?v=8SkdfdXWYaI

May 16, 2017 · 1 points, 0 comments · submitted by hultner
This guy took the first step. Its a great watch even though its kind of long

https://www.youtube.com/watch?v=8SkdfdXWYaI

IMHO I don't think it would fly. The most difficult things of programming are too abstract for that.

You may be interested in the guy that did all his coding with voice for a while because he couldn't use his hands, and still uses it to this day about half of the time. https://www.youtube.com/watch?v=8SkdfdXWYaI

melling
Yes, I already know many people don't think it will work. I have that url, along with a few others in my Programming by Voice notes:

https://github.com/melling/ErgonomicNotes/blob/master/progra...

No one said that you have to do everything. Start with what's possible then iterate.

See this: https://www.youtube.com/watch?v=8SkdfdXWYaI

> In a fast-paced live demo, I will create a small system using Python, plus a few other languages for good measure, and deploy it without touching the keyboard. The demo gods will make a scheduled appearance. I hope to convince you that voice recognition is no longer a crutch for the disabled or limited to plain prose. It's now a highly effective tool that could benefit all programmers.

Turns out it's doable, you just need to invent your own language.

Here's voice input for emacs: [1], [2]

And for vim (inspired by [2]): [3]

I'm sure gesture tracking would probably not be too hard, using any of a number of VR input devices now on the market. Eye tracking would be harder, as I'm not aware of any consumer-level eye-tracking technologies out there. You could certainly have head-tracking pretty easily, though.

Now integrating this in a way that would actually be fluid, natural, and useful is another story, though. Also, I'm personally not sure how many big body movements I want to be doing while using my computer (unless I'm trying to exercise, play a game, draw, conduct music, or lose weight).

Ideally, I'd want to be as motionless as possible while being as comfortable as possible and while having maximum flexibility and efficiency of input. Keyboards are pretty good for that.

[1] - https://www.youtube.com/watch?v=77zPOyMmMPQ

[2] - https://www.youtube.com/watch?v=8SkdfdXWYaI

[3] - https://www.youtube.com/watch?v=TEBMlXRjhZY

melling
[2] is Thomas Ballinger - Terminal whispering - PyCon 2015. Is that right?

VoiceCode is a $300 product. http://voicecode.io

Is it worth the money?

pmoriarty
Sorry. That was the wrong video. Fixed.
Here's an example, at least for code. "Using Python to Code by Voice" https://www.youtube.com/watch?v=8SkdfdXWYaI
I have been suffering from RSI in both arms for a decade now. It started after coding 10+ hour days working at a game company. The pain ebbs and flows with usage, but the inflammation and tightness has never disappeared completely.

Over the years, I've met a number of people who also also suffer from RSI in various forms. Fortunately, the majority of people I talk to suffer for some period of time, but do eventually recover, and are able to continue on with their lives having to be carful to avoid future flareups. A handful though, including myself, seem to fall into the "for life" category.

I've tried everything you can think of to varying results:

  * 15+ doctors in various specialties
  * Physical therapy
  * Massage therapists
  * Chiropractors
  * Acupuncture
  * Most every ergonomic keyboard and pointing device on the market
  * Various desk/chair/keyboard tray ergonomic adjustments
  * Addressing TMS (See books by Dr. Sarno)
  * Taking a year off of work (multiple times)
  * NSAIDs
This is resulting in me spending a lot of time looking into other career potabilities, but I keep ending up back writing code to some extent.

Voice coding is literally the first thing that comes to everyone's mind when you tell them you have arm pain, but it's a dead end in my mind. I tried to follow in the footsteps of Tavis Rudd, but I just ended up with voice strain from talking too much. You might have better luck though:

https://www.youtube.com/watch?v=8SkdfdXWYaI

At one point in my life, I had an intern who knew how to code type for me. No voice rec system will ever be as good as a human who knows how to code... and even dictating this way was slow and frustrating, though at least my voice wasn't strained.

Someone else here mentioned TMS. A good friend of mine who was suffering for months with arm pain was greatly helped by looking into and addressing it. I recommend Dr Sarno's book "The Mindbody Prescription". Even though it wasn't a cure for me personally, I did find it helpful to mitigate some of the pain.

As far as creating a community goes, I've never found reading about others suffering to be any help or comfort. If anything I've found it better to not dwell on it and just move on with life.

OTOH, a resource that offered suggestions and opportunities on how to make the most of the capabilities that are still available might be nice. Knowing there's still a lot that one can accomplished, even if you can't code for 8+ hours a day, 5 days a week.

Nothing out of the box as far as I know. You'll have to DIY.

Have a look at CMUSphinx/Pocketsphinx [1]. I wrote a comment about training it for command recognition in a previous discussion[2].

It supports BNF grammar based training too [3], so I've a vague idea that it may be possible to use your programmming language's BNF to make it recognize language tokens. I haven't tried this out though.

Either way, be prepared to spend some time on the training. That's the hardest part with sphinx.

Also, have you seen this talk for doing it on Unix/Mac [4]? He does use natlink and dragonfly, but perhaps some concepts can be transferred to sphinx too?

[1]: http://cmusphinx.sourceforge.net/ [2]: https://news.ycombinator.com/item?id=11174762 [3]: http://cmusphinx.sourceforge.net/doc/sphinx4/edu/cmu/sphinx/... [4]: https://www.youtube.com/watch?v=8SkdfdXWYaI

Welcome to the future (of 2013): https://youtu.be/8SkdfdXWYaI?t=9m5s https://youtu.be/8SkdfdXWYaI?t=16m22s

Don't forget to see the entire talk (28 minutes): https://www.youtube.com/watch?v=8SkdfdXWYaI

shpx
One important note is that your throat gets tired talking, just like your hands do typing.
dcvuob
I think it is an issue of convenience. Anecdotal evidence, I have never had trouble doing either of those for 8 hours straight (and more).

Voice will need some serious software support if it is going to take off. There is zero chance that users will learn those mnemonics, just as now almost nobody knows how to touch type. The usage will be more in the form of general commands, than specifying every action like we do it now with the keyboard+mouse.

Sep 10, 2015 · 2 points, 0 comments · submitted by JetSpiegel
The better alternative might be something more like this: https://www.youtube.com/watch?v=8SkdfdXWYaI#t=9m
Apr 14, 2015 · 1 points, 0 comments · submitted by beeworker
There was an interesting talk at PyCon 2013 about programming using only your voice: https://www.youtube.com/watch?v=8SkdfdXWYaI Might be worth looking into for your case.

It's worth mentioning that the guy did this for himself due to severe RSI in the hands, but there was another talk on using a stenographic keyboard to program and I remember this guy stood up and said he wished he'd known about it earlier as it's another approach for people who suffer from RSI since there's less typing.

[This talk by Tavis Rudd](https://www.youtube.com/watch?v=8SkdfdXWYaI) about a system he used when his hands were afflicted with RSI is pretty interesting.

Not entirely a no-hands approach, but I can't seem to remember the developer who made his first app in the hospital by typing with one/two fingers. He wrote a blogpost about his process that was really inspiring.

wcbeard10
After hearing this talk, I used this repo as an inspiration to get started on my own setup: https://github.com/dictation-toolbox/aenea

Works well for me, but with some friction from reliance on a Windows VM.

melling
Would be nice if Dragon could enhance their Mac product to be on par with their PC product. Even with Apple's recent resurgence it doesn't always get first class treatment.
feybay
I think the quality difference might be due to a lot of government jobs where you people use dragon are on Windows.
mistercow
Ever since I first saw that talk, I've been checking his github page to see if he's pushed his code yet. Not that I'm judging; if I had a dime for every project I totally intended to push to github "once I clean up the duct tape", and then didn't, I'd probably have, like, a dollar.
melling
Other people have pushed source code to github so you can stop waiting.

http://thespanishsite.com/public_html/org/ergo/programming_b...

Dec 28, 2014 · m48 on Look, no hands
I also have hand problems. I use speech recognition to type everything, including code, and a game controller to move around the mouse. The controller I'm currently using, a PS4 controller, has a trackpad in the middle, so I was able to mash that against my face and give her technique a shot.

It works better than you'd think. The PS4 trackpad isn't exactly brilliant, but I can move around the mouse and click on what I want to with some accuracy. Of course, the trackpad on the controller is very small and not very accurate, so it's not really practical for artwork or anything. But, with a better, larger trackpad, I can imagine this technique actually working. I might give it a shot at some point.

I am a bit worried about the inevitable neck and nose pain, though. I wish she had gone into a little more detail about how she avoids that. Maybe she just has a neck of steel?

For the curious, these are some other resources I've found about people working around RSI. Most of these are about using Dragon NaturallySpeaking to code by voice, since that's what I'm most interested in, but I think it's still interesting.

There really needs to be a list somewhere for open-source workarounds to disabilities. To the best of my knowledge, there really isn't one.

Natlink + Dragon NaturallySpeaking:

(NatLink, which lets you make custom speech commands for Dragon in Python, is currently being developed at http://qh.antenna.nl/unimacro/index.html, but that site's pretty incomprehensible. The original author's site is at http://www.synapseadaptive.com/joel/welcomeapage.htm. It's pretty out of date, but explains the fundamentals of the system better, I think.)

https://www.youtube.com/watch?v=8SkdfdXWYaI (don't bother looking around, the source code of this was never released)

https://github.com/simianhacker/code-by-voice

https://github.com/tgrosinger/aenea-grammars

Libraries for using Dragon NaturallySpeaking on Linux with VMs:

https://github.com/dictation-toolbox/aenea

https://github.com/russell/damselfly

It seems a post about this kind of thing pops up about every other month or so. I'm thinking of showing off my system here when I polish it up a bit. It's not nearly as complicated as some of these other ones, but I'm beginning to get pretty close to normal typing speed coding by voice.

iamcreasy
> don't bother looking around, the source code of this was never released

But the video like has a comment(1 week old) from Tavis Rudd linking github repository of his code : https://github.com/dictation-toolbox.

And about your setup, please do post it! :)

m48
I'm not completely sure it's his code. In his comments, he says it is for "those who are interested in trying something like this," and the readmes of most of the repos thank Tavis Rudd for inspiration, which would be a little strange if he wrote them. It might be based on his system, though, since he said he had privately sent some people his code. I haven't looked into it too much, though.

You can find my setup on GitHub fairly easily right now (my screenname's the same), but at the moment a good portion of the code is embarrassingly terrible. Making voice commands to code while using those voice commands to code is predictably awkward. Besides, I want to write some documentation on how to get it working and stuff.

How about this versus coding by voice? https://www.youtube.com/watch?v=8SkdfdXWYaI
First, these are two different problems to solve. Voice recognition and deep learning are different fields.

Is training really the issue for voice recognition? It has been a problem that has almost been solved for over a decade. Last year I saw this impressive use of Dragon Naturally Speaking for the PC, running in a VM on a Mac, that pretty much worked to code by voice.

https://www.youtube.com/watch?v=8SkdfdXWYaI

The developer mentioned that he didn't have any luck with Sphinx.

Xah Lee summarized the talk here: http://ergoemacs.org/emacs/using_voice_to_code.html

IshKebab
I've tried to use sphinx, but the problem was lack of training data (you have to supply it yourself pretty much!). It did have some data that was supposed to recognise numbers, but it didn't work (I mean, it ran, but the recognition was awful even when it only had to pick between 10 options).

Training is a huge issue for voice recognition. It's the only way Google and Apple have managed to take voice recognition from "works 80% of the time, but that is still bad enough to be totally usable" to "this actually works!". Maybe you don't remember how bad voice recognition was 10 years ago.

To give you an idea how important it is, on OSX you have the option to download data to improve offline voice recognition. It's something like 500 MB. And that's the result of the training.

mikeash
I think there may be some confusion as to what "training" means. When it comes to voice recognition, it makes me (and I suspect others) think of the older software which required a user to read a bunch of text to it to train the software to your specific voice before it could do any kind of decent job understanding you. Now, everything is speaker-agnostic and works out of the box for anybody. Different kind of training.
walterbell
Thanks for that pointer to the Python library which integrates with Dragon/Nuance to enable arbitrary commands, https://pypi.python.org/pypi/dragonfly/
bane
https://www.youtube.com/watch?v=KyLqUf4cdwc
melling
This video is not going to convince anyone to use voice recognition.
syllogism
...Of course deep learning is not voice recognition. But the most recent advances in speech recognition have been from deep learning models, which have come from academia.

The system in the video you link is single speaker, closed vocabulary. You need massive training data for multi-speaker, open vocabulary.

You might be interested in watching this talk:

https://www.youtube.com/watch?v=8SkdfdXWYaI

Although, if I recall correctly, he uses Dragon Naturally Speaking, which is a commercial product.

Jun 14, 2014 · melling on Life After Losing an Arm
Have you tried any voice recognition? This guy manages to program with Python and Dragon by creating macros in Emacs.

http://m.youtube.com/watch?v=8SkdfdXWYaI

Voice recognition is probably good enough now and could be useful if support were just built into the tools.

mtrimpe
Has he released his source code for that yet though? From what I've heard it's still not available anywhere.

As for voice recognition; it's pretty good but not nearly perfect yet. When you can limit the vocabulary though (as you can in most programming contexts) it does already perform rather well.

Jun 13, 2014 · cmsmith on Life After Losing an Arm
See [1] for what a developer can do when exposed to this exact problem.

[1] http://www.youtube.com/watch?v=8SkdfdXWYaI#t=558

Jun 07, 2014 · goblin89 on AirType
> "This is how you don't fuck up your body while working"

As an aside, “Using Python to code by voice”[0] is quite an inspiring presentation on the topic.

[0] http://www.youtube.com/watch?v=8SkdfdXWYaI#t=542

johnzim
That talk is why I switched to a standing desk and changed my keyboards.
Mar 03, 2014 · 2 points, 0 comments · submitted by fortepianissimo
> I'd bet that, unless you had an optimized "dialect" for coding or a language designed to be efficiently spoken, a keyboard will beat a human speaking code every time.

More or less what this guy did: http://youtu.be/8SkdfdXWYaI

It seems to me that languages with a more regular syntax are going to have a dramatic advantage here.

Thanks. I've actually been using Dragon Naturally Speaking for awhile now. It's pretty fantastic on Windows, but only so-so on OS X, which is my primary OS these days. Unfortunately it really isn't that great for code, even in the best case: http://www.youtube.com/watch?v=8SkdfdXWYaI
melling
Pretty cool. I read Ousterhout's link mentioned above. At the very bottom he mentions a tool: a2x, which he uses to transmit dictated text to a Unix box. This should work on the Mac.

http://www.cl.cam.ac.uk/a2x-voice/a2x-faq.html

lastofus
I saw that, though I didn't think it would be of any use on OS X since it doesn't use X Windows for the GUI, and what support it did have for X Windows was dropped in 10.6 I think.
Jan 14, 2014 · 2 points, 0 comments · submitted by WestCoastJustin
Tavis Rudd does something similar with python and emacs.

Using Python to Code by Voice: http://www.youtube.com/watch?v=8SkdfdXWYaI

Aug 27, 2013 · 2 points, 0 comments · submitted by 4midori
Aug 26, 2013 · 1 points, 0 comments · submitted by pie
Jun 06, 2013 · 1 points, 0 comments · submitted by msvan
May 06, 2013 · thomasjames on VimSpeak
A similar project using Emacs and really clever use of Python and Dragon Naturally Speaking SDK was my favorite presentation at PyCon this past year.

https://www.youtube.com/watch?v=8SkdfdXWYaI

Mar 24, 2013 · 6 points, 0 comments · submitted by spdy
Mar 22, 2013 · 4 points, 0 comments · submitted by oskarth
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.