Speech recognition is the ability of a computer to recognize the spoken word.
“Alexa: read me something interesting from Innowiki.”
“Duh human, everything on Innowiki is interesting or it wouldn’t be there.”
Today, inexpensive pocket-sized phones connect to centralized servers and understand the spoken word in countless languages. Not so long ago, that was science fiction.
Background
Star Trek in 1966, The HAL 9000 of 2001: A Space Odyssey of 1968, Westworld in 1973, and Star Wars in 1977 all assumed computers will understand the spoken word. What they missed is that people would become so fast at using other input devices, especially keyboards, that speaking is viewed as an inefficient input method.
The first real speech recognition actually predates science fiction ones. In 1952, three Bell Labs scientists created a system, “Audrey,” which recognized a voice speaking digits. A decade later, IBM researchers launched “Shoebox” that recognized 16 English words.
In 1971, DARPA intervened with the “Speech Understanding Research” (SUR) program aimed at a system which could understand 1,000 English words. Researchers at Carnegie Mellon created “Harpy” which understood a vocabulary comparable to a three-year-old child.
Researched continued. In the 1980s the “Hidden Markov Model” (HMM) proved a major breakthrough. Computer scientists realized computers need not understand what a person was saying but, rather, just to listen to sounds and look for patterns. By the 1990’s faster and less expensive CPUs brought speech recognition to the masses with software like Dragon Dictate. Bell South created the voice portal phone-tree system which, unfortunately, frustrates and annoys people to this day.
DARPA stepped back in during the 2000s, sponsoring multi-language speech recognition systems.
Rapid Advancement
However, a major breakthrough came from the private sector. Google released a service called “Google 411” allowing people to dial Google and lookup telephone numbers for free. People would speak to a computer that would guess what they said then an operator would answer, check the computer’s accuracy, and delivered the phone number. The real purpose of the system was to better train computers with a myriad of voices, including difficult-to-decipher names. Eventually, this evolved into Google’s voice recognition software still in use today.
Speech recognition continues to advance in countless languages. Especially for English, the systems are nearing perfection. They are fast, accurate, and require relatively little computer processing power.
In 2019 anybody can speak to a computer though unless their hands are busy doing something else, most prefer not to.