Searching for speech technology’s holy grail

Published Wed, Mar 28 20129:21 AM EDTUpdated Fri, May 1 202010:38 AM EDT

Heesun Wee @heesunweereporting @heesunwee

Telephone your credit-card company, health insurer or just about any big consumer-facing company, then speak into the receiver: for new accounts, say “new"; or billing, say “billing.” Forget it. You shout and stumble through the phone maze and often land at the directory’s start.

Recognize this experience? Somehow, voice recognition —despite some of technology’s most awesome achievements (tablets! the remote control!) — remains an anathema. We still can’t talk to computers like Captain Picard on “Star Trek: The Next Generation.”

Turns out building voice recognition to acknowledge a “yes” or “no” — let alone complex conversations — isn’t so easy.

“There are 200 ways to say ‘yes: Sure, okay, yeah, uh huh’ ” to name a few, says Alex Rudnicky, a speech recognition expert and researcher at Carnegie Mellon University in Pittsburgh, Penn. “Getting that right is surprisingly difficult.”

The foundations of most voice technology used today date back to the ‘70s. But with the proliferation of smartphones and new voice technology, devices like Apple’s iPhone 4S have upped the ante on voice recognition.

Growing, Billion-Dollar Business

At the 2012 Consumer Electronics Show in Las Vegas, manufacturers including and Nuance Communications unveiled TVs with voice-activated functionality.

In fact, Nuance helps power the Siri “personal assistant” system on the Apple iPhone 4S. Nuance translates spoken words into text. Siri then analyzes the text, figures out what it means and translates the words to intended actions.

Voice recognition is a lucrative, yet decentralized market that spans everything from toys to healthcare. The voice-related enterprise space alone that includes automated customer service is a roughly $10-billion market, says Richard Mack, vice president of communications for Nuance. Clinical documentation is an estimated $15-billion market.

Mobile and consumer voice products including cars and electronics are estimated at $5 billion, adds Mack. That figure doesn’t even include TVs or third-party developers, who are just gaining traction in the growing voice-recognition space.

Next-generation speech recognition “is really going to be ubiquitous,” says Seth Rosenblatt, a senior editor for CNET, which highlights tech trends and consumer-product reviews. But it won't happen overnight.

Game Changer: The iPhone 4S Siri

Back in October 2011 at a corporate Apple presentation, executive Scott Forstall demonstrated the new iPhone 4S. He asked the smartphone: what’s the weather like? Within a few seconds a weather chart — specific to his location — appeared on the smartphone screen. The phone also offered an audio voice response.

The iPhone 4S not only recognized the specific words, but its context. This is known as flexible, natural-language processing — speech technology’s holy grail. No matter how many ways you ask about the weather (“Do I need a raincoat today?), the system understands your meaning and context to offer the correct response.

Apple’s Forstall asked for local directions, food recommendations, a reminder to call his wife and Siri responded flawlessly. The presentation was recorded on video and went viral among gadget fans.

“It was one of the most amazing things I had seen,” says Paul McFedries, a tech expert and author of a new book, “iPhone 4S Portable Genius.”

Great Expectations

But soon after the new iPhone hit shelves in October 2011, some consumers experienced less success. Transcripts of dictated texts, for example, sometimes appeared jumbled. Consensus emerged that the voice technology, though ground-breaking, remained a work in progress.

Part of Siri’s mixed reception stems from Forstall’s demonstration.

“That great demo set expectations up pretty high,” says author McFedries, also a technical writer. “A lot of people don’t realize it’s a beta product,” he says. In a rare Apple move, the tech giant launched Siri as a developing beta product.

So why release Siri before it’s ready? Apple needs consumers to road test the technology, and collect their data to fine tune its product. “The way voice recognition works, it needs a lot of data,” McFedries says. “That’s the only way to get this to be a really good product.”

Tech, Auto Sectors Chase Voice Technology

In fact, more tech companies including Google are chasing advanced voice recognition, a space dominated for years by Nuance and its Dragon line of products.

Advanced voice technology is spreading fast on smartphones. In addition to Apple, Android phones feature built-in free access to Google’s network-based speech recognition. Windows-powered mobile devices offer built-in free access to Microsoft’snetwork-based speech recognition.

“Voice is capable of doing some very difficult things as Siri has demonstrated,” says Bill Meisel, editor of Speech Strategy News, an industry newsletter.

Meisel says voice technology will revolutionize the tech ecosystem the way graphical user interface forever changed personal computing. (Graphical user interface allows users to interact with electronic devices through images, rather than text commands.)

Cars are getting better speech recognition, too. Voice prompts like “What’s playing?” will trigger music and media choices. Among the auto makers investing in new speech recognition includeGM'sCadillac unit, Kia Motors America and Ford Motor, according to "Speech Strategy News."

Better Customer Service Ahead?

So with all this new technology, is automated customer service improving? The short answer is yes.

Mack of Nuance says the cycle between research development and the marketplace has been reduced to about several years compared to a decade-long gap. This time around, advances including natural-language processing are reaching critical mass faster.

For example, more automated call centers today begin with an open-ended “How can I help you?” versus a menu of specific questions.

Businesses and scientists, meanwhile, continue to chase more sophisticated speech recognition as portrayed on “Star Trek.”

Says author McFedries, “There is a certain generation of geeks, who grew up with that and the romance of that is too much to resist.”