Microsoft researchers are trying to humanize virtual assistants by studying multilingual speech

HYDERABAD, India — Voice-powered virtual assistants, underpinned by artificial intelligence, like Apple's Siri, Amazon's Alexa, Google Assistant and Microsoft's Cortana are becoming regular fixtures in people's lives. They're present at homes, on devices and watches and in cars, sending driving directions, weather updates, meeting reminders and the occasional joke or two when prompted.

Beyond that, they remain limited in their ability to hold conversations with users, the same way real people might. Efforts are on, however, to use machine learning and real-time big data analytics to make virtual assistants understand multiple languages, accents, contexts and nuances to hold more human-like conversations.

Drew Angerer | Getty Images

International Data Corporation predicts global spending on cognitive and AI solutions will see significant investments over the next several years, and could achieve a compound annual growth rate of 54.4 percent through 2020.

Microsoft, for example, is turning to an unlikely group to bridge the gap between human-machine interactions: bilinguals. Of specific interest is the practice of code-mixing, which is when speakers switch back and forth between multiple languages in a single sentence or conversation. It's commonly found in multilingual societies.

A handful of Microsoft researchers in India started Project Mélange, where they are studying the use of code-mixing among Indians online. They are trying to figure out how virtual assistants might be taught to respond to a user switching between, for example, English and Hindi in a conversation.

"Compartmentalization of mixed languages (by multilinguals) have gone away with each coming generation," Kalika Bali, a researcher at Microsoft, told CNBC in an interview. "So younger people use mixed languages in more and more phases of their lives."

"To have a digital assistant, something like Cortana, you have to be able to understand (the user base)," she said, adding current systems weren't trained to pick up multiple languages in a single conversation.

Microsoft's philosophy is that virtual assistants should not be too intrusive on a user's space and time, and only show up when they need to, Sundar Srinivasan, general manager of the AI division in India at Microsoft, said in a recent interview. Srinivasan's team works on Cortana and Microsoft's search engine Bing at the company's development center in Hyderabad.

"A personal assistant is only as effective as you want her to be," he said.

Cortana is available on Windows 10 and in selected smartphones running on the Windows operating software. It has about 140 million monthly active users versus about 500 million devices currently running Windows 10. The program is not available in many regions, but Srinivasan said Microsoft plans to expand it into other markets and onto more devices in the future.

By comparison, Google Assistant is available on both Android and iOS platforms, which support about 99.8 percent of all smartphones in the world.

Srinivasan explained that Cortana was not like other Microsoft products such as Office, where the company could add features as and when needed. Instead, Cortana "needs to really understand human beings, and the bar for that is very high because that means we have to train the assistant with lots of human speech data," he said.

"It's going to take us time."

Bali said the researchers' code-mixing study was inspired by observations from an anthropologist who was looking at the use of technology among Urdu-speaking youth in Hyderabadi slums. They were found using the internet to befriend and interact with girls from Brazil, who primarily spoke Portuguese. English, however, was the primary language for that communication, which piqued Bali's interest and she had asked to see the data collected by the anthropologist.

"When I started seeing this data, I saw that not only were they using this very pidgin English kind of thing, but (they were) effectively communicating with each other," Bali said, referring to the type of code-mixing that was happening in these conversations.

The team looks at every aspect of code-mixing, including text, speech, understanding and recognition. Bali said they also look at generational variations and why people switch between languages in a single conversation — for example, sometimes it's for humor and at other times it's to change topics.

But it may take years before a voice-powered virtual assistant can eloquently switch back and forth in multiple languages to respond to users. The biggest challenge now for Bali and the researchers is getting access to adequate data sets for study.

"Trying to solve just mono-lingual natural language processing and understanding has taken years. Now we're talking about mixed stuff and where are we going to get data from?" Bali said. "Because everything is data-driven. And what are we going to do? It just seemed like a very difficult problem to solve but we were all so excited about this that we just went on."

Currently, the team uses data collected from Twitter to study how users would switch between languages. Studies, Bali said, already showed Indian men who spoke English and Hindi tended to switch to the latter when they had to express negative sentiments or abuse. Women who are having a conversation in English, however, tended to stick with that language even when the content took a negative turn.

Teaching machines to interpret code-mixing could also potentially lead to developments in areas of opinion mining, customization and a better interpretation of nuances and context, Bali said.

"I think this would definitely help to bridge the gap in the human-computer interaction. The fact that you can actually talk to a machine the way you would normally talk to your friend is something we still need to wrap our heads around."