Google parent Alphabet's DeepMind division has made what it claims is a big leap forward in making computers sound more human.
The artificial intelligence start-up, bought by Google for £400 million ($532 million) in 2014, outlined in a blog post on Thursday how its new technology is able to make computer-generated speech sound more natural, and it performs better than Google's current solution.
"Allowing people to converse with machines is a long-standing dream of human-computer interaction," DeepMind's blog said.
So far, generating speech with computers – known as text-to-speech (TTS) – is based on a process where a large database of speech fragments are recorded from one person and then combined to create a sentence. This makes modifying voices or adding emotion to computer-generated speech difficult.
WaveNet, an artificial intelligence (AI) system developed by Deepmind, is able to take a raw sound in waveform and model any kind of audio using technology called neural networks -- a form of AI that tries to simulate the human brain.
To model waveforms, DeepMind needs to take 16,000 samples per second to generate data. This can then be converted into speech.
DeepMind said that its Chinese and English computer generated language systems are better than Google's.
"As you can hear from these samples, a single WaveNet is able to learn the characteristics of many different voices, male and female. To make sure it knew which voice to use for any given utterance, we conditioned the network on the identity of the speaker," DeepMind said in a blog.
The company added that it could add additional inputs into its AI model such as emotions or accents to make the speech "even more diverse and interesting".
Text-to-speech is a technology that companies from Apple to Microsoft are interested in as they could be critical in making digital personal assistants such as Siri or Cortana smarter and more human-like.
DeepMind rose to fame earlier this year when it created an AI system to beat a champion Go player.