Computers find it difficult to understand the English language. This is where Google's Parsey McParseface comes in.
The U.S. search giant is releasing the code, a piece of software called SyntaxNet and Parsey McParseface for free for developers to use. Together, these pieces of software will allow a developer's program or app to understand written English, something that companies from Facebook to Google have placed great emphasis on.
Google's move is significant because artificial intelligence (AI) and in particular, natural language understanding, is very difficult for computers and even harder for developers to code. By opening up the capabilities to do this, Google is giving these developers the chance to integrate natural language understanding into their apps.
As users ask AI systems on their phone – such as Google Now or the bots in Facebook Messenger – to carry out more tasks, the ability for them to understand the English language and execute the correct response is crucial.
Parsey McParseface is a so-called "English parser" which has been trained to understand the language. SyntaxNet is a "syntactic parser", trained to understanding the syntax of a sentence.
Take the sentence "Alice saw Bob". The two work together to decipher that Alice and Bob are nouns while "saw" is the verb.
But Parsey McParseface is also able to break down and understand an even more complex sentence like the one below.
Google said that being able to understand this allows the system to surface answers to a question such as, "what had Alice been reading about?"
Parsing is the process of the software breaking down the sentence in order to understand the words and context. But it's difficult for machines to do.
"One of the main problems that makes parsing so challenging is that human languages show remarkable levels of ambiguity. It is not uncommon for moderate length sentences - say 20 or 30 words in length - to have hundreds, thousands, or even tens of thousands of possible syntactic structures," Google wrote in a blog post.
"A natural language parser must somehow search through all of these alternatives, and find the most plausible structure given the context."
SyntaxNet applies so-called "neural networks" to the ambiguity problem. Google explains that the software reads a sentence and comes up with multiple hypotheses as to which variant of the sentence is the most plausible in a specific context. The most highly-ranked hypothesis is the one that is eventually surfaced to the user.
Google said that Parsey McParseface was 94 percent accurate when it read randomly drawn newswire sentences.
The search giant said that there were no studies on how human performance compares, but "from our in-house annotation projects that linguists trained for this task agree in 96-97 percent of the cases", suggesting the system is "approaching human performance". But Google has bigger ambitions.
"Our work is still cut out for us: we would like to develop methods that can learn world knowledge and enable equal understanding of natural language across all languages and contexts," the company said.