Tech

Why everyone is talking about an image generator released by an Elon Musk-backed A.I. lab 

Share
Key Points
  • OpenAI has trained a piece of software, known as Dall-E, to generate images from short text captions.
  • It demoed how the AI could create armchairs in the shape of avocados and baby daikon radishes wearing tutus.
  • Dall-E comes just a few months after OpenAI announced it had built a text generator called GPT-3.
SpaceX founder Elon Musk looks on at a post-launch news conference after the SpaceX Falcon 9 rocket, carrying the Crew Dragon spacecraft, lifted off on an uncrewed test flight to the International Space Station from the Kennedy Space Center in Cape Canaveral, Florida, March 2, 2019.
Mike Blake | Reuters

Armchairs in the shape of avocados and baby daikon radishes wearing tutus are among the quirky images created by a new piece of software from OpenAI, an Elon Musk-backed artificial intelligence lab in San Francisco.

OpenAI trained the software, known as Dall-E, to generate images from short text captions. It specifically used a dataset of 12 billion images and their captions, which were found on the internet.

The lab said Dall-E — a portmanteau of Spanish surrealist artist Salvador Dali and Wall-E, a small animated robot from the Pixar movie of the same name — had learned how to create images for a wide range of concepts.

OpenAI showed off some of the results in a blog post published on Tuesday. "We've found that it [Dall-E] has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images," the company wrote.

Dall-E is built on a neural network, which is a computing system vaguely inspired by the human brain that can spot patterns and recognize relationships between vast amounts of data.

While neural networks have generated images and videos before, Dall-E is unusual because it relies on text inputs whereas the others don't.

VIDEO4:5704:57
C3.ai CEO Tom Siebel on the future of AI

Synthetic videos and images have become more sophisticated in recent years to the extent that it has become hard for humans to distinguish between what is real and what is computer-generated. General adversarial networks (GANs), which employ two neural networks, have been used to create fake videos of politicians, for example.

OpenAI acknowledged that Dall-E has the "potential for significant, broad societal impacts," adding that it plans to analyze how models like Dall-E "relate to societal issues like economic impact on certain work processes and professions, the potential for bias in the model outputs, and the longer term ethical challenges implied by this technology."

GPT-3 successor

Dall-E comes just a few months after OpenAI announced it had built a text generator called GPT-3 (Generative Pre-training), which is also underpinned by a neural network.

The language-generation tool is capable of producing human-like text on demand and it became relatively famous for an AI program when people realized it could write its own poetry, news articles and short stories.

"Dall-E is a Text2Image system based on GPT-3 but trained on text plus images," Mark Riedl, associate professor at the Georgia Tech School of Interactive Computing, told CNBC.

"Text2image is not new, but the Dall-E demo is remarkable for producing illustrations that are much more coherent than other Text2Image systems I've seen in the past few years."

OpenAI has been competing with firms like DeepMind and the Facebook AI Research group to build general purpose algorithms that can perform a wide range of tasks at human-level and beyond.

Researchers have built AIs that can play complex games like chess and the Chinese board game of Go, translate one human language to another, and spot tumors in a mammogram. But getting an AI system to show genuine "creativity" is a big challenge in the industry.

Riedl said the Dall-E results show it has learned how to blend concepts coherently, adding that "the ability to coherently blend concepts is considered a key form of creativity in humans."

"From the creativity standpoint, this is a big step forward," Riedl added. "While there isn't a lot of agreement about what it means for an AI system to 'understand' something, the ability to use concepts in new ways is an important part of creativity and intelligence."

Neil Lawrence, the former director of machine learning at Amazon Cambridge, told CNBC that Dall-E looks "very impressive."

Lawrence, who is now a professor of machine learning at the University of Cambridge, described it as "an inspirational demonstration of the capacity of these models to store information about our world and generalize in ways that humans find very natural."

He said: "I expect there will be all sorts of applications of this type of technology, I can't even begin to imagine. But it's also interesting in terms of being another pretty mind-blowing technology that is solving problems we didn't even know we actually had."

'Doesn't advance the state of AI'

Not everyone is that impressed by Dall-E, however.

Gary Marcus, an entrepreneur who sold a machine-learning start-up to Uber in 2016 for an undisclosed sum, told CNBC that it's interesting but it "doesn't advance the state of AI."

He also pointed out that it hasn't been opened sourced and the company hasn't yet published an academic paper on the research.

Marcus has previously questioned whether some of the research published by rival lab DeepMind in recent years should be classified as "breakthroughs." 

OpenAI was set up as a non-profit with a $1 billion pledge from a group of founders that included Tesla CEO Elon Musk. In February 2018, Musk left the OpenAI board but he continues to donate and advise the organization.

OpenAI made itself for-profit in 2019 and raised another $1 billion from Microsoft to fund its research. GPT-3 is set to be OpenAI's first commercial product and Reddit has signed up as one of the first customers.