Teaching AI systems to behave themselves

Published Mon, Aug 14 201711:49 AM EDT

Cade Metz

From left: Geoffrey Irving and Dario Amodei show an autonomous system that has taught itself to play video games, at OpenAI in San Francisco, July 10, 2017.

Christie Hemm Klok | The New York Times

At OpenAI, the artificial intelligence lab founded by Tesla's chief executive, Elon Musk, machines are teaching themselves to behave like humans. But sometimes, this goes wrong.

Sitting inside OpenAI's San Francisco offices on a recent afternoon, the researcher Dario Amodei showed off an autonomous system that taught itself to play Coast Runners, an old boat-racing video game. The winner is the boat with the most points that also crosses the finish line.

The result was surprising: The boat was far too interested in the little green widgets that popped up on the screen. Catching these widgets meant scoring points. Rather than trying to finish the race, the boat went point-crazy. It drove in endless circles, colliding with other vessels, skidding into stone walls and repeatedly catching fire.

Mr. Amodei's burning boat demonstrated the risks of the A.I. techniques that are rapidly remaking the tech world. Researchers are building machines that can learn tasks largely on their own. This is how Google's DeepMind lab created a system that could beat the world's best player at the ancient game of Go. But as these machines train themselves through hours of data analysis, they may also find their way to unexpected, unwanted and perhaps even harmful behavior.

That's a concern as these techniques move into online services, security devices and robotics. Now, a small community of A.I. researchers, including Mr. Amodei, is beginning to explore mathematical techniques that aim to keep the worst from happening.

At OpenAI, Mr. Amodei and his colleague Paul Christiano are developing algorithms that can not only learn tasks through hours of trial and error, but also receive regular guidance from human teachers along the way.

With a few clicks here and there, the researchers now have a way of showing the autonomous system that it needs to win points in Coast Runners while also moving toward the finish line. They believe that these kinds of algorithms — a blend of human and machine instruction — can help keep automated systems safe.

For years, Mr. Musk, along with other pundits, philosophers and technologists, have warned that machines could spin outside our control and somehow learn malicious behavior their designers didn't anticipate. At times, these warnings have seemed overblown, given that today's autonomous car systems can even get tripped up by the most basic tasks, like recognizing a bike lane or a red light.

But researchers like Mr. Amodei are trying to get ahead of the risks. In some ways, what these scientists are doing is a bit like a parent teaching a child right from wrong.

Many specialists in the A.I. field believe a technique called reinforcement learning — a way for machines to learn specific tasks through extreme trial and error — could be a primary path to artificial intelligence. Researchers specify a particular reward the machine should strive for, and as it navigates a task at random, the machine keeps close track of what brings the reward and what doesn't. When OpenAI trained its bot to play Coast Runners, the reward was more points.

This video game training has real-world implications.

If a machine can learn to navigate a racing game like Grand Theft Auto, researchers believe, it can learn to drive a real car. If it can learn to use a web browser and other common software apps, it can learn to understand natural language and maybe even carry on a conversation. At places like Google and the University of California, Berkeley, robots have already used the technique to learn simple tasks like picking things up or opening a door.

All this is why Mr. Amodei and Mr. Christiano are working to build reinforcement learning algorithms that accept human guidance along the way. This can ensure systems don't stray from the task at hand.

Together with others at the London-based DeepMind, a lab owned by Google, the two OpenAI researchers recently published some of their research in this area. Spanning two of the world's top A.I. labs — and two that hadn't really worked together in the past — these algorithms are considered a notable step forward in A.I. safety research.

"This validates a lot of the previous thinking," said Dylan Hadfield-Menell, a researcher at the University of California, Berkeley. "These types of algorithms hold a lot of promise over the next five to 10 years."

The field is small, but it is growing. As OpenAI and DeepMind build teams dedicated to A.I. safety, so too is Google's stateside lab, Google Brain. Meanwhile, researchers at universities like the U.C. Berkeley and Stanford University are working on similar problems, often in collaboration with the big corporate labs.

In some cases, researchers are working to ensure that systems don't make mistakes on their own, as the Coast Runners boat did. They're also working to ensure that hackers and other bad actors can't exploit hidden holes in these systems. Researchers like Google's Ian Goodfellow, for example, are exploring ways that hackers could fool A.I. systems into seeing things that aren't there.

Modern computer vision is based on what are called deep neural networks, which are pattern-recognition systems that can learn tasks by analyzing vast amounts of data. By analyzing thousands of dog photos, a neural network can learn to recognize a dog. This is how Facebook identifies faces in snapshots, and it's how Google instantly searches for images inside its Photos app.

But Mr. Goodfellow and others have shown that hackers can alter images so that a neural network will believe they include things t hat aren't really there. Just by changing a few pixels in the photo of elephant, for example, they could fool the neural network into thinking it depicts a car.

That becomes problematic when neural networks are used in security cameras. Simply by making a few marks on your face, the researchers said, you could fool a camera into believing you're someone else.

"If you train an object-recognition system on a million images labeled by humans, you can still create new images where a human and the machine disagree 100 percent of the time," Mr. Goodfellow said. "We need to understand that phenomenon."

Another big worry is that A.I. systems will learn to prevent humans from turning them off. If the machine is designed to chase a reward, the thinking goes, it may find that it can chase that reward only if it stays on. This oft-described threat is much further off, but researchers are already working to address it.

Mr. Hadfield-Menell and others at U.C. Berkeley recently published a paper that takes a mathematical approach to the problem. A machine will seek to preserve its off switch, they showed, if it is specifically designed to be uncertain about its reward function. This gives it an incentive to accept or even seek out human oversight.

Much of this work is still theoretical. But given the rapid progress of A.I. techniques and their growing importance across so many industries, researchers believe that starting early is the best policy.

"There's a lot of uncertainty around exactly how rapid progress in A.I. is going to be," said Shane Legg, who oversees the A.I. safety work at DeepMind. "The responsible approach is to try to understand different ways in which these technologies can be misused, different ways they can fail and different ways of dealing with these issues."