Facebook A.I. researchers figured out how to make avatars look like they're playing music

  • Facebook relied on YouTube videos of piano and violin recitals to teach artificial neural networks about the connection between body gestures and music.
  • From there, they got the systems to generate videos of avatars just based on audio.
  • The researchers think the observed connection could be "very promising" for augmented reality and virtual reality.
Facebook CEO Mark Zuckerberg
Getty Images
Facebook CEO Mark Zuckerberg

Facebook artificial intelligence researchers have come up with a novel way to make cartoonish avatars look like they're really playing musical instruments. The work could lead to fascinating augmented reality or virtual reality experiences in the future.

In a new paper, Facebook research scientists Eli Shlizerman and Ira Kemelmacher-Shlizerman and collaborators Lucio Dery and Hayden Schoen talk about how they trained AI systems using YouTube videos of piano and violin recitals. They then used the trained systems to make avatars move their hands and fingers with fake instruments based solely on audio recordings, with the help of Apple's ARKit AR software for developers.

Teaching machines to understand how people move is an active area of AI research and one that has been explored by researchers at other technology companies, including Google and Microsoft. But generally, that requires video feeds. The achievement here was to try to act on audio alone — even if the results weren't perfectly realistic.

"We believe the correlation between audio to human body is very promising for a variety of applications in VR/AR and recognition," the researchers wrote.

Facebook continues to push VR through its Oculus branch, including with the Spaces app that represents users with avatars, and earlier this month the company gave developers new tools to build AR features for Facebook's apps.

The researchers believe that they could improve the accuracy of the avatars in the future partly by tapping MIDI files from people playing music, or data from sensors hooked up to musicians.