Microsoft developers cheered on Thursday when the company unveiled its latest image-recognition technology, which is coming soon to Windows 10 devices. The moment underscored Microsoft's investment in artificial intelligence (AI) and its determination to keep up with Alphabet and Apple.
Using the new Story Remix app -- which is meant to replace the standard-issue Windows 10 Photos app -- Microsoft executive Lorraine Bardeen showed how users can add animations on top of videos that then move along with the action. She selected a video of a girl dribbling a soccer ball and then superimposed a fireball on top of the soccer ball as it was sailing toward the net.
In addition to editing videos, the app will let people search for photos and videos based on the people, places and videos they contain -- just like the Google Photos app that came out in 2015 and Apple's most recent Photos app from last year. Microsoft began working on the app a year and a half ago.
"What you're basically seeing is everyone in the industry is taking advantage of deep learning and how it's revolutionizing how we build products," said Chris Pratley, a Microsoft corporate vice president, in an interview. Deep learning is a type of AI that involves training computers on data such as photos and allowing the computers to make inferences about new data.
The app relies on several Microsoft technologies. For one thing, it uses the Cognitive Toolkit open-source framework for deep learning. Microsoft trained the system on a shared cluster of graphical processing units (GPUs) in its Azure cloud, and it looked to the company's research group for technology that can recognize and then follow objects in videos.
The app works because Microsoft has so much data. For example, the company looked to prisons for images of people as they aged over time. Users also granted Microsoft access to photos and videos to help the company's engineers build out the system.
"We certainly don't use anyone's data to train on unless they give us permission," Pratley said. From there, Microsoft employees contributed to the knowledge of the neural network by confirming that tags of photos were correct.
Once Microsoft has sufficiently trained the neural network, it runs on Windows 10 PCs, using the power of those computers.
"This is one of the things that makes PCs great: I can use local computation and the GPU to analyze every frame of a video," Pratley said. "Once we know pictures of a soccer ball in every frame, it's trivial to re-render video. The hard part was even knowing it was an object and then computing all of that. That's very hard to do on a phone."
Microsoft has previously used deep learning to translate spoken words in Skype and to write descriptions of photos in PowerPoint presentations. Now the company is applying the technology to a wider swath of personal content. Over time the system might well get smarter with the help of users' donated data, Pratley said.