Yahoo is giving a critical piece of internal technology to the world -- just like it did with Hadoop

Key Points
  • Yahoo is open-sourcing an internal tool called Vespa, which it uses for content recommendations, ad serving, and executing certain searches.
  • Vespa is arguably Yahoo's biggest open-source software release since Hadoop in 2009, which formed the basis for two now-public companies, Hortonworks and Cloudera.
  • Companies like Amazon, Facebook, and Google could find it useful.
Yahoo is giving a critical piece of internal technology to the world
Yahoo is giving a critical piece of internal technology to the world

Oath, the Verizon-owned parent company of Yahoo, is releasing for free some of its most important internal software, which the company has long used to make recommendations, target ads and execute searches.

The Vespa software solves a common but surprisingly difficult problem: quickly figuring out what to show a user in response to input, like when they type text into a box. Oath uses it in around 150 applications, including Flickr, Yahoo Mail and the main Yahoo search engine (specifically for components like entities, local results, images and answers to questions). It handles 3 billion native ad requests every day.

"The typical case is you don't know what you want to serve, but you have 20 billion pictures and you want to find the right ones," Jon Bratseth, a distinguished architect at Yahoo who led Vespa's development, told CNBC in an interview.

Vespa, which is now live on GitHub with an Apache 2.0 open-source license, can easily be added to different applications, making it suitable for use at big companies like Amazon, Facebook and Google that need to do different kinds of processing on different sets of data.

The release is the most important for Yahoo since it open-sourced the code for the Hadoop big data software in 2006. Hadoop has since come to be at the center of two public companies, Cloudera and Yahoo spin-off Hortonworks. Today people at lots of companies can contribute to technology that's still widely used at Yahoo, and build their own systems using Hadoop.

How Yahoo built it

Big tech companies regularly open-source their software. But if there's powerful software at the heart of a company's biggest revenue centers, it can take a while to come out into the open, and Vespa is no different.

Vespa dates back to the early 2000s. Yahoo already had web search technology, first through a partnership with Google and later through its 2002 Inktomi acquisition. What Yahoo didn't have was technology for delivering search results and recommendations on content that falls outside traditional web search results.

In 2003 Yahoo acquired Overture, which included its partner AltaVista as well as a lesser known search engine called AllTheWeb.com. After the deal, the roughly 30 AllTheWeb people were given a year to build software that could perform certain functions quickly before web pages were shown to end users. The system also needed to be easy to set up, run and tweak, so that it could be applied to a variety of applications without much trouble.

In around 2005, the AllTheWeb team worked with Yahoo's shopping team to adopt the new system. It required less management time, freeing up staffers to build new features.

"After that, we had a proven use case -- and that was a complicated one," Bratseth said. "More and more teams in Yahoo started using our system by themselves, because it made business sense. They would offload a lot of the problems they had to take care of themselves."

So Bratseth's team started expanding the powers of Vespa. They made it capable of handling input other than users' strings of text; over time it could also personalize content based on what users had clicked on in the past, which is valuable in cases when users haven't typed in anything. They also changed Vespa so that it could take direction from machine-learning algorithms.

Yahoo distinguished architect Jon Bratseth.
Source: Yahoo

For the last five years, the Vespa group, based in the Norwegian city of Trondheim, has gone through the code and rewritten different parts to make the whole thing work better, Bratseth said.

They still have more work to do -- they're looking to integrate Vespa deeply with TensorFlow, the Google-led open-source artificial intelligence software framework.

Last week while Bratseth was visiting Oath headquarters in California, he met with people from Yahoo's Flurry division. They wanted to start using Vespa because they believed it could power a new feature for generating revenue: letting mobile app makers start running "house ads" that can promote their own offerings.

Poola Sreenivas, a software engineering director who has been at Yahoo for nine years, asked Bratseth -- today one of 10 or so AllTheWeb employees who remain at Oath today -- if there was a way to see how long it takes for Vespa to do its thing. Because at web scale, every extra millisecond that a consumer has to wait matters.

Vespa can hook into Oath's internal monitoring software, Bratseth said, but it also has a way to connect with third-party monitoring tools. "So you can do metrics and so on," Bratseth said.

Now people outside Oath can use the code as they wish, and get help from the very people who made it.

Related video: What is Hadoop?

What is Hadoop?
What is Hadoop?