Three months after acquiring Yahoo, Verizon is giving away the secrets of Yahoo’s search engine. Today, Oath, the Verizon-owned company born of the merger between AOL and Yahoo, released the source code of a data-crunching tool called Vespa, which has long-powered search and other features across the Yahoo empire. Now that it’s open source, any company or individual can use or modify Vespa to power its own products or websites.
Open sourcing its search technology might sound a little quaint, given that these days Yahoo actually uses Microsoft’s Bing to power most of its web searches. But Vespa underlies searches within Yahoo, on sites like Flickr, which hosts millions of images. Yahoo also uses Vespa to power related-article recommendations and ad-targeting on many Yahoo-branded sites, including Yahoo News, Yahoo Sports, Yahoo Finance, and its advertising network. Oath systems architect Jon Bratseth says Vespa processes billions of requests per day.
Vespa’s history traces back to the Norwegian search engine AlltheWeb, which Yahoo acquired in 2003. After the acquisition, the AllTheWeb team started retooling its search technology into a more general purpose tool that Yahoo developers could use internally to power different applications. The code has been almost completely rewritten since those early days.
By making Vespa open source, Oath VP of engineering for big data Peter Cnudde says the company hopes to replicate the benefits it has reaped from supporting Hadoop, an open-source software framework for managing big data. Yahoo hired Hadoop co-creator Doug Cutting in 2006, and paid other engineers to work on it as well. Eventually, Hadoop was adopted by the likes of Facebook, Twitter, eBay, and many others, whose employees added features and fixed bugs. As more people used Hadoop, it became easier for Yahoo to recruit people who were already familiar with the software. Cnudde says Oath hopes Vespa will follow the same path.
Hadoop isn’t as good as Vespa for returning real-time results. And many real-time processing tools, such as Apache Storm, aren’t designed to serve results to end users. So Oath uses Vespa, Hadoop, and Storm together. Until now, Vespa hasn’t been available to developers outside of Oath, Yahoo, and Yahoo Japan.
“We would have loved to do it earlier,” says Cnudde. “But open source doesn’t come for free. You have to write the documentation, make sure it’s acceptable, and be ready to manage a community.”
It’s unclear whether there’s demand for Vespa outside of Oath. Hadoop was born open source, and came along just as companies needed it. But most large-scale internet companies have already solved the web-search problems that Vespa was designed to address. Plus, there are several open-source search engines available, including Solr and ElasticSearch. And let’s face it: the Yahoo brand has seen better days. But for new and growing companies, Vespa might just fill an important niche.