Google Caffeine: 9/11 marked a turning point
Web indexing transitions from months to seconds
By Nancy Gohring | Published: 14:58, 10 June 2010
Newshounds and marketers may still debate whether Google's Caffeine, which now delivers search results from updated sites within seconds, is fast enough. But it wasn't too long ago that it was acceptable for Google to update its index only once every 30 days.
It turns out that the attacks of September 11, 2001, marked a turning point in Google's progression toward offering near-real-time web updates with Caffeine, its latest web indexing system, which was introduced on Tuesday. The events spurred Google to focus more on immediacy.
On September 11, Google News didn't exist. The search giant - which at the time had only been around for three years - wasn't returning the latest news reports as they appeared online.
Related Articles on Techworld
But immediately after the attacks, CNN.com and other news sites had trouble keeping up with demand. As Google was able to access those sites, it began posting cached versions of them because it had the bandwidth to support the visitors, said Matt Cutts, head of Google's web spam team. "Over the course of several hours, we had useful content where people couldn't otherwise get to it because other sites were down so much," he said.
Google would have taken down the cached sites if the original sites had requested it, he said.
Its experience of the demand after September 11, in part, led to the creation of Google News. It was also the impetus for a renewed focus on immediacy at the company, he said.
"That was a real wake-up call, where we said we have to pay a lot of attention to freshness. We knew that before, but we thought 30 days was pretty good," he said.
With Caffeine, as Google crawls the web, it immediately indexes the updated information. Google had been crawling a fraction of the web every night and then indexing the new information in a batch. Before that, Google was updating its index every 30 days, and initially it only did so every four months.
This week at Search Marketing Expo in Seattle, Google announced that Caffeine is now live. It's already apparent when Google is displaying immediate changes.
One blog decided to test it. After posting a story with an unusual word in the headline, I4U News found that the story appeared in just over a minute in Google search results.
Information included in Google searches that is indexed immediately using Caffeine will say that it was posted "seconds ago", Cutts said.
Not every change on every site will appear immediately, though. Google looks at factors such as page rank to determine which sites to crawl faster, Cutts said. It also checks news sites and blogs more often than other sites, he said.
Google is also beginning to take advantage of new tools to find out when sites are updated. Pubsubhubbub is an open-source tool that blogs are now using to essentially ping Google when their sites are updated. Google can then add the updated page to its index.