Digging into Google's search algorithms

I read with interest the Times's recent article digging further into Google's search algorithms and their search quality team. I was most suprised at the number (about 200) and specificity of tweaking factors they use in finding and ordering search results. For example, targeting searches for 'french revolution' to behave more like searches for the specific phrase than for the two words, to avoid too many results about the recent French elections who favored the word 'revolution' and not enough about the 1789 event. I would think with a system of this magnitude, the difficulty of making changes while avoiding unintended consequences would be staggering. I'm also amazed that they can keep adding algorithms and tweaking factors while continuing to serve results in fractions of a second.

It's common knowledge that the heart of Google's technology (or at least, the part of it that they are willing to publish white papers about) is making it easy and reliable to massively parallelize computation of searches and storage of its index of the web. But I would think there's a lower bound to the types of operations that can benefit from being spread across hundreds of computers. It makes sense to use parallel computing to reduce an operation from taking years to taking hours. But if you're using it to go from seconds to microseconds, or smaller, eventually the benefits are going to be swallowed by overhead such as network communication between servers. Some of the factors the article mentions are probably exempt from this limitation. For example the 'freshness' of a web page sounds like something to be stored in the index and computed when the index is recomputed, and therefore not during the 0.25 seconds during which a user is waiting for results. But others, like whether a search term is a brand name like 'Apple' or a non-famous person's name, definitely have to be done in those 0.25 seconds. I wonder how much of the search quality team's efforts are devoted not only to coming up with meaningful tweak factors, but clever ways of computing them with great efficiency.

After reading this article I'm definitely going to pay more attention to my Googling results and how they might reflect the methods discussed in this article.

Comments (1)

scott:

jay-

i have been trying to figure out how to email you or call you and then i realized, "jay has a blog. i should just post a comment." i am living in park slope for a little over a month. have you tried these ramen places in the east village? we should go to setagaya.

-scott

Post a comment

TrackBack

TrackBack URL for this entry:
http://www.autonoetic.com/cgi-bin3.3/mt-tb.cgi/348

 

Archives

Photos

www.flickr.com
mihalis' photos More of mihalis' photos

Colophon

Validation:
XHTML Validation
 
CSS Validation

Feeds:
RSS2
Atom

Powered by Movable Type 3.33
Hosted by Cornerhost