Monday, May 19, 2014

400 billion... The Wayback Machine now has more pages than there are stars in our galaxy (and here's how they manage that)

High Scalability - A Short On How the Wayback Machine Stores More Pages than Stars in the Milky Way

How does the Wayback Machine work? Now with over 400 billion webpages indexed, allowing the Internet to be browsed all the way back to 1996, it's an even more compelling question. I've looked several times but I've never found a really good answer.

Here's some information from a thread on Hacker News. It starts with mmagin, a former Archive employee:

...

image

..."

How awesome is that? If you're interested in the story behind the storage/indexing/etc used by the Wayback Machine, read this...

No comments:

Post a Comment

NOTE: Anonymous Commenting has been turned off for a while... The comment spammers are just killing me...

ALL comments are moderated. I will review every comment before it will appear on the blog.

Your comment WILL NOT APPEAR UNTIL I approve it. This may take some hours...

I reserve, and will use, the right to not approve ANY comment for ANY reason. I will not usually, but if it's off topic, spam (or even close to spam-like), inflammatory, mean, etc, etc, well... then...

Please see my comment policy for more information if you are interested.

Thanks,
Greg

PS. I am proactively moderating comments. Your comment WILL NOT APPEAR UNTIL I approve it. This may take some hours...