Thursday, October 05, 2006

Download Wikipedia?

Brad Smith ::: MSFT - How to download Wikipedia

"So you're looking for some dummy data?  Well how about downloading the wikipedia???!! 

There are over 2 million pages on the wikipedia.  Don't try to crawl the site, it won't let you.  No robots allowed!

Go to http://download.wikipedia.org and you'll see a list of all the databases.  If you're looking for the English one it's "enwiki".  Then you can choose to download a whole bunch of stuff ... but the file you generally want to download is "pages-articles.xml.bz2"..."

I don't know why, but downloading Wikipedia just seems too cool not to do... (so of course I am ;)

For "safe" sample/dummy data, I've used/downloaded Project Gutenberg files in the past too...

2 comments:

  1. Hi Greg,

    I wanted to ask you something that is completely unrelated to your entry. I need your 80s expertise. Do you remember a video from the early 80s (probably 83 or 84) that had really cool looking mermaids in it? They were draped over rocks in the ocean and had clam shells they used as makeup compacts. I don't remember the song or the name of the band but they were kind of New Orderish. I think the first line of the song goes something like "Waves washing over me..." but don't quote me on that. Believe it or not I have this somewhere on a betamax video tape but don't know where and I definitely don't have the betamax machine any longer. Any help would be greatly appreciated.

    Lori May

    ReplyDelete
  2. I seem to vaguely remember that video, but that's about it.

    I did the google/YouTube thing (as I'm sure you did too) but didn't see anything that struck a chord...

    I did find this page, http://www.nwoutpost.com/videos.asp
    but I don't know if it will help any (I was looking for just a text list of music video's from the 80's...)

    I wish I could have been of more help...

    ReplyDelete

NOTE: Anonymous Commenting has been turned off for a while... The comment spammers are just killing me...

ALL comments are moderated. I will review every comment before it will appear on the blog.

Your comment WILL NOT APPEAR UNTIL I approve it. This may take some hours...

I reserve, and will use, the right to not approve ANY comment for ANY reason. I will not usually, but if it's off topic, spam (or even close to spam-like), inflammatory, mean, etc, etc, well... then...

Please see my comment policy for more information if you are interested.

Thanks,
Greg

PS. I am proactively moderating comments. Your comment WILL NOT APPEAR UNTIL I approve it. This may take some hours...