Greg Hewgill ([info]ghewgill) wrote,

wikimedia dumps

The Wikimedia download server has been offline for a few months recently because of disk space issues. It's been back up and running for a few weeks now, and just recently finished dumping the English Wiktionary database. I have downloaded the database and refreshed my Wiktionary to dict gateway to 2008-11-12 (the previous update was 2008-06-13).

The English Wikipedia dump is currently in progress and today has an estimated completion date in June 2009. I expect it to be done sooner than that, because the first articles that are dumped are the oldest ones and those usually have more revisions. This seems to cause the time estimation algorithm to overestimate the amount of time remaining (it was estimating July 2009 last week).

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    Your reply will be screened

    Your IP address will be recorded 

  • 2 comments

[info]decibel45

November 15 2008, 17:02:21 UTC 3 years ago

What's sad is that if they'd use Postgres, the last time I looked at their usage stats they should be able to run all updates off one server and load-balance reads to read-only slaves. That would mean they could just use pg_dump and be done with it.

Shit, we have a 800GB database and could certainly dump it in less than a week.

[info]ghewgill

November 15 2008, 19:16:53 UTC 3 years ago

I think most of the time is actually taken by running compression - they either bzip2 or 7zip all output files. But it's hard to tell because everything is conflated into one run.

Not that I'm saying using Postgres wouldn't be a good idea. :)
Create an Account
Forgot your login or password?
Facebook Twitter More login options
English • Español • Deutsch • Русский…