Slashdot Mirror


Have 100GB Free? Host Your Own Copy of Wikipedia, With Images

First time accepted submitter gnosygnu writes "Want your own copy of English Wikipedia with images? Got 100 GB of disk space? Then open-source app XOWA may be of interest to you. The project released torrents yesterday for the 2013-11-04 version of English Wikipedia. There's 100 GB of sqlite databases containing 13.9 million pages, and 3.7 million images — readable from any Windows, Linux, or Mac OS X system. Image downloads for other wikis are building, but you can still use XOWA to read the text-only version for other wikis like Wiktionary, Wikisource, Wikiquote and 660 more. Next time you find yourself stranded without the internet, you can pull out your own copy of Wikipedia for use."

28 of 151 comments (clear)

  1. Article Ownership by Russ1642 · · Score: 5, Funny

    It comes with software that automatically reverts your edits and insults you.

    1. Re:Article Ownership by bradorsomething · · Score: 5, Funny

      It comes with software that automatically reverts your edits and insults you.

      Citation Needed.

    2. Re:Article Ownership by sjwt · · Score: 5, Funny

      It comes with software that automatically reverts your edits and insults you.

      Citation Needed.

      1 http://news.slashdot.org/comments.pl?sid=4488409&cid=45527247

      --
      You have 5 Moderator Points!
      Which Helpless Linux zealot/MS basher do you want to mod down today?
  2. Finally! by lagomorpha2 · · Score: 4, Funny

    Finally I can have my own version of wikipedia so I can correct all those changes I haven't been allowed to enter into the official version!

    1. Re:Finally! by BringsApples · · Score: 2

      You say that and laugh, but wait until someone that manages their own DNS, and with an evil intention gets a good idea...

      --
      Politics; n. : A religion whereby man is god.
    2. Re:Finally! by K.+S.+Kyosuke · · Score: 2

      Finally I can have my own version of wikipedia so I can correct all those changes I haven't been allowed to enter into the official version!

      Or you could just switch to using Conservapedia.

      --
      Ezekiel 23:20
    3. Re:Finally! by Arancaytar · · Score: 4, Insightful

      That's pretty much impossible to get into now (as a new editor), because you're either banned for being too sane to pass ideological purity, or banned for being so insane you're mistaken for a troll.

    4. Re:Finally! by lgw · · Score: 2

      Ah, a Wikieditor/fanboy. Admit it: you will be torrenting this 100GB copy just so you can delete every article, then do it all again.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    5. Re:Finally! by SimonTheSoundMan · · Score: 2

      You've always been able to download every page and image. Am I missing something?

      http://dumps.wikimedia.org/

  3. It's that time of year again by Anonymous Coward · · Score: 2, Funny

    Does it include the seasonal donation nag banners?

    Holidays are coming! Holidays are coming!

  4. Re:Google Fiber by MrDoh! · · Score: 2

    Alas, the terms and conditions will forbid you running a server to do this. They'll want you to use one of their cloud servers to do it (that kinda makes more sense to put something like that further upstream).

    --
    Waiting for an amusing sig.
  5. Re:No internet connection required! by parkinglot777 · · Score: 2

    You are right. That's a silly summary they put on. They should say something like 'No Internet connection required while browsing/searching through the wiki' (one of their feature).

    Navigate between offline wikis. Click on "Look up this word in Wiktionary" and instantly view the page in Wiktionary.

  6. Quite a bit smaller than I'd have thought. by caveat · · Score: 3, Interesting

    I'd have put en.wikipedia at at least a couple of terabytes. Not inconceivably large, but with some housecleaning I could actually get 100GB free.

    --

    Facts do not cease to exist because they are ignored. - Aldous Huxley
    1. Re:Quite a bit smaller than I'd have thought. by CastrTroy · · Score: 2

      I'm thinking this must be compressed data. Clicking through, it says that there 20 GB of text data, and 13.9 million articles. This only gives 1.4 KB per article. Which seems extremely small, especially if you're getting all the formatting data. Also remember, I'm pretty sure this doesn't contain all the revision data, only the current version of each article, so the amount of data at Wikipedia would have to be quite a bit larger.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    2. Re: Quite a bit smaller than I'd have thought. by O('_')O_Bush · · Score: 2

      Well, if they were pulling only text content, 1.4kB would actually be pretty close to correct. Using averagr characters/word, 1.4kB would be 350 words of text, which is not far off the estimated 400 words/article as calculated in 2005. I'd expect now it would be 450/article, but still not unreasonable depending on the types of articles added since 2005 (I.e., if every town has their own 1 sentence blurb).

      --
      while(1) attack(People.Sandy);
  7. Re:Rats. It won't QUITE fit on a microSD card... by vux984 · · Score: 5, Funny

    Rats. It won't QUITE fit on a microSD card...

    Just exclude the star trek / star wars related entries; that should pare it down. And besides we all have it all committed to memory anyway right? :p

  8. legitimizing torrents by stenvar · · Score: 4, Insightful

    That's a good thing. The more we use torrents for the distribution of legitimate content, the more such distribution methods will become legitimized.

  9. Re:Rats. It won't QUITE fit on a microSD card... by Anonymous Coward · · Score: 3, Informative

    ...yet. But I guess most phones won't easily read sqlite databases yet, either. I suppose it won't kill me to lug around a full-sized SD card.

    Still looking forward to the library-of-Congress-on-a-card from Rainbows End.

    Most phones _won't_? Four out of five smartphones today have sqlite preinstalled and ready for use: http://developer.android.com/reference/android/database/sqlite/package-summary.html

  10. Re:As a long long time editor... by Sarten-X · · Score: 2

    And yet you commented only 16 minutes after the AC...

    --
    You do not have a moral or legal right to do absolutely anything you want.
  11. Re:As a long long time editor... by dmbasso · · Score: 2

    As a long long time editor...

    Look at the quality of information.

    I agree, you did a terrible job. Please, quit editing!

    --
    `echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
  12. Re:2013-11-04 by NoNonAlphaCharsHere · · Score: 2

    YYYY-MM-DD is the only date scheme where filenames sort ASCIIbetically. Kinda useful if you have a lot of copies of something.

  13. Re:No internet connection required! by jellomizer · · Score: 4, Funny

    I prefer ZModem myself.

    But if you don't have that you can probably use XModem.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  14. Re:2013-11-04 by QilessQi · · Score: 5, Informative

    Actually, ISO 8601 dates (YYYY-MM-DD) are unambiguous: far better than the ambiguous AA/BB/YYYY notation, since Americans interpret it as MM/DD/YYYY but in some other countries it's regarded as DD/MM/YYYY.

    As an added plus, a lexical sorting of YYYY-MM-DD dates is also a temporal sorting. Not so with either of the other two formats.

    http://en.wikipedia.org/wiki/ISO_8601

  15. When was that version copied? by hcs_$reboot · · Score: 2

    I hope it's when the previous pope (Ben #16) was pictured as Master Yoda in Wikipedia.. missed that :-)

    --
    Slashdot, fix the reply notifications... You won't get away with it...
  16. Re:Rats. It won't QUITE fit on a microSD card... by jeffb+(2.718) · · Score: 2

    Yeah, I was misremembering the line:

    "The British Museum and Library, as digitized and databased by the Chinese Informagical Coalition. The haptics and artifact data are lo-res, to make it all fit on one data card. But the library section is twenty times as big as what Max Huertas sucked out of UCSD. Leaving aside things that never got into a library, that's essentially the record of humanity up through 2000. The whole premodern world."

    128PB, 97% in use.

  17. Don't Panic by Covalent · · Score: 4, Interesting

    Next year or so 100GB phones will be commonplace...and you will have your Hitchhiker's Guide.

    Truly amazing times we live in.

    --
    Great warrior...hrmph! Wars not make one great.
  18. Revisions? by hendrikboom · · Score: 3, Interesting

    Presumably the wikipedia is under revision control.
    Does this give you the whole thing so that you can forever after sync with the master?
    Or just the most recent versions of the articles?
    Should there be a bittorrent for syncing huge revision control data bases?

  19. already did this ( today, text version only ) by vikingpower · · Score: 2

    just pulled the most recent english-language wikipedia dump, and made elasticsearch ( via the wikipedia river plugin ) run over it. 13.9 million entries now on a small server, answering times ~ couple-of-millisecond order. elasticsearch rocks !

    --
    Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace