Slashdot Mirror


Building a Fast Wikipedia Offline Reader

ttsiod writes "An internet connection is not always at hand. I wanted to install Wikipedia on my laptop to be able to carry it along with me on business trips. After trying and rejecting the normal (MySQL-based) procedure, I quickly hacked a much better one over the weekend, using open source tools. Highlights: (1) Very fast searching. (2) Keyword (actually, title words) based searching. (3) Search produces multiple possible articles, sorted by probability (you choose amongst them). (4) LaTeX based rendering for mathematical equations. (5) Hard disk usage is minimal: space for the original .bz2 file plus the index built through Xapian. (6) Orders of magnitude faster to install (a matter of hours) compared to loading the 'dump' into MySQL — which, if you want to enable keyword searching, takes days."

27 of 208 comments (clear)

  1. Wow! by ferrocene · · Score: 2, Funny

    After doing all that, I think you may have missed your flight! :)

    --
    Most folk'll never lose a toe, and then again some folk'll...
  2. Re:Why? by rabblerabble · · Score: 5, Funny

    I'll bite...Unfortunately, I don't have a basement, so therefore there are times that I am required to venture into the outer realm that happens to be heated by the big ball of gas known as Sol, as opposed to a pump ;P Seriously though, this is exactly what I have been looking for. What better way to show up your friends when they cry "You're wrong, google it!" knowing that there is no connection possible within twenty miles. Next time i'm drunk at the beach and someone wants to pretend to know the history of coffee harvesting, it's on.

  3. Ho-Hum ... by jabberwock · · Score: 5, Funny
    What, no auto update? No User Agreement? No disabled features that are enabled by a mammoth key? No product registration?


    Let us know when you're ready for prime time ... ;-)

    1. Re:Ho-Hum ... by MichaelSmith · · Score: 2, Funny

      How do you keep the data up to date without downloading the entire 2.9G again?

      Not too hard if you have a sub-etha net connection handy. Better check that the article about The Earth which you have been working on hasn't been cut down to two words though.

  4. Take that, Mr Obviously A. Troll! by ampathee · · Score: 5, Funny

    Programmers shouldn't be wasting time on these trivial, pointless projects. We need their work in other more important projects!
    Hah! I'm going to start work on (let's see..) a random lolcat generator now, just to piss you off.
    1. Re:Take that, Mr Obviously A. Troll! by SoapDish · · Score: 2, Funny

      Make sure to write it in LOL-CODE! (http://lolcode.com/)

    2. Re:Take that, Mr Obviously A. Troll! by MarkRose · · Score: 4, Funny

      You mean something like lolcatgenerator.com? Looks like someone already tackled that important project! lol

      --
      Be relentless!
  5. Just settle it the old way by EmbeddedJanitor · · Score: 4, Funny

    Kick sand in their face!

    --
    Engineering is the art of compromise.
    1. Re:Just settle it the old way by rabblerabble · · Score: 3, Funny

      The goggles would work then. Your logic is flawed.

  6. I hope by Nikron · · Score: 4, Funny
    That you don't dump the wiki at a bad time.

    George W Bush

    Is a dick head!!!!11

    --
    Disclaimer: Disregard the above post.
    1. Re:I hope by Anonymous Coward · · Score: 5, Funny

      You mean before someone makes it inaccurate again?

      Oh, nevermind, I see the problem:

      George W Bush

      Is a dick head!!!!11

      should be

      George W Bush

      Is a dick head!!!!!!

      Man, those out to mess with the content are getting more and more subtle...

  7. But... by Anonymous Coward · · Score: 2, Funny

    What's the point of it if there are no vandals or flame wars to make it interesting?

  8. Hitchhiker's guide here we come! by Brietech · · Score: 5, Funny

    Combine this and one of the new E-ink ebook readers, make it pretty rugged, slap a solar panel on the back and man. . . you have something really close to a genuine hitchhiker's guide to the galaxy. Ah, I love where technology is heading =)

    --
    I'm perfect in every way, except for my humility.
    1. Re:Hitchhiker's guide here we come! by Sneftel · · Score: 4, Funny

      As long as hitchhikers primarily need to know how to evolve a Pikachu into a Raichu, and how Benjamin Disraeli has been referenced in pop culture.

      --
      The opinions stated herein do not necessarily represent those of anybody at all. Deal with it.
    2. Re:Hitchhiker's guide here we come! by RandomWhiteMan · · Score: 5, Funny

      You laugh now, but just wait until you're stranded in the middle of Blackheath England, needing a ride from a conservative British History Scholar who has his son with him playing Pokemon Gold. Won't be so smug then, will you. I bet you won't even have your towel on you when this all goes down.

    3. Re:Hitchhiker's guide here we come! by Gromius · · Score: 5, Funny

      Yes its a perfect fit. Particularly as Wikipedia has now supplanted the Encyclopedia Britannica in many places as the standard repository of all knowledge and wisdom. Although it has many omissions, contains much that is apocryphal, or at least widely inaccurate, it scores over the older more pedestrian work in two important ways.

              * 1. It is slightly cheaper
              * 2. It has the words "You can copy and edit me for free" inscribed in large friendly letters in the license.

      Also like the guide, although it cannot hope to be useful or informative on all matters, it does make the reassuring claim that where it is inaccurate, it is at least definitively inaccurate :)

    4. Re:Hitchhiker's guide here we come! by nstlgc · · Score: 5, Funny

      Just so we're clear, you can make Pikachu evolve into Raichu by using the Thunder Stone (which makes sense, since they're Electric Pokémon). However, due to the emotional value Pikachu has to trainers, most of them choose not to evolve him. Some Pokémon games even plain don't allow this. I hope this was helpful.

      --
      I'm Rocco. I'm the +5 Funny man.
  9. Only 2 days huh by Anonymous Coward · · Score: 2, Funny

    I was able to build this in two days, most of which were spent searching for the appropriate tools. Simply unbelievable... toying around with these tools and writing less than 200 lines of code, and... presto!
    Give that man a job at Google.
    1. Re:Only 2 days huh by Anonymous Coward · · Score: 1, Funny

      Don't you mean ChaCha?

    2. Re:Only 2 days huh by dmdavis · · Score: 2, Funny

      Sorry, but he never states that his product is in beta.

  10. Re:Uh.... by Gazzonyx · · Score: 4, Funny

    I love programming useless things just for the challenge.

    Have you ever worked on a project called "Clippey", by chance?
    No, he said he has a love for programming; not a seething hatred for users. Besides, everyone knows programmers only hate admins. ;) On behalf of the programmers, I'd like to say that this isn't true we love our admins. Who else makes sure that our connections*&#^$: Connection Reset By Peer
    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

  11. Re:Uh.... by Gazzonyx · · Score: 2, Funny

    So you can settle trivial arguments with your friends when away from an internet connection, duh!

    (Or to always have something to read on your laptop while traveling - this is what I would use it for) I bet you're quite the ladies man, huh?
    Sorry, I couldn't resist!
    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

  12. Re:Just hope you don't get an effed image. by gad_zuki! · · Score: 3, Funny

    Yes, the paper encyclopedias are missing all the anime trivia. Christ, its embarassing to see "references in pop culture" sections which just spell out every geeky guy stereotype. I dont know why those people dont get banned. Everything in existance has an anime reference. That is unsettling.

  13. Linda Mack! by Anonymous Coward · · Score: 1, Funny

    I would be concerned that Slimvirgin and the other intelligence agent(s) might not be able to revert and ban the edits I would be making offline. Maybe Jimbo can give them authority to come rough me up at home and beat my lcd with a hammer.

    http://yro.slashdot.org/article.pl?sid=07/07/27/19 43254

  14. What?? by icydog · · Score: 5, Funny

    TFA is:

    1. Not a thinly-veiled attempt to advertise a crappy product
    2. Not bashing Microsoft
    3. Not about somebody who is trolling open-source (i.e. SCO)
    4. Not about Bush taking away all our rights and ending freedom
    5. Not about voting fraud and the end of democracy/America/the world
    6. Not decrying Vista DRM and its ties to the MAFIAA
    7. Posted on Slashdot

    Furthermore, TFA is interesting and informative.

    Am I in heaven?

    1. Re:What?? by Pollardito · · Score: 4, Funny

      it'll get posted again tomorrow just to maintain expectations

  15. Re:Just hope you don't get an effed image. by ZzzzSleep · · Score: 2, Funny
    Blatantly stolen from David Morgan-Mar.

    In many of the more relaxed corners of the Outer Eastern Rim of the Internet, Wikipedia has already supplanted the great Encyclopaedia Britannica as the standard repository of all knowledge and wisdom, for though it has many omissions and contains much that is apocryphal, or at least wildly inaccurate, it scores over the older, more pedestrian work in two important respects.

    First, it is slightly cheaper; and secondly it has the words "anyone can edit" inscribed in large friendly letters on its cover.