Slashdot Mirror


Building a Fast Wikipedia Offline Reader

ttsiod writes "An internet connection is not always at hand. I wanted to install Wikipedia on my laptop to be able to carry it along with me on business trips. After trying and rejecting the normal (MySQL-based) procedure, I quickly hacked a much better one over the weekend, using open source tools. Highlights: (1) Very fast searching. (2) Keyword (actually, title words) based searching. (3) Search produces multiple possible articles, sorted by probability (you choose amongst them). (4) LaTeX based rendering for mathematical equations. (5) Hard disk usage is minimal: space for the original .bz2 file plus the index built through Xapian. (6) Orders of magnitude faster to install (a matter of hours) compared to loading the 'dump' into MySQL — which, if you want to enable keyword searching, takes days."

2 of 208 comments (clear)

  1. Re:2X by Brian+Gordon · · Score: 5, Informative

    Ahaha, 2.9GB? That's the text alone. Images will net you more than 200GB more. And yes, you do need a LAMP/WAMP and working mediawiki, but it wouldn't take 'days' it would take a few hours max. Also is this guy aware that wikipedia is available on DVD already?

  2. Re:2X by TubeSteak · · Score: 5, Informative

    Also is this guy aware that wikipedia is available on DVD already? Are you aware that the link you pointed to (1) is not the same thing as the link (2) the author pointed to?
    (1) http://schools-wikipedia.org/
    (2) http://download.wikimedia.org/enwiki/latest/

    1 is 4625 articles hand picked for school age children, hence the website name
    2 is a straight dump of wikipedia

    Just imagine my surprise when the schools-wikipedia website didn't have the wiki article on Goatse!
    --
    [Fuck Beta]
    o0t!