Slashdot Mirror


Squeezing a Wikipedia Snapshot Onto an 8GB iPhone

blackbearnh writes with this excerpt from O'Reilly Radar "Think about Wikipedia, what some consider the most complete general survey of human knowledge we have at the moment. Now imagine squeezing it down to fit comfortably on an 8GB iPhone. Sound daunting? Well, that's just what Patrick Collison's Encyclopedia iPhone application does. App Store purchasers of Collison's open source application can browse and search the full text of Wikipedia when stuck in a plane, or trapped in the middle of nowhere (or, as defined by AT&T coverage...)"

35 of 169 comments (clear)

  1. Survey of Human Knowledge? by benwiggy · · Score: 3, Insightful
    "Wikipedia, what some consider the most complete general summary of human knowledge we have at the moment."

    There. Fixed that for you.

    1. Re:Survey of Human Knowledge? by L4t3r4lu5 · · Score: 5, Funny

      "Wikipedia, what some consider the most complete general summary of human knowledge[citation needed] we have at the moment."

      There. Fixed that for you.

      There. Fixed that for you.

      --
      Finally had enough. Come see us over at https://soylentnews.org/
    2. Re:Survey of Human Knowledge? by mikael_j · · Score: 4, Interesting

      In all seriousness, I'm starting to get extremely annoyed by what is IMHO flagrant abuse of the [citation needed] tag on Wikipedia, I don't know how many times I've seen it used in situations where it just wasn't needed. And I don't mean in "But anyone who spends all day working on FOO knows that BAR!" situations but more along the lines of "The earth orbits the sun[citation needed]." or even better "Sir NameOfArticle was in his day frequently regarded as a national hero in $COUNTRY.[citation needed]. <Six paragraphs that detail, with plenty of sources, exactly how famous Sir NameOfArticle was.>".

      I've actually begun wondering if maybe there are certain individuals who are deliberately trolling Wikipedia by adding [citation needed] in places where it just doesn't belong and then sit around giggling as they read the discussion pages of various articles they've messed with.

      /Mikael

      --
      Greylisting is to SMTP as NAT is to IPv4
    3. Re:Survey of Human Knowledge? by larry+bagina · · Score: 2, Insightful

      It also seems completely random and arbitrary. If they need citations, then they need citations on every sentence/idea/paragraph that isn't general knowledge. Maybe a bot goes though and randomly adds them.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    4. Re:Survey of Human Knowledge? by dimeglio · · Score: 2, Insightful

      Look, this is how it works: I'm asked to find out what COBIT is. Naturally, I google it and find a hit in Wikipedia. From there, I get a fairly comprehensive idea of what it might be - with external links that I don't bother to click. I then explain to the team what COBIT is and tie it in with our business objectives. The team then might want to investigate further or get certification if it is a requirement for the job.

      Now, I'm not sure what you mean by trust. I do trust that the information I gathered on COBIT is as accurate as I needed it to be at that time. Now, if I am an MD and need to prescribe a drug to a patient, I certainly would not trust Wikipedia for the dosage. I'd look it up in a medical reference on that medication and published by the drug company.

      So for 90% of decisions, IMO Wikipedia is no worse than using last year's edition of the Encyclopedia Britannica.

      --
      Views expressed do not necessarily reflect those of the author.
    5. Re:Survey of Human Knowledge? by Anonymous Coward · · Score: 2, Funny

      "[[Wikipedia]], what some[who?] consider[weasel words] the most complete general summary of [[human knowledge]][which?][citation needed] we[who?] have at the moment[weasel words]."

      There. Fixed that for you.

      There. Fixed that for you.

      There. Fixed that for you.

    6. Re:Survey of Human Knowledge? by An+Onerous+Coward · · Score: 3, Insightful

      [citation needed]

      Seriously though, it's a really useful meme to tap in the middle of a debate. It's a reminder that those who seek to convince must bring evidence, and that anyone can post anything they like. When used to tag a claim that a) is very unexpected or counterintuitive, b) should be citable, and c) is central to the opponent's argument, then demand the citation. If it's tangential or a matter of opinion, then yeah, it's bastardish.

      --

      You want the truthiness? You can't handle the truthiness!

    7. Re:Survey of Human Knowledge? by atraintocry · · Score: 2, Insightful

      If there is a pattern, it's that the person who put [citation needed] didn't necessarily agree with the preceding statement, but either (a) didn't want it to turn into a conflict or (b) didn't feel like doing the research and (perhaps rightly) decided that the person making the claim should do it.

      I think that in the aggregate they turn the tone of WP into something that's very passive-aggressive. But individually they are harmless, just pointing out the obvious ("here is a statement that is unverified").

      Where I see lots of [citation needed]s is in articles that tend to be biographical or concerning an artistic work or work of entertainment. Average Pop Star's #1 fan will copy a bunch of stuff from APS-fan-forums.org and someone else will come along and think, "what is all of this (crummy) original research doing here?"

      If they deleted the material, other forum members will keep reverting. But if they add [cn], most people know that if they're going to remove that tag then they'd better have a citation handy.

      Obviously in very popular or contentious articles, they don't stay there as long, because more people are willing to go out and find citations that match their point of view. The only way to trump someone you disagree with in WP-land is by finding more evidence. Which is exactly how it should be. So despite their passive-aggressive side, I tend to see the [cn] as a sign that the system is working somewhat, albeit slowly.

      But seriously I'd nuke half of the articles on WP if I had the authority. WP can take page views away from a site that actually *is* accurate. And there's still copy-paste jobs going on. A WP article, by virtue of being able to draw from multiple sources and have multiple editors, should be more accurate than the sources it draws from. When it's not, it does people a disservice since it's going to show up first in Google whether or not it's any good.

  2. Not a Problem by Blue+Stone · · Score: 4, Funny

    This is easily doable.

    Once you trim the earth reference down to "Mostly harmless".

    --
    Corporation, n. An ingenious device for obtaining individual profit without individual responsibility. - Ambrose Bierce
    1. Re:Not a Problem by Dishevel · · Score: 2, Funny

      This is easily doable.

      Once you trim the earth reference down to "Mostly harmless".

      What reason did you have for adding "Mostly"?

      --
      Why is it so hard to only have politicians for a few years, then have them go away?
    2. Re:Not a Problem by LifesABeach · · Score: 2, Funny

      And the words, "Don't Panic" should be easily viewable.

  3. What a total geek.... by ColdWetDog · · Score: 4, Insightful

    1. Goes to foreign country - one that he has never visited before
    2. Doesn't have wireless access.
    3. Instead of wandering about the country he spends most of his time programming ("Then basically, I spent a significant fraction of my time there in Japan, again, in 2007 writing those applications") an application so he can look up stuff about the country he isn't spending much time actually visiting.

    I bow before you sir. Awesome.

    --
    Faster! Faster! Faster would be better!
    1. Re:What a total geek.... by pzs · · Score: 4, Interesting

      You're right that this guy has flown the geek flag pretty high here; however, at least it's to some useful purpose. There are all kinds of facts about a country that are quite hard to discover just wandering about in it, and Wikipedia would be the ideal candidate to answer them.

      Last time I went on holiday (to Australia) I came back with a dozen questions I wanted answering, just because I didn't have internet access while I was out there; Wikipedia access would answer many of these questions. Examples:

      • I heard that Beds Are Burning was about the Australian aborigines - I never knew this before and wanted to look up more details on it.
      • As a result of that, I wanted to know far more information about how well aborigines were integrated in Australia at the moment. Answer: badly, but again hard to find out just by wandering around in Australia and difficult to raise with a random Aussie.
      • Australia is experiencing a lot of drought at the moment, but while we in Sydney, it rained quite a few times. I wanted to know more about the drought and what parts of the country it was affecting.
      • ...

      I could answer these questions by going into an internet cafe, but this isn't always possible. A portable Wikipedia sounds like a great idea.

    2. Re:What a total geek.... by dbcad7 · · Score: 2, Insightful

      When I go on vacation to a country like that, I will buy a travel guide. I also spend a great deal of time researching prior to traveling. Having Wikipedia available 24/7 would be nice if I went off the grid that I had planned, but not life changing. Wikipedia is not that great as a travel guide. It "might" cover some things to see, and rarely things to do, but is more geared towards questions like those you asked in your post. Those type of questions are easily answered later without taking anything away from your trip.. What good is it to you though if you visit a city that has a bar with mermaids swimming in a shark tank and the wiki entry tells you nothing about it ? .. sure you will know the annual rainfall, and population density.. but your going to be pissed when you meet someone later who's been there and asked you about the mermaid shark tank bar.

      --
      waiting for ad.doubleclick.net
  4. Oblig by TinBromide · · Score: 4, Funny

    xkcd comic reference

    Yeah, pretty much you're turning your iphone into a hitch hiker's guide to earth, or at least america and europe if you can manage to squeeze wiki-travel onto it.

    --
    Is it sad that I am more likely to recognize you and your posts by your sig than your name or UID?
  5. Re:iPhone apps for computers by Filip22012005 · · Score: 5, Funny

    That problem has recently been solved. With the recent addition of sms-sharing, you could use any iPhone remotely.

    --
    When the policeman of the tie, rule you violate, hello punishment of the kitty?
  6. Nothing new by Hrshgn · · Score: 4, Informative

    This is nothing new. Wikipedia has been available for several years now in MDict format: http://www.octopus-studio.com/product.en.htm

    1. Re:Nothing new by Sentry21 · · Score: 2, Insightful

      Given the trouble Patrick had squeezing down a full DB dump of Wikipedia to fit into 2GB (for the app store), I find it impossible to believe that the 162 MB files I've found so far for Wikipedia in MDict format are anywhere near the full text (which Patrick's app is).

    2. Re:Nothing new by tomthepom · · Score: 2, Informative

      You're not looking hard enough. Wikipedia has also been available in Tomeraider format for a while now.

  7. Better by Anonymous Coward · · Score: 2, Informative

    And for those preferring accuracy and editorial responsibility :

    http://www.ipodnn.com/articles/08/02/27/britannica.on.iphone/

  8. Re:iPhone apps for computers by FlyingBishop · · Score: 4, Interesting

    FTFA:

    But I released the code to this application; it was open source from the very start. So it was pretty easy for them to take it and to port it to the OLPC.

    Already done.

    However, I'm not sure that I want precisely what this iPhone app is. It strips out references, and from the sound of things also the discussion pages. I'd say about 1/2 of articles I check the discussion pages to see what's really going on. Also he says he strips a lot of the metadata, and obviously images, none of which are things I"d want to give up (some of the metadata might be superfluous, but if I'm copying Wikipedia onto my computer, I want to copy Wikipedia onto my computer.)

    I understand there are licensing issues with images, but even so, the SVG ought to be safe. And that wouldn't add as much of a disk space hit as the gifs, etc.

    One of the other issues is the timing of Wikipedia dumps. They only do text-only dumps, and according to the article they only happen once every few months. It would be nice to implement an image review policy, and figure out a way to allow for mirrors (or just some increased bandwidth at Wikipedia HQ) so that we can actually have the entire English Wikipedia, regularly snapshotted and compressed, available for download. And really, for that kind of thing a 3-month or even yearly turnaround would be well worth the wait.

  9. Wikipedia has an entry on the Kama Sutra ... by Daniel+Dvorkin · · Score: 4, Funny

    ... so clearly this app will never make it through Apple's review process.

    --
    The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
  10. Re:Another step closer by TheRaven64 · · Score: 4, Insightful

    No. The Kindle supports online access to Wikipedia, but this requires a network connection. The iPhone supports the same. A while ago someone created a cut-down version of Wikipedia which you could browse completely offline on the iLiad. It sounds like someone has ported this to the iPhone, and because it's now on the iPhone it's news.

    Putting Wikipedia snapshots on portable devices is interesting. I don't really see why you'd do it with an iPhone; the iLiad takes CF cards, so you can just keep a 16GB CF card for Wikipedia and not fill up space you'd otherwise use for something else, but the iPhone's storage isn't expandable so it's a strange thing to want to do. The text of Wikipedia is not that big. A complete (uncompressed) copy is 200GB, but that includes all revision history and user pages. The current version of the English Wikipedia is around 4GB of text. This leaves another 4GB for filling up with images.

    --
    I am TheRaven on Soylent News
  11. Re:Nice! by Starayo · · Score: 3, Informative

    The filesize of the app is about 2GB. Pretty amazing!

    I'd be grabbing it right now if I didn't only have ~350MB of free space left on my iPhone...

    Would be a great app for iPod Touch users.

    --
    Ezekiel 23:20
  12. XML Compression by firefarter · · Score: 4, Interesting

    So, I'm reading here that they convert the XML into proprietary metadata and compress that.

    Why not use EXI (Efficent XML Interchange) http://www.w3.org/XML/EXI/ which has been tested as more efficient that gzip and requires less memory to parse? Especially since the XML processing can remain the same, since the nodeset is the same.

  13. EXCELLENT app, but limited by jbarr · · Score: 2, Interesting

    I've been using this app for quite a while on my 1st gen iPod Touch, and it works and works well. It's amazing just how many articles it has. Other than some cosmetic and minor feature issues, the only real limitation is that Apple limits data file size to 2GB, so there is an obvious limit as to how much can go into the file. But it is amazingly complete. No images, no fancy tables--just text articles at your fingertips.

    If you Jailbreak your iPhone/iPod Touch, then an excellent alternative is the Wiki2Touch app. Unfortunately, it seems that it's been pretty much abandoned in development, so it may be hit-or-miss if it works on OS v3.x. This implementation was REALLY slick. It provided a 4GB data file (that was much more complete) and a small Web server. You enabled the Web server, fired up Safari, and pointed it to a local URL. The app presented quick and very readable articles. And if you went to the trouble to download and process, you could also add about 4GB of image files to make things more complete (on a larger-capacity device, of course.)

    Here's a review that I posted for both apps just over a year ago on my iPod Touch Tips site:
    http://jimstips.com/ipod-touch-tips/ipod-touch-review-wikpedia-on-your-ipod-touch.html

    In both cases, the main complaint is updating. In order to update the data file, you have to re-download the data, and depending on the app, you are typically at the mercy of the developer to provide an update. Otherwise, you had to download, index, and install the HUGE files yourself.

    If you absolutely HAVE to have updated, offline data, check out the Wikipanion app. It's a nice compromise.

    --
    My mom always said, "Jim, you're 1 in a million." Given the current population, there are 7000 of me. God help us all!
  14. Re:Complete human knowledge? by Daniel+Dvorkin · · Score: 4, Insightful

    [citation needed]

    I'm not really kidding. Your anti-Wikipedia rant is entertaining, but it doesn't provide any substance. Speaking for myself, when I go to Wikipedia for a refresher on something I already know about, I'm generally pleased with the quality of the results, which makes me think that the articles on subjects I don't know much about are likely to be pretty good too.

    Your line about "political correctness and facts washed out of existence by human insecurities" provides a clue as to what really bothers you about Wikipedia: reality's well-known liberal bias. Unless you can provide specific examples, with citations, it's reasonable to assume that the Wikipedia groupmind knows more about the way things really work than some random dude on /.

    --
    The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
  15. Wikibooks is for how-to guides by tepples · · Score: 3, Informative

    Hell, I was flipping through an encyclopedia from the 40's, and under "Dynamite", it had detailed instructions on how to MAKE it yourself

    Wikipedia doesn't have how-to guides. If you want that, use Wikibooks.

  16. Re:Profits by twoshortplanks · · Score: 3, Informative

    He is; It's detailed on the info for the app in iTunes. Since you need iTunes to read that, I'll simply post a screenshot: http://img.skitch.com/20090703-e7kkm8i7f4wdq9ir92td898wr3.jpg (skitch may eventually delete that image after a while...)

    --
    -- Sorry, I can't think of anything funny to say here.
  17. Advice by DissociativeBehavior · · Score: 2, Funny

    App Store purchasers of Collison's open source application can browse and search the full text of Wikipedia when stuck in a plane

    This page is not recommended when you're stuck in a plane...

  18. Warning: 3 majors problem with this app. by Anonymous Coward · · Score: 5, Informative

    I bought this application 6 months ago and there are 3 majors problems with it:
    1) The search function is broken because you need to type the exact word (prefix)
    2) This is plain text: no pictures and no tables so most articles with "list" are useless
    3) No update mechanism so the dump used will be outdated soon.

  19. Web Version by SnarfQuest · · Score: 2, Funny

    Is there a version of this that will run in a web browser? Anyone have a link?

    --
    Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
  20. Re:Profits by Me!+Me!+42 · · Score: 3, Interesting

    I wonder exactly what "portion of the proceeds" go through to the Wikimedia Foundation?
    I hate when companies don't just come out and say it explicitly. It makes me think they might just be paying a penny on the dollar so they can play the "philanthropy" card. I like that Target Corp clearly states that "5% of our profits" go to charity (admittedly, much of this may be in the form of product donations, but still.)
    http://en.wikipedia.org/wiki/Target_Corporation#Philanthropy

    --
    -- My apologies if the above facts contain any opinions, or vice versa! --
  21. Re:Nice! by SuperKendall · · Score: 2, Informative

    So stick a bigger SD card in it already.

    He can't, can you just loan him your mobile SD capable device that can run the app?

    Oh that's right...

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  22. Just use the mobile-formatted version by daemonenwind · · Score: 4, Informative

    try this link from your mobile phone:
    http://wapedia.mobi/en/

    That way you get the whole thing, up-to-date, and with no trouble or major memory usage.