Have 100GB Free? Host Your Own Copy of Wikipedia, With Images
First time accepted submitter gnosygnu writes "Want your own copy of English Wikipedia with images? Got 100 GB of disk space? Then open-source app XOWA may be of interest to you. The project released torrents yesterday for the 2013-11-04 version of English Wikipedia. There's 100 GB of sqlite databases containing 13.9 million pages, and 3.7 million images — readable from any Windows, Linux, or Mac OS X system. Image downloads for other wikis are building, but you can still use XOWA to read the text-only version for other wikis like Wiktionary, Wikisource, Wikiquote and 660 more. Next time you find yourself stranded without the internet, you can pull out your own copy of Wikipedia for use."
It comes with software that automatically reverts your edits and insults you.
Finally I can have my own version of wikipedia so I can correct all those changes I haven't been allowed to enter into the official version!
Does it include the seasonal donation nag banners?
Holidays are coming! Holidays are coming!
...yet. But I guess most phones won't easily read sqlite databases yet, either. I suppose it won't kill me to lug around a full-sized SD card.
Still looking forward to the library-of-Congress-on-a-card from Rainbows End.
Alas, the terms and conditions will forbid you running a server to do this. They'll want you to use one of their cloud servers to do it (that kinda makes more sense to put something like that further upstream).
Waiting for an amusing sig.
When the supercold storm blasts through your town, your device will freeze. And I'll still be able to read the pages of my Universalis as I tear them to burn them for heat.
You are right. That's a silly summary they put on. They should say something like 'No Internet connection required while browsing/searching through the wiki' (one of their feature).
Navigate between offline wikis. Click on "Look up this word in Wiktionary" and instantly view the page in Wiktionary.
I'd have put en.wikipedia at at least a couple of terabytes. Not inconceivably large, but with some housecleaning I could actually get 100GB free.
Facts do not cease to exist because they are ignored. - Aldous Huxley
I suggest a website like say wikipedia.org
Wow .. you must be a walking encyclopedia or have a lot of spare time ... 19 minutes of clicking on 'random' failed to turn up anything I could claim to know about in depth.
Respect to you !
Time for bed, said Zebedee - boing
That's a good thing. The more we use torrents for the distribution of legitimate content, the more such distribution methods will become legitimized.
And yet you commented only 16 minutes after the AC...
You do not have a moral or legal right to do absolutely anything you want.
This is a news-for-nerds site. It’s reasonable to assume dates are in ISO format. :)
As a long long time editor...
Look at the quality of information.
I agree, you did a terrible job. Please, quit editing!
`echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
YYYY-MM-DD is the only date scheme where filenames sort ASCIIbetically. Kinda useful if you have a lot of copies of something.
I prefer ZModem myself.
But if you don't have that you can probably use XModem.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Actually, ISO 8601 dates (YYYY-MM-DD) are unambiguous: far better than the ambiguous AA/BB/YYYY notation, since Americans interpret it as MM/DD/YYYY but in some other countries it's regarded as DD/MM/YYYY.
As an added plus, a lexical sorting of YYYY-MM-DD dates is also a temporal sorting. Not so with either of the other two formats.
http://en.wikipedia.org/wiki/ISO_8601
Koans and fables for the software engineer
> XOWA is a free, open-source application that lets you download Wikipedia to your computer. No internet connection required!
This is supremely impressive; download Wikipedia without an internet connection!
Cereal boxes have more accurate information than slashdot too.
year month day, so it can be sorted easy.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
WANTZ UNCYCLOPEDIA
"Flyin' in just a sweet place,
Never been known to fail..."
Be... without internet? *screams*
Gamingmuseum.com: Give your 3D accelerator a rest.
I hope it's when the previous pope (Ben #16) was pictured as Master Yoda in Wikipedia.. missed that :-)
Slashdot, fix the reply notifications... You won't get away with it...
This will be great for offline/remote/low speed situations. Imagine being on a merchant ship or even a cruise ship with a pricey connection package. Scientific expeditions etc.
How about preloading it on OLPC?
What if your high school kid can't do his homework without getting distracted online, but says he needs Wikipedia for research. Bam, here's your air-gapped PC son.
what I call backup on the cloud
You can't run a commercial(non-personal) server. But you can run a server for friends and family. http://arstechnica.com/information-technology/2013/10/google-fiber-now-explicitly-permits-home-servers/
Next year or so 100GB phones will be commonplace...and you will have your Hitchhiker's Guide.
Truly amazing times we live in.
Great warrior...hrmph! Wars not make one great.
Presumably the wikipedia is under revision control.
Does this give you the whole thing so that you can forever after sync with the master?
Or just the most recent versions of the articles?
Should there be a bittorrent for syncing huge revision control data bases?
Because all offline wikipedia readers require you to download the wikipedia dump, and the english wiki isn't dumped that often, and this is wiki converted to HTML with downscaled images as far as I can understand.
just pulled the most recent english-language wikipedia dump, and made elasticsearch ( via the wikipedia river plugin ) run over it. 13.9 million entries now on a small server, answering times ~ couple-of-millisecond order. elasticsearch rocks !
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
http://www.wps.com/FidoNet/source/Fido-FidoNet/Fido-12u-29Oct1991/MYLIB/ZMODEM.C
Enjoy.
Yeah, even worse, the scripting runtime on Windows auto-parses AA/BB/YYYY into Date types, but it defaults to USA regardless of system locale... unless it can't be interpreted as a valid date.
If you enter
12/02/1999
That's the second of December, regardless of actual system locale...
13/02/1999
And that's the 13th of February (possibly just in locales like GB).
Not sure if this has ever been fixed, but it was a royal PITA when I used to do ASP classic pages.
You can put it on a neighbornode without violating any terms of service. Your internet connection would only be needed to download updates.
Any sufficiently unpopular but cohesive argument is indistinguishable from trolling.
I've been mirroring a local copy of Wikipedia for a long time, with images. What's new about this app compared to the dozens of others that already do this?
I was wondering when I could replace my CD of Encarta 96.
But I thought SQL wasn't webscale wtf?
Korma: Good
Wikipedia is only so entertaining if you are stranded somewhere with no other way to pass the time.
Now, if they give us a torrent of the complete TVTropes site....
That's ALL it takes up?? My goodness! Wikipedia can fit on my largest USB drive?? haha.. I expected it to be in the multi-TB range!
I agree that ISO 8601 is much better, but people will still put the year last in informal usage no matter how much you try to convince them otherwise. Among the countries that I've visited (not an exhaustive list obviously), only the US (usually) uses "/" as the separator. The others usually use "." or "-". And only the US has the month first. So an informal convention that usually works for me when there is ambiguity is to interpret "/" as meaning month first, anything else day first.
Is two more than X!
I am missing checksums to verify the download. It seems sourceforge has the tendency to change stuff.
nosig today
I've seen some online specifications in the format YY/MM/DD or maybe it's YY/DD/MM, practically impossible to determine for the past 14 years.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
This is really a cool thing to have as an option. 100G isn't that much today when a TB might cost you 30 bucks.. ( rather surprised its that small... ) and with how 'vunerable' everything is on the net today it wouldn't hurt to have an archive before the next take down notice or commercial buy-out. ( or shut-down due to loss of funding )
---- Booth was a patriot ----
Yes. Plenty of ways to do that. Google is your friend.
You can even buy a handheld piece of hardware ( that runs forth! ) if you like. http://www.thewikireader.com/
---- Booth was a patriot ----
How is this different from wikitaxi which has been available for years. http://www.yunqa.de/delphi/doku.php/products/wikitaxi/index
Dumps for Wikitaxi typically don't have images. Though it is a great tool.
Can I have a slightly smaller copy without the images and references?
Use Wikitaxi (Windows only, works in Wine): http://www.yunqa.de/delphi/doku.php/products/wikitaxi/index
Get dumps from here:
http://dumps.wikimedia.org/enwiki/
look for: pages-articles.xml.bz2
You have to process the dump. One I did earlier in the year resulted in a 15GB file.
How do I download it if I don't have an internet connection? Does this require special hardware?
Order Wikipedia on DVD, from Wikipedia themselves. http://dumps.wikimedia.org/dvd.html
If all you want is an offline Wikipedia reader, just use Kiwix. It uses the ZIM format which was created specifically for offline use and runs on Win/Mac/Linux/Android or anything else if you want to compile it yourself.
While the full English Wikipedia ZIM sans pictures is a bit old (January 2012), it has the benefit of being only 10GB and split up into 2GB chunks so it will fit on a FAT32 device like your phone's SD card.
That was a typo. Was supposed to say 10 minutes. Maybe not long enough !
Time for bed, said Zebedee - boing
Add the UK to that list.
No colour or religion ever stopped the bullet from a gun
Direct Dialup connection. That is how I downloaded files before I had Internet access.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.