archive.org · Domains · Slashdot Mirror

The Larrikin-Wowser Nexus by acb · 2005-12-08 22:59 · Score: 1 · on Australian Senator Wants to Censor the Net

Australia has always had a tradition of repressive, authoritarian government and arbitrary authority. After all, it was a penal colony, and a military outpost of the British Empire, holding the line, and standards had to be enforced. Up until the 1960s or 1970s, a lot of things which would be OK in London or New York were strictly beyond the pale in the big cities of Australia. Australian puritanism (or "wowserism") doesn't have the evangelical, light-on-a-hill idealism of the American variety, but tends to be more of a what-will-the-neighbours-think conservatism.

Mind you, Australia also has an equally old opposite tradition of borderline contempt for authority and propriety; commonly called "larrikinism". This is a country where an armed robber is a national hero, an unofficial (and by far more popular) national anthem is about a sheep thief, and more recently, there were (unofficial) national moments of silence and memorials held for an Australian executed in Singapore for smuggling a huge quantity of heroin. The larrikin streak has made an impression on Australian culture in a number of areas, from an old an ongoing tradition of political mischief to highly-developed scenes for activities such as stencil graffiti and urban exploration.

The downside of the larrikin-wowser dynamic is that there is not much of a centre, and not much of a tradition of liberalism and civil society. Since the 1970s, Australia has become more liberal and cosmopolitan, though that was never enshrined into anything like a bill of rights. Consequently, as soon as a hard-right government got into power, all the de facto institutions of liberalism are being swept away like so many sandcastles on a beach, and the old authoritarianism is showing through.

LMA by NickSD · 2005-12-08 12:17 · Score: 3, Informative · on Review of the Squeezebox

Here's the URL: http://www.archive.org/audio/

Re:WikiLyrics by syukton · 2005-12-07 15:40 · Score: 1 · on Music Should Be Heard But Not Understood

I hit up archive.org to see if they had indexed the site to any depth. No luck in my preliminary search, but I did find this:

http://web.archive.org/web/19990125090702/http://l yrics.ch/

Note the date. Jan 25, 1999. I can't believe it was that long ago.

libtool by Wikipedia · 2005-12-05 10:04 · Score: 0 · on Searchable C/C++ DB surpasses 275 million lines

You might be interested in this:
http://www.advogato.org/article/85.html
which links to the open-source metrics:
http://orbiten.org/ofss/01.html
which is dead but is still on the archive:
http://orbiten.org/ofss/01.html">The link doesn't work!@!#@!@@!

Here is the first table Table 1: Top 10 authors ranked by contribution of code Author % of total free software foundation, inc 11.231 sun microsystems, inc 1.848 the regents of the university of california 1.359 gordon matzigkeit 1.216 paul houle 1.042 thomas g. lane 0.782 the massachusetts institute of technology 0.762 ulrich drepper 0.559 lyle johnson 0.528 peter miller 0.525

Table 1: Top 10 authors ranked by contribution of code Author % of total free software foundation, inc 11.231 sun microsystems, inc 1.848 the regents of the university of california 1.359 gordon matzigkeit 1.216 paul houle 1.042 thomas g. lane 0.782 the massachusetts institute of technology0.762 ulrich drepper 0.559 lyle johnson 0.528 peter miller 0.525 more...

Re:What? by ozmanjusri · 2005-12-04 15:42 · Score: 1 · on Film Documents Software Creation

Ahh, "Blockbuster Movie Syndrome": Everything put on film must be exciting.

Yeah, to me the formulaic blockbuster movies are boring as batshit. Some of the technical films in the Prelinger archives http://www.archive.org/details/prelinger are much more interesting. How can you go past a film like;

Personal Hygiene (Part I) - U.S. Army
Military training drama showing how the residents of a barracks convince a sloppy soldier to clean up his act. With many folk songs on cleanliness.

It's unintentionally hilarious, and there are thousands like it in the collection.

Copy of the javascript? by ctenet · 2005-12-04 12:44 · Score: 1 · on RISK on Google Maps Shut Down

Does anyone have a copy of version 0.9.7 (I think that was the latest one, before it was taken down.) of http://www.ashotoforangejuice.com/jsrisk.js
I tried looking for it on http://web.archive.org/web/*/ashotoforangejuice.co m/* , but obviously had no luck. Is there any chance that Google might be caching it? They certianly have the HTML (which is pretty much worthless) here http://216.239.51.104/search?q=cache:RrdLpS5Pm5IJ: ashotoforangejuice.com/gmrisk.html+site:ashotofora ngejuice.com&hl=en&client=firefox-a , but I don't know whether Google caches javascript files.

By the way, I have 0.9.5 if anyone wants it.

Re:The Dead == The Man by Doc+Ruby · 2005-12-02 20:48 · Score: 1 · on The Grateful Dead vs. Archive.org

You might be swapping "David Gans and Friends" soundboards. But I'm not, like practically all of us. So none of the music we're exchanging was made by Gans. FWIW, the soundboards of Gans' bands are still up on Archive.org . Because there's no market for them. Without the free distribution, Gans wouldn't have the marketing, the same audience, make the same money off the stuff he can charge for.

Re:Be Like Mojo by Braino420 · 2005-12-02 05:56 · Score: 1 · on The Grateful Dead vs. Archive.org

There are many more. No Mojo Nixon though, maybe you should email him about this outlet for his music.

The Deadhead headache for the Internet Archive by Animats · 2005-12-02 05:41 · Score: 1 · on The Grateful Dead vs. Archive.org

From what I hear from some Internet Archive people, the Deadheads have become a headache. The Grateful Dead stuff is a tiny percentage of the Internet Archive, which has petabytes of data, including multiple copies of the whole World Wide Web. But the Deadheads are hogging the bandwidth, and because they hit the same stuff over and over, the Archive bogs down. The Archive was designed as a library, without a big cacheing front end to handle high traffic to a few files. So concentrated traffic in one area slows it down.

The Archive now offers files for streaming, which is a bandwidth hog for music files. People keep playing them again and again. (Especially Deadheads, who are notorous for listening to the same content repeatedly. Possibly due to drug-induced memory degradation.) This is interfering with other queries.

bt.etree.org by ackdesha · 2005-12-02 03:38 · Score: 1 · on The Grateful Dead vs. Archive.org

Many taper friendly bands choose to not allow their shows to be posted to the live music archive.
See the list of those that have opted out here (after the accepted and pending list):
http://www.archive.org/audio/etree-band-showall.ph p
Phish is a good example. They do allow fans to trade their recordings on bt.etree.org as well as other places. You can buy soundboards from their website. I don't think that makes them greedy or in the same class as metallica and others.
That said...the dead archive on etree is just amazing and I hope it stays. I encourage anyone that hasn't ever got the dead to download some of the higher rated shows and give them a chance. Great music to code to.

Jerry wanted the music to be free... by digitaldc · 2005-12-02 02:01 · Score: 5, Informative · on The Grateful Dead vs. Archive.org

"once we're done with [the music], you can have it." - Jerry Garcia
Bassist Phil Lesh echoed that sentiment--quoting Garcia in an interview with Charlie Rose on CBS's 60 Minutes in 2004: "Jerry put it the best, as he frequently did, 'Let 'em have it. When we play it, we're done with it."
from: http://www.archive.org/iathreads/post-view.php?id= 49496

The Dead also released a disclaimer about their live music:
MP3 STATEMENT TO MP3 SITE OPERATORS
The Grateful Dead and our managing organizations have long encouraged the purely non-commercial exchange of music taped at our concerts and those of our individual members. That a new medium of distribution has arisen - digital audio files being traded over the Internet - does not change our policy in this regard.
Our stipulations regarding digital distribution are merely extensions of those long-standing principles and they are as follows:
No commercial gain may be sought by websites offering digital files of our music, whether through advertising, exploiting databases compiled from their traffic, or any other means.
All participants in such digital exchange acknowledge and respect the copyrights of the performers, writers and publishers of the music.
This notice should be clearly posted on all sites engaged in this activity.
We reserve the ability to withdraw our sanction of non-commercial digital music should circumstances arise that compromise our ability to protect and steward the integrity of our work.

Jerry Garcia did not care about people taping or downloading their music, he thought any live show could be shared and traded by anyone for their personal use, but not to copy and sell for profit. I would think the rest of the band would respect his wishes. Long live Jerry.
http://www.people4peace.net/pix/people4peace/jerry -garcia.jpg

Night of the Living Dead by Anonymous Coward · 2005-12-02 01:52 · Score: -1, Offtopic · on The Grateful Dead vs. Archive.org

Get it at archive.org:
http://www.archive.org/details/night_of_the_living _dead

Grateful Dead no longer share-friendly by SuperKendall · 2005-12-01 20:28 · Score: 1 · on First RIAA Lawsuit to Head to Trial

Recently Archive.org was asked to pull recordings of the Grateful dead they had been hosting - all fan recordings.

Archive.org was allowed to resume hosting microphone recordings for shows - but the soundboard recordings (which were all made by fans, not the Dead) are now only allowed to be streamed.

This implies then that if you are sharing soundboard recordings you are doing so against the wishes of the Dead.

Read the spirited comments on the matter here. Some fans are thankful that they are allowed partial freedoms, others upset that all the effort that went into fan soundboard recordings is being withheld form the people that made it.

There is also a petition to sign here to let the band know how you feel about them going back on thier principals.

I looked just now by DanTheLewis · 2005-12-01 16:54 · Score: 1 · on First RIAA Lawsuit to Head to Trial

http://www.archive.org/audio/etreelisting-browse.p hp?collection=etree&cat=Grateful%20Dead

I noticed when the Grateful Dead shows went off but I didn't know why. Now it shows there are 1100 shows back there (still not all of them back). Maybe putting the other 1500 back on is why the archive has been running slowly today.

I'm having trouble clicking through the link now, so who knows what the deal is.

Support share-friendly artists by DanTheLewis · 2005-12-01 16:00 · Score: 2, Interesting · on First RIAA Lawsuit to Head to Trial

The answer to the downloading conundrum is easy.

1. Go to http://www.archive.org/audio/etreelisting-browse.p hp . All the music is legal, live concert, artist permitted, and free. Download Grateful Dead, 311, G Love and Special Sauce, Cracker, Glen Phillips, Andrew Bird, and the Ditty Bops and so on to your heart's content.

2. Listen to commercial-free streaming audio via ITunes (Radio) and other internet media.

3. Reward the artists whose work you enjoy this way by going to their concerts. Reward any artists whose albums you can hear from front to back for free, like Nickel Creek on CMT.com and the Ditty Bops on dittybops.com.

Re:Fear more than greed by hackstraw · 2005-12-01 09:27 · Score: 1 · on RIAA vs Linux and DVDs

This is about power. The record companies want to dictate how you use their product. They cannot get over the idea that once you purchase something it no longer belongs to them.

If this is true, then they just don't get music (as if they ever did or cared to).

Music is like language, it is a part of _our_ culture, not the record execs power trip. Sure, a record company can produce a random artist that looks good and can produce a couple of bubble gum hits, but everybody over 15 knows that is not music, and it will only be a forgoten thing except for later releases like "Greatest hits of the '90s" and a memory on the billboard list. If you don't believe me, go and look back at the "hits" from the 60s and see how many of them are songs that you know or if many of those songs are what you think of as 60s era music.

Music that lasts, lasts for a reason. Look at http://www.archive.org/audio/ for tons of music that is freely available. Look at some of the music trading sites on the net like http://www.dimeadozen.org/. We love music, and it has been a part of the human experience since the first guy beat 2 sticks together.

Like the South Park episode that shows the poor starving record exec and his mansion and private plane or whatever they showed. That is not music. That is business. Both will survive, regarless of there being a "record business".

Re:Slightly easier to build... by Hatta · 2005-11-29 06:24 · Score: 1 · on Air Guitar That Actually Plays!

If you're interested, check out this Flecktones show with guest thereminist pamela kurstin.

Re:pixelfest by Xzzy · 2005-11-28 12:12 · Score: 3, Interesting · on Pictures by Hive Mind

That is pretty cool. I did something of a similar bent a few years back, though with different goals. I wanted to see if people were capable of participating in an art project without being asshats.

http://web.archive.org/web/20021011144257/http://t ru7h.org/society/

Short version is, they couldn't. There were some cool things a few people did (that link is one example), but it was always done by one person and some scripts, rather than a group.

Don't have it up anymore, the way I stored the data was pretty inefficient and was too expensive in terms of CPU time to keep available.

Re:Control by Alsee · 2005-11-26 11:45 · Score: 1 · on Kazaa Forced To Modify Search Engine

I think you misspelled 'archive', chuckle.

Even after fixing that rather ...interesting... misspelling, the link still appears to be broken. Here's a substitute search link with results for the group.

-

Re:Extension of the Blogging Culture by Eric+Giguere · 2005-11-25 07:14 · Score: 2, Insightful · on Podcasting Hacks

Yes and no. Unlike blogs, podcasts are mostly one-way, none of the commenting, tagging and cross-linking that characterizes blogging. Podcasting is another form of content syndication. And yes, the technology is so simple now (I use a Yamaha UW500, a USB audio/midi recorder) that anyone with a computer can record themselves doing all kinds of things and slap it out on the Internet for anyone to see. (A hint to save you some bandwidth: if what you're doing is distributable via a Creative Commons license, you can have the Internet Archive host it for you.)

Recording is easy. The tricky part is figuring out how to best build your feed. Besides the standard RSS tags, look at the iTunes extensions.

Eric
Just put out my first (long!) podcast

Re:Weak by chazwurth · 2005-11-23 03:54 · Score: 4, Interesting · on Hollywood Buddies up with Bram Cohen

Bram Cohen has in fact condoned piracy, at least until mid-2003. Check out this little piece, now removed from his website, but still accessible via wayback: http://web.archive.org/web/20030602145959/bitconju rer.org/a_technological_activists_agenda.html

"I build systems to disseminate information, commit digital piracy, synthesize drugs, maintain untrusted contacts, purchase anonymously, and secure machines and homes...I refuse to work on technology to track users, analyze usage patterns, watermark information, censor, detect drug use, or eavesdrop. I am not naive enough to think any of those technologies could enable a 'compromise'."

He was the last person I'd have expected to deal with the MPAA, given what his rhetoric used to be.

Meta-Modding by Morosoph · 2005-11-23 02:12 · Score: 1 · on How Long to Crack an 'Encrypted' HD?

Insightful (thanks for the reminder, temojen!)

Re:a work of love by Doc+Ruby · 2005-11-21 12:13 · Score: 1 · on 5000 Cylinder Recordings Placed Online

Do they have anything to do with Archive.org's 78s archive? Because I'd love to see a unified archive, with a choice of whichever conservator's GUI I prefer.

Nice of y'all to join us by realinvalidname · 2005-11-21 04:58 · Score: 1 · on How To Write Unmaintainable Code

This is news? Was the poster not aware that Roedy's unmaintainable code doc has been growing for at least five years? http://web.archive.org/web/*/http://mindprod.com/u nmain.html

Re:Out of Touch with an Old Reality by rewinn · 2005-11-21 04:48 · Score: 1 · on The World of Competitive Gaming

>In another 2000 years...

Hmmmm, perhaps I should have said "relatively" perfect & enduring. If the half-life of an AOL disk is 20 years, there will still be several thousand of those buggers functional in 4006, bearing a usuable but embarassing browser.

There is a fundamental difference between physical books and electronic media. In 2000 years, nearly all paper books will have cycled through the biosphere a dozen times, which destroys the information on them. In contrast archived web pages will very likely still exist as information. Perhaps they will be readable only via old browsers but those browsers themselves are only information, similarly archived and available to researchers who care to figure them out.

There is no need to wikify or continually update works such as the Rubiyat, which have reached their final form long ago. Translations of course require continual update as language changes over the centuries, but the original text of Beowulf is the same today as it was 200 years ago (...plus or minus findings of new texts.)

Sounds like Cringely saw a Petabox by Animats · 2005-11-20 06:42 · Score: 5, Insightful · on Google's Secret Plans For All That Dark Fiber?

The Internet Archive's Petabox. is a petabyte of storage in a shipping container. Each rack holds 100 terabytes, and power consumption is 6 KW per rack. Capricorn builds them for the Internet Archive.

Sounds like Google is trying that out.

There's nothing that exotic about this. The military builds racks of electronics into shipping containers all the time. It's mostly a cable management and maintenance access problem. You have to be able to do everything from the front of the rack, which requires some design work but isn't rocket science.

The Petabox? by volsung · 2005-11-20 06:34 · Score: 1 · on Google's Secret Plans For All That Dark Fiber?

This sounds just like the Petabox being designed by the Internet Archive folks. The projected specs are (ripped from the linked page):

Low power-- 6kWatts per rack, and 60kWatts for the whole system
High density-- 100 Terabytes per rack
Local computing to process the data-- 800 low-end PC's
Multi-OS possible, linux standard
Colocation friendly-- requires our own rack to get 100TB/rack, or 50TB in a standard rack
Shipping container friendly-- Able to be run in a 20' by 8' by 8' shipping container

Re:Finally! by slavemowgli · 2005-11-20 01:41 · Score: 1 · on Beginner's Guide to Quantum Entanglement

Not to mention SAP R/3 administration for dummies (which does, in fact, exist!) and Vertex Operator Algebras for dummies (which unfortunately doesn't).

Re:Riddled with errors and unsupported statements. by Halo1 · 2005-11-20 01:34 · Score: 1 · on The Guardian On Intellectual Property

until recently it was entirely clear to the law. Things could have owners and ideas could not.

This is baloney. It's been quite a while since the constitution was written, and right there in Article 1 section 8 clause 8 is the statement by the framers that is the basis for our patent system. Ideas could be owned in 1789, and long before that as well, as England also had a patent system.

Patents (originally) were/are not monopolies on ideas, but on inventions. Those are not quite the same. And originally, all such "inventions" were limited to the physical world. It is only fairly recently that patent offices and courts have started extending what can be protected by patent to the immaterial world.

Even with the latest reform, the USPTO is still paying lip service to the original principle, by demanding a "Concrete, and Tangible Result". Of course, in practice it doesn't exclude much anymore (of course you always want to monopolise real-world actions in the end, and every innovation in the abstract can be applied to the real world if that includes things like "provide a commercial benefit").

And the main problem with these extensions are that they are not based on economic needs, but simply pushed by a small in-crowd who stand to gain from them.

Not to mention the fact that money is an idea, equitable servitudes are ideas, usufructs are ideas, loans are ideas, contracts are ideas, and, now this will really blow your mind --
options on options...

I think you're extending the term "idea" beyond the context in which the author used it. That's easy of course, since "idea" has no legal definition and can be interpreted quite broadly. My interpretation of the article is that the author used idea in a more abstract sense, as in "the idea of using money instead of property", "the idea of lending money" etc.

In this world, size is no protection. It just makes you a more succulent target for enemy lawyers.

I would just like to point out that both sides have lawyers -- this makes it sound like lawyers are the enemy. In fact, lawyers are just the guys that help their clients get what they deserve under the law.

But in general society is better off when less lawyers are needed. After all, (and please don't take this personally) all money that goes into lawyers is money which cannot be invested in useful things (like R&D). It's an overhead cost. And by creating more "rights" you automatically increase the number of lawsuits, license agreements etc.

I'm not saying that a world without rights or lawyers would be ideal, but on the other hand extending rights and adding more rights does increase the overhead and at a certain point starts reducing the overall "justice" and "efficiency" of the system.

People with more money have always been able to hire better lawyers in our legal system, and that problem has nothing to do with intellectual property.

It is an argument to balance the situations in which you may need a lawyer though.

The system is supposed to work this way. It incentivizes companies to research and patent things as fast as they can, pushing the limits of technology, and then disclosing them to the public.

That's the theory, but in practice it doesn't always work that way. Witness e.g. Machlup already saying in the fifties:

If one does not know whether a system "as a whole" (in contrast to certain features of it) is good or bad, the safest "policy conclusion" is to "muddl

Re:Plan 9?! by Anonymous Coward · 2005-11-19 20:16 · Score: 0 · on Space.com's Top 10 Space Movies of All Time

Plan 9 is no longer available from Archive.org
see here
and here
wiki page

This movie was made 1959, just 46-47 years ago, so naturally it's not yet public domain. Please wait additional 500 years and we'll reconsider.

Re:Plan 9?! by Anonymous Coward · 2005-11-19 20:16 · Score: 0 · on Space.com's Top 10 Space Movies of All Time

Plan 9 is no longer available from Archive.org
see here
and here
wiki page

This movie was made 1959, just 46-47 years ago, so naturally it's not yet public domain. Please wait additional 500 years and we'll reconsider.

Some gems from Archive.Org. by MsGeek · 2005-11-18 18:27 · Score: 3, Interesting · on 5000 Cylinder Recordings Placed Online

http://www.archive.org/audio/audiolisting-browsear tists.php?collection=78rpm

A lot of these are transfers from the flat Diamond Discs, not the cylinders dubbed from Diamond Discs. Some of those transfers are pretty freakin' amazing. Lots of history here. Hear Irving Berlin sing. Hear why people raved about Enrico Caruso...makes Pavarotti and Domingo sound like punters. Hear Fanny Brice do her schtick. A lot of what is referred to as "Jazz" is actually more like Ragtime. But that can be pretty amazing too.

I came here looking for cartoony music that had passed into the public domain for my upcoming podcast series The Cartoon Geeks. There's lots of it here. Here's the tune that's going to be the theme music. Yowza yowza.

Some gems from Archive.Org. by MsGeek · 2005-11-18 18:27 · Score: 3, Interesting · on 5000 Cylinder Recordings Placed Online

http://www.archive.org/audio/audiolisting-browsear tists.php?collection=78rpm

A lot of these are transfers from the flat Diamond Discs, not the cylinders dubbed from Diamond Discs. Some of those transfers are pretty freakin' amazing. Lots of history here. Hear Irving Berlin sing. Hear why people raved about Enrico Caruso...makes Pavarotti and Domingo sound like punters. Hear Fanny Brice do her schtick. A lot of what is referred to as "Jazz" is actually more like Ragtime. But that can be pretty amazing too.

I came here looking for cartoony music that had passed into the public domain for my upcoming podcast series The Cartoon Geeks. There's lots of it here. Here's the tune that's going to be the theme music. Yowza yowza.

Re:Our style! by cwry · 2005-11-16 15:37 · Score: 1 · on What Workplace Coding Practices Do You Use?

You are correct that it's a good rule of thumb to just never use identifiers that start with an underscore, but there are exceptions.

From http://web.archive.org/web/20040209031039/http://o akroadsystems.com/tech/c-predef.htm#Groups:

Respect that first entry in the table below: never make up any identifier that starts with an underscore.

(Actually, you can legally use an identifier that starts with an underscore if the second character is a lower-case letter or a digit, and the identifier is used inside a function or a function prototype or as a structure member or label. Easier just not to use leading underscores!)

The parent post uses them inside a function and the second character is lower-case.

Free music by bahgheera · 2005-11-13 11:52 · Score: 0 · on Sony's EULA Worse Than Its Rootkit?

Ah.. there's too much free (legal!) music on the internet nowadays to warrant shelling out cash for 'pop' music from the big dogs.

http://freealbums.blogsome.com/
http://www.archive.org/audio/netlabels.php
http://www.magnatune.com/

Laters...

What are they doing that is so expensive? by jbn-o · 2005-11-12 18:52 · Score: 1 · on A Tool to Tally Podcast Listeners

There are ways to cut down the costs—archive.org (the Internet Archive) will host any file, allow unlimited downloads, and mirror it internationally over reasonably fast connections for free. 6GB of transfer and 400MB of storage space can be had online for $12/month (and I'm guessing plenty of /. readers know better deals than that). This is certainly a lot of storage for some fixed (X)HTML+CSS and an RSS file. If one can reliably get free Internet access whenever one needs to upload files, one could make a nice site that is regularly updated and features an RSS feed for less than $80/month.

So, I'm not entirely convinced that one needs to have ads here.

OCA and PG scratching each others' backs by TTK+Ciar · 2005-11-12 08:23 · Score: 2, Insightful · on Human-Powered Internet Archive Book Project

The focuses of OCA and PG are really quite different: PG is most interested in preserving the essential information of a book (ie, its text), while OCA's interest is in preserving the form of the book (ie, its fonts, pages format, coloration, even down to the yellowing of the pages). That having been said, there's a lot each can do for the other (and has!).

The Archive has archived most of PG's material, because even though the Books department of The Archive is focussed mostly on preserving books, The Archive as a whole is interested in preserving just about any information it can, and the PG data is definitely of interest.

When the The Archive's Scribe software processes the book images into its various format (jpg, djvu, pdf, flippy, et al), it OCR's the book's text. This text then becomes part of generating some of the other formats. It will be really trivial for PG to obtain this text for any book it wants to incorporate into their dataset.

qv: intlepisode00jamearch. The interesting files here are intlepisode00jamearch.txt which is just the OCR'd text, and intlepisode00jamearch_djvu.xml which is the OCR'd text with layout information (which has been useful to me in developing software which auto-corrects some OCR errors -- where the text is on the page often offers valuable hints for choosing the right heuristic for guessing the right text).

A quick side note on the differences between Google's and OCA's efforts that I haven't seen talked about much -- Google's main advantages in their bookscanning efforts are their wealth and fame, while The Archive's main advantages are experience, familiarity, and scanning technology.

Traditional book-scanning technologies are expensive and slow (which makes doing a lot of books, fast, that much more expensive, because you have to hire more people to do more books in parallel), but Google has enough money to throw at the problem that this is less of an issue. Google's fame means they can bring powerful partners onboard with a smile and a handshake, including some of the most prestigious libraries in the nation.

The Archive has been involved in scanning books and making them available online for several years now (qv The Million Books Project). This experience has shaped the processes used in the acquisition and scanning of books, as well as the technology used in their storage, indexing, and presentation. Furthermore, libraries around the world have grown familiar with The Archive over the years. That, and The Archive's good track record, make it a powerful rallying point for partnerships and alliances, and have given it more experience in facilitating such relationships. Finally, partially due to the limits of existing book-scanning solutions, and partially due to The Archive's limited budget, it has facilitated the development of two independent low-cost, reliable, high-quality book-scanning systems: The Scribe (developed in-house at The Archive) and the Kirtas Robot (developed at Kirtas, a Canadian company).

Many of the books scanned for the Million Book Project using traditional scanning methods are really lousy, sometimes to the point of being unreadable. These new scanning systems dramatically improve the quality of the end product, while equally dramatically reducing the cost-per-page. This means that more scanning systems can be purchased for more libraries (avoiding the per-library capital outlay problem), and more books can be scanned more quickly within a given budget.

Obviously, Google and OCA can benefit from co-operation, as each has a lot to offer the other. I'd be surprised if Google didn't join the OCA, eventually, if for no other reason that to gain access to the books of the >100 OCA