Google To Digitize, Make Available British Library's Historical Holdings

← Back to Stories (view on slashdot.org)

Google To Digitize, Make Available British Library's Historical Holdings

Posted by timothy on Monday June 20, 2011 @09:43PM from the to-be-young-was-very-heaven dept.

pbahra writes with part of an excellent story at the WSJ: "The British Library today announced its first partnership with Google, under which Google will digitize 250,000 items from the library's vast collection of work produced between 1700-1870. The Library, the only British institution that automatically receives a copy of every book and periodical to go on sale in the United Kingdom and Ireland, joins around 40 libraries worldwide in allowing Google to digitize part of its collection and make it freely available and searchable online, at books.google.co.uk and the British Library website, www.bl.uk. ... As well as published books, the 1700-1870 collection will also contain pamphlets and periodicals from across Europe. This was a period of political and technological turmoil, covering much of the Industrial Revolution, the French Revolution, the introduction of UK income tax and the invention of the telegraph and railway. All of these topics are covered, as are the quirkier matters of the day, such as the account, from 1775, of a stuffed hippopotamus owned by the Prince of Orange."

66 of 86 comments (clear)

Min score:

Reason:

Sort:

But the IMPORTANT question is... by Serious+Lemur · 2011-06-20 21:48 · Score: 3, Funny

What will Apple and Facebook do? They can't afford a British literature gap!
1. Re:But the IMPORTANT question is... by c0lo · 2011-06-20 22:17 · Score: 2
  
  Nah. Wrong question. The really important one is: books being useful to as many as possible? TFA:
  
  Speaking at the official launch, Kristian Jensen, the Library’s head of Arts and Humanities, said: “This process allows books to fulfill their original aim of being useful to as many people as possible.”
  I thought that is already understood: the copyright should be extended forever, for the profit of the grand-grand-...-grand children of the author (too bad if the author sold the rights to the publisher... but it's irrelevant for the usefulness of books, isn't it?).
  Besides, digitization comes with the risk of exposing these "as many" to words, facts and attitudes that are quite sensitive today. I hope that Google will take note: even more recent pieces needed a "translation" to make them politically correct.
  Again: can we let the Tea Party and Michele Bachmann be hurt if indiscriminate digitized papers of the time showed that the founding fathers did own slaves (and, possibly, more than own)?
  </sarcasm>
  
  --
  Questions raise, answers kill. Raise questions to stay alive.
2. Re:But the IMPORTANT question is... by digitig · 2011-06-21 00:11 · Score: 1
  
  So if you cut and paste it then you might be breaching copyright. If you retype the paragraph you are citing then I don't see how you can be. I suspect the real reason for the non-commercial clause is to stop people publishing and selling paper versions directly from the digitized versions.
  
  --
  Quidnam Latine loqui modo coepi?
3. Re:But the IMPORTANT question is... by buchner.johannes · 2011-06-21 00:12 · Score: 3, Informative
  
  Here is a talk by librarian Brewster Kahle on book archiving. He created the Internet Archive internet.org.
  With Google, its important to make a contract so that the content is really open to all.
  
  --
  NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
4. Re:But the IMPORTANT question is... by sgt+scrub · 2011-06-21 02:48 · Score: 1
  
  Again: can we let the Tea Party and Michele Bachmann [nowpublic.com] be hurt if indiscriminate digitized papers of the time showed that the founding fathers did own slaves (and, possibly, more than own [monticello.org])?
  It will work more to their benefit. Books during those times were filled with Cristian fanaticism and bigotries. In fact. It will be hard to tell Bachmann, and quotes from any t.b/rep, from the books.
  
  --
  Having to work for a living is the root of all evil.
He who controls the past by Anonymous Coward · 2011-06-20 21:52 · Score: 1

Controls the future..
great! by Anonymous Coward · 2011-06-20 21:52 · Score: 1

No doubt there'll be plenty of "ZOMG GOOGLE IS TAKING OVER" comments but this is brilliant. There's so much archived information in Britain that is supposedly public but actually costs a fortune to research as you have to travel to wherever it's stored then pay an archivist to take you into the vault and find the papers etc.
1. Re:great! by Intrepid+imaginaut · 2011-06-20 23:09 · Score: 1
  
  Absolutely, I'm delighted at this. On a much tinier scale I've been poring over the reefs of old notes I have to create the rpg in the sig there, I wish someone would offer to digitise that lot for me :/ I keep catching myself looking for the search button in the paper notebooks.
2. Re:great! by martin-boundary · 2011-06-20 23:14 · Score: 1, Interesting
  
  Brilliant is a tall order. Judging from the quality of the scans of old books that are available on Google already, this will be a waste of time.
  Older documents and books are notoriously difficult to scan - as it gets old, the paper starts to disintegrate and the ink fades away, and because the books are valuable, people have to be much more careful how they open and handle them.
  Bottom line is that old books need to be scanned at much higher resolution AND the blotches and broken characters have to cleaned up much better than when scanning from the last decade only. Google won't do that - they're more interested in quantity and speed rather than quality.
  I expect most of the books will be unusable and will have to be redone at some point in the future. I don't know why they bother (*).
  (*) British Library, that is.
3. Re:great! by digitig · 2011-06-21 00:08 · Score: 2
  
  If you are in the UK, your local library should be able to get hold of copies of most British Library material for you, for quite a small fee. Yes, it's slow, and the small fees would build up if you need to access a lot of different things, but the information was already more accessible than you suggest. This is still a great step forward, though.
  
  --
  Quidnam Latine loqui modo coepi?
4. Re:great! by martin-boundary · 2011-06-21 00:08 · Score: 1
  
  No, I'd rather the BL do it properly themselves however many years it takes, instead of Google's wham bam thank you ma'am approach. Any job worth doing for future generations is a job worth doing right.
  Moreover, it's a necessity. If the scans are shit, then they can't be OCR'd, so all you have is pictures anyway.
  It *can* be done, lots of libraries around the world have done proof of concept pilot runs going back to the 90s even, and you can find their collections on the web if you look.
5. Re:great! by Panoptes · 2011-06-21 01:38 · Score: 1
  
  A small, but important, point: 'blotches and broken characters' are precisely what interest bibliographers and researchers, for whom these scans will be of immense value.
Now I am intrigued... by rts008 · 2011-06-20 21:53 · Score: 1

What about the Prince of Orange and a stuffed hippopotamus?
Inquiring minds want to know.
What does one do with a stuffed hippo?

--
Down With Slashdot BETA!!! I've been around the corner and seen the oliphant; you can only abuse me from your perspecti
1. Re:Now I am intrigued... by chill · 2011-06-20 22:22 · Score: 2
  
  More to the point, what did the Princes of Green, Red, White and Mauve think? And what about the Marquis of Heliotrope?
  
  --
  Learning HOW to think is more important than learning WHAT to think.
2. Re:Now I am intrigued... by pieterbos · 2011-06-20 22:44 · Score: 3, Informative
  
  Put in your cabinet of curiosities of course, and show to visitors. What else would you ever do with it? The title Prince of Orange is held by the crown prince of the Netherlands. It refers to the french city called 'Orange'. The title still exists, but is not a claim of any sort on the city of Orange, which is part of France. See wikipedia for the rather strange history of the term
3. Re:Now I am intrigued... by Intrepid+imaginaut · 2011-06-20 23:06 · Score: 1
  
  Ride it around lashing it with a switch of course. Ah the joys of inbreeding.
4. Re:Now I am intrigued... by dkleinsc · 2011-06-20 23:48 · Score: 1
  
  I don't know, but the Fresh Prince was jiggy with it, and Prince didn't have The Time to comment about it.
  
  --
  I am officially gone from /. Long live http://www.soylentnews.com/
5. Re:Now I am intrigued... by SMoynihan · 2011-06-20 23:56 · Score: 5, Informative
  
  Indeed, and the title is older than the English word "orange" itself. This was introduced to English in the early 1500's (just in time for Shakespeare to complain its lack of rhyme...), and is termed after the name for the fruit. Prior to this, the colour was "geoluhread" (yellow-red). Note, we don't call it "carrot", as (yellow-red) carrots were developed in the 1700s.
  Now, the house of Orange comes from the city, originally "Arausio", in southern France. This was named for the local Celtic water God of the same name.
  Being Irish, I admit I find it somewhat ironic that the "Orange-men" are originally termed for a pagan, Celtic god...
6. Re:Now I am intrigued... by sgt+scrub · 2011-06-21 02:52 · Score: 1
  
  Stuffed Hippopotamus? Is that 1700 goatse?
  
  --
  Having to work for a living is the root of all evil.
7. Re:Now I am intrigued... by adavies42 · 2011-06-21 03:04 · Score: 1
  
  Note, we don't call it "carrot", as (yellow-red) carrots were developed in the 1700s.
  and popularized as a symbol of dutch patriotism, iirc
  
  --
  Media that can be recorded and distributed can be recorded and distributed.
  -kfg
8. Re:Now I am intrigued... by sgt+scrub · 2011-06-21 03:08 · Score: 1
  
  Now, the house of Orange comes from the city, originally "Arausio", in southern France. This was named for the local Celtic water God of the same name.
  Thanks for pointing that out. I looked at it and thought, Arausio was a Gaul camp. Now to figure out why the Celts where in southern Gaul during a period of time when most everyone was trying to get way from the Romans.
  
  --
  Having to work for a living is the root of all evil.
9. Re:Now I am intrigued... by tehcyder · 2011-06-21 03:44 · Score: 1
  
  Being Irish, I admit I find it somewhat ironic that the "Orange-men" are originally termed for a pagan, Celtic god...
  But that's entirelly irrelevant to the current use of the term, which relates purely to the time after William of Orange, it has no connection with the original Celtic god.
  
  You might as well say it is ironic that Christians worship on a Sunday, which is named after the ancientt Sun god.
  
  --
  To have a right to do a thing is not at all the same as to be right in doing it
10. Re:Now I am intrigued... by SMoynihan · 2011-06-21 05:44 · Score: 1
  
  But that's entirelly irrelevant to the current use of the term, which relates purely to the time after William of Orange, it has no connection with the original Celtic god.
  You might as well say it is ironic that Christians worship on a Sunday, which is named after the ancientt Sun god.
  Begging your pardon (and ignoring the conflation of Christ with Sun gods in early Romano-Christian history); I think the comparison might be more apt if a group of Christians worshipped on Thursday, a day named after Thor, so named themselves Thursians.
  Personally, I would find that ironic - perhaps it's that extra step of actually naming yourself after the deity.
  However, your mileage may vary.
  On a related note, I find it somewhat amusing that many Christians (in my experience) would term saying "Christ" as blasphemy, and think of it as something akin to a surname - not knowing it as the transliteration of the simple Greek "Christos" (Saviour)
Not the only one... by metageek · 2011-06-20 21:54 · Score: 3, Informative

This is not the only British library that gets all publications, The National Library of Wales (http://www.llgc.org.uk/) also gets all publications that are published in the UK (and there is likely one also in Scotland)

--
metageek
1. Re:Not the only one... by Webspit · 2011-06-20 21:59 · Score: 2
  
  technically no - I re-read the article - only the BL automatically gets a copy. The welsh like oxford have to request one within a year. The other difference is the copy sent to the BL has to be the same as the best edition whereas the rest are fobbed off with the same edition as the one currently most popular.
2. Re:Not the only one... by mdransfield · 2011-06-20 22:02 · Score: 1
  
  As usual, it's slightly more complicated: http://www.legaldeposit.org.uk/background.html
3. Re:Not the only one... by jcupitt65 · 2011-06-20 22:05 · Score: 4, Informative
  
  Actually the BL really is the only one to automatically get all publications. Five other libraries are entitled to a free copy upon request.
  http://en.wikipedia.org/wiki/Legal_deposit#United_Kingdom
  I know Cambridge gets everything with an ISBN, and from your post it sounds like Wales and Scotland do too. Things like PhD thesis only go to the BL though.
4. Re:Not the only one... by Geeky · 2011-06-20 22:21 · Score: 3, Interesting
  
  Interesting, as it's covered by law in the UK. I wonder how it would apply to self-published books, such as books sold through the likes of Blurb or Lulu.
  Those companies are not UK based, so are not covered by the legislation. However, if I (as a UK resident) published a book, for sale to the public, via Lulu, would I be classed as publisher in terms of this legislation?
  
  --
  Sigs are so 1990s. No way would I be seen dead with one.
5. Re:Not the only one... by illtud · 2011-06-21 11:18 · Score: 1
  
  Things like PhD thesis only go to the BL though.
  No, at the National Library for Wales we get the theses from the universites in Wales:
  http://www.llgc.org.uk/index.php?id=4653
  So they don't get everything from the UK (I'm not sure what Scotland does, they have their own National Library).
  We've started harvesting e-theses from university repositories as part of the ETHOS project (see link in the url above), the BL will however harvest them on from us (subject to agreement with the originating uni), so they'll get a more complete collection of those.
  ps - the BL should have a copy of all material covered by Legal Deposit, but even they have a 'reminder' office that has to chase up publishers, but they have it a lot easier than the rest of us Legal Deposit libraries, who have to put in a claim for each item.
Its worth pointing out... by Richard_at_work · 2011-06-20 21:56 · Score: 1

From the article:

The new collection will contain only works that are out of copyright under European law.
Google are approaching it correctly this time.
1. Re:Its worth pointing out... by Anonymous Coward · 2011-06-20 22:03 · Score: 1
  
  Will the digitized copies contain a 'copyright Google' watermark?
Re:Yes, by ciderbrew · 2011-06-20 22:12 · Score: 1

Sorry, I only have "the pile would reach to the moon and back x amount" or number of double decker buses jumped by and or Eddie Kidd / Evel Knievel my mate Dave.
Re:Finally, us mere mortals may have a glimpse by Richard_at_work · 2011-06-20 22:21 · Score: 1

Considering the items involved that require you to have a readers pass, yes of course it is difficult - they are one of a kind items, often needing to be handled in specific ways and treated with extreme respect, costing millions of pounds to restore, thousands of pounds to store and cannot be replaced. They are exactly the items that need a gate keeper to look after them.
Legal deposit by Martin+Spamer · 2011-06-20 22:42 · Score: 1

Legal deposit cover printed material, digital publications (Newspapers, scholarly journals, software including games) and online material are covered by a voluntary scheme.
1. Re:Legal deposit by Geeky · 2011-06-21 04:11 · Score: 1
  
  Coming back to this late, but Lulu and Blurb are basically print on demand services, so we're not talking digital books. Lulu even let you get an ISBN number for your book.
  
  --
  Sigs are so 1990s. No way would I be seen dead with one.
Re:Yes, by sonamchauhan · 2011-06-20 22:43 · Score: 1

No, no ... in terms of cricket pitches.
Or, in multiples of 'Playing fields of Eton'
Re:Finally, us mere mortals may have a glimpse by Anonymous Coward · 2011-06-20 23:00 · Score: 1

ALL items in the British Library require a Reader's Pass to view, except for the limited stock that they retain for inter-library loan.
This is regardless of their provenance or rarity.
Re:How do they do it? by mccalli · 2011-06-20 23:27 · Score: 2

I worked at company that did the same for the French National Library, about fifteen to eighteen years ago. To go through your questions:

We had a mix of temps and perms, mostly temp scanner operators and perm developers.

Professionals - yes, there were clauses in the contract about how much we paid if things were damaged.

Team size? Smaller than you might think - we had about ten at its peak. Around the clock - not quite, but there were definitely early and late shifts.

We used then-flash Bell & Howell scanners with expensive document feeders to avoid ripping the papers. We used Kofax image processing cards at a staggering 1Mb VRAM (yes - feel the power...) and super-powerful PCs too (486DX2 66Mhz). We stored the resulting TIFFs on a vast network server (a Network 3 1gb machine called Leviathan. Inconceivably it ran out of space so we bought a second called Behemoth). Actual process was to guillotine the books and feed them through the scanners, some books would then be restitched. In the case of rare books we'd photograph them instead (and then scan the photo - this predates digital cameras).

Yes, we then OCR'd them, and the contract stipulates that x pages in 100 have to then be proof-read.

Clearly the tech is now completely outclassed, but I'd be surprised if the contract and physical side has changed much. Am not terribly surprised to hear the British Library have taken the best part of two decades to catch up, we were talking to them at the time and they were terribly, terribly slow to see the potential in this.

Cheers,
Ian
This is going to be incredibly great by davide+marney · 2011-06-20 23:37 · Score: 3, Insightful

The 18th century saw the birth of both the Industrial Age and the Age of Enlightenment. This was a time of profound change on a global scale that easily rivals the impact of our own information age.
You may ask what is the point in studying history -- who cares about the impact of steam power, for example? Here's the thing: although technology improves over time, people basically remain the same. By understanding the dislocation of farmers to factories in 1750, you can gain insight into the dislocation of national workers to global workers today.
To get access to literally every single published work from this period is going to be amazing. Bravo UK and Google!

--
"We receive as friendly that which agrees with, we resist with dislike that which opposes us" - Faraday
1. Re:This is going to be incredibly great by elrous0 · 2011-06-21 02:13 · Score: 1
  
  people basically remain the same
  Well, yeah, but they smell a lot better now.
  
  --
  SJW: Someone who has run out of real oppression, and has to fake it.
2. Re:This is going to be incredibly great by sgt+scrub · 2011-06-21 03:18 · Score: 1
  
  You may ask what is the point in studying history
  It is entertaining :) http://en.wikipedia.org/wiki/Connections_(TV_series)
  
  --
  Having to work for a living is the root of all evil.
Future Libraries? by Subratik · 2011-06-20 23:53 · Score: 1

I wonder what they will look like... If someone hasn't thought of it before, someone should start drawing up plans for futuristic libraries where instead of checking out paper books you can check out books for your kindle or some other device... on top of that, I think it would be cool for it to look like a traditional library, but server racks instead of bookshelves.. (this probably just seems cool to me because I'm a nerd, I have a lot of friends who are 'conservative' when it comes to paper books.. A lot of the English majors I know treat technology like the anti-christ.
1. Re:Future Libraries? by MichaelSmith · 2011-06-21 00:01 · Score: 1
  
  I wonder what they will look like... If someone hasn't thought of it before, someone should start drawing up plans for futuristic libraries where instead of checking out paper books you can check out books for your kindle or some other device... on top of that, I think it would be cool for it to look like a traditional library, but server racks instead of bookshelves.. (this probably just seems cool to me because I'm a nerd, I have a lot of friends who are 'conservative' when it comes to paper books.. A lot of the English majors I know treat technology like the anti-christ.
  I think your electronic library will look like this and the server racks will be located somewhere with cheap power and air conditioning.
  
  --
  http://michaelsmith.id.au
Re:I wish someone would offer to digitise that lot by TaoPhoenix · 2011-06-20 23:54 · Score: 2

Calling your bluff. What state are you in?
For that to happen for free you need to declare the contents of your game system Creative Commons BY-SA which is Attribution-ShareAlike, and avoids the weird tangles regarding ad revenue vs "non commercial".
Then you have to develop the Literacy Pyramid, which is what every single copyright-clueless entity always falls into, proving that they are about the lawyers instead of the writers. The Literacy Pyramid says that you need a base of some 100 Lurkers to get about 7 Enthusiasts. But the output of Enthusiasts may not be to the standards of the Creator or the Skilled Amateur! So then you need to let 100 Enthusiasts stomp around leaving muddy tracks everywhere to get your 7 Skilled Amateurs. So every time Eric Flint whines on the Baen Free Library that "it's too expensive to digitize old works therefore they will never be republished" he's full of ...jellyBaens because it's somehow magically worth paying the lawyers afterward to sue the Enthusiasts as they stomp around.
So are you ready to do a little carpet cleaning to get your game out there?

--
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Re:Thesis goes to University Library, not BL by jcupitt65 · 2011-06-21 00:04 · Score: 1

Strange, mine just went to the BL. Perhaps it depends upon the examining institution.
Re:How do they do it? by zevans · 2011-06-21 00:16 · Score: 1

Mod this up, interesting discussion.
I'd guess the answer to 1 and 2 is "it depends." There must be rarities for which a full-on expert is required with white gloves and a wand (and in their spare time they supplement their income as street magicians.)
The proofreading is at least partly through reCAPTCHA. "Currently, we are helping to digitize old editions of the New York Times and books from Google Books." http://www.google.com/recaptcha/learnmore

--
"... and more and more now there are all kinds of electronic goodies available" -- Pink Floyd 1972
Re:How do they do it? by N+Monkey · 2011-06-21 00:17 · Score: 1

I worked at company that did the same for the French National Library, about fifteen to eighteen years ago. To go through your questions: ...
Actual process was to guillotine the books and feed them through the scanners, some books would then be restitched. In the case of rare books we'd photograph them instead (and then scan the photo - this predates digital cameras).
I thought that Google had tech that could scan the pages of an original book and automatically compensate for any curvature. IIRC** it did something like flash a test pattern onto the page to determine how to straighten the final image.
**but it was a while ago I read this so could easily be mistaken.
Out-of-copyright works by SirGarlon · 2011-06-21 00:18 · Score: 1

So my question is, since the original material is in the public domain (copyright expired), is Google's digitized copy in the public domain as well?

--
[Sir Garlon] is the marvellest knight that is now living, for he destroyeth many good knights, for he goeth invisible.
1. Re:Out-of-copyright works by Amorymeltzer · 2011-06-21 02:02 · Score: 2
  
  Possibly. In the US, the Bridgeman V Corel case decided that copies of public domain works are not copyrightable, but that of course has no bearing in the UK. There is a sense there that the ruling is reasonable, but straight up copies are definitely deemed copyrighted works thanks to (imho, inane) concepts like lighting and photogenicity. In this case, nobody's likely to complain, and surely not Google, but image copyright in the UK lies in the act of taking the photo and not generally in the creativity involved therein.
  
  --
  I live in constant fear of the Coming of the Red Spiders.
Re:Finally, us mere mortals may have a glimpse by digitig · 2011-06-21 00:24 · Score: 1

The BL blows on about adding to "our shared heritage" but the truth is that they are notoriously fickle and arbitrary about issuing Reader's Passes to actually use their collection.
It's automatic if you are doing a postgraduate degree.

I have had my application for a pass refused as my research justification was deemed "insufficiently scholarly", even after I had spent 10 minutes being interviewed by the secretary. The average man on the street who wanders in to their London campus will be in for a rude shock.
You don't accept the possibility that your research justifiction might have been insufficiently scholarly?

Even if the staff judge you to be worthy enough to view their precious possessions you have to jump through hoops just to reserve the item.
You ask the person on the information desk to reserve it for you, or you log in to the electronic catalogue (on-site or on-line), look the item up, press the "reserve" button, and select the reading room to which you want it delivered. If you consider that to be jumping through hoops then it says a lot for the academic standard you are likely to achieve.

Whenever I finally publish the fruits of my work I will happily flout the Legal Deposit Libraries Act and refuse to provide BL a copy.
And nothing of value was lost, I suspect.

--
Quidnam Latine loqui modo coepi?
let's see what's actually happening by Hazel+Bergeron · 2011-06-21 00:30 · Score: 1

The British Library has just handed the copyright on a load of uncopyrighted work to Google, and Google in return gets exclusive commercial rights to the work. This is awful. And for only £6 million, by their estimate, they could have done it themselves - considering the broad range of interested parties, donations could easily raise that amount. Their effort would be far better, too, if the standards of Google's old archives are anything to go by.
This is just another example of the British "public private partnership", where one guy does an under-the-table deal with another guy to do something seemingly simple and relatively inexpensive in an unnecessarily convoluted and costly manner, ending up with a product/service far worse than it could otherwise have been.
The guilty party is the British people for allowing the government to engage in an ongoing sale of the country.
Fuck off, Google. It was OK when all you wanted to do is control the future - the future's not that interesting, if the last three decades can be extrapolated - but now you want to control the past.
1. Re:let's see what's actually happening by Ksevio · 2011-06-21 03:25 · Score: 1
  
  No, they didn't hand over any copyrights, it even states that all digital rights revert back to the Library. Google already has the expertise to scan these books and the infrastructure to distribute them.
  Basically they saved tax payers Â£6 million plus whatever the hosting and distribution costs would be AND the books are now easily accessible to anyone in the world! Do you really think the British Library could have done a better job than Google in house?
Re:Article and Summary wrong... by flimflammer · 2011-06-21 00:30 · Score: 1

This has been pointed out and proven wrong a dozen times already in the comments. Only the British Library gets one automatically, the other libraries may request a free copy.
Re:How do they do it? by hackertourist · 2011-06-21 00:34 · Score: 1

I've been involved with a similar project in the Netherlands. We found that commercial OCR engines had a high error rate on these old documents. We ended up having each document OCR'ed twice: once by software, once by having a sweatshop in India manually type up the document. The Indians had a lower error rate than the OCR software. By combining the two sources we could achieve an error rate low enough to comply with the project spec.
The project was unusual in that the documents were an index (of the minutes of parliament meetings); this meant it was full of words without context (incl. loads of names), and part of the information was in numbers, so we couldn't use a spelling checker to increase accuracy.
Using a spelling checker on century-old documents is iffy anyway, since you need one that has the then-current vocabulary instead of modern spelling.
Re:How do they do it? by martin-boundary · 2011-06-21 00:37 · Score: 1

Heh, reCAPTCHA isn't exactly foolproof. There's more spambots solving them per minute than humans. So if a human gets it right but two spambots already agreed on a wrong answer, guess what the system does...
Re:Finally, us mere mortals may have a glimpse by Lincolnshire+Poacher · 2011-06-21 01:02 · Score: 1

Hi there,
Do you hold a BL Reader Pass? Actually they're also now available to undergraduates, but since I am 20 years out of Uni that's not much help to me either
> You don't accept the possibility that your research justifiction might have been insufficiently scholarly?
"A history of astro-navigation" may not be Earth-shatteringly exciting, but who are the BL to judge its merit? I had a case for research work, I showed that pamphlets they held were not available elsewhere but my application was denied for no reason other than the secretary was grumpy that day. She could provide no objective explanation.
> And nothing of value was lost, I suspect.
Exactly the attitude expressed by the BL.
Re:Article and Summary wrong... by Panoptes · 2011-06-21 01:45 · Score: 1

The Brotherton at Leeds University is also a copyright library.
Re:How do they do it? by mccalli · 2011-06-21 02:06 · Score: 1

"I thought that Google had tech that could scan the pages of an original book and automatically compensate for any curvature. IIRC** it did something like flash a test pattern onto the page to determine how to straighten the final image."

We did that too - the Kofax card and driver software could take care of deskewing and it did a reasonably good job. Again, this was a while ago so I imagine things have improved but it wasn't too bad.

Cheers,
Ian
Re:Finally, us mere mortals may have a glimpse by digitig · 2011-06-21 02:27 · Score: 1

Hi there,
Do you hold a BL Reader Pass?
Yes.

Actually they're also now available to undergraduates, but since I am 20 years out of Uni that's not much help to me either
They're available to anybody who can make the case for one, irrespective of study level. It's just that doing postgrad studies is one of the objective criteria that automatically makes the case.

A history of astro-navigation" may not be Earth-shatteringly exciting, but who are the BL to judge its merit?
They are the people appointed with the task of making that judgement.

I had a case for research work, I showed that pamphlets they held were not available elsewhere but my application was denied for no reason other than the secretary was grumpy that day. She could provide no objective explanation.
In other words, you failed to make the case and it's somebody else's fault. There is a set of objective criteria to decide whether somebody can get a card. If you fail those tests then you get a second chance with an interview and a subjective judgement. It's meaningless to complain that she could "provide no objective explanation". You'd already failed the objective tests.

> And nothing of value was lost, I suspect.
Exactly the attitude expressed by the BL.
So you are still failing to make your case.

--
Quidnam Latine loqui modo coepi?
Don't do drugs by Snaller · 2011-06-21 02:31 · Score: 1

Here is a tip: Don't do drugs before you post rants on slashdot.

--
If Google really cared they would fix Android Chrome to reflow text, instead of discriminating
Re:Finally, us mere mortals may have a glimpse by Jeremy+Erwin · 2011-06-21 03:37 · Score: 1

That's strange. The Library of Congress gives readers passes out to most anybody who applies.
Re:I wish someone would offer to digitise that lot by Intrepid+imaginaut · 2011-06-21 03:39 · Score: 1

I genuinely have no idea what you're talking about?
Re:Finally, us mere mortals may have a glimpse by tehcyder · 2011-06-21 03:53 · Score: 1

I will happily flout the Legal Deposit Libraries Act and refuse to provide BL a copy.
What with that and your user name, you're on two strikes. Just as well you're not in the US, or the next time you crossed the road owithout lokking properly you'd be off to prison for thirty years.

--
To have a right to do a thing is not at all the same as to be right in doing it
Re:Article and Summary wrong... by tehcyder · 2011-06-21 03:56 · Score: 1

No, you're wrong..

This has already been answered by several people who either knew or could be bothered (like me) to spend ten seconds on Google.

--
To have a right to do a thing is not at all the same as to be right in doing it
Re:How do they do it? by tehcyder · 2011-06-21 03:58 · Score: 1

Do they use automated machines, scanning beds, or wands?
No, they're transcribing everything by hand using quill pens and ink, then typesetting it on proper hot metal presses, then finally photographing each page with an 11 x 14 plate camera and emailing the images one page at a time to everyone who has a gmail account.

--
To have a right to do a thing is not at all the same as to be right in doing it
Re:I wish someone would offer to digitise that lot by Intrepid+imaginaut · 2011-06-21 09:23 · Score: 1

Oh okay I think I get it - the game will be free for all to use and share upon publication, that's in the blog, issue 1. That's not much help though, I've yet to meet the ocr program that can translate my scribbles.