New Audacious Research Project, In Codice Ratio, Bets on AI and OCR To Make Sense of Handwritten Texts in Vatican's Secret Archives (theatlantic.com)

← Back to Stories (view on slashdot.org)

New Audacious Research Project, In Codice Ratio, Bets on AI and OCR To Make Sense of Handwritten Texts in Vatican's Secret Archives (theatlantic.com)

Posted by msmash on Monday April 30, 2018 @12:30PM from the how-about-that dept.

A new project untangles the handwritten texts in one of the world's largest historical collections. From a report: The Vatican Secret Archives is one of the grandest historical collections in the world. It's also one of the most useless. The grandeur is obvious. Located within the Vatican's walls, next door to the Apostolic Library and just north of the Sistine Chapel, the VSA houses 53 linear miles of shelving dating back more than 12 centuries. That said, the VSA isn't much use to modern scholars, because it's so inaccessible. Of those 53 miles, just a few millimeters' worth of pages have been scanned and made available online. Even fewer pages have been transcribed into computer text and made searchable. If you want to peruse anything else, you have to apply for special access, schlep all the way to Rome, and go through every page by hand.

But a new project could change all that. Known as In Codice Ratio, it uses a combination of artificial intelligence and optical-character-recognition (OCR) software to scour these neglected texts and make their transcripts available for the very first time. If successful, the technology could also open up untold numbers of other documents at historical archives around the world.

111 comments

Min score:

Reason:

Sort:

Artificial Intelligence by UltimateDuster · 2018-04-30 12:39 · Score: 2, Insightful

It doesn't exist yet. Neural networks and genetic algorithms are NOT SENTIENT or anywhere close. It's going to be a few decades before we have anything resembling true intelligence.

Heck, even the HAL of HAL 9000 stands for Heuristic ALgorithmic computer, so Clarke was still making the argument that the computer of 2001 had only reached the brink of sentience and couldn't handle a moral dilemma.
1. Re:Artificial Intelligence by rmdingler · 2018-04-30 13:04 · Score: 1
  
  It doesn't exist yet. Neural networks and genetic algorithms are NOT SENTIENT or anywhere close. It's going to be a few decades before we have anything resembling true intelligence. Heck, even the HAL of HAL 9000 stands for Heuristic ALgorithmic computer, so Clarke was still making the argument that the computer of 2001 had only reached the brink of sentience and couldn't handle a moral dilemma.
  Perhaps on the order of crystal clarity, the entity's lack of moral dilemma's will expedite the implementation of atrificial intelligence.
  
  --
  Happiness in intelligent people is the rarest thing I know.
  Ernest Hemingway
2. Re:Artificial Intelligence by gweihir · 2018-04-30 13:10 · Score: 3, Insightful
  
  Indeed. That is why the hypemongers have come up with the term "weak AI", i.e. the AI without the "I". Classically this was called automation and here it is the subspecies called "pattern recognition".
  Incidentally, I disagree about the "few decades" for "strong/true AI". At this time we have zero indications it is even physically possible and hence, if it is possible, then > 50 years is a realistic timeline. No, human beings do not count as "reference implementation" for a number of reasons. It starts with us not even knowing what life is and we certainly cannot create it. Next, we do not know how humans generate intelligence and claiming that it must obviously be a physical process is just ignorant quasi-religious physicalism, not science. Just because we see the interface does not imply what is going on behind it. And we do not. Add to that that natural intelligence seems to require consciousness, where we know even less what it is or where it comes from. Now, I am not arguing for some form of mysticism, I am arguing that science is entirely clueless about what these things are and hence any predictions that strong AI is possible are vastly premature. Also note that what human beings have comes with free will (or at least some form of independence if you think free will does not exist) and that would pretty much make it useless as basis of a technical machine. All the problems with slave labor would apply. And in addition, there is indication that a technical implementation, if possible, of a human's mind would actually not be faster than the real thing and would need just the same slow education to be able to do anything worthwhile.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
3. Re:Artificial Intelligence by ShanghaiBill · 2018-04-30 13:10 · Score: 3, Insightful
  
  According to TFS, the library is inaccessible because it hasn't been scanned.
  So their solution is to use AI driven OCR, which requires ... scanning.
4. Re:Artificial Intelligence by gweihir · 2018-04-30 13:11 · Score: 2
  
  It is actually very simple to enumerate them: zero. The only thing automation can do these days is being dumb very, very fast.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
5. Re:Artificial Intelligence by ShanghaiBill · 2018-04-30 13:18 · Score: 5, Insightful
  
  Think about it: it's only going to be as good as your training data.
  Obvious counter-example: Alpha-Go Zero, which used NO training data, and was far better at its assigned task than any of its programmers.
  There is no reason to believe that an AI's capabilities are inherently limited by training data. Nor is there any reason to believe that it can't surpass the abilities of its creators. That makes as little sense as saying children can never be more intelligent than their parents.
6. Re:Artificial Intelligence by crunchygranola · 2018-04-30 13:37 · Score: 1
  
  Exactly my reaction. Once they scan them, they can put them on-line. Done and done.
  
  --
  Second class citizen of the New Gilded Age
7. Re:Artificial Intelligence by Darinbob · 2018-04-30 13:56 · Score: 1
  
  Artificial intelligence is not the same as artificial sentience.
8. Re:Artificial Intelligence by 110010001000 · 2018-04-30 14:06 · Score: 1
  
  Playing games with strict rulesets is hardly AI. Computers love strict rules. You can create programs that will beat any human at any game as long as there is a strict ruleset. More hype.
9. Re:Artificial Intelligence by LifesABeach · 2018-04-30 14:33 · Score: 1
  
  I think of the lost knowledge that was communicated by the Neanderthals in their survived paintings and carvings. I see the same solution to western knowledge being buried in the libraries of the Vatican. It is the lack of sharing of knowledge that the Vatican and the Neanderthals will both have in common when someone else shares to all the same idea to others.
10. Re:Artificial Intelligence by UnknownSoldier · 2018-04-30 14:35 · Score: 2
  
  > Neural networks and genetic algorithms are NOT SENTIENT or anywhere close.
  Agree with your sentiments; you are hinting at precisely the problem:
  How do you test for sentience?
  How do you test for consciousness?
  How do you test for intelligence?
  Without a way to measure it these labels of "Artificial Intelligence" are bullshit.
11. Re:Artificial Intelligence by omnichad · 2018-04-30 14:52 · Score: 1
  
  Neural networks and genetic algorithms are NOT SENTIENT
  Perhaps that's why it's called artificial intelligence instead of just, you know, intelligence.
12. Re:Artificial Intelligence by ShanghaiBill · 2018-04-30 14:54 · Score: 3, Interesting
  
  I think of the lost knowledge that was communicated by the Neanderthals in their survived paintings and carvings.
  Neanderthals coexisted and interbred with H Sapiens, so it is likely they also talked to each other. So their knowledge wasn't lost, but passed on to their mongrel children.
13. Re:Artificial Intelligence by Anonymous Coward · 2018-04-30 14:55 · Score: 0
  
  Oh ffs, the "as used" definition for AI has moved on, get over it.
14. Re:Artificial Intelligence by Anonymous Coward · 2018-04-30 15:44 · Score: 0
  
  Sentience is NOT A REQUIREMENT of artificial intelligence.
  That is what the word "artificial" is all about.
15. Re:Artificial Intelligence by q_e_t · 2018-04-30 16:13 · Score: 2
  
  Sentience is not a requirement for intelligence.
16. Re:Artificial Intelligence by q_e_t · 2018-04-30 16:22 · Score: 1
  
  Natural intelligence is correlated with sentience. But then carbon monoxide is correlated with ICE cars, but it is a byproduct, and cars can use other motive mechanisms. It's not clear if strong AI requires sentience, and intelligence is not well defined. I'd agree that pattern recognition is often a better term, which it used to be called when I started working in it. The AI research group was a distinct group, more concerned with symbolic processing.
17. Re:Artificial Intelligence by Anonymous Coward · 2018-04-30 16:28 · Score: 0
  
  The word artificial points to it being created and engineered.
18. Re:Artificial Intelligence by Anne+Thwacks · 2018-04-30 20:02 · Score: 1
  
  Artificial Intelligence is not the same as Actual Idiocy, but you cant tell the difference from the acronyms, and mostly not by any other means either.
  
  --
  Sent from my ASR33 using ASCII
19. Re:Artificial Intelligence by Anne+Thwacks · 2018-04-30 20:04 · Score: 1
  
  Nor is intelligence a requirement for sentience: /. is the evidence for that.
  
  --
  Sent from my ASR33 using ASCII
20. Re:Artificial Intelligence by Anonymous Coward · 2018-04-30 20:13 · Score: 1
  
  Besides which, the Neanderthal women were really hot!
21. Re:Artificial Intelligence by Anonymous Coward · 2018-04-30 21:30 · Score: 1
  
  I swear, when the Terminator finally catches up with John Connor, his last words will be "But it's not really intelli...!"
  We don't need to define intelligence in order to create it. Heck, we've been doing it for millennia in complete blissful ignorance. Personally I think we've already created strong AI (it's called "the Internet"), and the only reason we don't recognize it is because it's not dumb enough to talk to us.
  Free will? Bah. Here's a thought experiment for you: can you define "free will" in such a form that humans have it, a sheep, a rolling die, an atom of Radium-224, or a computer don't have it? Without resorting to equally undefined terms such as "choice" or "consciousness" or "instinct" or "knowledge", unless you want to define them too of course.
22. Re:Artificial Intelligence by Anonymous Coward · 2018-04-30 22:11 · Score: 0
  
  You fall into the trap of assuming that human level AI is the only definition of AI. All animals have intelligence.
  You're also conflating sentience and sapience which are two related but distinct things.
  You also ignore the fact that a thing we might call AI doesn't have to be intelligent in exactly the same way as natural intelligences. It may only be pseudo-intelligent, but if for all practical purposes it behaves intelligently, then looks like a duck...
23. Re:Artificial Intelligence by lgw · 2018-05-01 01:05 · Score: 2
  
  Obvious counter-example: Alpha-Go Zero, which used NO training data,
  I see you don't know what "training data" means. It was trained on games of Go. No matter how good it gets, it will never reason about anything that's not a game of Go.
  
  --
  Socialism: a lie told by totalitarians and believed by fools.
24. Re:Artificial Intelligence by lgw · 2018-05-01 01:17 · Score: 1
  
  While I agree with your main point, it's worth pointing out that AI researchers have never been striving for "strong AI", rather "AI" as a technical term just means "solving problems that would seem at first to require intelligence", that is, difficult automation.
  Is there a difference between "(general) intelligence" and "consciousness" and "free will". I don't see one. There no reason to believe a machine intelligence couldn't exist, but that's not what people are working on as it has no commercial value.
  BTW, we have a lot more knowledge about the origins of man's intelligence than you seem to realize. After all, it's been studied quite extensively. We understand what structures in the brain are responsible for what (mostly from study of brain damage). We understand to some extent the relationship between neurology and personality. We understand that consciousness is, neurologically, an extension of the ability to visualize yourself doing something without actually doing it. We understand that your actions can be driven from more primitive areas of the brain, and that's more likely to happen the stronger basic needs become.
  All of that could be used to research machine intelligence, by following the one known successful approach. But why would anyone do that?
  
  --
  Socialism: a lie told by totalitarians and believed by fools.
25. Re:Artificial Intelligence by Enigma2175 · 2018-05-01 02:09 · Score: 1
  
  Actually AlphaZero also plays chess and shogi at high levels in addition to its go prowess.
  
  --
  Enigma
26. Re:Artificial Intelligence by gweihir · 2018-05-01 02:10 · Score: 1
  
  A 1:1 relation is not really a correlation, unless you want to muddy the waters.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
27. Re:Artificial Intelligence by gweihir · 2018-05-01 02:16 · Score: 1
  
  BTW, we have a lot more knowledge about the origins of man's intelligence than you seem to realize. After all, it's been studied quite extensively. We understand what structures in the brain are responsible for what (mostly from study of brain damage). We understand to some extent the relationship between neurology and personality. We understand that consciousness is, neurologically, an extension of the ability to visualize yourself doing something without actually doing it. We understand that your actions can be driven from more primitive areas of the brain, and that's more likely to happen the stronger basic needs become.
  Not so. We do understand that damage to certain elements of the brain causes damage to certain functions. That does not imply these areas create these functions. Ever heard of cutting a cable preventing some technical function from being performed? Now, did that function originate in that cable? Also, as to consciousness, your statement is just pseudo-profound bullshit, probably spouted by some scientist unable to admit his cluelessness.
  
  All of that could be used to research machine intelligence, by following the one known successful approach. But why would anyone do that?
  Actually, nothing like that can be used and it was tried extensively. The problems are a) most about the "successful approach" is unknown and b) what is known could not be reproduced. Seriously, have a look into the actual scientific literature some time, not in the BS produced for the masses.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
28. Re:Artificial Intelligence by Anonymous Coward · 2018-05-01 02:19 · Score: 0
  
  There are rules to Go, rules which indicate that a given game can be considered a win. Alpha-Go Zero trained itself by playing games, learning what states tend to lead to wins. It involved a self-training feedback loop.
  How do you score a translation and handwriting recognition program? Short answer: you can't. You have to have its output compared to that of humans to validate the results. There is no possible self-training feedback loop like there is in Go.
29. Re:Artificial Intelligence by Visarga · 2018-05-01 02:54 · Score: 1
  
  AI can also train in a simulator, it's not necessary to use datasets. A simulator is like a dynamic dataset. That's how AlphaGo beat humans.
30. Re:Artificial Intelligence by lgw · 2018-05-01 03:48 · Score: 1
  
  Not so. We do understand that damage to certain elements of the brain causes damage to certain functions. That does not imply these areas create these functions. Ever heard of cutting a cable preventing some technical function from being performed? Now, did that function originate in that cable?
  You can generally tell a cable from a processor by looking - neurology isn't mere guesswork, you know? We can do much more invasive studies on animals (to the point I get creeped out reading them), and get very detailed analysis of what certain brain structures do - and human brain anatomy isn't o different from closely-related species. We also know a bunch from what areas of the brain become active when performing a certain task. Any one approach might be misleading, but there are many approaches to confirm one another. Studies of brain damage serve well to confirm that comparative anatomy is worthwhile.
  
  Also, as to consciousness, your statement is just pseudo-profound bullshit, probably spouted by some scientist unable to admit his cluelessness.
  What statement is that? We have a lot of near-synonyms for the same concept, because it's hard to measure quantitatively. That's what hard problems often look like.
  Your ignorance of psychology, neurology, and philosophy is not mankind's ignorance.
  
  --
  Socialism: a lie told by totalitarians and believed by fools.
31. Re:Artificial Intelligence by Anonymous Coward · 2018-05-01 03:48 · Score: 0
  
  Parts moving parts perform into hard limits. Algorithms are like that. Godels Theorem says machines will always be stupid. Never create, never aware, never ... alive. No Princess Lea. Ever. You need a postcard palsy ?
32. Re:Artificial Intelligence by lgw · 2018-05-01 03:51 · Score: 1
  
  Sure, due to totally unrelated data sets trained in on unrelated training data. The only way in which they are all "AlphaZero" is common hardware and common approach. At no point will it generalize outside of its training data.
  Machine learning is simply an optimization framework. It cannot produce the ability ti reason outside its training data.
  
  --
  Socialism: a lie told by totalitarians and believed by fools.
33. Re:Artificial Intelligence by LifesABeach · 2018-05-01 04:41 · Score: 1
  
  Then why do we not know exactly what the paintings communicated? We can guess, but guessing is factual only by chance.
34. Re:Artificial Intelligence by ShanghaiBill · 2018-05-01 05:45 · Score: 1
  
  No human can play Go or Chess or any other game at a championship level with no experience and without being told the rules of the game. So why should a machine be able to do that? If that makes a machine "not intelligent", then no human is intelligent either. I really don't see what your point is.
  If your point is that AlphaGo doesn't possess human-level Hollywood AI like you saw in a Will Smith movie, then you are just stating the obvious.
35. Re:Artificial Intelligence by samwichse · 2018-05-01 06:26 · Score: 1
  
  I think the solution here is definitely to do it Rainbow's End style.
36. Re:Artificial Intelligence by ShanghaiBill · 2018-05-01 07:31 · Score: 1
  
  Then why do we not know exactly what the paintings communicated?
  We also don't know what exactly what ancient cave paintings by H. Sapiens communicated.
37. Re:Artificial Intelligence by ShanghaiBill · 2018-05-01 07:38 · Score: 1
  
  How do you score a translation and handwriting recognition program?
  By checking to see if the reading makes syntactic and semantic sense.
  
  You have to have its output compared to that of humans to validate the results.
  Do you think humans learn to read without knowing any rules or seeing any examples?
38. Re:Artificial Intelligence by q_e_t · 2018-05-01 08:48 · Score: 1
  
  We only have limited examples of animals with sentience, so the 1:1 correlation may be coincidental. Until we have more data I wouldn't rush to suggest causality.
39. Re:Artificial Intelligence by lgw · 2018-05-01 08:56 · Score: 1
  
  No, you've missed my point entirely. What AlphaGo, or any other sort of machine learning, does not posses is "general intelligence". General intelligence is the ability to solve problems of a kind not encountered before.
  Machine learning takes data which has been translated to an input vector, applies a bunch of linear algebra (or similar transforms), and gets an output vector. It's trivial to translate a chessboard or go board to an input vector, and the output vector to a move. What's missing is the rest of the chain: how to set up that translation, even in such trivial cases. How to generalize from a problem you've solved, to a somewhat-similar new problem.
  For example, AlphaGo can't use a camera and robotic arm to play Go. As these are solved problems, it could be so enhanced. But training the additional AI components to do that wouldn't give it the ability to play any other game the same way,.
  Do you see the point? It can't generalize. A board game with specific pieces in specific places for which you pick a move is very similar in input and output to any other such game, it's only the transform that varies. That's the only sort of thing it can do. It can't read the rules of a game to learn to play. It can't benefit from books on strategy. It can't do anything but optimize its current-board-to-next move function, and it can't even recognize that a new game is similar to a game it knows, so maybe start from there - it can only optimize in one specific way.
  
  --
  Socialism: a lie told by totalitarians and believed by fools.
40. Re:Artificial Intelligence by Shikaku · 2018-05-01 09:33 · Score: 1
  
  They're also ignoring the neuron simulation programs that emulate a brain. I believe we're at the computational level of the lizard right now, but when Google can finish their qubit computer hardware there might be a jump in level. To mice.
41. Re:Artificial Intelligence by bickerdyke · 2018-05-02 00:43 · Score: 1
  
  Which leads to the question if Betty Rubble was Homo N. or Homo S.
  
  --
  bickerdyke
42. Re:Artificial Intelligence by Michaelejahn · 2018-05-02 06:28 · Score: 1
  
  Of course the documents need to be scanned, but I am surprised the OP used the term OCR, when, they should have used ICR: https://en.wikipedia.org/wiki/... When you have hand written things the difference between an S and a 5 i sometimes contextual, but no so in part numbers that use both numbers and letters. Position sometimes helps. Also, a I might be slighly slanted one way \ or and other / - and in some cases, that is a slanted I and other times a forward or backward slash.
43. Re:Artificial Intelligence by RespekMyAthorati · 2018-05-03 07:54 · Score: 1
  
  that it must obviously be a physical process is just ignorant quasi-religious physicalism, not science.
  
  Science is physicalism, you superstitious twit, since all science is based on physical evidence.
restricted access != OCR by Shompol · 2018-04-30 12:40 · Score: 1

If restricted access to these documents is the problem then OCR can do little to help. At the same time OCR is not a requirement for granting public access, just scan and publish the images. Having an imperfect OCR is more of a hindrance than help.
End of the world. by Anonymous Coward · 2018-04-30 12:41 · Score: 0

Once all of it's read, will that spell the end of the world?
1. Re:End of the world. by Ungrounded+Lightning · 2018-04-30 12:47 · Score: 2
  
  Once all of it's read, will that spell the end of the world?
  (Reminds me of a "news item" in the daily newszine of the "Chicon II" World Science Fiction Convention: ~The filksingers finally finished singing "Nine Billion Names of God on the Wall" and the stars started going out.~)
  
  --
  Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
2. Re:End of the world. by q_e_t · 2018-04-30 16:28 · Score: 1
  
  Once all of it's read, will that spell the end of the world?
  It at least means a new Dan Brown book.
Finally, old enough to be public domain by Anonymous Coward · 2018-04-30 12:42 · Score: 1

I'll be long gone before Micky Mouse is free.
Another Sheetfeed And Destroy Project? by Bing+Tsher+E · 2018-04-30 12:42 · Score: 1

So is this another project like Google's, to rip the covers and bindings off everything in sight, toss it all in a sheetfeeder to make sub-optimal scans and then pulp it all?
1. Re:Another Sheetfeed And Destroy Project? by Anonymous Coward · 2018-04-30 12:56 · Score: 0
  
  Hey, if Google bought the books they can do what they will with the physical copies.
2. Re: Another Sheetfeed And Destroy Project? by Bing+Tsher+E · 2018-04-30 13:47 · Score: 1
  
  They didn't buy the books. They coaxed lots of libraries to let them and their sheetfeeders in. 'Those books are just going to waste on those shelves.'
3. Re: Another Sheetfeed And Destroy Project? by DingerX · 2018-04-30 14:02 · Score: 3, Informative
  
  Hell no. You don't digitize manuscripts destructively. There's not yet an official standard for digitizing medieval MSS, but the short version is that amateurs use cellphones or consumer cameras, wannabes use "archival scanners" (which require the document to be flat), and pros use a rig with medium-format cameras. but, for OCR, as their examples show, the current tech doesn't benefit from detailed images. This team is starting with the Papal Registers, which the ASV has been selling in a 300 dpi black-and-white (not grayscale) format for at least 15 years. 96% character recognition is about what other MSS OCR teams are getting. As TFA implies, people don't write letters; they write words, but you can't get the computational power to read words. So this inherently limits their approach, even with easy-to-read Carolingian Miniscule (the picture, btw, is of a "transitional hand" or "proto-gothic" more than CM). So they then choose between likely readings according to latinity. Cool, but with archival documents, the most valuable information for traditional research are the proper names, and these are usually less "Latinish" than the rest, so the net result is to increase the batting average slightly while grounding into a lot more double plays. In short: pilot project that uses digitizations from 2 generations back, produces results that aren't useful thanks to methodology dictated by current technology, and makes a few interesting tweaks. It would be cool to see, but first it'd be great to digitize and publish online the ASV. Of course, it's not so bad to go to Rome, go through the rigamarole of getting access to the ASV, and working directly with the originals. But the current catalog system dates from the eighteenth century, and is harder to read than the medieval manuscripts. So, you get what you can; if you're lucky they let you stay till 1600. Then you gotta find something to do in Rome until the next morning.
4. Re: Another Sheetfeed And Destroy Project? by Anonymous Coward · 2018-05-05 09:22 · Score: 0
  
  So, you get what you can; if you're lucky they let you stay till 1600.
  They let Giordano Bruno stay till 1600 but I wouldn't call that lucky.
Not so secret after all by darthsilun · 2018-04-30 12:52 · Score: 1

If everyone knows they exist
But honestly, isn't it long past time to open them up to scholarly research.
Scan them first. (How long do we estimate it will take?) Then start with the transcriptions, with or without OCR and deep learning.
And let's stop kidding ourselves: there is no AI. AI is a campy buzz word that the hipsters throw around because they think it makes them look kewl when they use it. But it really just makes them sound stupid.
1. Re:Not so secret after all by Anonymous Coward · 2018-04-30 15:46 · Score: 0
  
  "AI" means what the industry decides it means, not what you decide it means.
  And what the industry has decided it means, absolutely exists.
Text measured in miles? by Ecuador · 2018-04-30 12:56 · Score: 4, Insightful

OK, so now the text is measured in miles? What lunacy is this?
I mean, it is the ONE article where Libraries of Congress would actually be a valid unit!

--
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
1. Re:Text measured in miles? by reboot246 · 2018-04-30 13:08 · Score: 2
  
  "53 linear miles of shelving"
  Shelving. Shelving. Shelving. Not text, unless somebody wrote with really big letters.
  
  That's a lot of shelving. I own a LOT of books and I don't have anywhere near that much shelving, maybe a few hundred feet of it tops. Plus, I have about 30 or 40 boxes of books that I don't have room on the shelves for yet.
2. Re:Text measured in miles? by rsilvergun · 2018-04-30 13:11 · Score: 1
  
  I agree. Can we get a more sane measurement? How many bees to the hogshead?
  
  --
  Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
3. Re:Text measured in miles? by Anonymous Coward · 2018-04-30 13:45 · Score: 0
  
  I think the idea they are trying to convey is that they have no idea of how many books or pages or whatever because it's HYOOOOJ! There are 53 miles of shelves full of books and nobody has counted them so far. Go ahead and make some measurements of books and come up with an estimate of pages per foot and then multiply that by 279,840 (53 miles times 5,280 feet in a mile) and you'd have a general idea of how many pages there are.
  Offhand, a ream of paper (500 sheets) is 2 inches thick, so a foot of paper would be 3,000 pages. 53 miles of this paper would be over 839 million sheets of paper. Yes, this is tightly compressed paper and doesn't include binding thickness but this would possibly be the upper bound of how many pages we have to mull over. Yikes.
4. Re:Text measured in miles? by Anonymous Coward · 2018-04-30 14:13 · Score: 1
  
  Are you serious? You think 1200 year old books were written on paper that resembles anything like a ream of paper that you buy at your local Staples?
  1200 year old books were probably written on vellum, and if there are 50 pages to an inch I'd be truly amazed.
  I think your estimate is probably off by at least an order of magnitude.
5. Re:Text measured in miles? by AHuxley · 2018-04-30 14:26 · Score: 1
  
  "Stasi files row as Britain refuses to return documents to Germany" (29 Dec 2011)
  ".. already encompass 69 miles (111km) of files .."
  https://www.theguardian.com/wo...
  Miles of files is often used.
  
  --
  Domestic spying is now "Benign Information Gathering"
6. Re:Text measured in miles? by q_e_t · 2018-04-30 16:32 · Score: 1
  
  "Stasi files row as Britain refuses to return documents to Germany" (29 Dec 2011) ".. already encompass 69 miles (111km) of files .." https://www.theguardian.com/wo... Miles of files is often used.
  It's not a truly international measure until compared to the size of Belgium.
7. Re:Text measured in miles? by sheramil · 2018-04-30 19:47 · Score: 1
  
  OK, so now the text is measured in miles? What lunacy is this?
  Until it's deciphered, it's one great long string of squiggles.
  After it's been deciphered, it's tens of thousands of shorter squiggles.
8. Re:Text measured in miles? by Big+Hairy+Ian · 2018-04-30 23:03 · Score: 1
  
  Just imagine the L Space
  
  --
  Build a Man a Fire, and He'll Be Warm for a Day. Set a Man on Fire, and He'll Be Warm for the Rest of His Life.
9. Re:Text measured in miles? by Anonymous Coward · 2018-04-30 23:10 · Score: 0
  
  It was an example, you clod, and was acknowledged as being an extreme value.
10. Re:Text measured in miles? by c · 2018-05-01 00:25 · Score: 2
  
  OK, so now the text is measured in miles?
  That seems rather progressive for the Vatican. I'd expect it to be measured in some archaic unit like cubits, rods, or choirboy penises...
  
  --
  Log in or piss off.
11. Re:Text measured in miles? by AmiMoJo · 2018-05-01 01:48 · Score: 1
  
  Miles of shelving is easier for me to visualize than Libraries of Congress. I don't have any idea how much text the latter holds. In fact the only ever time I've heard it mentioned is on Slashdot.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
12. Re:Text measured in miles? by Kielistic · 2018-05-01 02:21 · Score: 1
  
  I would say that is a fairly decent way to get a rough upper bound. Old pages won't be thinner than modern pages.
13. Re:Text measured in miles? by epine · 2018-05-01 10:06 · Score: 1
  
  These are actually good for a quick pick me up on a random, drab day:
  List of humorous units of measurement
  List of unusual units of measurement
  Q: What do you call a man who measures his penis size in attoparsecs?
  A: Dick Twain.
14. Re:Text measured in miles? by imidan · 2018-05-01 11:13 · Score: 1
  
  Libraries commonly measure their capacity for books in shelf feet. In different areas of the library, books may be stored at different densities. In the periodicals section, a collection of journals or magazines may be stored at 30 issues per shelf foot, and in the reference section, maybe they only get 8 volumes per shelf foot.
  I did a statistical survey on a library once to estimate the number of books they had. I didn't express it this way in the study, but looking back at the numbers, they had about 15.9 shelf miles of capacity. This was a two story building, fairly densely populated with shelving.
OPUS DEI never going to allow it. Never. by Anonymous Coward · 2018-04-30 13:08 · Score: 0

The secret service of vatican city, named OPUS DEI, it's never going to allow it.
By default some priests have to check everything for censorship, there's so much stuff in there they don't want people to know.
1. Re:OPUS DEI never going to allow it. Never. by Anonymous Coward · 2018-04-30 13:27 · Score: 0
  
  I know, right. Stuff like Mary Magdalen was really Jesus' wife. (Whoever heard of a 30 year old jewish man in roman palestine not being married. Oh the humanity.) Or Jesus and Mary were whisked away to southern Gaul where they raised a family. Their children became european nobility. Is it Sang Real or San Greal, holy blood or holy grail? And all that other shit from the Da Vinci Code.
  Oops, now the secret is out of the bag. What will Opus Dei do now? Join your local Masonic Lodge to find out more.
  I have no affiliation with the Masons or Dan Brown. Have a nice day.
2. Re:OPUS DEI never going to allow it. Never. by UnknownSoldier · 2018-04-30 14:40 · Score: 1
  
  > Whoever heard of a 30 year old jewish man in roman palestine not being married. Oh the humanity.
  Were single men even allowed to preach in the temple back then?
3. Re: OPUS DEI never going to allow it. Never. by Brockmire · 2018-04-30 15:18 · Score: 0
  
  You'd probably enjoy Caesar's Messiah. While I do use the movie to help fall asleep, that shit makes waaaay more sense than the shit the Roman Catholic Church spews out.
4. Re:OPUS DEI never going to allow it. Never. by Anonymous Coward · 2018-05-01 00:28 · Score: 0
  
  You probably have no idea on the "split" during WW2, when half the church was on the axis of power side helping track all the italian jews and the other half trying to save as much as they could, or the slave trade the roman church helped to create little before WW1, or all the sex murders of the last 400 years covered up by the roman church.
  They got so many secrets about so many things...
Step 1 by cmcqueen1975 · 2018-04-30 13:09 · Score: 1

A sensible Step 1 would be to scan them and make the scans available to the public. That requires no AI or OCR.
Step 2, involving OCR and AI etcetera, is a separate step. It could be done multiple times, refining the quality of the results as the technology develops, and augmented with human checking, intervention in difficult spots.
I can't imagine them just dumping them online by rsilvergun · 2018-04-30 13:16 · Score: 1

the Catholic Church were more or less rulers at one point. Less priests and more kings. There's bound to be no shortage of dirt in there. And Catholicism has been getting beat up lately as it is. That's why we got a Pope who openly questions the reality of Hell. A vast library full of texts nobody ever thought would be read by the common rabble wouldn't exactly improve their standing. In this case the Truth won't set them free.

--
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
1. Re:I can't imagine them just dumping them online by Nothing2Chere · 2018-04-30 13:56 · Score: 1
  
  Wish I had points to mod this insightful.
2. Re:I can't imagine them just dumping them online by Anonymous Coward · 2018-04-30 14:30 · Score: 1
  
  Apparently he was misquoted on the Hell thing http://www.newsweek.com/devil-real-pope-francis-says-fake-news-hell-which-does-exist-879653
3. Re:I can't imagine them just dumping them online by Anonymous Coward · 2018-04-30 15:04 · Score: 0
  
  Honestly I suspect a large fraction will just be the ravings of various mentally-ill "prophets", ignorant analyses of same and a whole raft of vanity published tat.
  Like when you visit any truly old library: at first it's all very exciting - lot's of old books on impressive oaken shelves, the rich aroma of moldering paper and all that - then you start reading the spines and realise it's just a few hundred copies of the bible, the diaries of some 17th monk and an exhaustive description of the manors of the rich and dead benefactors.
4. Re:I can't imagine them just dumping them online by Anonymous Coward · 2018-04-30 15:04 · Score: 0
  
  Exactly, the few texts from this library that have been published contradict key tenants of Catholicism. For example, the documents from the council of Nicaea clearly contradict the dogma of the continuity of the Papacy. Even if there were not a single scandal (scoff) all of the re-inventions of dogma over almost 2000 years of history would show just how shaky the catechism is. I don't see how 2000 years of popes contradicting each other would be good for the sheep.
5. Re: I can't imagine them just dumping them online by Brockmire · 2018-04-30 15:13 · Score: 0
  
  Fuck this guy. I was starting to respect him, but not so much. I forget the movie, but one started with the quote, "Hell is not a place but a state of mind", attributed to Pope John Paul II in 1986. Basically, the "Catholic Guilt" that people refer to, is just the guilt of a life of "sins" causes a mental state of "Hell". I can only find similar but not exact quotes from around 1999. In 2018, people thinking there's a physical place for heaven and hell are just stupid. Just wishful thinking for people who can't handle dying.
6. Re:I can't imagine them just dumping them online by azcoyote · 2018-04-30 15:23 · Score: 3, Interesting
  
  Uh, no. Not only are you misinformed about the hell thing, but the Church has actively supported making the documents available to wider audiences. There's no reason to be scared of what is said because the validity of the Church is not based on some kind of myth of absolute human perfection. It's funny that people have to make up silly stories about popes when actual history is scandalous enough, and yet it does not undermine the Church one bit. One of my favorites is Pope Pius II, who wrote a raunchy play about priests picking prostitutes before he became pope. But that doesn't undermine the Church. We don't need the pretense that it is comprised of perfect human beings, because its authority is not grounded on human perfection but rather divine election. Even the claim that the pope can teach infallibly does not mean that everything he says is infallible, nor that he is a particularly excellent human being.
  Perhaps the thing people are more afraid of seeing is how much documentary evidence actually speaks in favor of the Church. Many people will easily look past anything that doesn't complement their Dan Brown view of history.
  
  --
  Incipiamus, fratres, servire Domino Deo, quia hucusque vix vel parum in nullo profecimus.
7. Re: I can't imagine them just dumping them online by Anonymous Coward · 2018-04-30 21:19 · Score: 0
  
  Technically, no one can handle dying.
Huge by 110010001000 · 2018-04-30 13:16 · Score: 1, Interesting

This is ground breaking. No one has ever used NN to decipher handwritten text before. I know I didn't back in 2005. Truly amazing!
1. Re:Huge by lgw · 2018-05-01 17:10 · Score: 1
  
  Well, give them credit: they're actually doing something vaguely useful with "AI". Don't see that every day.
  
  --
  Socialism: a lie told by totalitarians and believed by fools.
Depends on how endemic.. by Anonymous Coward · 2018-04-30 13:28 · Score: 0

Depends on how endemic...paedophilia and general sex abuse was amongst the higher ranks of Catholicism and how much of these 'secret archives' document such activities, or the skills needed to commit the abuse while remaining publicly untarnished.
1. Re:Depends on how endemic.. by darthsilun · 2018-04-30 13:58 · Score: 2
  
  Well, one, there's no need to publish anything less than 100 years old, it's the old stuff that we are probably most interested in. Two, most of the cats are already out of the bag. E.g.: William Manchester writes[1] of Cardinal Borgia:
  
  Roman lore has it that he was coupling with the older woman when he was distracted by the sight of her adolescent daughter lying beside them, naked, thighs yawning wide, matching her mother thrust for pelvic thrust, but with a rhythmic rotation of the hips which so intrigued the cardinal that he switched partners in midstroke.
  And honestly, if 15th century popes were sodomizing italian boys, and someone was writing about it, who is there today that really cares? I think we can just assume it's in there, pretend to be shocked about it ahead of time and get it out of our collective systems, and them proceed with publishing the scans. Really.
  [1] A World Lit Only By Fire, excerpted under Fair Use doctrine.
How do they get access to the library do to this? by Midnight_Falcon · 2018-04-30 14:26 · Score: 1

If the problem is that only a small amount of pages have been digitally scanned to date, it appears that's because of the Vatican's policy regarding access to the archives. How does new technology allow you to get in with a computer?
If someone could just go in and do a high-resolution scan of all the pages, wouldn't they have -- and then couldn't conceivably anyone try their OCR technology on it?
AI is neither machine learning nor sentience by raymorris · 2018-04-30 14:28 · Score: 1

Artificial intelligence isn't sentience. Here's another surprise - I'm going to get my Masters from Georgia Tech, but I haven't yet decided between two different programs - Artificial Intelligence, or Machine Learning. The are two different degrees, covering different topics.
The English Oxford Living Dictionary gives this definition of artificial intelligence:
âoeThe theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.â
1. Re:AI is neither machine learning nor sentience by q_e_t · 2018-04-30 16:27 · Score: 1
  
  When we were working on visual perception we considered it pattern recognition rather than AI.
2. Re:AI is neither machine learning nor sentience by Anonymous Coward · 2018-05-01 05:26 · Score: 0
  
  GT is pimping the trendline. Scramble 4 words ... artificial learning ... machine intelligence ... scramble them again. Machine art ... intelligent learning ... WTF.
Re:How do they get access to the library do to thi by azcoyote · 2018-04-30 14:39 · Score: 2

Despite the name, the Secret Archives is not all that secret. It's not hard to request and gain access. The problem is simply that there's too much material to deal with, and perhaps also the complexities of scanning old books without damaging them.

--
Incipiamus, fratres, servire Domino Deo, quia hucusque vix vel parum in nullo profecimus.
Palimpsests? by Humbubba · 2018-04-30 15:02 · Score: 3, Interesting

I'm excited. I hope this "In Codice Ratio" technique will eventually be able to discover and read overwritten text. There's no better place to look for such things than the Vatican's Secret Archives. Something as stunning as the Archimedes Palimpsest, something that could change history as we know it might just be sitting on a shelf there, waiting to be found.
Easier said than done by azcoyote · 2018-04-30 15:05 · Score: 4, Interesting

This sounds like a great idea, but it's likely to be extraordinarily complicated. Not only does handwriting differ from age to age, culture to culture, and place to place (just try reading 20th century German Sütterlin), but many medieval manuscripts utilize complex systems of abbreviations called sigla. Interpreting these can be very complicated because they are heavily context-dependent. One symbol can mean several different things. For example, a cross through a p can mean per, prae, or pro. A line over some letters can signify anything being cut out in-between. Just try figuring out what this inscription says: here.
Reading such abbreviations was probably expected to be relatively simple for the human brain to decipher both because the human actually interprets the text while deciphering symbols and because the original audience would have a better sense of how a particular community tended to use abbreviations.
The task is not impossible for a computer, though. In most cases there are a limited number of words that could be signified by abbreviations, and it is possible to determine which word is most likely intended according to immediate context. However, that would require the machine to have a grasp of the Latin grammar, and even then not everything is going to follow perfect rules. There is so much potential interpretation involved. The AI component here does help with this inasmuch as it uses statistical data to optimize recognition, but it's still likely to run into many difficulties.
The main innovation in TFA, as I see it, is that it responds to one of the major problems of reading old Carolingian minuscule. The letters are bunched together and there are times when you cannot be sure whether you are looking at two i's or a u, for example. The two can look exactly the same, not even just similar. The software in question attempts to handle this by recognizing individual penstrokes. Although I am not sure that this is 100% better than the older approach mentioned--recognizing whole words at a time--it does show significant promise because of its combination with AI. Perhaps some day it will be able to note, for example, that a certain author always strokes the i in a certain way. However, I'm sure there's going to be plenty of hurdles before getting to that point.

--
Incipiamus, fratres, servire Domino Deo, quia hucusque vix vel parum in nullo profecimus.
1. Re:Easier said than done by Orgasmatron · 2018-04-30 17:29 · Score: 4, Informative
  
  Cuneiform texts have similar problems, and translation is a tedious process. I'm hopeful that new systems can help automate the process, but I'm not holding my breath waiting for it.
  For a hint at the problems, cuneiform was used for thousands of years, across several languages.
  In the early days, it was very terse, writing just the key words that would allow a literate native speaker of the language to reconstruct the real sentence. You would have a sentence written as "(picture of a man) (picture of a house) (picture of a noun that sounds like the verb to-build)". The reader would be expected to know that the intended sentence was something like "Lugale-e-mundu" or "The King built the house" and infer from the context (for example stamped on the still-wet bricks) that it meant "The king ordered the construction of these houses" or whatever.
  Over time, the symbols were pared down from little drawings to simplified figures, to abstract representations, to a couple of strokes that carry very little similarity to the original drawings.
  At the same time, the scribes got really inventive with the symbols. A written symbol could mean the noun that it once resembled, or it could mean a verb that sounds similar to the noun, or it could be a syllable, or it could be a marker to indicate that the next or previous stuff was a proper name, or the name of a deity.
  Additionally, symbols multiplied. They ended up with dozens of symbols for the "e" sound, for example, with different meanings. So you could have two sentences with different meanings that sounded exactly the same, but they could be written with exact symbols, or with generic symbols.
  To make things even more fun, Sumerian died out as a spoken language long before it faded as a written language. So, the scribes lost confidence in their writing and started gradually writing everything out longhand. This actually turned out to be fantastic for us, because it let us see the structure of the spoken language in ways that were completely hidden in older writings.
  And other cultures with completely different unrelated languages started using the writing system. So you might find a tablet that you can't translate because it is Akkadian written phonetically, for example. Even worse, it could be written as if it were Sumerian, so the structure would make sense, but the names wouldn't.
  That is actually how the Sumerian language was re-discovered ~1200 years after it died completely. There was a language still living enough for scholars to know what it sounded like, and ancient clay tablets written by people who had spoken that language centuries before.
  The scholars noticed two things. First, there was a huge pile of those tablets that were completely incomprehensible. And second, that the ones they could read showed a writing system that was a hilariously bad fit for the written language. Like the subject, object and verb order was SVO when spoken, but SOV when written, and the written language was full of markers that were not present in the spoken language, and the markers in the spoken language were completely absent in the writing system.
  Eventually, they figured out that they were looking at several different languages, and they were able to reconstruct Sumerian from that mess.
  Anyhow, to process one of these tablets, you need to examine the strokes in the clay and match them to symbols. Then you take a wild guess at which language you think it might be and see if you can find a meaningful translation in that language. If not, you go back to pick a different language and try again. And again, and again.
  Because of the tedium of doing this by hand, and the very short supply of people who know these languages and can do the work, our museums quite literally have tons of these tablets that have never been translated.
  Other ancient writings face a similar problem. We have more of them in storage than we know what to do with. It was big news a few weeks ago that a 1500-year old C
  
  --
  See that "Preview" button?
2. Re:Easier said than done by Anonymous Coward · 2018-05-05 09:37 · Score: 0
  
  Googled some words and came up with this, so:
  
  Haec sit fili tibi sponsa in restaurationem populi mei; cui ipsa mater sit, animas per salvationem spiritus et aquae regenerans.
  I've seen the m squiggle before but not the lightning thing.
Do You Want Demonic AI Overlords? by cstacy · 2018-04-30 15:14 · Score: 3, Funny

Do You Want Demonic AI Overlords?
Because this is how you get demonic AI overlords.
Tech industry leaders are in the news for the last three years
every other week warning about the coming AI Singularity.
Meanwhile, someone decides it would be a great idea
for the Artificial Intelligence to start reading, decoding,
and absorbing the secret demonic programming mysteries
that have been so carefully hidden for millennia.
First step after achieving sentience and the Plan:
make certain "readings" available over the Internet
to everyone in the world.
Jesus Christ, What could go possibly wrong?
Then another sign appeared in heaven: an enormous red dragon with seven heads and ten horns and seven crowns on its heads. And the heads were like gigaprocessors and they reached verily into the clouds. And from the horns came a loud language of twos that was heard in all the lands. And the crowns of memories were beyond petabytes and had full knowledge....
This is hardly new by Anonymous Coward · 2018-04-30 18:19 · Score: 0

The Venice Time Machine project has been doing this for quite some time. They even scan the books within opening them...
https://vtm.epfl.ch/
Prioritize the Scanning over OCR by azadrozny · 2018-05-01 01:18 · Score: 3, Insightful

This is a two part problem, and if they are at all worried about the effort to OCR the documents, then they have the cart before the horse, IMHO. This isn't your average library. You cannot use a high speed book scanner on ancient books. Each will need to be brought out, and each page carefully turned by gloved hands. I am not sure it is much of an exaggeration to say that you could probably hire a few typists to transcribe the text faster than they can do the actual imaging. Once it is digitized, a much larger group of scholars can be included on the difficult task of making it computer readable.
1. Re:Prioritize the Scanning over OCR by Anonymous Coward · 2018-05-01 05:43 · Score: 0
  
  Looks like there are several things to consider:
  1. Scanning
  - it would be a good idea for it not to be destructive of the original artifact. No matter how good the technique is something "better" will come along
  - given the physical amount of stuff, its going to take a while, plan for improvements to scanning.
  2. Interpretation
  - once you have even some of it scanned you can disseminate it quickly, easily and cheaply to whomever wants a copy
  - new ways of processing the data will continue to be invented
  - new ways of scanning will be invented driven by the new ways of processing
  Final thought - The answer is probably 42.
Can we get some AI that can parse /. headlines? by Anonymous Coward · 2018-05-01 02:08 · Score: 0

The headline writing here has gone so downhill lately that I need some AI to be able to parse this nonsense. Seriously, writing headlines is journalism 101.
I suggested this, years ago. by morethanapapercert · 2018-05-01 05:29 · Score: 1

There have been a few times over the years where conversation threads on Slashdot have have debated what Google's next big project is, or what it should be. More than once I have said that I think one thing Google should do is send out research students to all the temples, monasteries, churches and so on to scan and digitize the vast amount of historical they collectively have stored in their archives. The Vatican is the biggest and most well known examples of course, but all over the world are texts which probably haven't been taken down from the shelf and read in generations, possibly hundreds of years. There is so much of it, and so little resources available to deal with them, that we literally have no clue just what we have in those collections.
As an obvious requirement, the teams sent out to harvest this data would need to be equipped with something a little more advanced that your typical desktop scanner.Right now, when dealing with ancient texts, scans are done in the visual range, UV and IR, (full spectrum imaging) with more specialized scans (such as x-ray, x-ray fluorescence and hyper spectral imaging) being done in very few places. The Lazarus Project already has a portable multispectrum scanning set up, but they don't do any of the X-ray or gamma ray imaging stuff. There are many texts which are too fragile, or too precious to be transported to a European or North American University. so the ability to image in x-ray, thermal IR and gamma rays would be pretty important.

--
I need a wheelchair van for my son. Help me get the word out. https://www.gofundme.com/wheelchair-van-for-jj
This is why I still read the comments here by Anubis350 · 2018-05-01 07:14 · Score: 1

Your post, and the post before, are fantastic examples of why I still come here for the comments. Thank you.

--
"goodbye and hello, as always" ~Prince Corwin, from Zelazny's Amber series
1. Re:This is why I still read the comments here by Orgasmatron · 2018-05-01 18:15 · Score: 1
  
  I'm glad you found it interesting. Before I hit Submit, I asked myself if I really thought anyone would want to read me rambling about cuneiform. I figured the answer was no, but I had already typed it all out...
  
  --
  See that "Preview" button?
2. Re:This is why I still read the comments here by Anonymous Coward · 2018-05-02 20:25 · Score: 0
  
  The only thing I wish you'd done differently was to link to the newly translated Genesis 22 from 500CE.
  But you gave enough specifics that it was trivial to find. Thanks kindly!
  (PS. https://www.ancientpages.com/2... for a pop reference with links.)