New Audacious Research Project, In Codice Ratio, Bets on AI and OCR To Make Sense of Handwritten Texts in Vatican's Secret Archives (theatlantic.com)
A new project untangles the handwritten texts in one of the world's largest historical collections. From a report: The Vatican Secret Archives is one of the grandest historical collections in the world. It's also one of the most useless. The grandeur is obvious. Located within the Vatican's walls, next door to the Apostolic Library and just north of the Sistine Chapel, the VSA houses 53 linear miles of shelving dating back more than 12 centuries. That said, the VSA isn't much use to modern scholars, because it's so inaccessible. Of those 53 miles, just a few millimeters' worth of pages have been scanned and made available online. Even fewer pages have been transcribed into computer text and made searchable. If you want to peruse anything else, you have to apply for special access, schlep all the way to Rome, and go through every page by hand.
But a new project could change all that. Known as In Codice Ratio, it uses a combination of artificial intelligence and optical-character-recognition (OCR) software to scour these neglected texts and make their transcripts available for the very first time. If successful, the technology could also open up untold numbers of other documents at historical archives around the world.
But a new project could change all that. Known as In Codice Ratio, it uses a combination of artificial intelligence and optical-character-recognition (OCR) software to scour these neglected texts and make their transcripts available for the very first time. If successful, the technology could also open up untold numbers of other documents at historical archives around the world.
It doesn't exist yet. Neural networks and genetic algorithms are NOT SENTIENT or anywhere close. It's going to be a few decades before we have anything resembling true intelligence.
Heck, even the HAL of HAL 9000 stands for Heuristic ALgorithmic computer, so Clarke was still making the argument that the computer of 2001 had only reached the brink of sentience and couldn't handle a moral dilemma.
OK, so now the text is measured in miles? What lunacy is this?
I mean, it is the ONE article where Libraries of Congress would actually be a valid unit!
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
This is a two part problem, and if they are at all worried about the effort to OCR the documents, then they have the cart before the horse, IMHO. This isn't your average library. You cannot use a high speed book scanner on ancient books. Each will need to be brought out, and each page carefully turned by gloved hands. I am not sure it is much of an exaggeration to say that you could probably hire a few typists to transcribe the text faster than they can do the actual imaging. Once it is digitized, a much larger group of scholars can be included on the difficult task of making it computer readable.