Slashdot Mirror


New Audacious Research Project, In Codice Ratio, Bets on AI and OCR To Make Sense of Handwritten Texts in Vatican's Secret Archives (theatlantic.com)

A new project untangles the handwritten texts in one of the world's largest historical collections. From a report: The Vatican Secret Archives is one of the grandest historical collections in the world. It's also one of the most useless. The grandeur is obvious. Located within the Vatican's walls, next door to the Apostolic Library and just north of the Sistine Chapel, the VSA houses 53 linear miles of shelving dating back more than 12 centuries. That said, the VSA isn't much use to modern scholars, because it's so inaccessible. Of those 53 miles, just a few millimeters' worth of pages have been scanned and made available online. Even fewer pages have been transcribed into computer text and made searchable. If you want to peruse anything else, you have to apply for special access, schlep all the way to Rome, and go through every page by hand.

But a new project could change all that. Known as In Codice Ratio, it uses a combination of artificial intelligence and optical-character-recognition (OCR) software to scour these neglected texts and make their transcripts available for the very first time. If successful, the technology could also open up untold numbers of other documents at historical archives around the world.

6 of 111 comments (clear)

  1. Artificial Intelligence by UltimateDuster · · Score: 2, Insightful

    It doesn't exist yet. Neural networks and genetic algorithms are NOT SENTIENT or anywhere close. It's going to be a few decades before we have anything resembling true intelligence.

    Heck, even the HAL of HAL 9000 stands for Heuristic ALgorithmic computer, so Clarke was still making the argument that the computer of 2001 had only reached the brink of sentience and couldn't handle a moral dilemma.

    1. Re:Artificial Intelligence by gweihir · · Score: 3, Insightful

      Indeed. That is why the hypemongers have come up with the term "weak AI", i.e. the AI without the "I". Classically this was called automation and here it is the subspecies called "pattern recognition".

      Incidentally, I disagree about the "few decades" for "strong/true AI". At this time we have zero indications it is even physically possible and hence, if it is possible, then > 50 years is a realistic timeline. No, human beings do not count as "reference implementation" for a number of reasons. It starts with us not even knowing what life is and we certainly cannot create it. Next, we do not know how humans generate intelligence and claiming that it must obviously be a physical process is just ignorant quasi-religious physicalism, not science. Just because we see the interface does not imply what is going on behind it. And we do not. Add to that that natural intelligence seems to require consciousness, where we know even less what it is or where it comes from. Now, I am not arguing for some form of mysticism, I am arguing that science is entirely clueless about what these things are and hence any predictions that strong AI is possible are vastly premature. Also note that what human beings have comes with free will (or at least some form of independence if you think free will does not exist) and that would pretty much make it useless as basis of a technical machine. All the problems with slave labor would apply. And in addition, there is indication that a technical implementation, if possible, of a human's mind would actually not be faster than the real thing and would need just the same slow education to be able to do anything worthwhile.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:Artificial Intelligence by ShanghaiBill · · Score: 3, Insightful

      According to TFS, the library is inaccessible because it hasn't been scanned.
      So their solution is to use AI driven OCR, which requires ... scanning.

    3. Re:Artificial Intelligence by ShanghaiBill · · Score: 5, Insightful

      Think about it: it's only going to be as good as your training data.

      Obvious counter-example: Alpha-Go Zero, which used NO training data, and was far better at its assigned task than any of its programmers.

      There is no reason to believe that an AI's capabilities are inherently limited by training data. Nor is there any reason to believe that it can't surpass the abilities of its creators. That makes as little sense as saying children can never be more intelligent than their parents.

  2. Text measured in miles? by Ecuador · · Score: 4, Insightful

    OK, so now the text is measured in miles? What lunacy is this?
    I mean, it is the ONE article where Libraries of Congress would actually be a valid unit!

    --
    Violence is the last refuge of the incompetent. Polar Scope Align for iOS
  3. Prioritize the Scanning over OCR by azadrozny · · Score: 3, Insightful

    This is a two part problem, and if they are at all worried about the effort to OCR the documents, then they have the cart before the horse, IMHO. This isn't your average library. You cannot use a high speed book scanner on ancient books. Each will need to be brought out, and each page carefully turned by gloved hands. I am not sure it is much of an exaggeration to say that you could probably hire a few typists to transcribe the text faster than they can do the actual imaging. Once it is digitized, a much larger group of scholars can be included on the difficult task of making it computer readable.