Slashdot Mirror


New Audacious Research Project, In Codice Ratio, Bets on AI and OCR To Make Sense of Handwritten Texts in Vatican's Secret Archives (theatlantic.com)

A new project untangles the handwritten texts in one of the world's largest historical collections. From a report: The Vatican Secret Archives is one of the grandest historical collections in the world. It's also one of the most useless. The grandeur is obvious. Located within the Vatican's walls, next door to the Apostolic Library and just north of the Sistine Chapel, the VSA houses 53 linear miles of shelving dating back more than 12 centuries. That said, the VSA isn't much use to modern scholars, because it's so inaccessible. Of those 53 miles, just a few millimeters' worth of pages have been scanned and made available online. Even fewer pages have been transcribed into computer text and made searchable. If you want to peruse anything else, you have to apply for special access, schlep all the way to Rome, and go through every page by hand.

But a new project could change all that. Known as In Codice Ratio, it uses a combination of artificial intelligence and optical-character-recognition (OCR) software to scour these neglected texts and make their transcripts available for the very first time. If successful, the technology could also open up untold numbers of other documents at historical archives around the world.

5 of 111 comments (clear)

  1. Huge by 110010001000 · · Score: 1, Interesting

    This is ground breaking. No one has ever used NN to decipher handwritten text before. I know I didn't back in 2005. Truly amazing!

  2. Re:Artificial Intelligence by ShanghaiBill · · Score: 3, Interesting

    I think of the lost knowledge that was communicated by the Neanderthals in their survived paintings and carvings.

    Neanderthals coexisted and interbred with H Sapiens, so it is likely they also talked to each other. So their knowledge wasn't lost, but passed on to their mongrel children.

  3. Palimpsests? by Humbubba · · Score: 3, Interesting

    I'm excited. I hope this "In Codice Ratio" technique will eventually be able to discover and read overwritten text. There's no better place to look for such things than the Vatican's Secret Archives. Something as stunning as the Archimedes Palimpsest, something that could change history as we know it might just be sitting on a shelf there, waiting to be found.

  4. Easier said than done by azcoyote · · Score: 4, Interesting

    This sounds like a great idea, but it's likely to be extraordinarily complicated. Not only does handwriting differ from age to age, culture to culture, and place to place (just try reading 20th century German Sütterlin), but many medieval manuscripts utilize complex systems of abbreviations called sigla. Interpreting these can be very complicated because they are heavily context-dependent. One symbol can mean several different things. For example, a cross through a p can mean per, prae, or pro. A line over some letters can signify anything being cut out in-between. Just try figuring out what this inscription says: here.

    Reading such abbreviations was probably expected to be relatively simple for the human brain to decipher both because the human actually interprets the text while deciphering symbols and because the original audience would have a better sense of how a particular community tended to use abbreviations.

    The task is not impossible for a computer, though. In most cases there are a limited number of words that could be signified by abbreviations, and it is possible to determine which word is most likely intended according to immediate context. However, that would require the machine to have a grasp of the Latin grammar, and even then not everything is going to follow perfect rules. There is so much potential interpretation involved. The AI component here does help with this inasmuch as it uses statistical data to optimize recognition, but it's still likely to run into many difficulties.

    The main innovation in TFA, as I see it, is that it responds to one of the major problems of reading old Carolingian minuscule. The letters are bunched together and there are times when you cannot be sure whether you are looking at two i's or a u, for example. The two can look exactly the same, not even just similar. The software in question attempts to handle this by recognizing individual penstrokes. Although I am not sure that this is 100% better than the older approach mentioned--recognizing whole words at a time--it does show significant promise because of its combination with AI. Perhaps some day it will be able to note, for example, that a certain author always strokes the i in a certain way. However, I'm sure there's going to be plenty of hurdles before getting to that point.

    --
    Incipiamus, fratres, servire Domino Deo, quia hucusque vix vel parum in nullo profecimus.
  5. Re:I can't imagine them just dumping them online by azcoyote · · Score: 3, Interesting

    Uh, no. Not only are you misinformed about the hell thing, but the Church has actively supported making the documents available to wider audiences. There's no reason to be scared of what is said because the validity of the Church is not based on some kind of myth of absolute human perfection. It's funny that people have to make up silly stories about popes when actual history is scandalous enough, and yet it does not undermine the Church one bit. One of my favorites is Pope Pius II, who wrote a raunchy play about priests picking prostitutes before he became pope. But that doesn't undermine the Church. We don't need the pretense that it is comprised of perfect human beings, because its authority is not grounded on human perfection but rather divine election. Even the claim that the pope can teach infallibly does not mean that everything he says is infallible, nor that he is a particularly excellent human being.

    Perhaps the thing people are more afraid of seeing is how much documentary evidence actually speaks in favor of the Church. Many people will easily look past anything that doesn't complement their Dan Brown view of history.

    --
    Incipiamus, fratres, servire Domino Deo, quia hucusque vix vel parum in nullo profecimus.