Slashdot Mirror


Where Old, Unreadable Documents Go to Be Understood (atlasobscura.com)

From a report: On any given day, from her home on the Isle of Man, Linda Watson might be reading a handwritten letter from one Confederate soldier to another, or a list of convicts transported to Australia. Or perhaps she is reading a will, a brief from a long-forgotten legal case, an original Jane Austen manuscript. Whatever is in them, these documents made their way to her because they have one thing in common: They're close to impossible to read. Watson's company, Transcription Services, has a rare specialty -- transcribing historical documents that stump average readers. Once, while talking to a client, she found the perfect way to sum up her skills.

[...] Since she first started specializing in old documents, Watson has expanded beyond things written in English. She now has a stable of collaborators who can tackle manuscripts in Latin, German, Spanish, and more. She can only remember two instances that left her and her colleagues stumped. One was a Tibetan manuscript, and she couldn't find anyone who knew the alphabet. The other was in such bad shape that she had to admit defeat. In the business of reading old documents, Watson has few competitors. There is one transcription company on the other side of the world, in Australia, that offers a similar service. Libraries and archives, when they have a giant batch of handwritten documents to deal with, might recruit volunteers.

44 comments

  1. AI FTW? by TexasDiaz · · Score: 0

    I assume this is on /. because one day AI might be able to do what she does better? Or will we have been annihilated by our AI overlords before that time?

    1. Re:AI FTW? by Stephan+Schulz · · Score: 5, Insightful

      I would assume it's on /. because it's interesting "stuff that matters"....

      --

      Stephan

    2. Re: AI FTW? by Anonymous Coward · · Score: 0

      If thereâ(TM)s some document that nobody has bothered to read or reference for hundreds or thousands of years, can it be said to really matter? Like I give a fuck about some shopping list for a dude two thousand years ago

    3. Re: AI FTW? by viperidaenz · · Score: 1

      Yeah, and the answer to "Who's a convict in Australia?" is "All of them"

    4. Re:AI FTW? by gman003 · · Score: 3, Insightful

      I have noticed a lot of tech/computer nerds have a significant interest in language nerdery. I've seen /. threads devolve into arguments over correct Latin grammar. This certainly piques the interest of people who have a bit of language nerd in them, because it's as much about knowledge of old writing systems and abbreviations as it is ability to look at squiggly lines and pattern-match.

    5. Re: AI FTW? by Anonymous Coward · · Score: 0

      You have successfully trolled me. I was annoyed enough to open a tab and find a link. Yes, documents not referenced for 1000s of years matter. ;-)
      From a sweater wearing librarian.

      https://arstechnica.com/science/2017/08/ancient-tablet-reveals-babylonians-discovered-trigonometry/

    6. Re:AI FTW? by No+Longer+an+AC · · Score: 3, Interesting

      My first thought on seeing the headline was about using technology to read ancient manuscripts which may be too fragile to open or may have even been written on recycled even older manuscripts. They use x-rays and computer imaging to read that which cannot be read by the human eye.

      I've seen a few stories about this over the years.

      Scientists read ancient sealed documents without opening them

      MIT and Georgia Tech develop technology to read books without opening them

      Scientists Read Ancient Hebrew Scroll Without Opening It

      Scanning an Ancient Biblical Text That Humans Fear to Open

      There's lots more out there and note those aren't just 4 different links to the same story.

      But this story is still interesting to me too. I'm sure that the people doing the work in the linked article might be tasked with transcribing or translating the images of pages they can't actually touch.

    7. Re: AI FTW? by bestweasel · · Score: 1

      The last paragraph of the article.

      "Some of the ones I find easier to read, the machine will probably be able to read sooner rather than later," says Watson. "But anything slightly difficult and ... Iâ(TM)ve seen some documents done by the software, and they just make you laugh. I think Iâ(TM)m safe in my job for a good while yet."

      I can't decide whether AI would be an improvement on the Slashdot editors or if it's already replaced them.

      "Once, while talking to a client, she found the perfect way to sum up her skills."

      What's that then? Not going to tell us? Have to go to the article to find out? ps. It's not worth it.

    8. Re: AI FTW? by arglebargle_xiv · · Score: 3, Funny

      Like I give a fuck about some shopping list for a dude two thousand years ago

      Some 2,000-year old documents can still be informative reading, e.g. the System 7 Unix source code.

    9. Re:AI FTW? by Anonymous Coward · · Score: 2, Insightful

      There's also a lot of practiced physical craft. My wife studied at West Dean College in England, a college dedicated to historical preservation and reconstruction. It includes clock making, tapestry weaving, ceramics, books, and metals conservation. The building is *littered* with amazing historical artifacts, with a wall of ancient weapons that made me drool on the carpet, whimpering "want to play!!!" with some of the lovingly restored specimens.

      Sadly, the craft is rapidly disappearing. There's a glut of lightly trained people in it, but a dearth of funding to keep people employed to get the 20 years of hands-on skills for the most delicate knowledge. And a lot of it hard-won, hard-learned skills from working with hundreds or thousands of less valuable documents over a career, and the senior people refuse to die off. There's going to be a massive purge as they hit forced retirement ages, because they haven't been able to train newer experts. There's been no funding to keep them on staff. If you value books as artistic objects in their own right, as I do, it's enough to make you weep.

    10. Re:AI FTW? by Anonymous Coward · · Score: 1

      The X-Ray stuff is cool. Multi-spectral photography is also great for stuff damaged by fire. But give me a high-quality digital photo and a Curves tool, and I'll show you things you normally wouldn't see.
      The big problem is not fragility. Parchment, the treated animal skins that make up the pages of almost all European mss between the sixth and thirteenth centuries (and most books from the fourteenth), is so durable that, when people could no longer make sense of the handwriting style, they took the pages and used them to bind paper books.
      The problem is that we have hundreds of thousands of books that haven't been properly catalogued, let alone digitized. And all over the world are boxes of fragments taken from book bindings. There are great discoveries to be made.

    11. Re:AI FTW? by mjwx · · Score: 1

      I have noticed a lot of tech/computer nerds have a significant interest in language nerdery. I've seen /. threads devolve into arguments over correct Latin grammar. This certainly piques the interest of people who have a bit of language nerd in them, because it's as much about knowledge of old writing systems and abbreviations as it is ability to look at squiggly lines and pattern-match.

      I wouldn't say so, but nerds do tend to be grammar nazis at least up until the point in their lives that they stop caring about what others think (usually mid 30's, about the same time you unashamedly start listening to the greatest hits of the 60's, 70's and 80's in your car). However we have nothing on the kind of pendants that come from old universities like Cambridge and Oxford. If you would like to see a truly vicious argument over a minor point of Latin grammar computer nerds with an interest in language are severely outclassed (and outnumbered).

      --
      Calling someone a "hater" only means you can not rationally rebut their argument.
    12. Re: AI FTW? by stealth_finger · · Score: 1

      But how do you know if it's a shopping list or an ancient cure for blue ball?

      --
      Wanna buy a shirt?
      https://www.redbubble.com/people/stealthfinger/shop?asc=u
    13. Re:AI FTW? by Grepdashv · · Score: 1

      Those swinging Oxford pendants.

  2. can't beat my doctor by KiloByte · · Score: 5, Funny

    I'd want to see this lady decipher the scribbling of a doctor I visited with foot pain recently. There's the Voynich Manuscript, then there's this.

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    1. Re:can't beat my doctor by Anonymous Coward · · Score: 0

      Rx legibility is due to a combination of things. Pharmacists are the key. It's like reading bad sheet music: if you can make out enough notes to identify the song, and you already know how to play it, you can work the out rest. Familiarity goes a long way and favors local docs. A really illegible out-of-state or foreign script may be left for the "master" pharmacist to fill.

      "Your prescription will be ready in a few hours [after we figure out how to read it]"

    2. Re:can't beat my doctor by NoNonAlphaCharsHere · · Score: 2

      Hey, I've got some stuff on 8" floppies that'd give her a run for her money.

    3. Re:can't beat my doctor by viperidaenz · · Score: 1

      If they can't understand it 100%, they simply contact the doctor who prescribed it.
      Getting a prescription wrong can kill someone.

    4. Re:can't beat my doctor by Applehu+Akbar · · Score: 2

      Which is exactly why we should switch from handwritten medical records to online data. Let it give the privacy paranoids fits if they want, but I would rather take the chance on al Qaeda reading the results of my colonoscopy over being killed by an error in a handwritten prescription. And I want that record to contain all medical data that has been accumulated about me.

      But for some reason the medical profession wants to keep their goddamn handwritten files.Perhaps they think it will stave off the threat of competition. This will have to be one more area in which we fall a generation behind the Asians.

    5. Re:can't beat my doctor by viperidaenz · · Score: 1

      I don't remember the last time I saw a handwritten prescription.
      Every doctor I've been to in recent memory prints it out and signs it.

      The rort is the fax fe pharmacies charge. Like it costs them more for you to have your doctor fax the prescription, so they can fill it at their leisure. Instead they charge more for faxing than to have you turn up at the pharmacy and wait for them to do it.

    6. Re:can't beat my doctor by Anonymous Coward · · Score: 0

      Yes, "simply" contact a foreign doctor, in a timezone 12 hours off. If they can't fill it safely, they won't. But the vast majority of scripts they see are similar enough to fulfill unambiguously and experience goes a long way.

    7. Re:can't beat my doctor by Anonymous Coward · · Score: 0

      8" when they are floppy! Imagine what they would be like when erect. Wait! how come you got more than one?

    8. Re:can't beat my doctor by morethanapapercert · · Score: 1
      Where I live, it's already almost exclusively online. The provincial government is doing a lot to encourage adoption of the new digital systems and the two local municipalities are making it a point of pride to be leaders in the use of IT to improve lives. From anecdotes from friends and local family, it seems there are only a small handful of doctors who aren't using the new system. Most of those also happen to be of the "crusty country doctor with his office in the converted first floor of a large house" archetype.

      The system works well enough that, several times now, I have had my prescription filled and waiting before I could cross town from the doctors office. (and it's a small town). My drug card is also automated as well, so I don't need to have any documentation to pick up my non-narcotic prescriptions. (for my narcotics, they swipe my drivers license and have me physically sign their copy of the receipt.

      I think the problem with adoption boils down to the usual culprits, time and money. The archetypical doctor in his own practice and one or two medical secretaries is generally too busy to afford the time it would take to change a core function. Plus, it requires certified training ($) for every staff member, upkeep on the internet connection ($) licensing for the software ($) and enough cooperation from the local hospitals, imaging centres, blood labs, pharmacies and so on to make it worth while. ($$$$) after all, what's the point in having an online Rx submission system if the pharmacist isn't also part of the network?

      --
      I need a wheelchair van for my son. Help me get the word out. https://www.gofundme.com/wheelchair-van-for-jj
    9. Re:can't beat my doctor by tlhIngan · · Score: 1

      I don't remember the last time I saw a handwritten prescription.
      Every doctor I've been to in recent memory prints it out and signs it.

      Depends on the doctor, it seems. Some have a prescription pad and use it (and honestly, I've been able to read what it says - it's a quarter-letter sheet, the doctor's name is preprinted, so there's a lot of space for the doctor to write in very big block letters the prescription. We're talking inch-high block printing (I haven't seen cursive in a long time).

      My other prescriptions were printed onto regular letter paper and signed by the doctor.

      I'm guessing the latter is more common now because the doctor has to enter our medication in a province-wide prescription database (called PharmaNet), so every drug ever prescribed to you is listed (to make sure you're not overprescribed, or to watch out for drug interactions). And since they're entering that information anyways, it's only a small stretch to have the software actually print out the prescription as it's entered in the database - single entry kills two birds with one stone and thus, efficient. I know this because even my dentist asks me if I'm still on the medication (they too need to know in case their drugs cause effects)

    10. Re:can't beat my doctor by Applehu+Akbar · · Score: 1

      I don't remember the last time I saw a handwritten prescription.

      We have a perfectly good standard for digital prescriptions, but most doctors here think that handwriting over fax is the latest tech they wish to use.

  3. Isn't Google's reCAPTCHA in this game? by Khopesh · · Score: 4, Interesting

    The reCAPTCHA service does two things. Verifying a user is a human by offering something that's really hard to automate is the one everybody knows about. The other is an effort to crowdsource understanding of images. This started with decoding the words in scanned books that OCR was having difficulty with.

    There's your competition (though it's admittedly restricted to modern texts, so historical context and historical characters are beyond its scope ... and reCAPTCHA has recently moved on to other forms of image recognition.)

    --
    Use my userscript to add story images to Slashdot. There's no going back.
  4. You call that "unreadable"? by Anonymous Coward · · Score: 1

    Try reading some "intellectual property" in the future!
    Hidden away in some corporate basement. Encrypted, with the key servers shut down long ago...
    Researchers complain that we already have the second dark ages[1], starting with the invention of "copyright"[2].

    THIS is unreadable.

    There was a time, where Germany started to be called "the land of poets and thinkers". It was the time when Germany didn't have such laws but the UK already had. Art thrived and flourished in Germany, and starved in the UK.[3]

    (Let's just hope our systems become powerful enough, the corporations don't live on forever, and they don't use one-time pads.)

    ___
    Note 1: Which is a term referring to the lack of information from that era.
    Note 2: Which should really be called "imaginary distribution monopoly privilege, for the purpose of leeching off of artists and fans without working for it in return".
    Note 3: And Germany still doesn't really have it. They have something that is often confused with copyright, but differs in all key points: It is not a distributor's privilege, but that of the actual creator of the work. It is implicit and not explicit, depending only on the threshold of originality, making (c) marks unnecessary. And it can never be signed away to anyone else. (You can license it, of course. But you can never lose control.) So all the things that copyright states it would do ... i.e. grant a privilege to the actual creators ... but deliberately doesn't.

  5. This is just asking by rsilvergun · · Score: 3, Funny

    to be devoured by some ancient evil or long dead civilization.

    --
    Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
    1. Re: This is just asking by bestweasel · · Score: 1

      "Nearly there, it appears to be a prayer, no, more like a summoning. Just can't recognize this group of syllables which appears all over the manuscript. Ah, now I see! It's a name, Cthulhu, mighty Cthulhu."

      All done. Now, what does it sound like?
      ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn ...".

  6. I'm guessing ... by fahrbot-bot · · Score: 1

    Where Old, Unreadable Documents Go to Be Understood

    ... Congress, for Bring Your Birth Certificate to Work Day ?

    --
    It must have been something you assimilated. . . .
  7. Gauntlet thrown by Anonymous Coward · · Score: 0

    “Some of the ones I find easier to read, the machine will probably be able to read sooner rather than later,” says Watson. “But anything slightly difficult and I’ve seen some documents done by the software, and they just make you laugh. I think I’m safe in my job for a good while yet.”

    Anyone up for this challenge?

  8. Yes, AI would work here by Anonymous Coward · · Score: 0

    This is actually a good application for machine learning. The problem might be finding a sufficient number of datasets for training. Often these people are taking advantage of other cues, like topic or surrounding words. Not sure we have that now for handwriting analysis.

    1. Re:Yes, AI would work here by Anonymous Coward · · Score: 0

      Yep. Context helps immensely for this kind of thing.
      You may not be able to tell what a funny squiggle in when looking at a word on its own, but the same squiggle might appear in a place name or easy word else where in the document.

  9. Wingdings by Presence+Eternal · · Score: 1

    For MS Works files, just use Libre Office. Heaven knows Microsoft Office is too incompetently made to handle them.

  10. Try interpreting government legislation... by seniorcoder · · Score: 1

    The US tax code documents would seriously challenge Ms. Linda Watson.
    Not because you cannot read the actual words, more because you cannot understand their meaning.
    Same with most government documents from just about any government.

  11. Some handwriting styles have become illegible by nerdonamotorcycle · · Score: 4, Informative

    There are two handwriting styles in German that are pretty much illegible to modern readers. Sütterlin was taught in the '30s and '40s to people who are alive today, but in 20 years, very few people will be able to read it. I can kinda-sorta read it because my grandmother (b. 1898) wrote letters in it, and my father's (b. 1930) handwriting was this weird combination of Sütterlin and American-style Palmer. Kurrent is even older and was taught to German school children up through the early 20th century. Kurrent's letter forms are however closer to Roman-style alphabet than Sütterlin.

  12. Machine learning project doing it right now by Anonymous Coward · · Score: 0

    http://transkribus.eu

    You can download the expert client right now and test if one of the models understand your document (even if it's a scan of a bad microfilm). If you have scanned in material and enough of it is transcribed, you can also train your own model (and they'll help you improve it). A web-based, simpler client is also under development for crowd-sourcing usage and you can try out a development build if you like.

  13. Even online data is not a panacea. by Anonymous Coward · · Score: 0

    My pharmacist GF recounts how many times they STILL have to call the Dr. office for clarification because they cannot be bothered to fill out the prescription computer forms correctly or provide accurate, non-vague info.

  14. Great! by Brett+Buck · · Score: 1

    Now all we need is someone to decipher Word documents we wrote 2 weeks ago but no longer render properly.

  15. This is why I quit writing in cursive by Solandri · · Score: 1

    (This was back in the 1970s and 1980s, when schoolkids were still being taught cursive.) After considerable thought, I concluded that written text was a WORM operation (write-once read-many).

    Cursive saved time at the write stage (easier to write), at the cost of additional time at the read stage (harder to read). Since the write operation happened only once while the read operation could happen multiple times, I decided saving time at the write stage was not usually not worth it - the cumulative extra time wasted at the read stage could easily exceed the time saved at the write stage. And I began writing exclusively in print letters in the 6th grade.

    1. Re:This is why I quit writing in cursive by Zappy · · Score: 1

      As a kid I never got the hang of it, writing cursive. I found it much easier, and thus faster, to write 'print' letters.

      Reading cursive, even when neatly written take great effort, sloppy written cursive could as well be Elvish and is completely unreadable.

      Some notes of letters I get I can't even recognise enough of it as letters or words to even understand what the subject is and then my wife pick's up, and she starts to make fun of me and starts reading it without effort.

    2. Re:This is why I quit writing in cursive by Anonymous Coward · · Score: 0

      It's a question of practice (both reading and writing). I went to school in the 80s-90s, uni in the mid 90s, so a lot of handwriting was involved. Back then I could write reasonably clearly and read most versions of cursive I came across, from cool old-school loopy/swirly stuff to the cramped output of the victims of various fad "new" styles of handwriting in the 70s and 80s. Today I struggle to read the clearest of handwriting and get a cramp in my thumb if I have to write (almost illegibly) for more than a few minutes.

  16. Entropy by Anonymous Coward · · Score: 0

    Long ago, I read a small book on Information & Entropy. It was my first (semi-) formal exposure to information theory and Shannon. The one interesting thing I recall seeing in the book was a table of various (major) languages along with their entropy. By that, the book meant redundancy. The author either said or implied that one of the reasons English was such a 'popular' language might be because it had, by far, the highest redundancy - iirc, German came second (but that I wouldn't swear to). There are a number of comments on /. questioning why this was posted. I see their point but on deciphering a written document is no different than decrypting, is it? Anyway. I wonder how many alternative interpretations her organization usually gives a client. I'd guess only their "best" one - since otherwise her claim that there've been only two that stumped her would be meaningless. And if that's true, then the whole enterprise is like getting a reading on your tea leaves. The idea that there can be only one works in Highlander but not in the real world. Ni wa, ts ntrstng.