Where Old, Unreadable Documents Go to Be Understood (atlasobscura.com)
From a report: On any given day, from her home on the Isle of Man, Linda Watson might be reading a handwritten letter from one Confederate soldier to another, or a list of convicts transported to Australia. Or perhaps she is reading a will, a brief from a long-forgotten legal case, an original Jane Austen manuscript. Whatever is in them, these documents made their way to her because they have one thing in common: They're close to impossible to read. Watson's company, Transcription Services, has a rare specialty -- transcribing historical documents that stump average readers. Once, while talking to a client, she found the perfect way to sum up her skills.
[...] Since she first started specializing in old documents, Watson has expanded beyond things written in English. She now has a stable of collaborators who can tackle manuscripts in Latin, German, Spanish, and more. She can only remember two instances that left her and her colleagues stumped. One was a Tibetan manuscript, and she couldn't find anyone who knew the alphabet. The other was in such bad shape that she had to admit defeat. In the business of reading old documents, Watson has few competitors. There is one transcription company on the other side of the world, in Australia, that offers a similar service. Libraries and archives, when they have a giant batch of handwritten documents to deal with, might recruit volunteers.
[...] Since she first started specializing in old documents, Watson has expanded beyond things written in English. She now has a stable of collaborators who can tackle manuscripts in Latin, German, Spanish, and more. She can only remember two instances that left her and her colleagues stumped. One was a Tibetan manuscript, and she couldn't find anyone who knew the alphabet. The other was in such bad shape that she had to admit defeat. In the business of reading old documents, Watson has few competitors. There is one transcription company on the other side of the world, in Australia, that offers a similar service. Libraries and archives, when they have a giant batch of handwritten documents to deal with, might recruit volunteers.
I assume this is on /. because one day AI might be able to do what she does better? Or will we have been annihilated by our AI overlords before that time?
I'd want to see this lady decipher the scribbling of a doctor I visited with foot pain recently. There's the Voynich Manuscript, then there's this.
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
The reCAPTCHA service does two things. Verifying a user is a human by offering something that's really hard to automate is the one everybody knows about. The other is an effort to crowdsource understanding of images. This started with decoding the words in scanned books that OCR was having difficulty with.
There's your competition (though it's admittedly restricted to modern texts, so historical context and historical characters are beyond its scope ... and reCAPTCHA has recently moved on to other forms of image recognition.)
Use my userscript to add story images to Slashdot. There's no going back.
Try reading some "intellectual property" in the future!
Hidden away in some corporate basement. Encrypted, with the key servers shut down long ago...
Researchers complain that we already have the second dark ages[1], starting with the invention of "copyright"[2].
THIS is unreadable.
There was a time, where Germany started to be called "the land of poets and thinkers". It was the time when Germany didn't have such laws but the UK already had. Art thrived and flourished in Germany, and starved in the UK.[3]
(Let's just hope our systems become powerful enough, the corporations don't live on forever, and they don't use one-time pads.)
___ ... i.e. grant a privilege to the actual creators ... but deliberately doesn't.
Note 1: Which is a term referring to the lack of information from that era.
Note 2: Which should really be called "imaginary distribution monopoly privilege, for the purpose of leeching off of artists and fans without working for it in return".
Note 3: And Germany still doesn't really have it. They have something that is often confused with copyright, but differs in all key points: It is not a distributor's privilege, but that of the actual creator of the work. It is implicit and not explicit, depending only on the threshold of originality, making (c) marks unnecessary. And it can never be signed away to anyone else. (You can license it, of course. But you can never lose control.) So all the things that copyright states it would do
to be devoured by some ancient evil or long dead civilization.
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
Where Old, Unreadable Documents Go to Be Understood
It must have been something you assimilated. . . .
“Some of the ones I find easier to read, the machine will probably be able to read sooner rather than later,” says Watson. “But anything slightly difficult and I’ve seen some documents done by the software, and they just make you laugh. I think I’m safe in my job for a good while yet.”
Anyone up for this challenge?
This is actually a good application for machine learning. The problem might be finding a sufficient number of datasets for training. Often these people are taking advantage of other cues, like topic or surrounding words. Not sure we have that now for handwriting analysis.
For MS Works files, just use Libre Office. Heaven knows Microsoft Office is too incompetently made to handle them.
The US tax code documents would seriously challenge Ms. Linda Watson.
Not because you cannot read the actual words, more because you cannot understand their meaning.
Same with most government documents from just about any government.
There are two handwriting styles in German that are pretty much illegible to modern readers. Sütterlin was taught in the '30s and '40s to people who are alive today, but in 20 years, very few people will be able to read it. I can kinda-sorta read it because my grandmother (b. 1898) wrote letters in it, and my father's (b. 1930) handwriting was this weird combination of Sütterlin and American-style Palmer. Kurrent is even older and was taught to German school children up through the early 20th century. Kurrent's letter forms are however closer to Roman-style alphabet than Sütterlin.
http://transkribus.eu
You can download the expert client right now and test if one of the models understand your document (even if it's a scan of a bad microfilm). If you have scanned in material and enough of it is transcribed, you can also train your own model (and they'll help you improve it). A web-based, simpler client is also under development for crowd-sourcing usage and you can try out a development build if you like.
My pharmacist GF recounts how many times they STILL have to call the Dr. office for clarification because they cannot be bothered to fill out the prescription computer forms correctly or provide accurate, non-vague info.
Now all we need is someone to decipher Word documents we wrote 2 weeks ago but no longer render properly.
(This was back in the 1970s and 1980s, when schoolkids were still being taught cursive.) After considerable thought, I concluded that written text was a WORM operation (write-once read-many).
Cursive saved time at the write stage (easier to write), at the cost of additional time at the read stage (harder to read). Since the write operation happened only once while the read operation could happen multiple times, I decided saving time at the write stage was not usually not worth it - the cumulative extra time wasted at the read stage could easily exceed the time saved at the write stage. And I began writing exclusively in print letters in the 6th grade.
Long ago, I read a small book on Information & Entropy. It was my first (semi-) formal exposure to information theory and Shannon. The one interesting thing I recall seeing in the book was a table of various (major) languages along with their entropy. By that, the book meant redundancy. The author either said or implied that one of the reasons English was such a 'popular' language might be because it had, by far, the highest redundancy - iirc, German came second (but that I wouldn't swear to). There are a number of comments on /. questioning why this was posted. I see their point but on deciphering a written document is no different than decrypting, is it? Anyway. I wonder how many alternative interpretations her organization usually gives a client. I'd guess only their "best" one - since otherwise her claim that there've been only two that stumped her would be meaningless. And if that's true, then the whole enterprise is like getting a reading on your tea leaves. The idea that there can be only one works in Highlander but not in the real world. Ni wa, ts ntrstng.