War and Nookd — eBook Regex Gone Haywire
PerlJedi tips a story that highlights one of the downsides to ebooks. A blogger who recently read Tolstoy's War and Peace on his Nook stumbled upon some odd phases, such as: "It was as if a light had been Nookd in a carved and painted lantern..." After seeing the word 'Nookd' a few more times, he found a dead-tree version of the book and discovered that the word was supposed to be 'kindled.' Every instance of the word 'kindle' in the ebook had been replaced with 'Nook.'
"The Superior Formatting Publishing version isn’t a Barnes and Noble book, so this isn’t the work of a rogue Nook marketer from B&N. Rather, it’s likely that Superior Formatting Publishing ported its Kindle version of War and Peace over to the Nook — doing a search and replace to make sure that any Kindle references they’d inserted, such as in the advertising at the end of the book about their fine Kindle products, were simply changed to Nook. The unwitting hilarity of a publisher doing a 'find and replace' and accidentally changing the text of a canonical work of Western thought is alarming. Many versions of e-books are from similar outfits, that distribute public domain works formatted for Kindle or Nook at the lowest possible prices. The great democratizing factor of the ebook formats – that anyone can easily distribute – can also mean that readers can never be quite sure that they are viewing the texts as the author intended."
But I went back and searched every kindle and cranny to set every instance of the word back to kindle to fix it.
I'm only human.
My work here is dung.
Such an amazing set of tools such as diff and grep would probably amaze them.
"Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
"I accidentally Western Literature, is that bad?"
It's not just intentional malice you need to look out for but also just pure distilled stupidity.
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
'eBook Regex Gone Haywire'
This is a straight-forward substring replace, not a regular expression. A not-completely-stupid regex would at least have only converted \bKindle\b, although obviously even then human oversight would be necessary.
Spelling mistakes, grammatical errors, and stupid comments are intentional.
You could say it's downright medireview.
sic transit gloria mundi
sed -i s/wand/wang/g Harry\ Potter*
Don't blame me, I voted for Kodos
Unless it is in Russian. Any translation runs the risk of not being "as the author intended".
Starships were meant to fly, Hands up and touch the sky - Nicky Minaj
So, this story is definitely an amusing anecdote, but I feel like TFA has the wrong takeaway. The fact is, while this specific issue is obviously e-book related, the overall problem of poor quality, low cost public domain publications is in no way specific to e-books. There have always been low budget publishing houses that print poorly edited, poorly translated versions of public domain works. Spend some time digging around used book sales, you'll find an endless supply of these, most notably from the 60's and 70's.
No, the sad part is full price books from Amazon with incoherent pagination, horribly over recompressed jpegs and a verdant sea of spelling errors. I'd give Project Gutenberg a pass for those sorts of things except that the majority of PG books I've read are actually pretty well done.
When I'm paying top dollar for a product, I'd like some attempt at quality control....
Faster! Faster! Faster would be better!
They really shouldn't mess with the clbuttics.
:wq
Just more of the same clbuttic errors.
(Hint: "ass" was one of the 13 words.)
But soft, what light through yonder Linux breaks?
It is the east, and Juliet is the Oracle(TM).
Arise, fair Oracle(TM), and kill the envious moon,
Who is already sick and pale with grief
That thou, her maid, art far more fair than she
Has anybody ever been introduced to the wonderful world of the truly dreadful unauthorized variants of canonical texts that were being hacked out while the ink on those texts was barely dry?
.99 public domain cash-ins are largely shlock(Project Gutenburg isn't world-class critical editions; but they do at least tend to be produced by people who give a damn and aren't just grubbing for cash by releasing quick and dirty repackages); but the quality of the low end of the market for printed works has always been pretty dire. At least, these days, we don't generally see physical problems like crap ink, blunt, used type, or horrid paper stock also being inflicted on the readers in the cheap seats.
Actors and/or audience members cobbling their (often surprisingly good; but not good enough) memory of a new work of Shakespear into a cut-price unauthorized edition, some really trippy stuff in those version... Hack printers buying first editions and setting blunt type as fast and furious as they could, to get their knockoff on the street before the other guy did... Never mind the various editorial mistakes in subsequent prints, bowdlerizations, etc.
Of course, works that started as oral traditions or assembled-by-committee mashes of existing texts are far worse than even the worst horrors of post-gutenburg hackery. Oh, and let's not even talk about the dark history of situations where translation has been needed...
There's a whole industry, in academia, of 'critical editions' that are distinguished in no small part by the editor actually giving a damn about the sources drawn from, attempting to provide the most accurate reproduction of the original, essays and footnotes illuminating the process of choosing between manuscript A and manuscript B, and how to transliterate manuscript C's character names, and whatnot.
Sure,
I find when paying top dollar is when you are least likely to get quality control. Look at really expensive software as a great example, I have never seen any costing 6 figures or more that was not a huge pain and did not fail to do its job on a regular basis.
Every novel should have an MD5 hash....
It's a dangerous world of low cost ebooks out here
Nah, some of the expensive ebooks are worse; I've seen a number of people complain about e-books of recent high-priced novels where they've clearly OCR-ed the print book rather than use the actual digital text it was created from, because it's full of uncorrected OCR errors or 'corrections' to the OCR errors which are even further from what the text should say.
This is the problem exactly. I can deal with odd formatting from a PG book (though as you say, most are fine); what pisses me off is recent, full price ebooks where there has obviously not been the slightest attempt at editing or typesetting. One I got recently had a consistent problem where quoted text changed font & size after the first paragraph, which is pretty jarring. A full price book on my Nook should be a better experience than PG or scanned & OCR'd pdb were on my old Palm Pilot but sometimes these types of glitches just take you out of the experience & actually seem worse.
The Oatmeal's book "5 Very Good Reasons to Punch a Dolphin in the Mouth" I luckily got out of the library (through Overdrive) - the images are so small as to be unreadable, both on the PC & ipad. If you look at the Play store, there are lots of good reviews, but they're all from Goodreads & such for the paper version. I'm sure it's funny, if you can read it; if I'd paid money for this pile of bits I'd be pissed. Does the publisher not own an ipad or a Kindle Fire? Did they not load it on one single device & say to themselves, "hmm, this really sucks, let's fix it"?
There is a Wikipedia article about this issue:
http://en.wikipedia.org/wiki/Scunthorpe_problem
"The problem was named after an incident in 1996 in which AOL's dirty-word filter prevented residents of the town of Scunthorpe, North Lincolnshire, England from creating accounts with AOL, because the town's name contains the substring cunt.[1] Years later, Google's filters apparently made the same mistake, preventing residents from searching for local businesses that included Scunthorpe in their names.[2]"
There is also a stub article about a specific instance of the replacement effect: http://en.wikipedia.org/wiki/Medireview
You do realize that you can actually post the word "nigga" on slashdot, right?
apparently AKabral is one of many avatars of Ironyman.
oh, and the word being referred to is nigger
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff