War and Nookd — eBook Regex Gone Haywire
PerlJedi tips a story that highlights one of the downsides to ebooks. A blogger who recently read Tolstoy's War and Peace on his Nook stumbled upon some odd phases, such as: "It was as if a light had been Nookd in a carved and painted lantern..." After seeing the word 'Nookd' a few more times, he found a dead-tree version of the book and discovered that the word was supposed to be 'kindled.' Every instance of the word 'kindle' in the ebook had been replaced with 'Nook.'
"The Superior Formatting Publishing version isn’t a Barnes and Noble book, so this isn’t the work of a rogue Nook marketer from B&N. Rather, it’s likely that Superior Formatting Publishing ported its Kindle version of War and Peace over to the Nook — doing a search and replace to make sure that any Kindle references they’d inserted, such as in the advertising at the end of the book about their fine Kindle products, were simply changed to Nook. The unwitting hilarity of a publisher doing a 'find and replace' and accidentally changing the text of a canonical work of Western thought is alarming. Many versions of e-books are from similar outfits, that distribute public domain works formatted for Kindle or Nook at the lowest possible prices. The great democratizing factor of the ebook formats – that anyone can easily distribute – can also mean that readers can never be quite sure that they are viewing the texts as the author intended."
But I went back and searched every kindle and cranny to set every instance of the word back to kindle to fix it.
I'm only human.
My work here is dung.
sed -i s/wand/wang/g Harry\ Potter*
Don't blame me, I voted for Kodos
So, this story is definitely an amusing anecdote, but I feel like TFA has the wrong takeaway. The fact is, while this specific issue is obviously e-book related, the overall problem of poor quality, low cost public domain publications is in no way specific to e-books. There have always been low budget publishing houses that print poorly edited, poorly translated versions of public domain works. Spend some time digging around used book sales, you'll find an endless supply of these, most notably from the 60's and 70's.
No, the sad part is full price books from Amazon with incoherent pagination, horribly over recompressed jpegs and a verdant sea of spelling errors. I'd give Project Gutenberg a pass for those sorts of things except that the majority of PG books I've read are actually pretty well done.
When I'm paying top dollar for a product, I'd like some attempt at quality control....
Faster! Faster! Faster would be better!
But soft, what light through yonder Linux breaks?
It is the east, and Juliet is the Oracle(TM).
Arise, fair Oracle(TM), and kill the envious moon,
Who is already sick and pale with grief
That thou, her maid, art far more fair than she
There is a Wikipedia article about this issue:
http://en.wikipedia.org/wiki/Scunthorpe_problem
"The problem was named after an incident in 1996 in which AOL's dirty-word filter prevented residents of the town of Scunthorpe, North Lincolnshire, England from creating accounts with AOL, because the town's name contains the substring cunt.[1] Years later, Google's filters apparently made the same mistake, preventing residents from searching for local businesses that included Scunthorpe in their names.[2]"
There is also a stub article about a specific instance of the replacement effect: http://en.wikipedia.org/wiki/Medireview