War and Nookd — eBook Regex Gone Haywire
PerlJedi tips a story that highlights one of the downsides to ebooks. A blogger who recently read Tolstoy's War and Peace on his Nook stumbled upon some odd phases, such as: "It was as if a light had been Nookd in a carved and painted lantern..." After seeing the word 'Nookd' a few more times, he found a dead-tree version of the book and discovered that the word was supposed to be 'kindled.' Every instance of the word 'kindle' in the ebook had been replaced with 'Nook.'
"The Superior Formatting Publishing version isn’t a Barnes and Noble book, so this isn’t the work of a rogue Nook marketer from B&N. Rather, it’s likely that Superior Formatting Publishing ported its Kindle version of War and Peace over to the Nook — doing a search and replace to make sure that any Kindle references they’d inserted, such as in the advertising at the end of the book about their fine Kindle products, were simply changed to Nook. The unwitting hilarity of a publisher doing a 'find and replace' and accidentally changing the text of a canonical work of Western thought is alarming. Many versions of e-books are from similar outfits, that distribute public domain works formatted for Kindle or Nook at the lowest possible prices. The great democratizing factor of the ebook formats – that anyone can easily distribute – can also mean that readers can never be quite sure that they are viewing the texts as the author intended."
But I went back and searched every kindle and cranny to set every instance of the word back to kindle to fix it.
I'm only human.
My work here is dung.
Such an amazing set of tools such as diff and grep would probably amaze them.
"Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
"I accidentally Western Literature, is that bad?"
It's not just intentional malice you need to look out for but also just pure distilled stupidity.
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
'eBook Regex Gone Haywire'
This is a straight-forward substring replace, not a regular expression. A not-completely-stupid regex would at least have only converted \bKindle\b, although obviously even then human oversight would be necessary.
Spelling mistakes, grammatical errors, and stupid comments are intentional.
You could say it's downright medireview.
sic transit gloria mundi
So, this story is definitely an amusing anecdote, but I feel like TFA has the wrong takeaway. The fact is, while this specific issue is obviously e-book related, the overall problem of poor quality, low cost public domain publications is in no way specific to e-books. There have always been low budget publishing houses that print poorly edited, poorly translated versions of public domain works. Spend some time digging around used book sales, you'll find an endless supply of these, most notably from the 60's and 70's.
sed -i s/wand/wang/g Harry\ Potter*
Don't blame me, I voted for Kodos
Unless it is in Russian. Any translation runs the risk of not being "as the author intended".
Starships were meant to fly, Hands up and touch the sky - Nicky Minaj
They really shouldn't mess with the clbuttics.
:wq
Part of the problem is the grotesque need to put advertisement inside everything we do, because sweet Jebus help me if we can't find some way to squeeze another penny of profit off a dead author's moldering corpse. Sadly, this problem isn't going away any time soon. How about this, separate the "Work of Art" from the annoying bits. Literally have them be distinct and separate objects. Leave the art alone. Do not touch it. Keep your grubby mitts off my masterpiece you heathen. Dork with your part as much as you like... it is after all your part. This is about sloppy data management and publishers need to begin to understand the nature of data. That is, if they intend to sell books in an electronic format. All you publishers, please have a brief but productive conversation with a few software and IT folk about how you manage data integrity, and ensure your product doesn't A) Get stepped on by stupid stuff B) Get corrupted by lack of proper data safeguards.
The rest as they say, is business as usual... please proceed, nothing to see here.
Just more of the same clbuttic errors.
(Hint: "ass" was one of the 13 words.)
" can also mean that readers can never be quite sure that they are viewing the texts as the author intended."
As an owner of a publishing company I can assure you the authors intentions are almost never the highest priority. Having read thousands of unedited manuscripts, many by very well known modern authors, I can say with confidence that you don't want to know what the authors originally pooped out.
I once saw the same issue when a db dump was edited. A user 'bend' was replaced with 'ainsleyj' globally - hilarity ensued.
But soft, what light through yonder Linux breaks?
It is the east, and Juliet is the Oracle(TM).
Arise, fair Oracle(TM), and kill the envious moon,
Who is already sick and pale with grief
That thou, her maid, art far more fair than she
Has anybody ever been introduced to the wonderful world of the truly dreadful unauthorized variants of canonical texts that were being hacked out while the ink on those texts was barely dry?
.99 public domain cash-ins are largely shlock(Project Gutenburg isn't world-class critical editions; but they do at least tend to be produced by people who give a damn and aren't just grubbing for cash by releasing quick and dirty repackages); but the quality of the low end of the market for printed works has always been pretty dire. At least, these days, we don't generally see physical problems like crap ink, blunt, used type, or horrid paper stock also being inflicted on the readers in the cheap seats.
Actors and/or audience members cobbling their (often surprisingly good; but not good enough) memory of a new work of Shakespear into a cut-price unauthorized edition, some really trippy stuff in those version... Hack printers buying first editions and setting blunt type as fast and furious as they could, to get their knockoff on the street before the other guy did... Never mind the various editorial mistakes in subsequent prints, bowdlerizations, etc.
Of course, works that started as oral traditions or assembled-by-committee mashes of existing texts are far worse than even the worst horrors of post-gutenburg hackery. Oh, and let's not even talk about the dark history of situations where translation has been needed...
There's a whole industry, in academia, of 'critical editions' that are distinguished in no small part by the editor actually giving a damn about the sources drawn from, attempting to provide the most accurate reproduction of the original, essays and footnotes illuminating the process of choosing between manuscript A and manuscript B, and how to transliterate manuscript C's character names, and whatnot.
Sure,
Every novel should have an MD5 hash....
It's a dangerous world of low cost ebooks out here
Nah, some of the expensive ebooks are worse; I've seen a number of people complain about e-books of recent high-priced novels where they've clearly OCR-ed the print book rather than use the actual digital text it was created from, because it's full of uncorrected OCR errors or 'corrections' to the OCR errors which are even further from what the text should say.
Cheap, crummy ebook conversions with no editorial checking. This has been going on for years, and it will continue to be a problem for the foreseeable future.
A physical book is costly to produce. It's costly to stock and ship them as well. Given those costs, the additional cost of doing a little editing is insignificant. Ebooks, on the other hand, open up new depths of low cost publishing. It's one of those perverse, ironic results. You'd think that cutting down the reproduction and stocking costs of a book would free up money for other tasks, but in fact what happens is that editing, design and promotion become an opportunity for cutting what is now a more significant proportion of expenses.
As ebooks become the dominant form of book reading, the opportunity arises for marginal publishers to publish books with expenses cut to the bone. Eventually the role of publishers as mediators between the author and public to disappear, and authors will hire editors, story development consultants and designers themselves. Or perhaps literary agents will take the place of traditional publishers, becoming full service business management services for authors. In any case, expect that a greater proportion of "published" books to be poorly designed and edited.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
There is a Wikipedia article about this issue:
http://en.wikipedia.org/wiki/Scunthorpe_problem
"The problem was named after an incident in 1996 in which AOL's dirty-word filter prevented residents of the town of Scunthorpe, North Lincolnshire, England from creating accounts with AOL, because the town's name contains the substring cunt.[1] Years later, Google's filters apparently made the same mistake, preventing residents from searching for local businesses that included Scunthorpe in their names.[2]"
There is also a stub article about a specific instance of the replacement effect: http://en.wikipedia.org/wiki/Medireview
"Dead tree version"? Really? Is that kind of asshole-ish snark really justified? If you want to read an Amazon-brand Shakespeare-flavored Licensed Advertisement-Delivery System (tm), go right ahead, but there's no reason to poke fun at actual books, which are significantly less likely to have these kinds of glaring mistakes in them.
I don't respond to AC's.
"Superior Formatting Publishing"'s web site is broken. It consists mostly of "Whoops, looks like there was a problem get the book data from Amazon. Please try again in a moment" and "Amazon API error". Plus a Kindle ad. And "All of our e-books are formatted specifically for the Kindle by an expert in formatting online content using only raw code."
How do we know what the author's intentions are, especially for works whose author has been dead for at least 70 years?
If the author's intentions are not obvious from the text, then you're no better off reading it in the original Russian.
"Convictions are more dangerous enemies of truth than lies."
You do realize that you can actually post the word "nigga" on slashdot, right?
apparently AKabral is one of many avatars of Ironyman.
oh, and the word being referred to is nigger
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff