E-Book Copy Protection, For What It's Worth
AudioBooksForFree.Com writes "WHSmith have challenged AudioBooksForFree.Com to breaks Microsoft Reader e-book protection. It just took 30 minutes." No, they didn't break the encryption; instead, this is just an application of the idea that it's very hard to make something which can be displayed but not copied.
Unfortunately this method of "decryption" requires MS reader to be installed on your system. Which isn't possible when you're running Linux.
.lit file on Linux.
It's nice as "proof of concept" (although it's by no means new - I have seen a program that gets the contents of MS Reader files more intelligently, by automatically copying-and-pasting every page), but it won't help you to read a
I am a genius; therefore, you suck.
I used to work for a typesetting company on my industrial placement (internship in US terms), and we also produced SGML documents for another company who created audio versions of the files we supplied.
The previous placement student came in handy when the audio book company lost the master password to a whole archive of audio books, he cracked the files and unlocked the affected files. The other company was run by friends of the management of our company, so there weren't any 'confidentiality agreements' or anything... but I dread to think how the current laws (which weren't implemented then) would have affected us there.
Are you local? There's nothing for you here!
Heh. I also bought that book. But you went through way too much work. The book allowed itself to be printed... heh. So all I did was install a print to file driver, and printed the whole thing to PostScript. Perfect copy. And its simple to go from PostScript PDF HTML Whatever.
For another answer to DRM garbage, Baen, publishers of sci-fi and fantasy books have the 100% correct idea about eBook copy restriction and encryption:
Don't do it!
They just released the latest book in their Honor Harington series on Tuesday, and it included a CD with various formats of eBooks of every book in that series and other books that they publish. And best of all, no stupid restrictions. Here's their release about the CD.
I applaud their move, and recommend purchasing this book and others from them (Note: I'm a big fan of the author, David Weber, but not involved with Baen in any way, etc...).
Baen Books, who are known on Slashdot for their Free Library, and who also offer their WebScriptions, all of which in several formats including e-books, do not to use encryption in the e-books they publish. Roughly, their argument is that it's costly, useless and unfair.
From the 6th Prime Palaver: The Library's track record shows clearly that the traditional "encryption/enforcement" policy which has been followed thus far by most of the publishing industry is just plain stupid, as well as unconscionable from the viewpoint of infringing on personal liberties. (...) the fundamental obstacle to the success of electronic publishing [is] the industry's obsession with encryption. I suggest you read the whole document, it's quite interesting.
On Mac OS X it would have been even easier, since it included print-to-PDF in the standard printing library. There's no step 3 :P
there was a post to abeb 6/24/2002 entitled "Convert LIT to RTF: ACHIEVED"
- - - - -
Yes, I know, it's supposed to be impossible. Well, it takes some work, but
it's LESS work than scanning from paper, and you can get comparable if not better
results.
I am proud to report that I have successfully converted a Microsoft Reader LIT
format e-book into an HTML book. The book was "Uhuru's Song", by Janet Kagan,
and I will post it when I finish editing.
No, I didn't crack the LIT format, or the encryption.
This method was designed to work with *encrypted* e-books; if it's non-encrypted,
a scripting method to copy and paste pages via the clipboard could work.
(Of course, if it's non-encrypted, it's probably easier to just locate the source
material that the LIT was generated from.)
A description of the process follows.
Short description:
Screencap each page of the LIT file into image files. Enhance and enlarge
the image files to improve results. Use OCR software to recognize the text
in the image file. Proof and edit.
Software used:
Windows 2000
Microsoft Reader 2.0 for PC
IrfanView version 3.70
Windows Script Components version 5.6
CuneiForm99
Capturing.wsf script (attached)
Detailed description:
Acquire your LIT book and all the software listed. (You can substitute a
different OCR package if you want, or a different screencap package if you hack
the script.)
Set your display settings to the highest resolution you can, BUT ONLY 256
COLORS. Keeping the color count low minimizes the nasty effects of Cleartype.
Open the book in Microsoft Reader, displaying page 1.
Start IrfanView. Do Options/Capture, selecting these options: Capture area:
Foreground window - Client area Capture method: Hot key F11 (to set the hotkey,
click inside the box and then press function key 11) Capture option: do not
Include mouse cursor (leave unchecked) Saving method: Save captured image as
file Destination directory: (type your desired directory) Save as: (Any
LOSSLESS type you want. I suggest PNG because it's generally smallest. DO NOT
USE JPG.) Click Start.
Start the script. Answer its questions (folder, starting & ending page
number). It will begin capturing pages from MS Reader. It will take up to 1.6
seconds per page, which would be 100 pages per minute.
When capturing is done, the script will notify you with a popup.
Go back to Irfanview. Do the following to the files in your capture directory:
* batch rename, using a sensible template name (I used page###)
* batch process with the following Advanced options:
+ crop
This is needed to get the ebook title off the top, and the riffle slider
off the bottom. experiment with a single file to get the crop
dimensions. On my project, the original size was 808x1078; my crop
settings were Xstart 70 width 700 Ystart 70 height 910. Note:
Irfanview has a bug in the batch processing dialog which ignores what
you type for starting Y-coord and uses the same as the starting X-coord.
So set them the same and work from there.
+ Set DPI: 200.
Your OCR software may be different, but mine required that the DPI be
between 200-800. Your screencaps will not have a true DPI number so we
fake it here.
+ Resize: Set new size as percent of original: Width 200% Height 200% You can
experiment with larger resizes. Blowing up the images is absolutely
necessary for OCR software to work; the OCR software needs more pixels to
work with than a regular screencap can give it.
+ Convert to Grayscale
+ Brightness: -40
This gets rid of the pale yellow dotscreen pattern.
+ Contrast: +127
This maximum contrast enhancement converts almost all the grays to
black. You might want to experiment here too to get the best
recognition; I got a lot of recognition errors where "cl" was recognized
as "d". Less contrast might have improved that.
* a SECOND batch process with just this Advanced option:
+ Change color depth: 2 colors (Black/white) (1BPP)
(Do not try combining the batch processes!)
For each batch process, you'll need to either change the extension, change
the folder, or enable "Overwrite Existing Files" in Advanced options (which
I don't recommend).
At this point you have a folder full of b/w screencaps, with everything but the
actual text cropped out.
Go into CuneiForm99's Batch Recoginition Utility and set it up to recognize all
the images in the folder. (Remember to only put the b/w ones in the batch.)
At the end of the job wizard, go into Recognition options. On the Recognition
tab, clear ALL the checkboxes under Recognition parameters; on the Format tab,
you probably want to uncheck "Font Size" and leave "Italic", "Bold", and
"Paragraph" checked. Now click OK.
Start recognizing.
When you're done, you'll have an RTF that is at least as good as a raw scan of
a paper book. Go proofread and edit it.
Someone else mentioned that Windows Media Player prevented screen copy. The reason for this is video overlay. Most graphic cards support overlays as faster ways of writing streams of changing video frames to the display without worring about the actual window. If you turn graphic acceleration all the way down in WMP I believe it will play directly to the player window rather than overlay, thereby allowing a capture but most cards won't be able to keep up the same performance that way. I was on some site looking at satellite images a few months ago (I think TerraServer) and they gave me the option of smaller images, or nice big images with copy protection (which required a plugin download to see them, though still right in the browser). I tried to capture the images then using PrtScrn and got logos of the copy protection with no sat image. It seemed likely that the window showed the logo, then they used video overlay for the actual images. I wonder why makers of eBook readers don't use overlays in the same manner for this reason. I used the MS Reader awhile ago and it seemed to allow specific titles to allow/disallow printing, clipboard copy, and Save As functionality. If they also used overlays they would be much harder to defeat (though of course still not impossible). As it is, it would take less than an hour to automate PrtScrn, OCR/save, push keystrokes to change to next page. Images are nice, but MS Office XP includes nice OCR now so the tools are mostly at hand!
On the plus side, some of the old versions of realplayer allow print screen if you are at full screen.
In realplayer 7 and 8 for Windows, I can go to View > Preferences > Performance and turn off "Use optimized video display", and realplayer won't use an overlay.
Will I retire or break 10K?
I'm more worried that it took them "just 30 minutes" to find the damn thing....
If you have "Full window dragging" (or whatever it's called on your system) enabled, you can also grab a screenshot by hitting PRNTSCRN while you are dragging the RealPlayer window around. The image in the window switches from overlay to the standard video system while being dragged.
--DennyK
I've been emailing the guy who did this - he hadn't even *heard* of Palladium or the ridiculous laws proposed to close the analog hole. So all of his bold assertions about this stuff ALWAYS and FOREVER being ways to circumvent copy-protection are just so much ill-informed nonsense.
www.sjbaker.org
In order for a monitor to work, it must be viewable
I know that's a blindly flash of the obvious, but the author's point still stands. While you might no longer be able to do digital screen captures via PrintScreen or software, at worst case you could still take a picture of the screen and OCR it.
He made an extremely good reminder to people that, so long as people are involved, encryption will ultimately fail on some level, because the end product MUST be decrypted for us to use.