Breaking Google's DRM
An anonymous reader writes "Google's new Google Print service (that lets you see scanned pages from printed books) has a pile of advanced browser-disabling DRM in it ('Pages displaying your content have print, cut, copy, and save functionality disabled in order to protect your content.'). This works with JavaScript turned off, even in Free Software browsers. Seth Schoen has posted preliminary notes on some breaks to the DRM (beyond just automating a screenshotting process), including a proposal for a circumventing proxy that would fetch Google Print pages and strip out the DRM. A full exploration of the html obfuscation and DRM employed by Google would be very interesting; certainly the ability for a remote attacker to disable critical browser features like save, right-click, copy and cut against the user's wishes is a major security vulnerability in Moz/Firefox and should be fixed ASAP."
Searching google on book titles returns a Print match if they have the book in their records. Not too many yet, it seems.
Google DRM
g url with cryptographic signature"); background-repeat:no-repeat; background-position:center left; background-color:white; }
.theimg background, to be saved to disk. For some reason, Save Page As.../Web Page (complete) still declined to download the background image at all, even in the absence of JavaScript, as if perhaps the CSS parser in the display logic in Firefox is smarter than the CSS parser in the Save Page As... code.
.mozilla/firefox/default.*/Cache/[0-9A-F]*). I'm still puzzled about why Page Info and the DOM Inspector won't actually reveal the image referenced in the .theimg style or allow it to be saved.
( [^ "]+\)")
.theimg, and then to load it directly. Perhaps that will change in the future.
To further protect your book content, printing and image copying functions are disabled on all Google Print content pages.
Similarly:
We've put a number of measures in place to prevent the downloading, copying, or printing of your content [...] Pages displaying your content have print, cut, copy, and save functionality disabled in order to protect your content.
I'm surprised at how much effort Google went to here. I would have expected my browser not to be vulnerable to having any of its "functionality disabled", yet, with a recent Firefox, I found that I couldn't
1. print the page to a PostScript file,
2. right-click on the page at all,
3. save the page to disk (the image would somehow not be downloaded at all),
4. view the precious image in Page Info/Media (although I could see which image it was),
5. save the precious image in Page Info/Media,
6. find the precious image in the DOM Inspector (which seemed like the really heavy artillery), although the DOM Inspector did let me see its URL as part of an uninterpreted style definition, and seem to reveal the trick: defining a style called ".theimg", with the definition
{ background-image:url("http://print.google.com/lon
and then invoking that style inside a tag:
So I tried turning off JavaScript, and I found that I was essentially no better off: right-clicking caused a copy of cleardot.gif, not the
The two ways I've found so far that work to capture images from Google Print are a screen capture (I used xwd, which of course worked perfectly) and looking in the on-disk cache (ls -lrt
If you wanted to write a proxy that would make Google Print pages capable of being saved to disk, you would presumably want to match
background-image:url("http://print.google.com/\
(although you'd need to be careful to match only the one in the definition of ".theimg", because it looks like there may at least one other background-image:url) and then replace
I haven't tried this because it felt like too much work relative to the previous two methods.
Contrary to what I expected, Google Print does not seem to check referer, so it seems to be possible merely to extract the URL from the definition of
Google must have hired some experts on html image protection or html obfuscation. To be sure, there are lots of other tricks in Google Print that I had never seen before. It is hard to think that the author of that HTML obfuscation was not the subject of Richard Stallman's accidental haiku. It is amusing to think that Mr. Bad's "other" DeCSS might at last be used for some kind of circumvention (although I doubt it, because presumably Google Print simply won't work at all with the CSS removed).
A full exploration of the html obfuscation and DRM employed by Google would be very interesting
I've been looking at this - there's a blog post with some preliminary discussions, and a follow-up giving some ways of getting around it. The short answer is that if you just want to save the image to disk, it's not too hard in a decent browser.
Gerv
Seacrh for "economic development".
gerv, a mozilla developer, has a few blog entries that talk about how the print service tries to stop you from getting to the jpeg's, and how to bypass that.
Google Print, And Clue Barriers
Google Print Hacking Ideas
nostrils
Now they're both mysteriously restricted to general viewing.
Your hair look like poop, Bob! - Wanker.
this is a damn good point.
I copied this from a post I saw earlier on slashdot - I have lost the link but still have the text.
That's why they need the dumb-ass DMCA, because it's impossible to make secure DRM. DRM is not and can never be cryptographically secure because it is not actually a cryptography problem. Cyrpography is about keeping secrets away from unauthorized people. That's fairly easy. DRM is about GRANTING people authorized access and GIVING them the key and then attempting to keep what you've given to them a secret from them.
DRM is a schizophrenic and fundamentally impossible task.
All they can do is the key obscurely inside the player and hope that no one makes the effort to look at it.
It was written about SACDs, but it applies just as equally to stopping people copying text. In the long run, DRM won't work. It's just a serious pain in the ass, especially for legitimate users (how can you get fair use if the damn copy/paste functionality is disabled?)
-- james
It's been explained ad nauseum that google does not archive deleted email indefinitely; deleting just isn't instantaneous, because of the nature of the system.
from the gmail privacy pageIt's not tough "DRM"... my university's local online student newspaper equivalent effectively does the same thing.
The World Wide Web is dying. Soon, we shall have only the Internet.
First, turn off javascript. then turn on image dimensions. right click on the dimensions for the main image, and click view background image.
http://print.google.com/print?id=ULQSG0Zs7vcC&pg=3 &img=1&q=mastering+digital+photography&sig=gv2nFpt Ef0dj7Gzb8eZ4U8UdtUo
is the URL that is used, and surprisingly it is linkable from outside, it doesn't appear to check IP's, browsers, or anything else. (deep link away!)
Gerv, who works for mozilla/bugzilla, already went through this, and found several ways around google's hackery. He then went and summarized the multiple ways to do it in good browsers.
Get Firefox!
How do you mean "begin"? Plenty of books on Kazaa and many of them aren't exactly legal.
And how about Usenet?
Here is an excerpt from a Mozilla blog regarding this. The parent URL of the print.google.com example is http://print.google.com/print?id=ULQSG0Zs7vcC&lpg= 3&pg=0_1&sig=O0-GVU5AdfrMmUtu0N5mNM7sUCg.
:-(
.theimg { background-image:url("http://print.google.com/prin t?id=ULQSG0Zs7vcC&pg=3&img=1&sig=gv2nFptEf0dj7Gzb8 eZ4U8UdtUo") }
Next idea: use the DOM Inspector to inspect the entire browser XUL. This means that the context menu will still work. It's more difficult to do, because you can't locate elements by clicking in the content area - it only works for the chrome. Still, we finally track down the clear GIF and delete it. Boom! This time Firefox crashes (taking with it an earlier version of this blog post.)
OK, let's try another approach. Let's find the surrounding in the DOM Inspector, look at its computed style, and copy the URL out of it. Except that the Computed Style view doesn't support copying. Undeterred, and feeling close to the goal, we view the applied styles for the and try and copy the URL out of the individual background style rule.
Success! This works. We can chop off the CSS gubbins, paste the result into a web browser URL bar, and finally get an image we can save.
In fact, you can also get the URL of the page graphic by viewing the source. It turns out that it's not as hard as I made out, because currently, the in question has a sensible class name:
so it's easy to find.
Install the Firebird extension "allow right-click" and do what you want with the images...
And censorship. You forgot their Chinese censorship ;)
Prosperity is only an instrument to be used, not a deity to be worshipped. Calvin Coolidge
Question #5 states:
What can I do with books that I find?
Well, you can browse a few pages, learn more about the topics explored by the book, buy it, or commit a selection to memory. To further protect your book content, printing and image copying functions are disabled on all Google Print content pages.
I don't see the big deal. As long as they let me still use "back", "forward" and "exit" I'll be happy. Sure it sucks that you might have to buy a book or write down your favorite quote, but it's free as in gratis at this point.
Amazon only lets you get about 3 pages into a book and usually you can't leave the introduction.
Get your Unix fortune now!
Although command P produced a page with a big white hole where the text was supposed to be, I used the "Activity Viewer" to discover that one of the components of the page was substantially larger than the others. I was able double click that particular URl, which opened in a new window, shorn of any nasty DRM.
I am afraid, however, that Apple will face pressure to restrict this rather useful feature. At one time, it could be used to evade Quicktime silliness, but it seems the feature has since been disabled.
(The transparent.gif overlay technique has previously been used by (ahem) vendors of photography, and (of all people) ebay sellers. It's not quite novel.)
Is anyone else getting 502 error. Has Google really been /.ed. If so shame on them - Google seem to be losing the thread, first DRM and now system outages - all in one day :(
----
The text of the book is a dymamically generated jpeg.
/print?id=TpUEyu2mTdoC&pg=3&img=1&q=economic+devel opment&sig=Aty75CJmTJeGBo3RuQNDK2rySFw HTTP/1.0
1 55:S=0M__0IuYQEWmHl8g; expires=
^ G^ G^G
# telnet print.google.com 80
GET
Trying 64.233.161.118...^M
Connected to print.google.com (64.233.161.118).^M
Escape character is '^]'.^M
HTTP/1.0 200 OK
Content-Type: image/jpeg
Set-Cookie: PREF=ID=3a4b3c405b55e316:TM=1097254155:LM=1097254
Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Server: OFE/0.1
Content-Length: 95942
Date: Fri, 08 Oct 2004 16:49:15 GMT
Connection: Keep-Alive
^@^PJFIF^@^A^A^@^@^A^@^A^@^@^@C^@^H^F^F^G^F^E^H
<snip>
The jpeg can be converted to postscript, which can be converted to text.
This gets one page. If someone could reverse-engineer the "sig" argument I'm sure you could specify a page number.
To be honest, it would probably be easier to just check the "Economic Development" out from the library.
I also notice the slashdot effect is starting to crush print.google.com.
First the obvious: It's trivial to break "Save Page" if you can require JavaScript. If someone finds a way around their DRM, they will simply go that route.
Second, even without Javascript, CSS offers numerous ways to make saving a webpage a complicated problem. Some browsers also honor cache timeouts when you try to save a page and make revealing roundtrips to the server. You could also trigger alarms based on page frequency. Humans don't read a page per second...
Ultimately however, what you can see you can save. Google doesn't give you plain text, only images which are hardly suitable for OCR, but for some that may be enough. I for one wouldn't want to read a text which is presented at what looks like 50 ppi.
Google ceased to be good in my book when they used the DMCA to take down an rss feed of google news.
I am trolling
Whilst I'm all for breaking DRM that hinders the rights you have to use your content in the way you want - this just looks like breaking DRM to get stuff for free.
Which DRM? I have no DRM installed on my machine. I have agreed to no contracts or EULAs with regard to DRM.
Google sends me some copyrighted information. The copyright law limits what I can do with it (e.g. I cannot republish), but for my own private use I can do pretty much anything I want with it.
That image already exists as a file (or part of a file) on my machine. What Google is doing is trying to prevent me from looking at it in non-approved ways. Well, it can try, but I have no legal or ethical obligations to follow its wishes. If I want to take that image, load it into Photoshop and play with it there, I am completely within my rights.
So, no, I don't see any problems (either legal or ethical) with breaking this pseudo-DRM -- and I am willing to bet it will be breakable very easily -- and using these images however I want within the limits set by the copyright law.
Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.
I just looked at the page source code... they actually did something very similar to this. They create a table cell, set the background image to the book page (it's fed out of their search engine as opposed to being a static image link, so I imagine the backend screens based on http_referer or something), and then stretch a 1x1 transparent gif over the table cell. "Show Image" then shows the transparent gif, and there is no "show background image" since we are over a foreground image.
They also use the standard context-menu disabling Javascript, which IE respects (and Mozilla does as well if you tell it to). Other than this (standard-issue) trick, they aren't doing anything sneaky to the user's browser at all. They could even disable the DRM for non-copyright pages if they wanted to (don't use the transparent cover image, and don't disable the context menu). All in all, it seems like a pretty slick implementation!
Save Maine's economy: write stuff down. All comments are exclusively my own, not my employer.
They became an Evil Company last april
I am trolling
Apparently it's slashdotted. I'm getting a 502 server error when I try to look at the book.
Never underestimate the power of fiber.
In college, an acquaintance of mine and I worked on this concept, and he implemented it. I think his final version took in .png files and outputted HTML for them. They looked perfect, and it even had a little bit of optimization for colspanning if adjacent pixels were the same color. Suffice it to say, yes, it's been tested. Yes, it works. Yes, you would need more memory. :)
...to see what image was the "protected" page. Search the source, it's a CSS background-image. There are two background-images: a thumbnail of the cover and the book page you are viewing.
All you need is a script to retrieve CSS background-images and *poof* goes Google copy protection. It was doomed from the start, anyway.
There's an extension called Allow Right-Click to accomplish just that. (Granted, it would be nice if this was integrated into the browser)
It's not that hard to mess with a browser in this way. For example, to hide content when you print is a matter of some CSS2.
@media print {
#content { display: none; }
}
Toss in half a dozen other spoilers such as multi-part mime & redirects (to hide URLs), DOM event handlers (to handle & ignore mouse clicks), transparent gifs (to mangle context menus), transparent DIVs that become opaque when printed and you achieve the desired effect.
They're all surmountable, but I suppose Google want to be seen to be making a concious effort to block people from printing out pages.
This is easy to circumvent, at least in X. You can copy text by simply selecting it.
http://print.google.com/print/doc?articleid=y4tfu9 YqpnG (sans formatting):
I remember when legal used to mean lawful, now it means some kind of loophole. - Leo Kessler
Couldn't that be fixed with ex. proxonomitron?
In Firefox 1.0 PR on Windows, it's Tools -> Options -> Web Features -> -> Advanced -> "Disable or replace context menus".
And yes, you can right click on a Google image and save it. Well, almost. First, you have to use AdBlock to block the "cleardot.gif" file, the transparent GIF that overlays the image. Then you right-clicksually called) to "View Background Image". Then you will get the JPEG image of the book's page. You can then right-click the JPEG image and save it where you wish.
If one wanted to make this process a little easier, one could use a proxy server that saved all images that passed through. Of course, the proxy server would have to ignore the No-Cache headers that Google probably puts on the images, but that shouldn't be difficult.
DRM (Digital Rights Management) actually manages and enforces permissions based on a user's privledges, per user. Usually this is in lock and key form.
On the other hand, Copy-protection indiscriminantly curtails duplication.
* Set Adblock to "Hide Ads" * Block: http://print.google.com/images/cleardot.gif * Prevent websites from changing the context menu: Web features > Advanced * et voila
If it is that big of a deal that you be able to STEAL someone's copyrighted text through Google, use Print Screen you idiots.
I'd like to see something like this, for instance, in Firefox's security settings near the Javascript permission settings:
Block sites from:
[X] Disabling right-click context menus
In Firefox:
* "Edit" -> "Preferences"
* Select "Web Features"
* Click the "Advanced" button next to "Enable JavaScript"
* Uncheck "Disable or replace context menus"
(This was bug 86193, checked into the code in March. It's in 1.0PR)
As for single-window mode, there are plenty of extensions. Try the one called "Tabbrowser Extensions", for instance.
So:
- Start at the beginning of the book
- Read 3 pages
- Pick a phrase on the third page
- Search for that phrase within the book
- Click the search result for the third page
- Read the next two pages
- Pick a phrase on the fifth page
- Search for that phrase within the book
- Click the search result for the fifth page
- Read the next two pages
- Repeat until end of book
It's irritating, but when you're trying to find a passage in the book and the three-page limit smacks you, you can use this method to get more of the book (or all of it, if you have the patience).IE. Default settings. No proxy, no modifications. Nothing particularly special about it.
t ?blablahblah");bunch of other stuff;}
-Load up the book in the browser.
-Click the View menu, select Source.
-Search for "div class=browse"
-Immediately before that, you'll find something like this in a CSS style:
{ background-image:url(http://print.google.com/prin
-Take that URL, copy and paste it into a new browser window and voila, you have the full size image. Save As or Print on this image works fine. No problems at all.
Seriously, this is trivial to break.
What's not trivial is getting an entire book. How to figure out how to get every page is the tough part. Getting the image itself is a cakewalk. It's just Javascript tricks to break right-clicking and CSS tricks to break direct printing from that window. Saving gets broken because of the tricky CSS using the IMG as a background image. The browser doesn't think to save the image, is all.
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
1. Go to a "protected" page, like the sample page.
2. Select the Activity window from Safari.
3. Double click on the largest image, i.e. this page.
4. Do what ever you want with it.
5. Profit!!!
Ok, disable javascript. (Set javascript.enabled to false (just double click)) Now you can already right click on the google book.
p hotography
Ok, so go to a bookpage, this will help finding one: http://www.google.com/search?q=mastering+digital+
Next, use the Web Developer extension (you have that one right?) to Display ID & Class details. You will see a class named theimg. Now right click that red little box and "View background image".
I thank you very much.
Hopla
Wells Fargo Online Banking does stuff like that so a printed version of your account history is "printer friendly".
Actually works extremely well, so such things can be used for good.
Just because it CAN be done, doesn't mean it should!
1) Search for "to kill a mockingbird".5 &img=1&q=to+kill+a+mockingbird&sig=KQFFYkYib3kQQGF e9h8nx1JlbIE
2) Click on the book link.
3) View source.
4) Search the source for something like: http://print.google.com/print?id=iGvy3fB-D-QC&pg=
5) Go to that URL in your web browser.
6) Save the image.
Actually, its not edit -> preferences its
Tools -> Options -> Web features -> Advanced Button -> uncheck "Disable or replace context menus"
most of the time "edit" is used to copy, paste find and undo. never seen a preference selection in an edit menu before.
I'll just use my special getting high powers one more time...
Just use the following javascript as a bookmark to move the obscuring image out of the way, then right click to get the context menu, and middle click "view background" to open the image in a new tab for saving.
;i++) { if(document.images[i].src.match('cleardot') == 'cleardot'){document.images[i].width=20 ;} ;};void('');
javascript:for(var i =0;i < document.images.length