Breaking Google's DRM
An anonymous reader writes "Google's new Google Print service (that lets you see scanned pages from printed books) has a pile of advanced browser-disabling DRM in it ('Pages displaying your content have print, cut, copy, and save functionality disabled in order to protect your content.'). This works with JavaScript turned off, even in Free Software browsers. Seth Schoen has posted preliminary notes on some breaks to the DRM (beyond just automating a screenshotting process), including a proposal for a circumventing proxy that would fetch Google Print pages and strip out the DRM. A full exploration of the html obfuscation and DRM employed by Google would be very interesting; certainly the ability for a remote attacker to disable critical browser features like save, right-click, copy and cut against the user's wishes is a major security vulnerability in Moz/Firefox and should be fixed ASAP."
Knowing how to develop stuff like this is not a skill everyone has. This might explain why Google recently hired some browser-type software developers (as discussed on Slashdot).
certainly the ability for a remote attacker to disable critical browser features like save, right-click, copy and cut against the user's wishes is a major security vulnerability in Moz/Firefox and should be fixed ASAP
While I agree it would be nice to fix this from a convenience point of view, and a "it's my computer - it'll do what I want" point of view, how is this a security risk? How do I get a trojan, or lose files, because of an inability to copy & paste on a particular page?
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
Facts :
i) To display the books, they've got to send that information to the browser, on your machine.
ii) Once its displayable on your machine, there is *absolutely* no way they can stop a determined person from printing it.
iii) If its going to work on Open-Souce browsers, the DRM must be fairly transparent.
iv) If it works on Open Source browsers, someone cleverer than me will modify that browser so that it works as the user intends, rather than the sender. Their only protection is the DMCA, which may stop a US coder from writing/distributing the hacked app, but the rest of us will be laughing.
Frankly, if Google were as smart as they're hyped to be, they'd know this.
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
and so begins a new age of literature piracy
TWW
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
Messing with our browsers and DRM
Does this mean that Google is now officially an Evil Company(TM)?
Google DRM
g url with cryptographic signature"); background-repeat:no-repeat; background-position:center left; background-color:white; }
.theimg background, to be saved to disk. For some reason, Save Page As.../Web Page (complete) still declined to download the background image at all, even in the absence of JavaScript, as if perhaps the CSS parser in the display logic in Firefox is smarter than the CSS parser in the Save Page As... code.
.mozilla/firefox/default.*/Cache/[0-9A-F]*). I'm still puzzled about why Page Info and the DOM Inspector won't actually reveal the image referenced in the .theimg style or allow it to be saved.
( [^ "]+\)")
.theimg, and then to load it directly. Perhaps that will change in the future.
To further protect your book content, printing and image copying functions are disabled on all Google Print content pages.
Similarly:
We've put a number of measures in place to prevent the downloading, copying, or printing of your content [...] Pages displaying your content have print, cut, copy, and save functionality disabled in order to protect your content.
I'm surprised at how much effort Google went to here. I would have expected my browser not to be vulnerable to having any of its "functionality disabled", yet, with a recent Firefox, I found that I couldn't
1. print the page to a PostScript file,
2. right-click on the page at all,
3. save the page to disk (the image would somehow not be downloaded at all),
4. view the precious image in Page Info/Media (although I could see which image it was),
5. save the precious image in Page Info/Media,
6. find the precious image in the DOM Inspector (which seemed like the really heavy artillery), although the DOM Inspector did let me see its URL as part of an uninterpreted style definition, and seem to reveal the trick: defining a style called ".theimg", with the definition
{ background-image:url("http://print.google.com/lon
and then invoking that style inside a tag:
So I tried turning off JavaScript, and I found that I was essentially no better off: right-clicking caused a copy of cleardot.gif, not the
The two ways I've found so far that work to capture images from Google Print are a screen capture (I used xwd, which of course worked perfectly) and looking in the on-disk cache (ls -lrt
If you wanted to write a proxy that would make Google Print pages capable of being saved to disk, you would presumably want to match
background-image:url("http://print.google.com/\
(although you'd need to be careful to match only the one in the definition of ".theimg", because it looks like there may at least one other background-image:url) and then replace
I haven't tried this because it felt like too much work relative to the previous two methods.
Contrary to what I expected, Google Print does not seem to check referer, so it seems to be possible merely to extract the URL from the definition of
Google must have hired some experts on html image protection or html obfuscation. To be sure, there are lots of other tricks in Google Print that I had never seen before. It is hard to think that the author of that HTML obfuscation was not the subject of Richard Stallman's accidental haiku. It is amusing to think that Mr. Bad's "other" DeCSS might at last be used for some kind of circumvention (although I doubt it, because presumably Google Print simply won't work at all with the CSS removed).
... if their DRM can be broken or not.
The point is that it is "good enough" to stop the average person from lifting the material.
If you're determined enough, nothing is going to stop you from getting what you want.
A full exploration of the html obfuscation and DRM employed by Google would be very interesting
I've been looking at this - there's a blog post with some preliminary discussions, and a follow-up giving some ways of getting around it. The short answer is that if you just want to save the image to disk, it's not too hard in a decent browser.
Gerv
Where can we see a sample of this to test whether it actually does these disabling things?
I do agree that this is a security problem. We already have options in some browsers (I use Firefox, for example) to block sites from changing status bar text, changing images, etc. And there was no fuss about that. I think disabling such basic functions as copy, paste, print falls in the same "no-no" category as changing statusbar text, changing images, etc.
A site presents a page in a certain way, but I as the user get to select how I view it, with what functions I want to view it, which parts of the site I want active and which ones I don't. You can't force me to accept what I don't want to accept. If I set my software to ignore part of your site, that's my choice, not yours.
You don't go disabling functions in users' browsers. You let them do that themselves. Conversely, you don't enable stuff the user didn't enable themselves.
Isn't it now about to be illegal to go changing peoples' browser settings via the use of spyware? Doesn't this come awfully close to doing the same thing? If it changes how my software behaves, it's awfully close to being malware.
i am a soviet space shuttle
Seacrh for "economic development".
gerv, a mozilla developer, has a few blog entries that talk about how the print service tries to stop you from getting to the jpeg's, and how to bypass that.
Google Print, And Clue Barriers
Google Print Hacking Ideas
nostrils
Now they're both mysteriously restricted to general viewing.
Your hair look like poop, Bob! - Wanker.
this just looks like breaking DRM to get stuff for free.
You are 100% right.
It isn't about "security" or even "fair use" it's about the ability to cut and paste, save and print someone else's content without their permissions.
I could understand if you owned the books but you don't. Sounds like a good way to bite the hand that feeds you.
If you are really concerned with Google messing with your browser... don't go to any Google domain, ever. Add an entry in your HOSTS file for google, froogle, gmail, gbrowser and whatever else you'd like.
It's a free service, free in the sense that you are free not to use it.
Get your Unix fortune now!
this is a damn good point.
I copied this from a post I saw earlier on slashdot - I have lost the link but still have the text.
That's why they need the dumb-ass DMCA, because it's impossible to make secure DRM. DRM is not and can never be cryptographically secure because it is not actually a cryptography problem. Cyrpography is about keeping secrets away from unauthorized people. That's fairly easy. DRM is about GRANTING people authorized access and GIVING them the key and then attempting to keep what you've given to them a secret from them.
DRM is a schizophrenic and fundamentally impossible task.
All they can do is the key obscurely inside the player and hope that no one makes the effort to look at it.
It was written about SACDs, but it applies just as equally to stopping people copying text. In the long run, DRM won't work. It's just a serious pain in the ass, especially for legitimate users (how can you get fair use if the damn copy/paste functionality is disabled?)
-- james
They have to show the suits at the publishing houses that they are being responsible, safeguarding the suits' ``intellectual property''. It doesn't really matter whether it actually works, just as it doesn't really matter if the features in the checklist on the box of software work. It's a tool for the salesman to use.
If this feature exists but really doesn't work, then the suits get the illusion that their ``intellectual property'' is protected, and they get free advertising of the try-before-you-buy variety. For this best of all possible worlds scenario, it has to work well enough to fool the suits, but not well enough to stop the rest of us.
Sounds to me as if Google has gotten it to work just about well enough to do a good job for all concerned: Google, us readers, and even the suits.
See what I've been reading.
You are adding to the fire by allowing them to change the definition of copyright. Copyright gives holder no right to determine how one USES content, it merely gives them a monolopy right over copying the content for distributation. There are some copyright limitations on use, such as public displaying and the like, but fair use clearly says once you give ME a copy of your work, I can do anything I damn well chose to it.
It already gave me a copy of the work for free, if I chose to burn it, make a hat out of it, or print it out, it's my business.
Burn Hollywood Burn
First, turn off javascript. then turn on image dimensions. right click on the dimensions for the main image, and click view background image.
http://print.google.com/print?id=ULQSG0Zs7vcC&pg=3 &img=1&q=mastering+digital+photography&sig=gv2nFpt Ef0dj7Gzb8eZ4U8UdtUo
is the URL that is used, and surprisingly it is linkable from outside, it doesn't appear to check IP's, browsers, or anything else. (deep link away!)
Gerv, who works for mozilla/bugzilla, already went through this, and found several ways around google's hackery. He then went and summarized the multiple ways to do it in good browsers.
Get Firefox!
I seem to recall them using a simiar trick on the official site for Lord of the Rings when it came out.
The World Wide Web is dying. Soon, we shall have only the Internet.
$ wget long url from http://slashdot.org/comments.pl?sid=124900&cid=104 70948
Resolving print.google.com... done.
Connecting to print.google.com[64.233.161.118]:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
09:44:53 ERROR 403: Forbidden.
Read Epic the first RPG novel.
It's not a vulnerability at all... Just obfuscation.
The image is set to be a background image, using CSS. Like a background on Table, or on a website, the page doesn't let you click on it, to directly alter it.
But in the code itself, it's pretty obvious...
An example, of the straight JPEG
Colin Davis
Even within the framework of our eroding copyright laws, fair use allows quoting of copyrighted works. Why should I not be allowed to cut and paste (to prevent distorting a quote)? So I would say this is not an open and shut case.
I understand the necessity for the DRM by Google -- without it their library of content will be severely limited; however, do not paint the actions of everyone attemting to circumvent the DRM.
Home Automation & Linux -- now I know I'm a geek
Whilst I'm all for breaking DRM that hinders the rights you have to use your content in the way you want - this just looks like breaking DRM to get stuff for free.
Which DRM? I have no DRM installed on my machine. I have agreed to no contracts or EULAs with regard to DRM.
Google sends me some copyrighted information. The copyright law limits what I can do with it (e.g. I cannot republish), but for my own private use I can do pretty much anything I want with it.
That image already exists as a file (or part of a file) on my machine. What Google is doing is trying to prevent me from looking at it in non-approved ways. Well, it can try, but I have no legal or ethical obligations to follow its wishes. If I want to take that image, load it into Photoshop and play with it there, I am completely within my rights.
So, no, I don't see any problems (either legal or ethical) with breaking this pseudo-DRM -- and I am willing to bet it will be breakable very easily -- and using these images however I want within the limits set by the copyright law.
Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.
What's next, banning cell phone cameras in book stores, or libraries?
This sort of HTML onfuscation abuse is just the beginning. This is a general problem with any sufficiently rich presentation language. There are hundreds of different ways to obfuscate things.
Just wait until MS finally decides to properly support PNG alpha transparency! Combine this with CSS absolute positioning, and you'll start seeing images which are composited from many different layers of semi-translucent images; each of which is just noise of it's own. You also have already seen for a long time the cutting up of images into many small pieces.
This could be taken to an extreme as well. With absolute positioning you could also do this with text as well as images. Just position each letter on the page separately and randomize the order in which they appear in the HTML stream. Or even worse, use a custom downloaded font, where the glyphs are all randomized, so although it may look like an "A", it's really in the slot for a "Q"...try to cut and paste that.
Consider the PDF format as an extreme of where XHTML+CSS+DHTML+PNG can go wrt. obfuscation. Sure, the determined and savy can always get the text copied out; but that doesn't mean its not going to be very difficult.
Maybe we should all go back to ASCII and lynx.
This was always intended as a "feel good" feature of the Google print system so that pulishers would feel safer sending tons of books to Google.
/. But it's good enough for Google to run the business, most likely.
/. isn't going to spread enough FUD to publishers that would have otherwise sent in their material. Google print is still in its infancy, and could fail if Google doesn't assert some spin control on the situation, I suppose. Maybe I overestimate /.'s influence.
The "real" DRM here isn't DRM. As a previous post so astutely pointed out, DRM is schitzophrenic by nature: it involves trying to give someone something without *actually* giving it to them.
Google's "real" protection is that the service won't let you view more than a certain percentage of the book in any given month. That percentage is determined by the book's publisher at submssion time, anywhere from 20% to 100%.
Even if you can copy/paste/print, you're still only going to get a portion of the book - certainly not enough to replace a valid sale. Disabling that functionailty basically returns us to the age of photocopying a few pages of a book/article in a library. Except now we can search, so it's faster.
If one solution is as simple as "grab th data from your browser's cache" this is clearly meant to only stop the "average" user, something that is in very short supply here on
Here's to hoping this headline appearing on
I don't even see the point to this.
Really who is going to print out all 600 pages of the newest Tom Clancey book, then goto the effort of binding them together. It'd cost more in paper, ink, time & energy than to just buy the book.
Sure if it were a cooking book or something someone might only want 1 page. But then again, if they want 1 page they can just write it down.
Seems like a big waste of time and money to me, but then again after the IPO they have money to blow.
Most people arn't aware of that workaround. But browsers are supposed to work for the user not the website designer. "Features" that irritate the user in order to placate designers are antithetical to that the concept.
Designers didn't pay for my machine, why should they have any right to control what I do with it.
autopr0n is like, down and stuff.
If someone's only business model is to put some crap on a website, charge a bunch of money for access, and hope to sit back and watch the cash roll in, I think they will be in for a rude wakeup call.
You're absolutely right.
If that worked, the internet would be full of pornography in a heartbeat.
Oh. Wait a minu..
http://request-header.info
So:
- Start at the beginning of the book
- Read 3 pages
- Pick a phrase on the third page
- Search for that phrase within the book
- Click the search result for the third page
- Read the next two pages
- Pick a phrase on the fifth page
- Search for that phrase within the book
- Click the search result for the fifth page
- Read the next two pages
- Repeat until end of book
It's irritating, but when you're trying to find a passage in the book and the three-page limit smacks you, you can use this method to get more of the book (or all of it, if you have the patience).