Breaking Google's DRM
An anonymous reader writes "Google's new Google Print service (that lets you see scanned pages from printed books) has a pile of advanced browser-disabling DRM in it ('Pages displaying your content have print, cut, copy, and save functionality disabled in order to protect your content.'). This works with JavaScript turned off, even in Free Software browsers. Seth Schoen has posted preliminary notes on some breaks to the DRM (beyond just automating a screenshotting process), including a proposal for a circumventing proxy that would fetch Google Print pages and strip out the DRM. A full exploration of the html obfuscation and DRM employed by Google would be very interesting; certainly the ability for a remote attacker to disable critical browser features like save, right-click, copy and cut against the user's wishes is a major security vulnerability in Moz/Firefox and should be fixed ASAP."
Knowing how to develop stuff like this is not a skill everyone has. This might explain why Google recently hired some browser-type software developers (as discussed on Slashdot).
certainly the ability for a remote attacker to disable critical browser features like save, right-click, copy and cut against the user's wishes is a major security vulnerability in Moz/Firefox and should be fixed ASAP
While I agree it would be nice to fix this from a convenience point of view, and a "it's my computer - it'll do what I want" point of view, how is this a security risk? How do I get a trojan, or lose files, because of an inability to copy & paste on a particular page?
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
Facts :
i) To display the books, they've got to send that information to the browser, on your machine.
ii) Once its displayable on your machine, there is *absolutely* no way they can stop a determined person from printing it.
iii) If its going to work on Open-Souce browsers, the DRM must be fairly transparent.
iv) If it works on Open Source browsers, someone cleverer than me will modify that browser so that it works as the user intends, rather than the sender. Their only protection is the DMCA, which may stop a US coder from writing/distributing the hacked app, but the rest of us will be laughing.
Frankly, if Google were as smart as they're hyped to be, they'd know this.
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
and so begins a new age of literature piracy
Whilst I'm all for breaking DRM that hinders the rights you have to use your content in the way you want - this just looks like breaking DRM to get stuff for free.
If that really is the case, then I'm extremely concerned that someone is doing this. Mainly because it adds extra ammunition to those who (wrongly) try to push the line that the only people who want to break DRM are those who want to rip people off.
Avantslash - View Slashdot cleanly on your mobile phone.
Information, by its very nature, is copyable. DRM schemes may stop a casual user from copying information, but it is theoretically impossible to make an invincible DRM system like this due to the very nature of information.
That having been said, Google is smart enough to know this. They have to put what they can in place in order to convince publishers to agree to their system.
Ha, ha! Nobody ever says Italy.
TWW
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
Messing with our browsers and DRM
Does this mean that Google is now officially an Evil Company(TM)?
We're entering an age where all data is passed as objects. OS'es won't have common facilities to save data, merely to access the storage HW. Objects might or might not have facilities to save themselves, depending on their producer. PCs are probably a lost cause, but once phones submerge in the viruspam tide, their OS'es will prove the perfect platform for "trusted computing". Software distributors will control your gizmos, and you won't even be able to turn them off.
--
make install -not war
Google DRM
g url with cryptographic signature"); background-repeat:no-repeat; background-position:center left; background-color:white; }
.theimg background, to be saved to disk. For some reason, Save Page As.../Web Page (complete) still declined to download the background image at all, even in the absence of JavaScript, as if perhaps the CSS parser in the display logic in Firefox is smarter than the CSS parser in the Save Page As... code.
.mozilla/firefox/default.*/Cache/[0-9A-F]*). I'm still puzzled about why Page Info and the DOM Inspector won't actually reveal the image referenced in the .theimg style or allow it to be saved.
( [^ "]+\)")
.theimg, and then to load it directly. Perhaps that will change in the future.
To further protect your book content, printing and image copying functions are disabled on all Google Print content pages.
Similarly:
We've put a number of measures in place to prevent the downloading, copying, or printing of your content [...] Pages displaying your content have print, cut, copy, and save functionality disabled in order to protect your content.
I'm surprised at how much effort Google went to here. I would have expected my browser not to be vulnerable to having any of its "functionality disabled", yet, with a recent Firefox, I found that I couldn't
1. print the page to a PostScript file,
2. right-click on the page at all,
3. save the page to disk (the image would somehow not be downloaded at all),
4. view the precious image in Page Info/Media (although I could see which image it was),
5. save the precious image in Page Info/Media,
6. find the precious image in the DOM Inspector (which seemed like the really heavy artillery), although the DOM Inspector did let me see its URL as part of an uninterpreted style definition, and seem to reveal the trick: defining a style called ".theimg", with the definition
{ background-image:url("http://print.google.com/lon
and then invoking that style inside a tag:
So I tried turning off JavaScript, and I found that I was essentially no better off: right-clicking caused a copy of cleardot.gif, not the
The two ways I've found so far that work to capture images from Google Print are a screen capture (I used xwd, which of course worked perfectly) and looking in the on-disk cache (ls -lrt
If you wanted to write a proxy that would make Google Print pages capable of being saved to disk, you would presumably want to match
background-image:url("http://print.google.com/\
(although you'd need to be careful to match only the one in the definition of ".theimg", because it looks like there may at least one other background-image:url) and then replace
I haven't tried this because it felt like too much work relative to the previous two methods.
Contrary to what I expected, Google Print does not seem to check referer, so it seems to be possible merely to extract the URL from the definition of
Google must have hired some experts on html image protection or html obfuscation. To be sure, there are lots of other tricks in Google Print that I had never seen before. It is hard to think that the author of that HTML obfuscation was not the subject of Richard Stallman's accidental haiku. It is amusing to think that Mr. Bad's "other" DeCSS might at last be used for some kind of circumvention (although I doubt it, because presumably Google Print simply won't work at all with the CSS removed).
... if their DRM can be broken or not.
The point is that it is "good enough" to stop the average person from lifting the material.
If you're determined enough, nothing is going to stop you from getting what you want.
A full exploration of the html obfuscation and DRM employed by Google would be very interesting
I've been looking at this - there's a blog post with some preliminary discussions, and a follow-up giving some ways of getting around it. The short answer is that if you just want to save the image to disk, it's not too hard in a decent browser.
Gerv
Where can we see a sample of this to test whether it actually does these disabling things?
I do agree that this is a security problem. We already have options in some browsers (I use Firefox, for example) to block sites from changing status bar text, changing images, etc. And there was no fuss about that. I think disabling such basic functions as copy, paste, print falls in the same "no-no" category as changing statusbar text, changing images, etc.
A site presents a page in a certain way, but I as the user get to select how I view it, with what functions I want to view it, which parts of the site I want active and which ones I don't. You can't force me to accept what I don't want to accept. If I set my software to ignore part of your site, that's my choice, not yours.
You don't go disabling functions in users' browsers. You let them do that themselves. Conversely, you don't enable stuff the user didn't enable themselves.
Isn't it now about to be illegal to go changing peoples' browser settings via the use of spyware? Doesn't this come awfully close to doing the same thing? If it changes how my software behaves, it's awfully close to being malware.
i am a soviet space shuttle
Seacrh for "economic development".
gerv, a mozilla developer, has a few blog entries that talk about how the print service tries to stop you from getting to the jpeg's, and how to bypass that.
Google Print, And Clue Barriers
Google Print Hacking Ideas
nostrils
Now they're both mysteriously restricted to general viewing.
Your hair look like poop, Bob! - Wanker.
... another can undo.
:-)
It seems rather futile to try and restrict what people can do with images on the net. Given that fundamentally it's an open easily-parsed format, and wget is your friend, it ought to be relatively easy to write a harvester, if anyone could be bothered.
And there's the rub. Unless Google publishers are suffciently stupid (I've not seen much evidence of online stupidity in book publishers to date...) to put significant excepts from the book online, who'd care if you could download the images ?
At the end of the day, the best protection is to make sure that the good information is kept in the book, and the online imagery gives an indication of what you get when you pay for the book. This all presupposes the book is worth buying, of course, and perhaps that's the market they're trying to protect...
I guess this will protect against casual copying by the clueless, and that's probably all they're trying to do, but Google is every tech's favourite lovechild (brought about by those clever marketing peeps, which, er, aren''t most tech's favourite people. Well, moving swiftly on...). So Google are popular, and they do something that those tech peeps will react to (DRM), and quick as a flash there are workarounds. Hell, I expect a firefox plugin by tomorrow! A waste of time, perhaps ? Or just another example where the clueful (Mozilla users) have the advantage over the clueless (IE users
Simon.
Physicists get Hadrons!
Just put your monitor on a copy machine!
60 percent of the time, my comments are right everytime.
It's not tough "DRM"... my university's local online student newspaper equivalent effectively does the same thing.
The World Wide Web is dying. Soon, we shall have only the Internet.
They have to show the suits at the publishing houses that they are being responsible, safeguarding the suits' ``intellectual property''. It doesn't really matter whether it actually works, just as it doesn't really matter if the features in the checklist on the box of software work. It's a tool for the salesman to use.
If this feature exists but really doesn't work, then the suits get the illusion that their ``intellectual property'' is protected, and they get free advertising of the try-before-you-buy variety. For this best of all possible worlds scenario, it has to work well enough to fool the suits, but not well enough to stop the rest of us.
Sounds to me as if Google has gotten it to work just about well enough to do a good job for all concerned: Google, us readers, and even the suits.
See what I've been reading.
You are adding to the fire by allowing them to change the definition of copyright. Copyright gives holder no right to determine how one USES content, it merely gives them a monolopy right over copying the content for distributation. There are some copyright limitations on use, such as public displaying and the like, but fair use clearly says once you give ME a copy of your work, I can do anything I damn well chose to it.
It already gave me a copy of the work for free, if I chose to burn it, make a hat out of it, or print it out, it's my business.
Burn Hollywood Burn
First, turn off javascript. then turn on image dimensions. right click on the dimensions for the main image, and click view background image.
http://print.google.com/print?id=ULQSG0Zs7vcC&pg=3 &img=1&q=mastering+digital+photography&sig=gv2nFpt Ef0dj7Gzb8eZ4U8UdtUo
is the URL that is used, and surprisingly it is linkable from outside, it doesn't appear to check IP's, browsers, or anything else. (deep link away!)
Gerv, who works for mozilla/bugzilla, already went through this, and found several ways around google's hackery. He then went and summarized the multiple ways to do it in good browsers.
Get Firefox!
I seem to recall them using a simiar trick on the official site for Lord of the Rings when it came out.
The World Wide Web is dying. Soon, we shall have only the Internet.
$ wget long url from http://slashdot.org/comments.pl?sid=124900&cid=104 70948
Resolving print.google.com... done.
Connecting to print.google.com[64.233.161.118]:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
09:44:53 ERROR 403: Forbidden.
Read Epic the first RPG novel.
It's not a vulnerability at all... Just obfuscation.
The image is set to be a background image, using CSS. Like a background on Table, or on a website, the page doesn't let you click on it, to directly alter it.
But in the code itself, it's pretty obvious...
An example, of the straight JPEG
Colin Davis
Question #5 states:
What can I do with books that I find?
Well, you can browse a few pages, learn more about the topics explored by the book, buy it, or commit a selection to memory. To further protect your book content, printing and image copying functions are disabled on all Google Print content pages.
I don't see the big deal. As long as they let me still use "back", "forward" and "exit" I'll be happy. Sure it sucks that you might have to buy a book or write down your favorite quote, but it's free as in gratis at this point.
Amazon only lets you get about 3 pages into a book and usually you can't leave the introduction.
Get your Unix fortune now!
What's next, banning cell phone cameras in book stores, or libraries?
This sort of HTML onfuscation abuse is just the beginning. This is a general problem with any sufficiently rich presentation language. There are hundreds of different ways to obfuscate things.
Just wait until MS finally decides to properly support PNG alpha transparency! Combine this with CSS absolute positioning, and you'll start seeing images which are composited from many different layers of semi-translucent images; each of which is just noise of it's own. You also have already seen for a long time the cutting up of images into many small pieces.
This could be taken to an extreme as well. With absolute positioning you could also do this with text as well as images. Just position each letter on the page separately and randomize the order in which they appear in the HTML stream. Or even worse, use a custom downloaded font, where the glyphs are all randomized, so although it may look like an "A", it's really in the slot for a "Q"...try to cut and paste that.
Consider the PDF format as an extreme of where XHTML+CSS+DHTML+PNG can go wrt. obfuscation. Sure, the determined and savy can always get the text copied out; but that doesn't mean its not going to be very difficult.
Maybe we should all go back to ASCII and lynx.
This was always intended as a "feel good" feature of the Google print system so that pulishers would feel safer sending tons of books to Google.
/. But it's good enough for Google to run the business, most likely.
/. isn't going to spread enough FUD to publishers that would have otherwise sent in their material. Google print is still in its infancy, and could fail if Google doesn't assert some spin control on the situation, I suppose. Maybe I overestimate /.'s influence.
The "real" DRM here isn't DRM. As a previous post so astutely pointed out, DRM is schitzophrenic by nature: it involves trying to give someone something without *actually* giving it to them.
Google's "real" protection is that the service won't let you view more than a certain percentage of the book in any given month. That percentage is determined by the book's publisher at submssion time, anywhere from 20% to 100%.
Even if you can copy/paste/print, you're still only going to get a portion of the book - certainly not enough to replace a valid sale. Disabling that functionailty basically returns us to the age of photocopying a few pages of a book/article in a library. Except now we can search, so it's faster.
If one solution is as simple as "grab th data from your browser's cache" this is clearly meant to only stop the "average" user, something that is in very short supply here on
Here's to hoping this headline appearing on
I don't even see the point to this.
Really who is going to print out all 600 pages of the newest Tom Clancey book, then goto the effort of binding them together. It'd cost more in paper, ink, time & energy than to just buy the book.
Sure if it were a cooking book or something someone might only want 1 page. But then again, if they want 1 page they can just write it down.
Seems like a big waste of time and money to me, but then again after the IPO they have money to blow.
Most people arn't aware of that workaround. But browsers are supposed to work for the user not the website designer. "Features" that irritate the user in order to placate designers are antithetical to that the concept.
Designers didn't pay for my machine, why should they have any right to control what I do with it.
autopr0n is like, down and stuff.
1. This is not *your* content.
Let's say that you buy a song/movie and it has DRM which restricts the way you use it - you would be justified in removing the DRM to use it in your own way (provided that you engage in 'fair' use). The content that Google displays in its book search results are *NOT* your media. You do not own it, you have not paid for it and Google is providing it to you as a courtesy. To provide it, they have to ensure that you do not make copies of it since even Google does not own the media to be able to give it away to you. Nothing wrong in restricting your options here.
2. OMG they have control over the browser!
Yes they do not ask you before disabling your browser options. But this does not install a trojan, or do anything permanent with your computer like other sites do. If you do not like the fact that your options have been reduced on that page, all you have to do is hit the back button and scram. (It's like complaining that a particular room in someone else's house is too hot - if you don't like it, get outta there!)
3. The DRM can be disabled.
Sure, it can. If one man can enable it, another man can disable it. The point, as has been noted in several places, on several occassions is that the average person cannot disable it. And no, you cannot automate the process to get complete books since the guys sitting at Google are not stupid and they will have measures built in to prevent automated downloading of entire books (through whatever strategies - searching repeatedly etc)
And yes, I have to mention this : Google has shown me how to push the limits of HTML and scripting - First with Gmail and now with Google Print - they are doing stuff that looks like pure art to the programmer within me. Hurray for ingenuity!
"When the only tool you own is a hammer, every problem begins to resemble a nail." - Abraham Maslow (1908-1970)
I agree that the content does not belong to me.
However, according to this reasoning, book publishers (and newspaper publishers, and other producers of print media) should have control over lights in my environment, because I'm using them to read their stuff.
I prefer this approach: Part of the "terms of service" of making content publically available on the World Wide Web is accepting that someone can fetch that content and browse it in any reader they want.
Yes, I know that. But in Acrobat, it's expected that it will behave that way, and Acrobat does explain why. I deal with Acrobat every day, practically.
YOU seem to misunderstand something. It's OK, IN MY OPINION (have your own opinion, but don't screech at me for having one, and I won't scream at you for having yours) to do things like overlay with transparent GIFs, etc. that accomplish the same goal. But don't actively interfere with the user's expectations. If there's an image etc you don't want them to copy, overlay it with a transparent image (tirerack.com does this and it works well) but don't go disabling parts of the browser that the user expects to be there all the time. Who knows what they want/need it for?
IN MY PERSONAL OPINION, the balance I think is best is different than the one you think is best. Don't bitch at me for having a personal opinion and I won't yell at you for having one. Don't like it? Tough shit.
i am a soviet space shuttle
* Set Adblock to "Hide Ads" * Block: http://print.google.com/images/cleardot.gif * Prevent websites from changing the context menu: Web features > Advanced * et voila
Unfortunately, that is the idea behind "trusted" computing. You no longer have full control over your own machine, you can only run applications "trusted" by those controlling the DRM.
This used to be called "Mandatory Access Control" (MAC, as opposed to the kind of multiuser protection most people deal with... "Discretionary Access Control") before Microsoft decided to change the definition of "trust".
As soon as you run an untrusted app, you cannot run a trusted application.
This is one way of doing it. Another way is to create a compartmentalised environment, where applications can not get information from compartments with a higher classification, nor transfer information to compartments of a lower classification.
Ironically, THIS kind of MAC environment under administrative control can be a major security enhancement. You could create a compartment with "untrusted classification"... which would effectively have fewer rights than even a normal application... and force users to run their web browsers and other untrusted applications inside it. Not only couldn't they bet attacked through the browser, they couldn't even be suborned or tricked by a social engineering attack into breaking the security (that's the main point of MAC, really). Unfortunately, Windows doesn't seem to have any kind of generic MAC mechanism that could be used this way.
So:
- Start at the beginning of the book
- Read 3 pages
- Pick a phrase on the third page
- Search for that phrase within the book
- Click the search result for the third page
- Read the next two pages
- Pick a phrase on the fifth page
- Search for that phrase within the book
- Click the search result for the fifth page
- Read the next two pages
- Repeat until end of book
It's irritating, but when you're trying to find a passage in the book and the three-page limit smacks you, you can use this method to get more of the book (or all of it, if you have the patience).IE. Default settings. No proxy, no modifications. Nothing particularly special about it.
t ?blablahblah");bunch of other stuff;}
-Load up the book in the browser.
-Click the View menu, select Source.
-Search for "div class=browse"
-Immediately before that, you'll find something like this in a CSS style:
{ background-image:url(http://print.google.com/prin
-Take that URL, copy and paste it into a new browser window and voila, you have the full size image. Save As or Print on this image works fine. No problems at all.
Seriously, this is trivial to break.
What's not trivial is getting an entire book. How to figure out how to get every page is the tough part. Getting the image itself is a cakewalk. It's just Javascript tricks to break right-clicking and CSS tricks to break direct printing from that window. Saving gets broken because of the tricky CSS using the IMG as a background image. The browser doesn't think to save the image, is all.
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
There is no fancy copy protection. There certainly isn't some flaw in Mozilla.
It's simple - the image is done as the background image for an HTML element. There's nothing to stop you linking directly to the content: sample image, for example.
You can't right click on it because it's a background graphic. But you sure as hell could write a robot script that went and downloaded pages.
If they're clever, they'll watermark each image as it is served, so they can tell who's copying what (well, down to the originating IP, anyway).
1. Install Adblock. You should have it for other reasons anyway. :-)f
2. Add this URL to its block list:
http://print.google.com/images/cleardot.gi
3. Disable "collapse blocked elements" in Adblock while browsing Google Print.
4. Pick "View Background Image", then "Save Image As..."
I guess someone will come up with a Firefox extension in no time that will just add a context menu option called "Save Background Image as..."
Beware: In C++, your friends can see your privates!