Slashdot Mirror


Hulu Munging HTML With JS To Protect Content

N!NJA writes "Hulu has started encoding the html that they send to people's browsers, and then decoding it using javascript before rendering it. [...] They then run the character stream through a series of javascript functions to convert it back into plain text before pushing it into your browser using DHTML. That's quite a lot of effort just for fun, so I assume that is to stop screen scrapers from parsing content." I really can't understand all this effort. Boxee displayed the Hulu advertising perfectly. I suspect Alec Baldwin is to blame.

9 of 281 comments (clear)

  1. Cat & Mouse. by 0100010001010011 · · Score: 5, Informative

    The XBMC guys already made a plugin after the last hulu change. It'll take a few hours and a new one will be made.

    Especially if you SEND the user all the info they need, how hard is it to decode functions? There are crackers out there that take decoded assembly to figure out how to bypass DRM, what makes Hulu think their implementation will be any more difficult?

    1. Re:Cat & Mouse. by tweek · · Score: 4, Informative

      It has nothing to do with piracy. It has to do with revenue from cable company contracts. The problem the "content providers" had was that via Boxee and other set-top pcs, people could forgo cable all-together and that would be a huge chunk of lost revenue. Hulu is popular but the ad revenue from Hulu is nothing compared to the money the cable companies pay "content providers".

      * I quote "content providers" because Hulu liked to use that phrase when Boxee was shut out. The fact of the matter is that Hulu is co-owned by two of these "content providers" so in essence, Hulu *IS* the "content provider"

      --
      "Fighting the underpants gnomes since 1998!" "Bruce Schneier knows the state of schroedinger's cat"
    2. Re:Cat & Mouse. by Idiomatick · · Score: 3, Informative

      The documents in Alexandria WERE copies. The reason the library was so great was that when people came to port the librarians would copy travelers' stuff. I think it would be kind of impressive if the riaa drmed some of their stuff and protected it so well that it dissapeared entirely... like top secret documents in the us gov.

    3. Re:Cat & Mouse. by fprintf · · Score: 3, Informative

      Don't bother! I was being totally serious. You see, from a business/MBA standpoint (yes, I know there are very very few of us here on Slashdot.. I think we might be outnumbered by the women) it all looks like stonewalling. And the thing is, the project architect is going to get blamed, probably when he is long gone from the first project, when it takes $500K to add a table to a data warehouse.

      Here is the problem. The MBA type guys don't have a clue what works or doesn't work from an IT perspective. We can only make suggestions of what we want, and encourage folks to seek acceptable alternatives. But if we say we have $5M to do a project, and you say it can't be done for less than $10M we have to trust you. Since we only have $5M you get to recommend either doing the project half-assed, with half the functionality required, or you don't get *any* work and the funding goes to some other project. So what ends up happening is a "multi-year" project is born. "We'll build the foundation and some of the features required and do the rest next year when more funding is available" the project manager will say. And yet, when next year rolls around then there is no funding 'cause a 100 other projects are requesting priority instead. It is a maddening circle. The MBA types, like myself, blame the IT team for incompetence and failure to deliver when they promise a certain feature set. The IT types blame the MBAs for being inflexible and unrealistic. Finally everyone blames the customer for being too demanding.

      --
      This post brought to you by your friendly neighborhood MBA.
  2. Phase One is Over by wonkavader · · Score: 5, Informative

    TunerFreeMCE couldn't scrape the data. Mission accomplished. Oh, wait... Tada:

    "Update- version 2.6.7 is now available to download to work round this new tactic."

    And now, I supposed, there will be a DMCA attack as phase two.

  3. Huh? by AlterRNow · · Score: 3, Informative

    My father gave me some HTML that was decoded with Javascript. To get the raw HTML was pretty simple IIRC..

    1) Load page in Firefox
    2) Open DOM explorer/inspector
    3) Export as HTML
    4) ???
    5) PROFIT!!

    --
    The disappearing pencil trick. Let me show you it.
    1. Re:Huh? by AKAImBatman · · Score: 4, Informative

      The particular situation here deals with compressed/encoded HTML in an effort to prevent screen-scraping. This leaves two options for screen scrapers:

      Option 1
      1) Figure out how the decoder works
      2) Replicate the decoder functionality in the screen scraper
      3) Parse the decoded HTML
      4) Make changes as the encoding scheme changes
      5) ???
      6) Profit!

      Option 2
      1) Link a Javascript engine like SpiderMonkey, Rhino, V8, or SquirrelFish into the screen scraper
      2) Run the Javascript to decode the HTML
      3) Parse the decoded HTML
      4) ???
      5) Profit!

  4. Re:Dumb question here by ynef · · Score: 5, Informative

    Yes, in fact, HtmlUnit is my preferred browser simulation library in Java for this very reason: it allows you to write very easy to understand Java code, and it uses Rhino as a JavaScript interpreter. Completely brilliant, and yet few people know about it.

  5. Re:Brand dilution guys.... by MightyYar · · Score: 4, Informative

    They are being knuckleheads. Their "website" is analogous to a traditional TV channel and Boxee is analogous to a set-top cable box. You'd still get the Hulu ads, still get the Hulu branding.

    To be fair, it seems like Hulu would very much like to be on Boxee - the distaste of the content providers' policies is palpable on their blog.

    --
    W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.