Hulu Munging HTML With JS To Protect Content
N!NJA writes "Hulu has started encoding the html that they send to people's browsers, and then decoding it using javascript before rendering it. [...] They then run the character stream through a series of javascript functions to convert it back into plain text before pushing it into your browser using DHTML. That's quite a lot of effort just for fun, so I assume that is to stop screen scrapers from parsing content." I really can't understand all this effort. Boxee displayed the Hulu advertising perfectly. I suspect Alec Baldwin is to blame.
The XBMC guys already made a plugin after the last hulu change. It'll take a few hours and a new one will be made.
Especially if you SEND the user all the info they need, how hard is it to decode functions? There are crackers out there that take decoded assembly to figure out how to bypass DRM, what makes Hulu think their implementation will be any more difficult?
TunerFreeMCE couldn't scrape the data. Mission accomplished. Oh, wait... Tada:
"Update- version 2.6.7 is now available to download to work round this new tactic."
And now, I supposed, there will be a DMCA attack as phase two.
Yes, in fact, HtmlUnit is my preferred browser simulation library in Java for this very reason: it allows you to write very easy to understand Java code, and it uses Rhino as a JavaScript interpreter. Completely brilliant, and yet few people know about it.
The particular situation here deals with compressed/encoded HTML in an effort to prevent screen-scraping. This leaves two options for screen scrapers:
Option 1
1) Figure out how the decoder works
2) Replicate the decoder functionality in the screen scraper
3) Parse the decoded HTML
4) Make changes as the encoding scheme changes
5) ???
6) Profit!
Option 2
1) Link a Javascript engine like SpiderMonkey, Rhino, V8, or SquirrelFish into the screen scraper
2) Run the Javascript to decode the HTML
3) Parse the decoded HTML
4) ???
5) Profit!
Javascript + Nintendo DSi = DSiCade
They are being knuckleheads. Their "website" is analogous to a traditional TV channel and Boxee is analogous to a set-top cable box. You'd still get the Hulu ads, still get the Hulu branding.
To be fair, it seems like Hulu would very much like to be on Boxee - the distaste of the content providers' policies is palpable on their blog.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.