LinkedIn Says It's Illegal To Scrape Its Website Without Permission (arstechnica.com)
A small company called hiQ is locked in a high-stakes battle over web scraping with LinkedIn. It's a fight that could determine whether an anti-hacking law can be used to curtail the use of scraping tools across the web. From a report: HiQ scrapes data about thousands of employees from public LinkedIn profiles, then packages the data for sale to employers worried about their employees quitting. LinkedIn, which was acquired by Microsoft last year, sent hiQ a cease-and-desist letter warning that this scraping violated the Computer Fraud and Abuse Act, the controversial 1986 law that makes computer hacking a crime. HiQ sued, asking courts to rule that its activities did not, in fact, violate the CFAA. James Grimmelmann, a professor at Cornell Law School, told Ars that the stakes here go well beyond the fate of one little-known company. "Lots of businesses are built on connecting data from a lot of sources," Grimmelmann said. He argued that scraping is a key way that companies bootstrap themselves into "having the scale to do something interesting with that data." [...] But the law may be on the side of LinkedIn -- especially in Northern California, where the case is being heard. In a 2016 ruling, the 9th Circuit Court of Appeals, which has jurisdiction over California, found that a startup called Power Ventures had violated the CFAA when it continued accessing Facebook's servers despite a cease-and-desist letter from Facebook.
don't make it public fi you don't want it read
Using some add-on python packages it is ridiculously easy to scrape any web page, even those that use ASP (It's a PITA to get set up the first time, but...). The ONLY thing - aside from legal action, apparently - is to have a login mechanism in front. Without authenticating, it's no-go.
Airline websites have this same problem -- the online "cheap ticket" engines regularly scrape the publicly available data by essentially running the "book a trip" workflow millions of times to try to pull the entire set of fares for different city pairs. It's a cat-and-mouse game because the information has to be available for normal humans to book trips; no one is going to solve a CAPTCHA to look up fares. Basically these engines are looking for any irregularities like mis-filed fares or fares that happen to be a particularly good deal. (Airlines have to publish their fares in advance and make them available to online sources that are available to travel agents. This is why you'll occasionally see stuff like a transatlantic business class ticket for $50 or similar...)
I'm not sure if LinkedIn can actually bar someone from scraping their public data. If that was the case, no one could run wget on a website and pull down all the static content.
Here's why it seems bonkers to me. . . When you access a website, you are merely sending that site a request for information. That's all. Assuming it responds with the requested information, one must presume that's because the operator (and, by proxy, the owner) of the website set it up for that purpose. So what we have here is effectively. . .
LinkedIn: Don't request information from us!
hiQ: Please send the following information.
LinkedIn: OK, here you go.
LinkedIn: Dammit, you requested information after we told you not to! WE'RE GONNA SUE!!
I refuse to use any social media site including LinkedIN. A lot of companies - such as Goodwill - recruit exclusively from LinkedIN. Fuck'em.
I don't work for any company that uses social media for recruiting.
Yes, it has, but only in the Central District of California as far as I know. The interpretation that the CFAA covers violating TOS was found to be overbroad in U.S. v. Drew, 259 F.R.D. 449 (C. D. Cal. 2009).
The CFAA applies immediately or when the defendant (or defendant to be) exceeds the permitted access. This could be also through a cease and desist letter. See Facebook, Inc. v. Power Ventures, Inc., No. 13-17102 (9th Cir. July 12, 2016) https://cdn.ca9.uscourts.gov/d...
You are permitted to grant different people different terms or access. Look at https://qz.com/981029/a-federa...
Fight Spammers!