Slashdot Mirror


LinkedIn Says It's Illegal To Scrape Its Website Without Permission (arstechnica.com)

A small company called hiQ is locked in a high-stakes battle over web scraping with LinkedIn. It's a fight that could determine whether an anti-hacking law can be used to curtail the use of scraping tools across the web. From a report: HiQ scrapes data about thousands of employees from public LinkedIn profiles, then packages the data for sale to employers worried about their employees quitting. LinkedIn, which was acquired by Microsoft last year, sent hiQ a cease-and-desist letter warning that this scraping violated the Computer Fraud and Abuse Act, the controversial 1986 law that makes computer hacking a crime. HiQ sued, asking courts to rule that its activities did not, in fact, violate the CFAA. James Grimmelmann, a professor at Cornell Law School, told Ars that the stakes here go well beyond the fate of one little-known company. "Lots of businesses are built on connecting data from a lot of sources," Grimmelmann said. He argued that scraping is a key way that companies bootstrap themselves into "having the scale to do something interesting with that data." [...] But the law may be on the side of LinkedIn -- especially in Northern California, where the case is being heard. In a 2016 ruling, the 9th Circuit Court of Appeals, which has jurisdiction over California, found that a startup called Power Ventures had violated the CFAA when it continued accessing Facebook's servers despite a cease-and-desist letter from Facebook.

4 of 167 comments (clear)

  1. then dont' make it public by Anonymous Coward · · Score: 5, Insightful

    don't make it public fi you don't want it read

    1. Re:then dont' make it public by Anonymous Coward · · Score: 5, Interesting

      don't make it public fi you don't want it read

      They want it read. By people. (And search engines.) They don't want it read by companies that take the information and then sell it as their business model.

      If we support hiQ, saying that scraping publicly-accessible content from another site and then using that for profit is permissible, then doesn't that mean it's also applicable to other sites? Slashdot's content is public: can I scrape everything, host it on my site, insert ads, and make money?

      Sorry hiQ, as much as software and internet legislation is behind the times and technically inappropriate, there are some things in law which follow common sense - and one of them is you can't take someone else's stuff and sell it for yourself. If you want to use their content then you need to follow the (common) practice of establishing some sort of licensing agreement.

      But anyways, what about their user agreement?

      You agree that you will not: [...] Develop, support or use software, devices, scripts, robots, or any other means or processes (including crawlers, browser plugins and add-ons, or any other technology or manual work) to scrape the Services or otherwise copy profiles and other data from the Services;

      Is that not enough for at least an injunction and civil suit?

  2. Happens in other industries too by ErichTheRed · · Score: 5, Interesting

    Airline websites have this same problem -- the online "cheap ticket" engines regularly scrape the publicly available data by essentially running the "book a trip" workflow millions of times to try to pull the entire set of fares for different city pairs. It's a cat-and-mouse game because the information has to be available for normal humans to book trips; no one is going to solve a CAPTCHA to look up fares. Basically these engines are looking for any irregularities like mis-filed fares or fares that happen to be a particularly good deal. (Airlines have to publish their fares in advance and make them available to online sources that are available to travel agents. This is why you'll occasionally see stuff like a transatlantic business class ticket for $50 or similar...)

    I'm not sure if LinkedIn can actually bar someone from scraping their public data. If that was the case, no one could run wget on a website and pull down all the static content.

  3. Re:This is bonkers! by bluefoxlucid · · Score: 5, Interesting

    Actually, LinkedIn has a point.

    LinkedIn supplies service to the public at-large, in the same way that a MicroCenter supplies retail service to the public at-large. All members of the public are allowed to enter a MicroCenter. You walk up to the doors and they open automatically.

    You can be trespassed for no reason by a retail center or other physical location open to the public at-large. The doors still open to you, but you're not allowed in. It's the same with a Web site: it's difficult in-practice to establish a verifiable packet identity on the Internet. IP addresses change, and you can do goofy shit like put the data scrapes in AJAX requests to distribute their source.

    In other words: you're by default authorized to access LinkedIn's public assets. You're not allowed to access stuff requiring a logged-in session until you've gotten log-in credentials, because there are actual systems in place to stop you from doing that, implying that you're not supposed to force access there. Basically, civilized understanding of the expectations of your host on the face.

    If LinkedIn tells you to stop, you've now had your authorization revoked. You can't claim a restraining order is invalid because someone's outside and you can also be anywhere outside, and you also can't claim that LinkedIn can't de-authorize you unless they specifically identify and block you. Blocking an individual entity from a Web site is hard and has collateral damage.

    So the CFAA is actually a valid vehicle here, since "abuse" is essentially defined as "accessing a system to which you are not authorized." The reasonable person test holds up a lot of behavior, largely because it's unreasonable for a person to determine if a certain behavior or function on a Web site might not be something they're allowed to touch, or whatnot, given the reasonable behavior of people at-large. A lot of stuff happens that won't pass CFAA as fraud or abuse, even though it's inconvenient and unintended. By the same token, when somebody has told you to stop accessing their systems in a certain way and you do it anyway, a reasonable person might assume you were, you know, told not to, and not allowed to do that, and that you know damned well you're not allowed to do that.

    That's not to say threats, lawyers, and other anti-social behavior are good business. Poor diplomacy here. Effective in the legal field, but not your best option.