Slashdot Mirror


LinkedIn Says It's Illegal To Scrape Its Website Without Permission (arstechnica.com)

A small company called hiQ is locked in a high-stakes battle over web scraping with LinkedIn. It's a fight that could determine whether an anti-hacking law can be used to curtail the use of scraping tools across the web. From a report: HiQ scrapes data about thousands of employees from public LinkedIn profiles, then packages the data for sale to employers worried about their employees quitting. LinkedIn, which was acquired by Microsoft last year, sent hiQ a cease-and-desist letter warning that this scraping violated the Computer Fraud and Abuse Act, the controversial 1986 law that makes computer hacking a crime. HiQ sued, asking courts to rule that its activities did not, in fact, violate the CFAA. James Grimmelmann, a professor at Cornell Law School, told Ars that the stakes here go well beyond the fate of one little-known company. "Lots of businesses are built on connecting data from a lot of sources," Grimmelmann said. He argued that scraping is a key way that companies bootstrap themselves into "having the scale to do something interesting with that data." [...] But the law may be on the side of LinkedIn -- especially in Northern California, where the case is being heard. In a 2016 ruling, the 9th Circuit Court of Appeals, which has jurisdiction over California, found that a startup called Power Ventures had violated the CFAA when it continued accessing Facebook's servers despite a cease-and-desist letter from Facebook.

167 comments

  1. then dont' make it public by Anonymous Coward · · Score: 5, Insightful

    don't make it public fi you don't want it read

    1. Re:then dont' make it public by Anonymous Coward · · Score: 5, Interesting

      don't make it public fi you don't want it read

      They want it read. By people. (And search engines.) They don't want it read by companies that take the information and then sell it as their business model.

      If we support hiQ, saying that scraping publicly-accessible content from another site and then using that for profit is permissible, then doesn't that mean it's also applicable to other sites? Slashdot's content is public: can I scrape everything, host it on my site, insert ads, and make money?

      Sorry hiQ, as much as software and internet legislation is behind the times and technically inappropriate, there are some things in law which follow common sense - and one of them is you can't take someone else's stuff and sell it for yourself. If you want to use their content then you need to follow the (common) practice of establishing some sort of licensing agreement.

      But anyways, what about their user agreement?

      You agree that you will not: [...] Develop, support or use software, devices, scripts, robots, or any other means or processes (including crawlers, browser plugins and add-ons, or any other technology or manual work) to scrape the Services or otherwise copy profiles and other data from the Services;

      Is that not enough for at least an injunction and civil suit?

    2. Re:then dont' make it public by BronsCon · · Score: 3, Insightful

      They don't want it read by companies that take the information and then sell it as their business model.

      What do search engines do, then?

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    3. Re:then dont' make it public by Anonymous Coward · · Score: 0

      You have hit the issue on the head.

      Incidentally google has endured legal action against it for simply doing what search engines do.

      These distinctions are fuzzy, and each side has some legitimacy to their gripe.

    4. Re:then dont' make it public by tattood · · Score: 1

      What do search engines do, then?

      Search engines create an index that is searchable and make money by selling ads on the search page. Search engines are NOT collecting the website data, and make correlations about the data on the website and selling that data to companies.

      --
      WTB [sig], PST!!!
    5. Re:then dont' make it public by BronsCon · · Score: 0

      Oh? So they don't display page titles? And they've figured out a way to display the excerpt from the page that contains your search term(s) without collecting that data?

      Please, tell me how this magic. works.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    6. Re:then dont' make it public by saloomy · · Score: 1

      Even so they do (display titles), and even so they haven't (figured out a way to display excerpts without collecting data), Google respects a sites robots.txt, which clearly linked.in as asked hiQ to do, but hiQ is flouting the EULA. On one hand, I agree: if you don't want it read, keep it private.

      On the other hand: Posting something in the public space effectively gives the readers a right to consume it (akin to reading a book). It does not give the reader the license to freely copy or build upon it (akin to republishing the book under their own name, or writing a sequel to a book they have read) using the same characters and titles. For that, you need a license from the original content creator (Linked.In in this case).

    7. Re:then dont' make it public by BronsCon · · Score: 1

      Right, the issue here is the "we want to let search engines use it without license, but want to require a license for anyone else" attitude. Either require everyone who uses it to license it (even if you don't charge some of them), or require nobody to license it, that's more or less how copyright law works.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    8. Re:then dont' make it public by BradleyUffner · · Score: 1

      Exactly, "403: Forbidden" exists for exactly this reason.

      A program asked a web server for a particular page. The web server said "here you go!", instead of returning a 403 error.
      If this is about copyright infringement, then sue under a law that applies to that.

    9. Re:then dont' make it public by JaneTheIgnorantSlut · · Score: 1

      I guess you can't print a page either.

    10. Re:then dont' make it public by sexconker · · Score: 4, Insightful

      No, only one side has legitimacy.

      If you complain about people using information you post PUBLICLY, you are an idiot.
      This doesn't even rise to copyright infringement.

    11. Re:then dont' make it public by smooth+wombat · · Score: 3, Insightful

      "we want to let search engines use it without license, but want to require a license for anyone else" attitude.

      No, that is not correct. Search engines point to a page and may give a very brief line or so from the article, but one still has to click on the link to go to the real page and read everything.

      hiQ goes to the Linkedin site and rather than pointing to the pages in question, takes the data, packages it, and then sells it to someone else, having left Linkdedin to do all the heavy lifting.

      The two are not close.

      --
      We will bankrupt ourselves in the vain search for absolute security. -- Dwight D. Eisenhower
    12. Re: then dont' make it public by Anonymous Coward · · Score: 0

      Let linkedin win this one. Then, use the result to prohibit tracking for af purposes!

      The difference is small. You can't sell details about me...

    13. Re:then dont' make it public by F.Ultra · · Score: 1

      Where exactly in the complaint by Linkedin are they telling search engines to not index linkedin.com?

    14. Re:then dont' make it public by Dog-Cow · · Score: 1

      If MS asserted a copyright claim, that would be different. There is no fraud and no hacking taking place when scraping publicly-accessible data.

    15. Re:then dont' make it public by BronsCon · · Score: 4, Interesting

      The two are not close.

      They really are, though. LinkedIn has copyright on all of their content, in whole and in part, not just as a whole. That's how copyright works, otherwise I could change a single word in a book and republish it as an original work under its own copyright. It is also important to keep in mind that (most) search engines -- and Google specifically -- don't just grab the page title, META description (or first couple lines of content) and a word/phrase count, they grab the entire content of the page, and they do so in order to display the exact part of the content that contains your search term(s) -- as I mentioned earlier -- rather than a likely irrelevant summary or intro.

      To do this, search engines must necessarily use the entire page and not just key pieces of data. That is, Google et-al get away with using more of LinkedIn pages without license than hiQ is using. Therein lies the problem.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    16. Re:then dont' make it public by BronsCon · · Score: 1

      Not relevant to the argument being made, which was in response to someone claiming they want search engines to be able to do what search engines do then, in the very next sentence, claiming they don't want search engines to be able to do what search engines do.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    17. Re:then dont' make it public by Anonymous Coward · · Score: 0

      The fact that I point my browser to linkedin.com doesn't mean I agree to shit.

    18. Re:then dont' make it public by Gavagai80 · · Score: 1

      Slashdot's content is public: can I scrape everything, host it on my site, insert ads, and make money?

      Copyright law clearly makes that illegal. This case is a little different in that it seems to be about the kind of data that can't be copyrighted.

      --
      This space intentionally left blank
    19. Re:then dont' make it public by F.Ultra · · Score: 1

      But it seams that Linkedin want precisely that, i.e for search engines to continue to do what they do but not let hiQ do what hiQ does.

    20. Re:then dont' make it public by BronsCon · · Score: 1
      Right. I was replying to this, though:

      They want it read. By people. (And search engines.) They don't want it read by companies that take the information and then sell it as their business model.

      I was pointing out that search engines "take the information and then sell it as their business model."

      Sorry you missed that.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    21. Re:then dont' make it public by omnichad · · Score: 1

      Slashdot's content is public: can I scrape everything, host it on my site, insert ads, and make money?

      Plain old Copyright law is enough to put an end to that. However, facts are not copyrightable, and Linkedin has a lot of valuable facts in its database.

      But anyways, what about their user agreement?

      Can you put a EULA in a document folder (page 50 in a stack of 200 pages) and throw it on the ground in the park, and expect to enforce it when it tells people not to read the other pages in the envelope? That's the physical-world equivalent.

    22. Re:then dont' make it public by bws111 · · Score: 1

      It doesn't matter what search engines do. The owner of the site is perfectly within his rights to say 'these accesses are allowed, these are not'.

      Being indexed by a search engine is probably beneficial to LinkedIn. Both parties gain from being indexed, it is a symbiotic relationship.

      HiQ is probably not beneficial. By ratting out LinkedIn's user to their employers they are potentially decreasing the number of people who will use LinkedIn. That is a parasitic relationship.

    23. Re:then dont' make it public by omnichad · · Score: 1

      1) collecting the website data. Check The spider downloads all of the text content and stores an index along with contextual relationships.

      2) make correlations about the data on the website Check The hyperlinks on the web site are used to evaluate the relative importance of the linked web site.

      3) selling that data to companies. Nearly They don't charge for the search engine directly - they charge for advertisers and then provide the data for free to visitors. More or less the same thing effectively speaking.

    24. Re:then dont' make it public by Anonymous Coward · · Score: 0

      They are close only the part where the page content is read. They are completely different in the use of the data they take from LinkedIn.

    25. Re:then dont' make it public by BronsCon · · Score: 1

      The owner of the site is perfectly within his rights to say 'these accesses are allowed, these are not'.

      Yes, and they can do that with HTTP200 and HTTP403 status codes, respectively.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    26. Re:then dont' make it public by buss_error · · Score: 2

      No, that is not correct.

      I'm of two minds about LinkedIn.
      In the first place, I'm required to have an account by my current employer.
      In the second place, LinkedIn in my opinion does a ton of scraping themselves (asking to access your mail box contacts, for instance.) But at least Linkedin ASKs to access it. Still, it feels creepy to me. The "psycho" girl friend kind of creepy.

      On the third hand, LinkedIn told the to stop. So they should stop.

      --
      Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves.
    27. Re:then dont' make it public by F.Ultra · · Score: 1

      I didn't miss that, just look like you thing that extracting the title of a page constitutes "take the information and then sell it", something that is covered by fair use. It would be a whole different affair if i.e Google extracted and resold the amount of information that hiQ does, which of course was the point of the GP.

    28. Re:then dont' make it public by Anonymous Coward · · Score: 0

      > saying that scraping publicly-accessible content from another site and then using that for profit is permissible
      Yes, it is. Look at google, for example.

      > one of them is you can't take someone else's stuff and sell it for yourself
      Yes. Supermarkets for example, do not exist.

    29. Re:then dont' make it public by bws111 · · Score: 1

      Sure, they CAN do that, but they don't HAVE to do that. Once you have been told you don't have permission, you don't have permission.

    30. Re:then dont' make it public by BronsCon · · Score: 1

      I didn't miss that, just look like you thing that extracting the title of a page constitutes "take the information and then sell it"

      No, it looks like you missed where they're taking the entire content of the page and not just the title, since you keep coming back to "just the title".

      It would be a whole different affair if i.e Google extracted and resold the amount of information that hiQ does, which of course was the point of the GP.

      So, if Google took only key pieces of information, rather than the entire page, that would be problematic? Because Google takes the whole page, while hiQ takes key pieces of data; Google is actually taking, repackaging, and profiting from more of LinkedIn's data than hiQ is.

      But, all of that is still highly irrelevant to what I was replying to.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    31. Re:then dont' make it public by BronsCon · · Score: 2

      Actually, anything you're able to view from a public space is fair game under current laws, with the exception of court orders stating otherwise. If hiQ's servers can view the content from the public internet (that is, if LinkedIn's servers serve it to them without them hacking around some technical measure), it's fair game unless LinkedIn gets an injunction against hiQ. That is, what you're claiming is really for the courts to decide.

      Or, you know, LinkedIn could just claim copyright on their data and issue a series of DMCA notices.

      IANAL but I've consulted several regarding a very similar issue in the past.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    32. Re:then dont' make it public by Anonymous Coward · · Score: 0

      So the analogy is:

      LinkedIn = Average staffer (does the heavy lifting)
      hiQ = Management (takes the work, repackages it as their own, sells it so senior management or the owners)

    33. Re:then dont' make it public by angel'o'sphere · · Score: 1

      The question is nor what they grap/scrap but what they copy and redistribute.
      Google creates a catalog, pointing with every search result to the original.

      HiQ is balantly violating copy right, privacy rights and EULAs/TOSs.

      If you don't grasp the difference I hope you are not a software developer. Ignorance on that scale can easy be the end of your career.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    34. Re:then dont' make it public by Anonymous Coward · · Score: 0

      http://linkedin.com/robots.txt Search Engines are given explicit permission

    35. Re:then dont' make it public by BronsCon · · Score: 1

      Actually, both violate copyright, but the reality is that they can (and should) choose who they go after. And on those grounds, LinkedIn can and should issue a flurry of DMCA notices and sue for an injunction. They can't sue for damages without registering their copyright, including a copy of the work being registered, but they sure can get an injunction. The CFAA simply does not apply to publicly available data until and unless they get said injunction and even then it would be a hell of a stretch.

      Also, I suppose it takes near average or higher intelligence to determine when someone is making an absurd argument in an attempt to point out the absurdity and incorrectness of the argument he is countering.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    36. Re:then dont' make it public by KingMotley · · Score: 1

      Uh no. That's not how copyright works. They have to right to grant or deny the rights to copy their data however they see fit. Their EULA clearly states that you are not allowed to do exactly what HiQ is doing. Copyright, unlike trademarks, does not forfeit its right to enforce it just because they haven't in the past, or have chosen to enforce it in specific instances. HiQ has been directly told they don't have permission, end of story.

    37. Re:then dont' make it public by AHuxley · · Score: 1

      Re "What do search engines do, then?"
      Connecting people who worked on secret mil/gov projects with people looking for staff to work on other secret mil/gov projects.
      So people list all the projects they worked with and can show they are trusted in plain text.
      They used the same methods in the gov/mil and just expect the same results on the net.

      --
      Domestic spying is now "Benign Information Gathering"
    38. Re:then dont' make it public by BronsCon · · Score: 1

      Read the rest of my comments, then feel silly.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    39. Re:then dont' make it public by American+Patent+Guy · · Score: 1

      No, that's not quite correct. LinkedIn only has a copyright in that which (1) they acquired from their employees or other sellers and (2) constitutes a "work of authorship". They do not have a copyright in the content acquired from other sources, e.g. data, phrases, images that originate from members or other sources. The arrangement of information on a page may be a work of authorship, but only if there is some creative aspect to it. Data on a web page is not a work of authorship, and no one has a copyright in it. Not even by the one who produced or collected it.

    40. Re:then dont' make it public by American+Patent+Guy · · Score: 1

      Data is not copyrightable, because it isn't a "work of authorship" under the copyright statutes. That's why LinkedIn is using this hacking law in a contorted way to try to stop the use of this content.

    41. Re:then dont' make it public by kwbauer · · Score: 1

      No, they have given permission to search engines to do what search engines do which is to direct traffic to the website from which the data was collected. The search engines are operating within the license given to them by LinkedIn.

      LinkedIn has not given hiQ permission to do what it is doing. Kind of like how a songwriter or copyright holder can give permission to an organization to reprint songs/music in a songbook but restrict others from reprinting those same songs or even photocopying them out of the authorized book even if other music in that book is openly allowed to be copied.

      This really is not a new concept.

    42. Re:then dont' make it public by kwbauer · · Score: 1

      They do not violate copyright if the copyright holders says they don't violate copyright. Kind of like how the police cannot charge my neighbor for stealing my lawnmower if I don't care that he has borrowed it even if I didn't give my express permission to him prior to the police asking me about it.

    43. Re:then dont' make it public by kwbauer · · Score: 1

      but the "less" and "effectively speaking" are the keys to the whole thing. Along with that pesky little thing called copying without permission for the intended usage. For some concrete, everyday examples, wander into any Catholic or Methodist or LDS (Mormon) church and look through the hymnal. You will find something at the beginning explaining how all the music can be copied for non-commercial use except as otherwise noted. And then you will find that some of the songs are marked with phrases that fit the "otherwise noted" category. This is the exact same situation applied to the content on a web-site instead of the content in a book of music.

    44. Re:then dont' make it public by saloomy · · Score: 1

      No. The difference between a search engine and hiQ is that a search engine (Google specifically, but I think all of them) respect the robots.txt instruction set. If you don't want a search engine in your content, then there is an "opt out". Maybe one could argue that this should be an "opt in", but there is a way to say "I don't want to authorize you to scan my page for indexing".

      hiQ has not listened to Linked.In's "opt-out" in the form of a cease and desist which is a pretty strong indication the content owner doesn't grant the permission to be scraped.

      I think the court will find in Linked.In's favor here, and so they should. hiQ is operating without a license to reproduce the original works. This is what Copyright is designed specifically to stop.

    45. Re:then dont' make it public by kwbauer · · Score: 1

      But Google is doing something of which LinkedIn approves and has given Google permission to do. hiQ, on the other hand, is doing something of which LinkedIn does not approve and has not given hiQ permission to do. That is entirely the difference here. LinkedIn believes that they benefit from the way Goole indexes their pages and allows them to be searched but LinkedIn believes that what hiQ does is harmful to LinkedIn as it will tend to drive people away.

      I understand that people who have never created anything of value or who believe strongly in socialism have no concept of ownership of property (aka, property rights) so they cannot distinguish between the two usages but, until such a time as the whole world has descended into the dystopian hellholes you have name Utopia, please try to follow along.

    46. Re:then dont' make it public by kwbauer · · Score: 1

      A library shelf is a public space and so is a museum wall. Are you claiming that anybody has the right to walk in, take pictures or photocopies of anything in those public spaces and resell those copies and that they are not violating current law? I would be asking those several lawyers that you consulted with for a refund.

    47. Re: then dont' make it public by Anonymous Coward · · Score: 0

      If you're not a user (i.e. you didn't press "I agree" and signup), you can't be bound by legally questionable "Terms". If the court rules in favor of LI here, we can say goodbye to Wikipedia, search engines, archive.org, and many more sites that crawl/scrape to gather info.

      The rule today, as it's always been, is: if you don't want it copied, don't publish on the Internet.

    48. Re:then dont' make it public by Anonymous Coward · · Score: 0

      I'm pretty sure you can't copyright facts. Which LinkedIn is a collection of. They can copyright the arrangement of them but a scrapper isn't using the arrangement just the content. Perhaps things have changed down there since I last looked into it though...

    49. Re: then dont' make it public by Anonymous Coward · · Score: 0

      Fail.

      Supermarkets do not take other peoples stuff and sell it for themselves. They either:

      1. Buy other peoples stuff and then sell it with their own margin on top
      2. Sell other proples stuff on behalf of those other people and then take a cut

      Both happen. Some supermarkets even charge for shelf space to stock a given product.

      None of these things involve taking other peoples stuff without due recompense.

    50. Re:then dont' make it public by alzoron · · Score: 3, Insightful

      This is not a copyright issue. This is a CFAA issue. It's been long determined that you cannot copyright facts. The CFAA deals with unauthorized access to computer systems. LinkedIn told these companies to stop doing it and they kept doing it That's a pretty clear case of unauthorized access.

    51. Re:then dont' make it public by Anonymous Coward · · Score: 0

      This is a false analogy since /. is protected by copyright law. Generic data is NOT protected by copyright generally.

    52. Re:then dont' make it public by BronsCon · · Score: 1

      When it comes to reading (viewing in your museum example), which is what was discussed in the argument I was originally replying to, the above is absolutely true. When it comes to copying, it's a little more nuanced than that, of course; but, then, I was writing a Slashdot post, not a fucking dissertation, I certainly was not giving legal advice and, again, was arguing against someone who claimed that merely viewing something viewable from public space, which the owner readily serves up to you with no technical measures in place to protect it, is illegal. De-facto, it is not. That is, I was talking about vieweing, not copying, a fact that becomes abundently clear when you read the entire post, rather than just the first and last sentences.

      And, with regard to LinkedIn's complaint, whether or not hiQ has the right to view the data is precisely what matters. They do have that right, which should render the complaint invalid.

      Copyright, on the other hand, well... I talked about that in part of my post that you clearly skipped over.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    53. Re:then dont' make it public by BronsCon · · Score: 1

      You can't copyright facts, but you can copyright collections of them, provided you've done more than simply compile facts you already had. That is, if you had to collect the facts from multiple sources, your representation of those facts is protected by copyright. LinkedIn's data is a collection of facts, which they collected from multiple sources.

      I'm sure you're very familiar with patents, but you've missed some nuance of copyright law in your reply.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    54. Re:then dont' make it public by BronsCon · · Score: 1

      You can, in fact, copyright a representation of facts you've done work to compile from multiple sources. LinkedIn has done work to compile their collection of facts and is entitled to copyright protection.

      That is unlike, for example, a local phone directory, because the local phone company is the sole source of that data. A nationwide phone directory, comprising data collected from the various ILECs and CLECs, would be a copyrightable work. The bar being so low in that case would likely mean you'd be fine putting out your own version if you can prove you collected the data yourself, but it's still protected.

      As for accessing a public website, well, HTTP provides for access control and it needs to be used here. Further, we're talking about LinkedIn which, like Facebook, participates in active tracking of users of other websites, so access to their servers is practically forced on you if you don't actively seek to avoid it. I can't tell you to stop letting me fling data at your face, then hold you liable when I don't stop, which is precisely why technical measures (on LinkedIn's part) would be necessary for this to be a CFAA issue.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    55. Re:then dont' make it public by American+Patent+Guy · · Score: 1

      A compilation of facts can be copyrighted, but not the underlying data. If this company wants to extract those facts, data or other bits of information and create its own compilation, it violates no one's copyright. It doesn't matter whether those facts came from multiple sources or a single one.

      Even if there were to be a copyright here, the doctrine of implied license and the statutory exclusion of fair use upon infringement would probably apply. By making the data available to anyone over the web, LinkedIn arguably granted an implied license to use that data. (They don't publish it to keep it secret.) The courts have ruled it is a fair use to record movies on your DVR for your own personal viewing, and it would arguably be the same for extracting a collection of data from an Internet source, provided that the entity didn't compete with that source.

      I missed nothing. I'm an intellectual property lawyer.

    56. Re:then dont' make it public by BronsCon · · Score: 1

      A compilation of facts can be copyrighted, but not the underlying data. If this company wants to extract those facts, data or other bits of information and create its own compilation, it violates no one's copyright. It doesn't matter whether those facts came from multiple sources or a single one.

      If, in doing so, a copy of the compiled data is made... well...

      The courts have ruled it is a fair use to record movies on your DVR for your own personal viewing

      Yes, it is.

      and it would arguably be the same for extracting a collection of data from an Internet source

      Sure, for your own personal use.

      provided that the entity didn't compete with that source

      You mean provided the use was not commercial or for profit, right? After all, you're:

      an intellectual property lawyer

      Your appeal to authority does not imply correctness or completeness of understanding; especially so given the argument you just made.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    57. Re:then dont' make it public by American+Patent+Guy · · Score: 1

      If, in doing so, a copy of the compiled data is made... well...

      ... and because that's how LinkedIn provides the data, the scraper operates under the doctrine of fair use. (There's no other way for it to collect the data.)

      Whether your copying of the data is for personal or business use is not distinguished in the law. Your impact on the market is what counts. This scraper isn't affecting LinkedIn's ability to operate or provide the service that it does. You're free to gather information over the web (or another medium) as much as you like, recompile it, and resell it if you want to. Whether you profit from it doesn't matter. Publishers of directories and phone books have been doing this for many years. Google, Yahoo, Bing and all the search engine providers do it too.

      Would you like me to correct you a further time?

    58. Re:then dont' make it public by Anonymous Coward · · Score: 0

      Isn't that how LinkedIn got their start? I recall them buying information from scraper sites in the first place?

    59. Re:then dont' make it public by BronsCon · · Score: 1

      So wait, what you're saying is if my use of a short sample from a song doesn't affect the sale of that song, it's fair use? Sorry, Robert Van Winkle wouldn't have settled out of court if it looked like he was going to win against Bowie and Queen. What's happening here is practically the same. So, no, I don't think I'd like you to correct me again, since your "corrections" don't align with reality.

      Fair used is determined based on four factors, which I'm sure you're quite familiar with (but hoping I'm not aware of for the sake of your argument):
      - the purpose and character of your use (e.g. commercial or not - which is relevant here)
      - the nature of the copyrighted work
      - the amount and substantiality of the portion taken, and
      - the effect of the use upon the potential market.

      Your argument is, effectively, that only the latter of those factors matters. Sorry, Mr. Slashdot Lawyer, but you're wrong and the law as written says so quite clearly. You've got the books, look it up for yourself.

      Regarding phone books, I've already raised that point elsewhere.

      I get it, you're an IP lawyer who specializes in patent law. It's okay, there are a lot of facets to IP law and it's important to specialize and be good at your specialty; other specialties will suffer as a result, but that's fine. You might consult one of your colleagues specializing in copyright before continuing this conversation.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    60. Re:then dont' make it public by BronsCon · · Score: 1

      Ugh... I had a whole response typed out, clicked Preview and, in my sleep-addled state, closed the tab without posting. I'm not writing it all again, but the long and short of it is that there are four factors used in determining fair use and you're considering only one of them. I would argue that hiQ's use of LinkedIn's data does harm their ability to collect and sell it; I sure don't want hiQ mining shit about me from LinkedIn for the specific purpose for which they are doing so, so I'm less likely to continue maintaining my LinkedIn profile; I'm not alone in that, so yes, it does affect LinkedIn's business. Other factors include the purpose and character of the use (it's commercial, so it fails that test as well), the nature of the copyrighted work (time was spent collecting and compiling the data, time was spent designing the format and layout of the data, it's a creative work), and the amount and substantiality of the portion taken (all profile data is being scraped).

      As for phonebooks, I actually raised that issue elsewhere. This isn't even a long thread; perhaps complete your research before assuming I don't know about fucking phonebooks, eh? As a patent lawyer, you should be really good at researching shit, shouldn't you? I mean, if you're good at your job, that is.

      I get it, you're a patent lawyer with barely a passing familiarity with aspects of IP law outside of your specialization. Perhaps consult one of your colleagues specializing in copyright before replying again, though.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    61. Re:then dont' make it public by American+Patent+Guy · · Score: 1

      Well, then. Post your "whole response" and perhaps I'll have something to respond to other than your sleep-addled insults. You haven't rebutted what I have said.

      As anyone can download LinkedIn's data, HiQ is doing nothing special in the market. You're less likely to use LinkedIn because you've discovered that it can be used by anyone in a way you don't like. HiQ hasn't impacted the market by scraping it. Your analysis applies to the entire compilation. HiQ is downloading information for individual postings, which is not protected by copyright. (If HiQ was breaking into LinkedIn's servers and downloading the entire compliation, then your analysis would have some merit.)

      If you posted such wisdom elsewhere, why not post it here? Is it so troubling for you, being so eager to make a point? (If you really had such a wise posting, that is.)

      I get it. You're a person who thinks he understands copyright law, and wants to prove his expertise. Perhaps you should consult with one of your colleagues specializing in law (of any kind) before replying again.

    62. Re:then dont' make it public by AK+Marc · · Score: 1
      So the solution is to provide public APIs, and request scrapers use those, so the data access can be tracked and identified just like when humans and search engines use it.

      If they make it public and predictable so search engines point to them, then they have given a robots.txt that allows that use, so it's "licensed" by the lack of controls, same as search engines.

      But anyways, what about their user agreement?

      The search engines never log in or agree to the user agreement, and this use seems to be a search engine that doesn't simply direct views to their page.

    63. Re:then dont' make it public by AK+Marc · · Score: 1

      I can go into the Louvre, sit in front of the Mona Lisa, and sketch an exact replica of it, down to the brush stroke (except for the fact that there are always people standing in front of it trying to take a selfie), then sell that copy. Wait, what was your point again?

    64. Re:then dont' make it public by AK+Marc · · Score: 2

      Then why didn't they file a copyright complaint? Instead, they are claiming "hacking" for viewing public information. (not copyright for using it, but "hacking" for viewing). Copyright is irrelevant, and not the complaint.

    65. Re:then dont' make it public by BronsCon · · Score: 1

      You haven't rebutted what I have said.

      You haven't read what I have written.

      If you posted such wisdom elsewhere, why not post it here?

      Because I did post it here, in this very thread. It's not my fault you've chosen not to read the thread in its entirety before replying to me, nor is it my responsibility to repeat everything I've ever posted here to every dumbass who can't scroll a page to find it himself.

      Sorry, I reserve that level of service for my paying clients, not random armchair quarterbacks who claim to be lawyers yet can't do a simple review of what's already on the page they're looking at before replying.

      Personally, I hope LinkedIn goes the copyright route on this, just so I can rub your nose in it when they win. They're not going to get anywhere with CFAA.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    66. Re:then dont' make it public by Wootery · · Score: 2

      That something is 'in public' doesn't mean you're free to copy it.

      Walk around a city and you might see countless TVs. That doesn't mean you're allowed to record them and sell the videos - that's still copyright infringement.

    67. Re:then dont' make it public by BronsCon · · Score: 1

      But Google is doing something of which LinkedIn approves and has given Google permission to do.

      Have they, though? Or have they simply not asked them to stop?

      I understand that people who have never created anything of value or who believe strongly in socialism have no concept of ownership of property

      Lovely assumption, but incorrect. I, in fact, have created quite a bit of value in this world. Just as a small sample, my clients value me enough to keep me employed long-term and my employees value the income and stability I provide them. So, then, you must think I'm a socialist? Why is that? Wait, no, you can't possibly think I have no concept of ownership of property when I've stated that LinkedIn has ownership of the data they've collected. Speaking of trying to follow along... So, then, I'm left wondering why you wrote this, as it certainly does not apply here.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    68. Re:then dont' make it public by BronsCon · · Score: 2

      Look again, they're given a lot of explicit restrictions and a handful of explicit permissions. In Google's case those are limited to:
      Allow: /psettings/guest-controls*
      Allow: /psettings/guest-email-unsubscribe*
      Allow: /psettings/sms-unsubscribe*
      Allow: /psettings/guest-controls/retargeting-opt-out*
      Allow: /settings/loid-email-unsubscribe-router*
      Allow: /settings/loid-email-unsubscribe*
      Allow: /help/

      For reference, the first 6 are pages where one can unsubscribe from various forms of marketing and the last is LinkedIn's support section. Anything else Google indexes (and they have indexed a LOT of LinkedIn's content) is without explicit permission, possible even contrary to the 45 explicit restrictions they've been given. For example, I found this in Google's index, and /profile/ is listed as a Disallow rule.

      Most of the search engines listed in that robots.txt have the same set of rules as Google. The only obvious exception is deepcrawl, which also has the following Allow rules:
      # Profinder only for deepcrawl
      Allow: /profinder*
      Allow: /profinder/*

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    69. Re:then dont' make it public by Anonymous Coward · · Score: 0

      What LinkedIn allows them to.

      The bottom line of the law is, you can do whatever LinkedIn allows you to with their site. They can be as permissive or as capricious about it as they like. They can specify entirely different terms for different people on the basis of "Because I say so," if they are so inclined--that's what the law says. Whether that's a good thing is a separate issue.

    70. Re:then dont' make it public by Anonymous Coward · · Score: 0

      You're allowed to record the TVs. Just not the works they are displaying. Where you cross the line from 'Recording the environment around you' to 'duplicating the copyrighted work on display in that environment' is up to the judge.

    71. Re:then dont' make it public by pnutjam · · Score: 1

      Have you seen google's "snippets" they do scrape content from sites, sometimes alot.

    72. Re: then dont' make it public by Anonymous Coward · · Score: 0

      Actually you can record video to your heart's content and it is completely legal.

    73. Re: then dont' make it public by Anonymous Coward · · Score: 0

      Maybe it's different in Canada?

    74. Re:then dont' make it public by Anonymous Coward · · Score: 0

      LinkedIn is granting permission. LinkedIn received (a bunch of) individual web requests, and then fulfills those requests. By fulfilling those requests and purposefully sending the data to the requestor they inherently grant permission. Stupid case should be dismissed.

    75. Re:then dont' make it public by Anonymous Coward · · Score: 0

      End user comment agreement:

      By reading this comment, you agree to transfer all of your worldly assets to the next person who says "supercalifragilisticexpialidocious" to you in person. If you do not agree, register your disagreement at the customer service desk in the largest mall in Seattle between the hours of 1 and 2 pm by next Saturday in triplicate written on an IRS tax form fully filled out as if for a ficticous person of your choice, but containing exactly 7 simple mathematical errors whose cumulative product is equal to the number of seconds since the start of the UNIX epoch at the time you hand the form over.

    76. Re:then dont' make it public by Anonymous Coward · · Score: 0

      I can go into the Louvre, sit in front of the Mona Lisa, and sketch an exact replica of it, down to the brush stroke (except for the fact that there are always people standing in front of it trying to take a selfie), then sell that copy. Wait, what was your point again?

      Yes, you can. And the Louvre will allow you to do that.

      But they don't have to. They can revoke your access at any time (tell you leave and not to come back).

      That is what LinkedIn has done. They have sent an official notice stating that permission is denied for HiQ to access their servers.

      Even if the door is open, once the owner says "Do not enter", coming in is trespassing. Accessing the data once you have been asked not to is still illegal access.

    77. Re:then dont' make it public by AK+Marc · · Score: 1
      Facebook is letting someone in that they know is going to make a copy, then calling it "hacking" that a copy was made.

      Even if the door is open, once the owner says "Do not enter", coming in is trespassing. Accessing the data once you have been asked not to is still illegal access.

      More like:

      Facebook has a used goods store. The sign says "open". Bob comes in, buys a chair, then re-sells it on eBay for twice as much. Facebook says that Bob is welcome to come in the shop, and buy things, but if he sells them, then he was trespassing when he bought the item.

      Facebook is arguing that getting the data is legal, but the use of the data after retroactively makes the legal act illegal hacking.

      You can't invite someone into your house, find out they took a smelly poo in the bathroom, the file charges against them because they trespassed to leave the poop, as you later claim that the invitation implied that there'd be no pooping. Facebook's door isn't just unlocked, when someone knocks (sends a get), Facebook opens the door and says "come in", then sues them for coming in.

    78. Re:then dont' make it public by kwbauer · · Score: 1

      Because the Louvre does not hold a copyright on the Mona Lisa? Many other works of art are still under copyright so let's focus on those.

    79. Re:then dont' make it public by AK+Marc · · Score: 1

      The claim does not talk about copyright, so let's focus on the facts Facebook asserts.

  2. SEO? by Anonymous Coward · · Score: 1

    I guess they don't care about being found on a search engine.

    1. Re: SEO? by Anonymous Coward · · Score: 0

      LinkedIn wants a court order to stop a certain company from scraping their site. LinkedIn doesn't want law to prevent anyone from scraping any site.

  3. Scrape or Scrap? by DontBeAMoran · · Score: 1

    Because if it's not illegal to scrap their websites, black hat hackers will have a field day.

    --
    #DeleteFacebook
  4. I've done several scraping projects by GerryGilmore · · Score: 3, Interesting

    Using some add-on python packages it is ridiculously easy to scrape any web page, even those that use ASP (It's a PITA to get set up the first time, but...). The ONLY thing - aside from legal action, apparently - is to have a login mechanism in front. Without authenticating, it's no-go.

    1. Re:I've done several scraping projects by iggymanz · · Score: 4, Interesting

      hahaha, you imagine login is a cure?

      no, scripts can log in. with sites having millions of users you can make as many logins as you need, it's a whack-a-mole the site can't win

    2. Re:I've done several scraping projects by im_thatoneguy · · Score: 3, Informative

      You can have terms of service though on a login to make it easily illegal.

      "By logging in you agree to not republish data that you view."

    3. Re:I've done several scraping projects by Gr8Apes · · Score: 4, Informative

      That's not illegal, that's merely a violation of the user agreement.

      --
      The cesspool just got a check and balance.
    4. Re:I've done several scraping projects by Anonymous Coward · · Score: 1

      Just keep in mind the language of the Computer Fraud and Abuse Act.

      Permissions, login systems, and any technical specification or description of what you are and are not allowed to do not at all come into play.

      Only the wishes and desires of the system operators, possibly taken long after the fact, are what get applied.

      You really need a written contract showing the system operator both wishes you to have access, and is authorized to have their wishes upheld.

      Simply giving you a login and a click-through screen saying you are allowed to use that login is not enough to reflect the wishes of the system operator and does not legally grant you access to anything under the CFAB.

      Anything showing the implied wishes of the system operator may get you out of any claimed damages since it shows you were acting in good faith, but will still have them taken into account.

      A contract is the only way to really defend your past access as legal.
      Although it should be obvious in such a court case the contract will only cover the past, the date it was signed to the day of the claimed illegal access that is being argued in court.
      If you continue to access their system after they tell you that you aren't supposed to, the time from that moment forward will still be considered as illegal and include damages.

    5. Re:I've done several scraping projects by Zero__Kelvin · · Score: 2, Insightful

      Which gives them standing in court. It *might* not be a crime but it creates a contract that doesn't exist without it. This is far from the first time a company has tried the old "The Internet doesn't work the same way for us as it does for the rest of the world. Callsies, no take-backs!" defense.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    6. Re:I've done several scraping projects by im_thatoneguy · · Score: 2

      Not criminal but breach of contract is grounds for a civil cause of action.

    7. Re:I've done several scraping projects by Anonymous Coward · · Score: 0

      >it creates a contract

      No. I didn't sign anything.

    8. Re:I've done several scraping projects by Zero__Kelvin · · Score: 1

      You forgot the "IANAL". I know this, because if you were you would also know about EULAs and other ways to enter into a contract, including verbal agreement.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    9. Re:I've done several scraping projects by Anonymous Coward · · Score: 0

      > even those that use ASP

      ASP is server-side so there is no difference scraping it versus standard web pages.

    10. Re:I've done several scraping projects by Cajun+Hell · · Score: 1

      no, scripts can log in. with sites having millions of users you can make as many logins as you need, it's a whack-a-mole the site can't win

      There's no rule that says getting login credentials needs to be trivial. Can you make a throw-away account at your bank?

      LinkedIn can authenticate people if they want, assuming they don't mind having a barrier to entry that keeps people from using their site. But keeping people from using their site does seem to be the agenda item here...

      --
      "Believe me!" -- Donald Trump
    11. Re:I've done several scraping projects by Anonymous Coward · · Score: 0

      Electronic signatures are recognized in virtually all courts of law. They may not have standing for major actions, like mortgaging your home, but they certainly do have standing when it comes to copyright.

      https://en.wikipedia.org/wiki/Electronic_signature#Enforceability

    12. Re:I've done several scraping projects by Rockoon · · Score: 1

      It *might* not be a crime

      Breaking a contract is not a crime. Full stop.

      --
      "His name was James Damore."
    13. Re:I've done several scraping projects by Zero__Kelvin · · Score: 1

      ... and, start again. If the contract says that any breach of the contract means you are no longer allowed to access the system then continuing to access it after violation of the contract almost certainly *IS* a crime. A meatspace analogy would be if you are allowed to get merchandise and pay later, then violate the contract knowingly and take product from the warehouse anyway you have committed theft.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    14. Re:I've done several scraping projects by Anonymous Coward · · Score: 0

      I don't think he means it'll stop it dead like its impossible for a computer to mimic

      He means as long as you have control over account creation you can stop it, but yes, obviously if you leave account creation open to anyone, bots can log in.

    15. Re:I've done several scraping projects by GerryGilmore · · Score: 1

      Sure, scripts can login - if you have a valid login credential. If you do, obviously you can scrape away. If not, well....

    16. Re:I've done several scraping projects by h33t+l4x0r · · Score: 1

      Not if they verify with SMS, which I believe they do.

    17. Re: I've done several scraping projects by Anonymous Coward · · Score: 0

      All of which carry no weight in a court room.

    18. Re: I've done several scraping projects by Zero__Kelvin · · Score: 2

      You couldn't be more wrong.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    19. Re:I've done several scraping projects by Anonymous Coward · · Score: 0

      It might not create a contract. I always make my son click Agree on EULAs, etc because he can't legally enter into a contract... And because he can't enter into a contract, he can share that account with me even if the EULA says otherwise.

    20. Re:I've done several scraping projects by Zero__Kelvin · · Score: 2

      That's not correct. You directing your son to click the button is no different than you directing him to commit a crime. The culpability and responsibility rests with you. You will be held to the contract in the former case and charged with a crime in the latter. Great parenting though!

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    21. Re:I've done several scraping projects by Anonymous Coward · · Score: 0

      Judges' interpretations of the CFAA would disagree with you.

    22. Re:I've done several scraping projects by iggymanz · · Score: 1

      not a deterrent, buying a block of cell phone addresses is common too (usually for telemarketing scam call origination)

    23. Re:I've done several scraping projects by iggymanz · · Score: 1

      I think the volume of telemarketing scams originating in India (where they block block of cell phone addresses in the USA to use as origination point) proves there are plenty of fraudsters that don't give a crap about your words on paper ratified by a room of politicians.

    24. Re:I've done several scraping projects by iggymanz · · Score: 1

      the fraudster/scammer in a 2nd or 3rd world toilet is shaking in his boots at the threat your country's legal system represents....HAHA!

  5. Happens in other industries too by ErichTheRed · · Score: 5, Interesting

    Airline websites have this same problem -- the online "cheap ticket" engines regularly scrape the publicly available data by essentially running the "book a trip" workflow millions of times to try to pull the entire set of fares for different city pairs. It's a cat-and-mouse game because the information has to be available for normal humans to book trips; no one is going to solve a CAPTCHA to look up fares. Basically these engines are looking for any irregularities like mis-filed fares or fares that happen to be a particularly good deal. (Airlines have to publish their fares in advance and make them available to online sources that are available to travel agents. This is why you'll occasionally see stuff like a transatlantic business class ticket for $50 or similar...)

    I'm not sure if LinkedIn can actually bar someone from scraping their public data. If that was the case, no one could run wget on a website and pull down all the static content.

    1. Re:Happens in other industries too by shuz · · Score: 1

      I have direct experience with this myself.

      This is why companies like Akamai have products geared specifically for this problem. However stopping bots is nearly impossible unless you deal with them on a realtime basis. It would be interesting if Linkedin could get the entire world to make website scrapers illegal and then actually enforce that illegality. As of now when a bot owner is shutdown they just move the operation overnight to the ISP that will take their business in the same country or move countries all together. Likely the only way to stop this behavior is in the website transaction process or to make scraping not monetarily feasible.

      --
      There is or can be built a machine that can simulate any physical object. -Church-Turing principle
    2. Re:Happens in other industries too by viperidaenz · · Score: 1

      Wouldn't it just be easier to run your bots through multiple VPN's with endpoints in different countries?

    3. Re:Happens in other industries too by Anonymous Coward · · Score: 0

      However stopping bots is nearly impossible unless you deal with them on a realtime basis.

      Nobody's going to stop good scraping bot(s) unless they're okay denying legitimate users as well.

    4. Re:Happens in other industries too by h33t+l4x0r · · Score: 1

      No offense but you are a complete noob if you're trying to scrape sites without connecting through proxies. LinkedIn will start sending 403's almost right away.

  6. This is bonkers! by Zobeid · · Score: 4, Interesting

    Here's why it seems bonkers to me. . . When you access a website, you are merely sending that site a request for information. That's all. Assuming it responds with the requested information, one must presume that's because the operator (and, by proxy, the owner) of the website set it up for that purpose. So what we have here is effectively. . .

    LinkedIn: Don't request information from us!

    hiQ: Please send the following information.

    LinkedIn: OK, here you go.

    LinkedIn: Dammit, you requested information after we told you not to! WE'RE GONNA SUE!!

    1. Re:This is bonkers! by bluefoxlucid · · Score: 5, Interesting

      Actually, LinkedIn has a point.

      LinkedIn supplies service to the public at-large, in the same way that a MicroCenter supplies retail service to the public at-large. All members of the public are allowed to enter a MicroCenter. You walk up to the doors and they open automatically.

      You can be trespassed for no reason by a retail center or other physical location open to the public at-large. The doors still open to you, but you're not allowed in. It's the same with a Web site: it's difficult in-practice to establish a verifiable packet identity on the Internet. IP addresses change, and you can do goofy shit like put the data scrapes in AJAX requests to distribute their source.

      In other words: you're by default authorized to access LinkedIn's public assets. You're not allowed to access stuff requiring a logged-in session until you've gotten log-in credentials, because there are actual systems in place to stop you from doing that, implying that you're not supposed to force access there. Basically, civilized understanding of the expectations of your host on the face.

      If LinkedIn tells you to stop, you've now had your authorization revoked. You can't claim a restraining order is invalid because someone's outside and you can also be anywhere outside, and you also can't claim that LinkedIn can't de-authorize you unless they specifically identify and block you. Blocking an individual entity from a Web site is hard and has collateral damage.

      So the CFAA is actually a valid vehicle here, since "abuse" is essentially defined as "accessing a system to which you are not authorized." The reasonable person test holds up a lot of behavior, largely because it's unreasonable for a person to determine if a certain behavior or function on a Web site might not be something they're allowed to touch, or whatnot, given the reasonable behavior of people at-large. A lot of stuff happens that won't pass CFAA as fraud or abuse, even though it's inconvenient and unintended. By the same token, when somebody has told you to stop accessing their systems in a certain way and you do it anyway, a reasonable person might assume you were, you know, told not to, and not allowed to do that, and that you know damned well you're not allowed to do that.

      That's not to say threats, lawyers, and other anti-social behavior are good business. Poor diplomacy here. Effective in the legal field, but not your best option.

    2. Re:This is bonkers! by Train0987 · · Score: 1

      Then blacklist IP's at the firewall(s) for endpoints that are scraping your site.

    3. Re:This is bonkers! by Zero__Kelvin · · Score: 1

      That is exactly the situation, except add "I'm not telling you who I am, and have not entered into a contractual relationship with you." Linked in *could* reply with "Sorry, you must tell us who you are and enter into a contractual agreement with us in order for us to send that information ", but they don't. So the only way LinkedIn or anyone else wins this case is with an ignorant judge or one who has been bought. If the defense does their job only the latter is possible.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    4. Re:This is bonkers! by tattood · · Score: 3, Insightful

      Then blacklist IP's at the firewall(s) for endpoints that are scraping your site.

      IP addresses are fairly easy to change. You can use something like TOR, so your public IP always changes.

      --
      WTB [sig], PST!!!
    5. Re:This is bonkers! by bluefoxlucid · · Score: 2

      Let's try this again.

      it's difficult in-practice to establish a verifiable packet identity on the Internet. IP addresses change, and you can do goofy shit like put the data scrapes in AJAX requests to distribute their source.

      Blocking an individual entity from a Web site is hard and has collateral damage.

      Wikipedia has tried this, with collateral damage and limited success. I've seen people get sent to jail for harassment and legally barred from accessing certain sites and systems under restraining order, and then continue to access them with no reasonable way to prove their identity (i.e. could be someone else pretending to be said person).

      These days, it's different. Those IP addresses are probably automatically-assigned or internal to cloud infrastructure. IAAS may share addresses across clients. The IPs may appear from a range of hundreds of subnets coming from auto-scaling AWS infrastructure, constantly provisioning and releasing addresses.

      In other words: "Block it at the firewall" can easily mean "Block everything coming from AWS, Azure, DigitalOcean, and all other data centers all over the world." Difficult (nigh-impossible) and prone to huge amounts of collateral damage.

      Then: the courts have already told you this is a matter of you having a public Web site, and you can deal with them "accessing" it yourself because you apparently have no right to tell people they're not allowed in due to their use of your published information. Now you have people jumping from address to address, and you're forced to play firewall whack-a-mole.

    6. Re:This is bonkers! by Anonymous Coward · · Score: 0

      When you access a website, you are merely sending that site a request for information.

      Presumably you've agreed to the sites Terms of Service and you request(s) comply with them, no? If you've taken any measures at all to evade or subvert the Terms of Service, then you're accessing the computer without authorization or in excess of what is authorized and you run afoul of the CFAA.

    7. Re:This is bonkers! by Ichijo · · Score: 1

      Except when you enter a MicroCenter, you are stepping foot on their property. When you anonymously request a public web page from a web server, you're standing on the public sidewalk at the walk-up window. Since you as a taxpayer own that sidewalk, can the store owner restrain you from your own property as a way to make you stop placing orders at the window?

      From TFA:

      [Orin Kerr, a legal scholar at George Washington University] argues sites wanting to limit access to their site should be required to use a technical mechanism like a password to signal that the website is not, in fact, available to the public.

      By making data publicly available to individuals to help LinkedIn gain exposure but trying to restrict it from other web sites who wish to use that same data for financial gain, it seems LinkedIn is trying to have their cake and eat it too.

      --
      Any sufficiently unpopular but cohesive argument is indistinguishable from trolling.
    8. Re:This is bonkers! by viperidaenz · · Score: 1

      A trespass notice can't stop you looking at MicroCenter from a public space.

      If you want to restrict someone in a public space, you need a restraining order from a judge.

      Transferring that to the internet, a cease and desist letter is like a trespass notice. Probably appropriate for telling someone to stop creating new logins to access restricted content after you disable their old ones.

      Asking a judge for an injunction would be appropriate to stop someone accessing publicly available content. Of course, this is never going to stop someone from another jurisdiction.

    9. Re:This is bonkers! by bws111 · · Score: 1

      Many stores have doors that you can open by pushing a button. Assuming the door opens, one must presume that is because the management (and, by proxy, the owner) of the store has set it up for that purpose. So what we have here is effectively..

      Store: You have been banned from this store. Do not come back
      You: Push the button
      Store: Door opens, you go in
      Store: We told you to stay out, we're having you arrested for trespassing

      This, of course, happens all the time (except for the idiotic assertion that the door opener somehow granted you permission to enter).

      In both cases, the DEFAULT position is that you have access. You may request information, you may enter the store. However, once you have been TOLD you do not have permission, then you DO NOT have permission. At that point it is YOUR responsibility to not access what you have been explicitly told you may not access. If you ignore that, you may find the applicable laws (trespassing, CFAA) used against you. And you will lose, because you have no defense. 'But they let me...' is not a winning defense, ever.

    10. Re:This is bonkers! by Gavagai80 · · Score: 1

      Trying to make it illegal to scrape the data is beside the point -- what linkedin really wants to do is prevent others from publishing the data. Just because you can find a book in the library and the book doesn't fire lasers at your eyes to blind you and stop you reading it doesn't mean you have permission to sell your own book which consists of photocopies of that book with a few small changes.

      --
      This space intentionally left blank
    11. Re:This is bonkers! by Frosty+Piss · · Score: 1

      LinkedIn supplies service to the public at-large

      OK, there's where you're wrong.

      --
      If you want news from today, you have to come back tomorrow.
    12. Re: This is bonkers! by Anonymous Coward · · Score: 0

      I don't think pushing the button is a correct analogy.
      Rather, stay at the gate and ask the guard for permission to enter. Without any 'action' as pressing a button. An action like pressing a button would assume something more complicated, such as logging into the system, agreeing to some terms of service and the like.
      Instead, hiQ is just asking for information from LinkedIn and it receives it.
      LinkedIn is too lazy to implement a proper system (or too expensive for their pockets), so they try to enforce the law on some weird way.

    13. Re:This is bonkers! by Anonymous Coward · · Score: 0

      This is just crazy.
      So if it is in the public space can it all be controlled by the corporations?
      Okay, two way street.
      I have a personal site which has a set of rules and conditions of which you will agree to every time I visit. This is done in every request via additional post values to the webservers logs indicating my website and that you agree for me to continue unabated if you have not chosen to opt-out by visiting my site and adding your site to the list.

      This is why people want the right to forget but the corps don't want that.
      Once it is in the public, your intentions ARE for the public, no one left behind. Whether you like it or not.
      This goes for the corps too, in my mind.

    14. Re:This is bonkers! by American+Patent+Guy · · Score: 1

      You're not allowed to access stuff requiring a logged-in session until you've gotten log-in credentials, because there are actual systems in place to stop you from doing that, implying that you're not supposed to force access there.

      Actually, if the scraper used a valid username and password (or other valid credentials) to gain access, access was authorized. It might have violated a user agreement perhaps, but that's a separate civil matter. The Computer Fraud and Abuse Act specifies criminal acts that a private entity (like LinkedIn) can't use as a basis for its suit.

    15. Re: This is bonkers! by Anonymous Coward · · Score: 0

      If that's the case then that's what copyright is for. I'd like to see them argue that automatically collected metadata is copyrightable.

    16. Re:This is bonkers! by Anonymous Coward · · Score: 0

      uh huh. when you start locking out black people from microcenter, because you don't like what they're doing; let me know.

      you can't have public content, and private access. particularly when you grant that private access.

    17. Re:This is bonkers! by bluefoxlucid · · Score: 1

      When you request a public Web page, you're accessing and using their machinery.

      My entire argument was "available to the public" versus "except you; you get the hell out right now." A technical mechanism is infeasible: if they want the data to be publicly-viewable and don't want people to do certain things, then a password doesn't work; and firewalls and the like will have to contend with modern global, auto-scaling, IP-changing data centers where you can't just single out a particular actor by IP address or any other identifier.

      Legal scholars can argue whatever they want, while other legal scholars come behind them and argue that they're wrong. That's kind of what lawyers do. Your argument essentially claims that anything they put up is a public property and that they can't have any terms of use for anonymously-accessible assets. Considering I can ping your server anonymously, a DDoS would be legal under your reasoning--without even going to reducto-ad-absurdium.

    18. Re:This is bonkers! by bluefoxlucid · · Score: 1

      Actually, transferring that to the Internet, you have to walk into the MicroCenter and turn the display around, then go back outside the window and look at it again to get a view of what's there. Every time you want to see it, you have to walk inside, fiddle with things, then walk back out.

      You do know that nothing is actually "on the Internet", right? Do we need to explain to you how the Internet works?

    19. Re:This is bonkers! by bluefoxlucid · · Score: 1

      The point wasn't that they used a password; there was a further point down that LinkedIn had de-authorized them from non-password-protected mechanisms: they told them they're now specifically not allowed to do that, which means they're not.

      Imagine if you ssh'd to a bank's accounting system across the 'net and found that it just lets you log in as root, no password. Is that also legal?

    20. Re:This is bonkers! by Anonymous Coward · · Score: 0

      Fuck LinkedIn. I deleted my account years ago, but when their pathetic admins let hackers extract their database well after, I find myself in the hacked list.

    21. Re:This is bonkers! by Ichijo · · Score: 1

      Considering I can ping your server anonymously, a DDoS would be legal under your reasoning...

      Starting from step 1 (assembling a botnet), how do you organize an effective DDoS using purely legal means?

      --
      Any sufficiently unpopular but cohesive argument is indistinguishable from trolling.
    22. Re:This is bonkers! by bluefoxlucid · · Score: 1

      Arranging the infrastructure wouldn't be necessarily-legal. The ownership and control of the source of the attack may be illegal. The attack itself, however, would not constitute an additional illegal act beyond just having a botnet and sitting on it.

      Likewise, if I just spun up several thousand micro-servers in AWS spread across all data centers and smashed the shit out of your site with requests declaring their source to a dead IP in the same subnet, I could have your servers pump out tons of HTTP responses with large images and other assets and kill itself. Totally legal, right?

    23. Re:This is bonkers! by Ichijo · · Score: 1

      That sounds like an expensive way to run a DDoS. Has it ever happened in real life?

      --
      Any sufficiently unpopular but cohesive argument is indistinguishable from trolling.
    24. Re:This is bonkers! by viperidaenz · · Score: 1

      No, you'd need to yell from the street "Hey, what's in your store MicroCenter?" and they would yell right back what ever the the hell they want to.

      You send a request for information and receive a response. You don't move anything around or change anything.

    25. Re:This is bonkers! by bluefoxlucid · · Score: 1

      So, if I hack into your bank account, I'm just standing outside your bank yelling, "Hey, can you give me some money? Maybe viperidaenz's? He's got some!" and they're handing me cash, and I haven't done anything wrong?

    26. Re:This is bonkers! by bluefoxlucid · · Score: 1

      No because it would be illegal. People do pay thousands of dollars to borrow an illegal botnet for DDoS, though.

    27. Re:This is bonkers! by viperidaenz · · Score: 1

      Depends if you did it by bypassing a security measure, or by falsifying your identity.

      Crawling a website doesn't require any of that. Google and Bing don't get in trouble for it. They both use the data they collect to make money. Not really any different to what this company is doing.

      However, sending usernames and passwords in headers or in post data that are not yours is a little bit illegal. Lying about who you are to get something in return is "obtaining by deception"
      Additionally, sending your own username and password means you've explicitly agreed to a contract to obtain them in the first place. Using that service against the terms of your contract opens up your liability.

  7. CFAA does not apply by Anonymous Coward · · Score: 1

    If you read the act, it is clear that it applies to financial and government systems. It has not been tested in court that the CFAA covers violating terms of use. You really only need to go to some basic contract law to deal with people accessing your site in a way you do not like, and copyright law for distributing your material without permission.

    1. Re:CFAA does not apply by Anonymous Coward · · Score: 0

      It applies to computer systems involved in interstate commerce. That would seem to include LinkedIn servers.

    2. Re:CFAA does not apply by russotto · · Score: 2

      It has not been tested in court that the CFAA covers violating terms of use.

      Yes, it has, but only in the Central District of California as far as I know. The interpretation that the CFAA covers violating TOS was found to be overbroad in U.S. v. Drew, 259 F.R.D. 449 (C. D. Cal. 2009).

  8. Exactly! by Anonymous Coward · · Score: 2, Informative

    I refuse to use any social media site including LinkedIN. A lot of companies - such as Goodwill - recruit exclusively from LinkedIN. Fuck'em.

    I don't work for any company that uses social media for recruiting.

  9. You know it is not that simple. by Anonymous Coward · · Score: 1

    They DO want it read, by the end-users who consume it and don't resell it. They DON'T want it read by aggregators who profit from it (and especially in a way that drives users to restrict their use of the service).

    There is no solid technical solution for distinguishing between these two classes of user. So businesses are using the law to draw that distinction instead.

    I personally think the "don't scrape" approach is totally backwards. They should take a "don't redistribute" approach instead. But logic commonly loses to wealth in the real world.

  10. okay? by Anonymous Coward · · Score: 0

    The Law of the Land is HTTP.

    403 my requests. Until then, your front-facing data is fair game.

  11. what about wifi scanning just looking for ssid's by Joe_Dragon · · Score: 1

    what about wifi scanning just looking for ssid's is on by default on many os's

  12. Give all a bit of trust and get ripped off by Trax3001BBS · · Score: 1

    I refer to the Robot.txt used to tell search engines what's out of bounds. http://www.searchtools.com/rob...

    1. Re:Give all a bit of trust and get ripped off by HornWumpus · · Score: 1

      But they want to be indexed by Google, just not by they company that tells employers their staff is looking.

      The solution is just to never, ever, stop looking. Even if you love your job, having a current resume on Linkedin will get you better raises.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    2. Re:Give all a bit of trust and get ripped off by Mandrel · · Score: 1

      But they want to be indexed by Google, just not by they company that tells employers their staff is looking.

      A robots.txt file can state which HTTP User Agent strings are allowed. For example, Slashdot only allows access by certain search engines. If you're starting a new one, you have to misrepresent yourself, or you're buggered. The question is when such misrepresentation is legal and moral, and whether it is instead up to sites to more accurately detect who they want to serve, and serve errors to those they don't.

      The solution is just to never, ever, stop looking. Even if you love your job, having a current resume on Linkedin will get you better raises.

      Again it pays to be the selfish squeaky wheel. The basis of advertising.

  13. A little concerned... by Anonymous Coward · · Score: 0

    I do a LOT of data scraping of the government websites here in my town and state. They don't make their data available for bulk download so the only option is to scrape it to turn it into something useful for data analysis. Many of the sites, however, are seriously underpowered for the software they run (mostly ASP and ASP.NET sites) and, even though I throw in generous delays between each request, the various entities still take notice because it takes anywhere from 1 to 5 seconds for a response to come back. That means that something on their end is consuming 1 to 5 seconds of at least 1 CPU core for each request. If the court sides with LinkedIn, it sets a very bad precedent that government, especially local city and state governments who prefer to hide any of their illicit/corrupt activities, will most certainly use against those who want to hold them accountable. Several of the entities have claimed that I or one of my colleagues have "hacked them" over the years but the lawyers always go to bat for me and those government entities shut up real fast and let me continue to scrape their data unhindered. As a taxpaying citizen, holding government accountable is what my tax dollars are for, so I'm totally okay with anyone who scrapes government websites.

    I realize private entities are involved here, but precedent, once established, has a funny way of being abused by other entities, especially government. Frankly, I'd like to not see the inside of a prison cell for doing an important and necessary public service. The real solution is that LinkedIn should have separate infrastructure that can be scraped/interacted with all day long and not adversely affect their main infrastructure. They should be implementing public WebSockets so that scraping tools don't even need to guess and they can perform direct pushes of information. Maybe have bulk downloads of data available as well that LinkedIn perceives as "okay" for other entities to have. These are solvable problems with technology already at our disposal.

    My view: If you are on the Internet, then you provide your data at your own expense. I run my own websites this way and it is working out just fine for me. If LinkedIn can't survive, then its business model is wrong and the business should start innovating or die, not lash out and sue everything and everyone it doesn't like.

    1. Re:A little concerned... by Alok · · Score: 1

      My view: If you are on the Internet, then you provide your data at your own expense. I run my own websites this way and it is working out just fine for me.

      How much data do your websites have, and approximate number of people and other entities who might be interested in downloading it? Setting aside the point that you don't distinguish between public vs private data, or the method of consumption (web surfing vs scraping to sell to others); just try to calculate the bandwidth requirements if LinkedIn or any highly popular site were to actually do this.

  14. Most websites don't have any terms of service by Anonymous Coward · · Score: 0

    Presumably you've agreed to the sites Terms of Service and you[r] request(s) comply with them, no?

    I think that's a very unrealistic and uncommon presumption. 99% of the websites that I access, I have not agreed to any terms of service and I don't even know if they offer any special terms.

    By default, websites don't have any documented terms of service. The terms are just: send a request and I'll probably send you back a reply. (It's not like, by default, an Apache install comes with a generic ToC boilerplate legalese thing.)

    And most websites use this default (essentially: no terms). You're normally only limited by whatever the law happens to be, combined with various technical limitations. Usually there's no deviation, explained in some contract, which overlays the default situation.

    Of course, there are exceptions. A website having terms isn't so uncommon that we're all shocked by the thought of it. But whenever that happens, there is always a mechanism whereby the website refuses service, until you have agreed to the terms.

    And if the website doesn't refuse service without agreement to the "terms," then the "terms" weren't really terms.

    Imagine I said this: "I'll sell you my old car for $10k. Those are the terms: pay me $10k and I'll sign over the title on my car to you."

    You refuse my terms (because the car totally isn't worth it). Then you say, "Hey, can I have your car for free?" and I reply, "Sure, it doesn't run anyway. Just get that thing out of here. No? Ok, I'll pay you $50 to tow it out of here!"

    Did you subvert or evade my terms? No, I say you discovered my true terms. Or maybe you persuaded me of your new terms, you damn Jedi.

    It's always worth asking again, and everything is negotiable.

  15. They just went about this the wrong way by 93+Escort+Wagon · · Score: 1

    Now if LinkedIn had instead posted "ecto gammat", all the nerds would be in their corner.

    --
    #DeleteChrome
  16. Pot, Kettle. by AnotherBlackHat · · Score: 1

    LinkedIn's whole business model is "scraping" information from people. It's not like they pay people to enter that information.

    When CDDB tried this sort of B.S. it led to FreeDB. Maybe LinkedIn being assholes will lead to something similar.

  17. here's a thought by Anonymous Coward · · Score: 0

    if companies just chained enployees to their desk, there'd be no need to worry about them quitting

  18. HiQ by tylersoze · · Score: 1

    Can we talk about what HiQ is doing with the data for a sec? "HiQ scrapes data about thousands of employees from public LinkedIn profiles, then packages the data for sale to employers worried about their employees quitting" I mean WTF?

    1. Re:HiQ by American+Patent+Guy · · Score: 1

      Like it or not, if you (or an employee in your example) choose to publish information about yourself in a publicly-accessible place, then you've voluntarily relinquished whatever privacy rights you had in that information. Whatever you believe about HiQ, they are only organizing and re-releasing public information. LinkedIn has no copyright in it (as they didn't create the data, nor is it a work of authorship), and they were complicit in the act by delivering it up upon request.

  19. It's MY data, not LinkedIns by Anonymous Coward · · Score: 1

    Curious how GDPR will or could play here. If the Europeans have it right, a profile isn't LinkedIns. It's the collective people who put their info there.

    I'd suggest, a flag that the user can set, which says 'make x' of my profile indexable.

  20. No standing by American+Patent+Guy · · Score: 1

    The Computer Fraud and Abuse Act is part of the Federal Criminal Code, and no private entity can use it to bring a suit. A prosecuting attorney for the government could make a criminal charge, but LinkedIn would have to persuade him/them to take that act. This is much ado about nothing.

  21. Wrong! by www.sorehands.com · · Score: 3, Informative

    The CFAA applies immediately or when the defendant (or defendant to be) exceeds the permitted access. This could be also through a cease and desist letter. See Facebook, Inc. v. Power Ventures, Inc., No. 13-17102 (9th Cir. July 12, 2016) https://cdn.ca9.uscourts.gov/d...

    You are permitted to grant different people different terms or access. Look at https://qz.com/981029/a-federa...

    1. Re:Wrong! by russotto · · Score: 1

      Didn't read it, did you? It does say that a cease&desist can trigger the CFAA. It does not say "The CFAA applies immediately or when the defendant (or defendant to be) exceeds the permitted access". In fact, it specifically says violation of terms of use cannot trigger liability under the CFAA.

    2. Re:Wrong! by BronsCon · · Score: 1

      Precisely this. Especially important in the case of Facebook or LinkedIn, which you have to actively avoid if you wish to not access them.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
  22. That would also make search engines illegal by Kreigh · · Score: 1

    If you don't want your content to be scraped, don't make it publicly available. If I can view it without a logon you have no claim to protect the data.

  23. Isn't this a simple copyright infrigement case? by Anonymous Coward · · Score: 0

    LinkedIn published content under copyright. Another entity took that copyrighted material and re-published it without consent of the copyright holder. It seems like a pretty straight-forward case.

    What am I missing here?

    The only questions should be the size of the award LinkedIn should receive and whether anyone associated with the other entity should be criminally prosecuted.