Slashdot Mirror


Judge Says LinkedIn Cannot Block Startup From Public Profile Data (reuters.com)

A U.S. federal judge on Monday ruled that LinkedIn cannot prevent a startup from accessing public profile data, in a test of how much control a social media site can wield over information its users have deemed to be public. Reuters reports: U.S. District Judge Edward Chen in San Francisco granted a preliminary injunction request brought by hiQ Labs, and ordered LinkedIn to remove within 24 hours any technology preventing hiQ from accessing public profiles. The dispute between the two tech companies has been going on since May, when LinkedIn issued a letter to hiQ Labs instructing the startup to stop scraping data from its service. HiQ Labs responded by filing a suit against LinkedIn in June, alleging that the Microsoft-owned social network was in violation of antitrust laws. HiQ Labs uses the LinkedIn data to build algorithms capable of predicting employee behaviors, such as when they might quit. "To the extent LinkedIn has already put in place technology to prevent hiQ from accessing these public profiles, it is ordered to remove any such barriers," Chen's order reads. Meanwhile, LinkedIn said in a statement: "We're disappointed in the court's ruling. This case is not over. We will continue to fight to protect our members' ability to control the information they make available on LinkedIn."

81 of 166 comments (clear)

  1. Huh? by Sebby · · Score: 5, Insightful

    We will continue to fight to protect our members' ability to control the information they make available on LinkedIn

    If users added their info, and made it public, it's not up to LinkedIn to decide what users want to protect.

    Besides, given LinkedIn's past behavior with scraping people's contacts/address books on their PCs and email accounts, it has no lessons to give anyone else.

    --

    AC comments get piped to /dev/null
    1. Re:Huh? by beelsebob · · Score: 1

      If the data they're hosting is uncopyrightable, and it's freely available to the public, then yes.

    2. Re:Huh? by jenningsthecat · · Score: 4, Interesting

      If users added their info, and made it public, it's not up to LinkedIn to decide what users want to protect.

      Besides, given LinkedIn's past behavior with scraping people's contacts/address books on their PCs and email accounts, it has no lessons to give anyone else.

      LinkedIn doesn't give a good goddamn about "what users want to protect", and their "past behavior" is the proof. LinkedIn cares only about having exclusive use of that mine full of data, (except for the bits and pieces that users gather about each other), because it doesn't want potential competitors to eat a slice of the pie they've come to think of as belonging entirely to them.

      --
      'The Economy' is a giant Ponzi scheme whose most pitiable suckers are the youngest among us and the yet-unborn.
    3. Re:Huh? by redmid17 · · Score: 1

      Is this a serious question?

    4. Re:Huh? by Anonymous Coward · · Score: 1

      From Slashdot's TOS:

      By sending or transmitting to us Content, or by posting such Content to any area of the Sites, you grant us and our designees a worldwide, non-exclusive, sub-licensable (through multiple tiers), assignable, royalty-free, perpetual, irrevocable right to link to, reproduce, distribute (through multiple tiers), adapt, create derivative works of, publicly perform, publicly display, digitally perform or otherwise use such Content in any media now known or hereafter developed. You hereby grant the Company permission to display your logo, trademarks and company name on the Sites and in press and other public releases or filings. Further, by submitting Content to the Company, you acknowledge that you have the authority to grant such rights to the Company. PLEASE NOTE THAT YOU RETAIN OWNERSHIP OF ANY COPYRIGHTS, TRADEMARKS AND SERVICE MARKS IN ANY CONTENT YOU SUBMIT.

      I'm guessing Linkedin has something similar. By using their service, you give them permission to use your content and display it (or not display it) anyway they want. And your content is copyrighted.

    5. Re: Huh? by Anonymous Coward · · Score: 2, Interesting

      And your content is copyrighted.

      Umm, no. You cannot copyright such data, so any such provisions are meaningless. As the federal court has just reaffirmed for the upteenmth time.

    6. Re:Huh? by shentino · · Score: 1

      Just because information can or cannot be copyrighted doesn't give me the privilege of hijacking your printing press to do the actual copying.

      The judge here screwed up. The courts have NO BUSINESS dictating to a website what information it can or cannot publish, and it has even less business attempting to turn the website into a mouthpiece.

      LinkedIn should have the right to post what they please and block who they like from accessing it. Barring privacy issues.

    7. Re: Huh? by backslashdot · · Score: 1

      Either way, they shouldn't pretend they are doing their users a service.

    8. Re:Huh? by Richard_at_work · · Score: 1, Insightful

      Whether or not the user made the info public, does this ruling affect how a website or service can regulate third parties and the extra load they create?

      Grabbing one users public info is a world of difference to grabbing a million users public info - LinkedIn may have a legitimate argument about undue additional load on their service as a result of scraping public info from them.

    9. Re:Huh? by sg_oneill · · Score: 4, Insightful

      If users added their info, and made it public, it's not up to LinkedIn to decide what users want to protect.

      Wrongo! Its their server. This ruling is *very* erroneous, and since I'm not in the job market, I'm going to be deleting my account now. Which is actually a shame, because I was using it to keep up with former workmates from previous jobs, but I'll be damned if I'm going to be handing my work history over to asshole companies that specializing in mining through other peoples bins looking for evidence to hang me with

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
    10. Re:Huh? by Anonymous Coward · · Score: 1

      They haven't. The justice simply said that HiQ cannot be blocked from otherwise public information. In other words, if the judge, at his desk can access the info without logging in, the HiQ should be able access the same information.

    11. Re:Huh? by Dutch+Gun · · Score: 1

      As a LinkedIn user, I'm actually fine with anyone scraping my data and using it. Whatever information I put on LinkedIn, I did so with the full intention of being available to the public at large. That's the whole point of LinkedIn, at least for me. It's a place to post your public resume + a way of maintaining professional contacts with colleagues. If it were not publicly view-able, I wouldn't have bothered, as I want potential employers to be able to find me.

      Obviously, this is very valuable data, but in aggregate, mostly only to salespeople and recruiters. So, when LinkedIn talks about "protecting its users", it's pure nonsense. I certainly understand why they want to retain exclusive access, but it's not for my benefit, naturally.

      You'll forgive me if I don't shed any crocodile tears when your company's one and only product - that of managing other people's publicly available data - doesn't remain exclusive to you once you publish it.

      P.S. What kind of simpleton would give LinkedIn full access to their email account? They requested my e-mail login credentials at one point. My response? Bwahahahahaha! Yeah, right!

      --
      Irony: Agile development has too much intertia to be abandoned now.
    12. Re:Huh? by starlesstheshellcat · · Score: 1

      In one feel swoop I shall render HiQ labs ploys against Linkedin worthless...(Please forgive the N00bness of my intrusion; *this is my first slashdot post ever.*) I have; as of 2 days ago, posted my Canadian SIN number on my blog as well as my enemies Youtube videos. I did this not only to prove that I have been spied on but also because I am done working for a corrupt system. You see; my SIN number has been "hacked by certain religious group(s) (111/666/644)... I wont bother posting any details here because I respect me fellow 'nerds' and wont even try to waste any ones time with spam. If your curious you can look me up and read all about it on my shitty blog. You see Canadian SIN numbers are like any numerical shape. The first three numbers are the province... the second are most likely a rendering of your birth date/name; etc. etc, etc... Its not hard hard if you own a calculator and have read the #Bible. I am actually not even claiming anything that groundbreaking. What I am talking about is just grade school math...

    13. Re:Huh? by DrXym · · Score: 1
      I have no love for LinkedIn and believe it is a skeezy meat market and data vacuum. However it is their website and I don't see why they shouldn't put any measures they like into it to prevent competitors from scraping it.

      Even if the ruling goes against them I'm sure they can think of imaginative ways to fuck around with people scraping their site.

    14. Re:Huh? by ls671 · · Score: 1

      Actually, the big ones do not pay any network charges when you access their web sites, they get money for it! Search on peering agreements and you will see this is how it works.

      Maybe it was inspired by telcos where the one that terminates the call bills the caller, it works the same way anyway.

      --
      Everything I write is lies, read between the lines.
    15. Re: Huh? by DrXym · · Score: 1
      And therein lies what LinkedIn would probably do to stop scraping. Many websites truncate the information and place a "click here to read on" button or a "More..." link. Humans click the button and the rest of the info appears.

      Under the covers however, the link and the javascript that controls it and the elements of containing visible text, and the layout in general could be engineered in a way to be a pain in the ass to read automatically and scrape into a coherent form. At the very least it would slow down scraping, introduce more errors into their result and it could even be used to inject garbage that a human doesn't see if some text is hidden with JS or CSS at runtime.

      Requests could also be disrupted, e.g. slowed down, or redirected to interstitials which only occur occasionally and that a human would bypass easily but a bot wouldn't.

    16. Re:Huh? by angel'o'sphere · · Score: 1

      What kind of simpleton would give LinkedIn full access to their email account?
      Much to many, considering that I get contact suggestions that only can come from the fact that the other person imported my eMail address somehow into their linked.in account.
      Or how else should linked-in suggest one who I only know because I was sailing with him a year ago?

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    17. Re:Huh? by Dutch+Gun · · Score: 1

      I guess I've never seen that the benefits outweighed the potential risks. E-mail security is absolutely vital to securing your complete online identity. Why someone would entrust that to a third-party is beyond me. If there's someone I want to get in contact with, I can generally do so without potentially compromising my email security.

      No offense, as the "simpleton" crack was probably not appropriate. Different people have different priorities, I guess.

      --
      Irony: Agile development has too much intertia to be abandoned now.
    18. Re: Huh? by truedfx · · Score: 1
      No, they're not. From the damn summary:

      "To the extent LinkedIn has already put in place technology to prevent hiQ from accessing these public profiles, it is ordered to remove any such barriers," Chen's order reads.

      According to this, the judge is specifically saying that LinkedIn isn't free to use technical measures to block them.

    19. Re:Huh? by angel'o'sphere · · Score: 1

      No, they are simpletons.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    20. Re:Huh? by AmiMoJo · · Score: 1

      Back when I was on LinkedIn, so several years ago now, you used to be able to see who was viewing your profile. It was quite interesting to see who was looking at you. Mostly recruiters of course.

      If that's still the case then copying the data to another web site means that users of LinkedIn can no longer see who is viewing their profile, or get an accurate "hit count" on the stuff that is public and available to non-logged-in viewers.

      I don't know what controls LinkedIn has for privacy. Is public visibility opt-in? If so then it seems that there is a good case to be made that the user shared the data willingly.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    21. Re:Huh? by Desler · · Score: 1

      It's not "technically" public. Any person in the world can view and download the same data as it IS public.

    22. Re:Huh? by BradleyUffner · · Score: 1

      And your content is copyrighted.

      Yeah, by me, not Slashdot.

    23. Re:Huh? by mysidia · · Score: 1

      If users added their info, and made it public, it's not up to LinkedIn to decide what users want to protect.

      It is not absolutely public. Users shared their information with LinkedIn, and possibly chose not to restrict it through privacy controls LinkedIn offers users, BUT LinkedIn themself gets to decide how "Public" their website actually is. That "Public" could very well mean only visible to users that registered and/or accepted some terms prior to viewing.

    24. Re:Huh? by mysidia · · Score: 1

      If the data they're hosting is uncopyrightable, and it's freely available to the public, then yes.

      The issue with scraping non-copyrightable data is that it is Theft of service: violations of terms users agree to in order to access the resource.

      The owner of the server/website pays for processing time and bandwidth, AND the owner of the server DOES NOT HAVE TO provide a free-for-all --- everyone who owns a computer/network has a right to direct who can use the resources provided by their server/equipment and monthly ISP services, how, and in what manner.

      Rate limits on page loads, bot detection and captchas on resources intended to be used by humans are commonplace for preventing bots from coming in and hogging resources, spamming, or mass-downloading things at the server owner's expense for purposes not useful to the owner of the server.

    25. Re:Huh? by mysidia · · Score: 1

      That makes sense.... LinkedIn can allow HiQ to access the information, BUT employ rate-limits to make sure they can't generate more usage than a normal network with humans would, AND employ Captchas/Bot-prevention countermeasures on IP addresses suspected to be something different than the rest of the public.

      If HiQ wants to hire an army of humans to manually transcribe data (without an unusually large number of requests from one network), then all the power to them.

    26. Re: Huh? by mysidia · · Score: 1

      They can just put in place technical measures to control all abuse of their services and all bots and then say they are not using any technical measures to specifically block hiQ.

    27. Re: Huh? by mysidia · · Score: 1

      LinkedIn are.... HiQ's use of the data is scary and ADVERSIVE to the users of the website.
      They're essentially a surveillance service to help employers spy on workers to suggest when certain people might be a risk.

      This is very big-brotherish, and should not be allowed in a more civilized society......

    28. Re: Huh? by DrXym · · Score: 1

      Which means LinkedIn are free to use technical measures to make it a pain in the ass to make meaningful sense of the data - JS, CSS, random ids on elements, dynamically injected code etc.

    29. Re:Huh? by tlhIngan · · Score: 1

      The issue with scraping non-copyrightable data is that it is Theft of service: violations of terms users agree to in order to access the resource.

      The owner of the server/website pays for processing time and bandwidth, AND the owner of the server DOES NOT HAVE TO provide a free-for-all --- everyone who owns a computer/network has a right to direct who can use the resources provided by their server/equipment and monthly ISP services, how, and in what manner.

      Rate limits on page loads, bot detection and captchas on resources intended to be used by humans are commonplace for preventing bots from coming in and hogging resources, spamming, or mass-downloading things at the server owner's expense for purposes not useful to the owner of the server.

      Yes, all correct. However, he cannot bar anyone from viewing that data - LinkedIn basically banned the company from scraping the data. Which is fine if they were being abusive and hammering the site, after which if they stop, they are allowed to access the data again.

      What LinkedIn cannot do is simply say that data is public to anyone who browses it, EXCEPT YOU, because you want to do something with the data we didn't think to monetize.

      Basically, because the data is available to all, it's available to all (within limits). You can't say anyone in the world can see your LinkedIn profile, except that guy over there just because he wanted to take that data and do something with it. (And "do something" is intentionally vague. Perhaps they want to contact you about a job offer, or other thing).

    30. Re:Huh? by Sebby · · Score: 1

      But that line has nothing to do with the ruling - it specifically about how LinkedIn is pretending to 'protect' their users in that statement, as if it was trying to do them a favor "fighting" this.

      --

      AC comments get piped to /dev/null
    31. Re: Huh? by truedfx · · Score: 1

      What exactly do you think the judge is going to say to that? The judge is almost certainly not a complete idiot, and seeing the access blocked after a court order to lift the restrictions is not going to make a good impression.

    32. Re:Huh? by nanoflower · · Score: 1

      I'm sure the original poster was trying to distinguish between privately owned data that may be publicly available versus publicly owned data that is publicly available. Linked-in is running a private server so is privately owned even if the data is available to the public. It does seem odd that the judge didn't seem to put any limits in place so that if the servers allow it each of us could scrape all of their data as fast as our links can support.

  2. robots.txt by psergiu · · Score: 1

    Read https://linkedin.com/robots.tx...

    Especially at the end

    User-agent: *
    Disallow: /
     
    # Notice: If you would like to crawl LinkedIn,
    # please email whitelistcrawl@linkedin.com to apply
    # for white listing.

    --
    1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
    1. Re:robots.txt by Zero__Kelvin · · Score: 1

      We don't need to look at robots.txt to know that LinkedIn would prefer they not do this, as there is a lawsuit about it. I'm guessing you don't understand that robots.txt is a request, not a mandate.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  3. Translation by sit1963nz · · Score: 5, Informative

    "We will continue to fight to protect our members' ability to control the information they make available on LinkedIn."

    Translates to

    "We will continue to fight to protect our profits and our ability to control and sell the information they make available on LinkedIn "

    1. Re:Translation by Bearhouse · · Score: 1

      "We will continue to fight to protect our profits and our ability to control and sell the information they make available on LinkedIn "

      further translates to:

      "sell the information they make available for free on LinkedIn

    2. Re:Translation by Anonymous Coward · · Score: 1

      What's wrong with that?

      LinkedIn paid little or nothing to get the data in the first place.

  4. Re:Public Post Should be Open to Everyone by jenningsthecat · · Score: 1

    There should be no difference between a human reading the site and a machine. If it is able to be accessed by a person then it should be ok to scrape and aggregate it.

    Do you also believe that there should be no difference between a person buying tickets to an event, and a bot doing so? That if it is able to be purchased by a person then it should be ok to use bots to buy up a few thousand tickets in a few seconds and artificially increase the price?

    BTW, I agree with what you said; but while I was thinking about your comment that analogy crossed my mind. I'd like the people who use bots to buy up tickets to DIAF, yet I'm happy to let hiQ scrape LinkedIn data. Strange...

    --
    'The Economy' is a giant Ponzi scheme whose most pitiable suckers are the youngest among us and the yet-unborn.
  5. My server, My rules by Anonymous Coward · · Score: 2, Interesting

    LinkedIn's servers are their private property, and they should have the right to decide who can access them.

    In the physical world, there are many places that are generally "open to the public", but they are private property, and the property owner can order you to leave and never come back. If you come back again it's called trespassing, and it's a criminal offense. You can and will be arrested, and if you go to trial, you will be convicted. It's well settled law.

    I don't see why the LinkedIn situation is any different. The fact that LinkedIn are hypocritical corporate assholes doesn't change the legal analysis.

    1. Re:My server, My rules by gravewax · · Score: 1

      Stopping a physical trespasser is fairly straight forward. How do you stop a virtual trespasser?

      firewalls, geo-blocking, security, rate and connection limiting, search limits, CAPTCHA's, client processor intensive scripts, interactive components etc etc. We regularly use a variety of those depending on the behaviour we are trying to block with bots that are trawling some of the sites I look after. It actually is quite easy to block virtual trespassers or at least make it very difficult for them to automate that.

    2. Re:My server, My rules by Kkloe · · Score: 1

      A public profile is more like a item in a display window, if you display things in the windows of the store for people walking outside too see then it should be available to everyone, someone might go outside taking notes or images of what you have displayed to the outside.

    3. Re:My server, My rules by Zero__Kelvin · · Score: 2

      They aren't any different and that's the point. They can't make it public and not public at the same time.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    4. Re:My server, My rules by BradleyUffner · · Score: 1

      Not showing information to users without logging in would be a good start.

      What you cans see without logging in is public information.

    5. Re:My server, My rules by pauljlucas · · Score: 1

      And if some garment manufacturer chooses to allow Macy's to sell and display their garments in their window, but not Bloomingdales, why shouldn't they be allowed to do that?

      --
      If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
    6. Re:My server, My rules by PoopJuggler · · Score: 1

      This is more like a garment manufacturer displaying their garments in a publicly viewable Macy's window, but having a guard outside who pushes certain people off the sidewalk.

    7. Re:My server, My rules by bhiestand · · Score: 1

      A public profile is more like a item in a display window, if you display things in the windows of the store for people walking outside too see then it should be available to everyone, someone might go outside taking notes or images of what you have displayed to the outside.

      No, a display window has no marginal cost per viewer, whereas a service like LinkedIn does. Crowds in front of display windows likely cause more people to come want to view the displays. Crawlers cause much higher loads on all sorts of backend systems compared to normal users. Each crawler has a real monetary cost to LinkedIn, and their usage may have a chilling effect on LinkedIn members.

      Further, the aggregate of openly available data is often much more valuable than what it simply visible on a profile. In this particular example, changes in a LinkedIn member's profile, new connections, and other indicators over time. In a way that is only really possible by scraping repeatedly and comparing changes (profile history is not publicly available). At best they will tip off an employer that their employee is on the way out (and should therefore not be promoted, given a raise, given major new projects, etc.) before the employee is able to put in notice. At worst, the algorithms are wrong and the employer may find a replacement.

      I'll grant that there are some possible positive outcomes as well as other possible negatives. But this makes it more risky for LinkedIn members to use LinkedIn to find their next job.

      --
      SWM seeks new sig for a brief fling
    8. Re:My server, My rules by Kkloe · · Score: 1

      actually there are still costs to a display windows as keeping it clean and depending on the land\road outside the store owner might to keep it clean aswell, so yes there are still costs, and like local taxes and rent for the building, if the land around the store get more popular as more people come there the tax can get higher or the building owner might put a higher rent.

  6. Microsoft using Linkedin users as a human shield by Arzaboa · · Score: 1

    Microsoft bought Linkedin to profit off of users data. Users on Linkedin specifically post info so it is shared. Most users were members long before MS bought the social network. I certainly didn't have any say in this purchase, or my data. I don't appreciate that they can buy my public data, 3rd party website or not, and then act holier than though about it.

    I'm not sure MS could create a social network that worked based on their past history. They've already changed the behavior of the site to promote more clicks and revenue, which would have seriously turned me off if they were in place when it started. Unfortunately, I put up, for now.

    For MS to go to court and now say they are protecting their users is shameful. By throwing the users in front of the judge for their purposes is using the users as human shields. We all know this is about profits, and not being saved from another evil corp.

    Apply this to Microsoft's practices across their platform, and its the users that need further protections from them. For them to throw us in front of a judge to claim this is for anything, even semi related to privacy, is a joke.

    At best, this is the pot calling the kettle black.

  7. They should revise their statement then by Sebby · · Score: 1

    By your logic, they should revise their statement then:

    We will continue to fight to protect our data we extracted from our members and the ability to control the information they make available to us, here on LinkedIn"

    Quick, there's still time for you to call them and tell them to revise it!

    --

    AC comments get piped to /dev/null
  8. Re:People about to quit update their LinkedIn page by __aaclcg7560 · · Score: 1

    B: Creimer's still waiting for the coffee money to roll in. He's focusing on making that Little Debbie money, first. At 25 cents per delicious, chewy Oatmeal Cream Pie, he should start making enough to buy 2 or 3 a month, soon!

    When I get my June earnings at the end of the month, I can buy three cases and still have enough change for a skinny vanilla latte.

  9. Re:People about to quit update their LinkedIn page by __aaclcg7560 · · Score: 1

    5. Void Where Prohibited; Indemnification

    Doesn't apply to what I'm doing. This is standard legal boilerplate to cover Slashdot's collective ass from legal liability.

    Also look slashdot.org/robots.txt

    Doesn't apply to what I'm doing. My Python script isn't a web crawler and I'm scraping my own comments. If you look at the bottom of each Slashdot page: "Comments owned by the poster." I'm just recovering my own intellectual property that I freely shared with the Slashdot community.

    If you seriously believe that I'm violating the Slashdot TOS, file a compliant with management. However, considering the shit that Anonymous Cowards get away with, I wouldn't hold my breath.

  10. They need a Public Profile API by peterofoz · · Score: 1

    So accessing the public profiles is to be allowed unless its done in such a way as to create unnatural load on their servers, something akin to a DDoS attack. They can set a throttle on hits per minute for programmed access. Or provide an API so HiQ and others can access the public profile info without impacting user facing servers, except the users get an additional profile security option to allow API access and default it to Off for everyone initially so they can opt in.

    1. Re:They need a Public Profile API by Joe+U · · Score: 1

      So accessing the public profiles is to be allowed unless its done in such a way as to create unnatural load on their servers, something akin to a DDoS attack. They can set a throttle on hits per minute for programmed access. Or provide an API so HiQ and others can access the public profile info without impacting user facing servers, except the users get an additional profile security option to allow API access and default it to Off for everyone initially so they can opt in.

      So, public data, except not accessible to the entire public and not on by default.

      Sounds like a great way to give the host company a huge advantage on mining while pretending to give access to others. That API is worthless unless you restrict the host to the same requirements.

  11. Re: Public Post Should be Open to Everyone by __aaclcg7560 · · Score: 1

    A computer scaping will read all, causing a heavy load making the website performance poor.

    Depends on how the web server is set up. When I run my Python script to scrape my Slashdot comment history, 16 pages can be requested at the same time. More than 16 pages, the server shuts down the connection.

  12. I took it half seriously by backslashdot · · Score: 1

    until the whopper at the end.

  13. Re: Public Post Should be Open to Everyone by ls671 · · Score: 1

    Your phyton script should not know about that. Connection KeepAlive server settings like:
    KeepAlive On
    MaxKeepAliveRequests 50
    KeepAliveTimeout 5

    should be completely transparent to you. Your client library should transparently reconnect when it gets a Connection: close from the server. Heck, some sites don't even use keep alives (KeepAlive Off).

    I have written such client software and I never bothered about MaxKeepAliveRequests setting on the servers and if KeepAlive was on, the libraries I used were doing the re-connection for me so I did not have to know the MaxKeepAliveRequests for every site I was connecting to. Heck, any browser does just the same!

    Also, if you write a scraper, it is a smart move to sleep between request, any scraper like Google, etc. does sleep between request. 1 or 2 seconds is a nice value because your sleep time has to be less than KeepAliveTimeout for the connection to be re-used for the next request.

    https://httpd.apache.org/docs/...
    https://httpd.apache.org/docs/...
    https://httpd.apache.org/docs/...

    --
    Everything I write is lies, read between the lines.
  14. Re: Public Post Should be Open to Everyone by ls671 · · Score: 1

    An additional note; the same applies if you build an auto-refresh web page in ajax etc. Arrange so that you refresh the page more often than KeepAliveTimeout if you want connections to be re-used by your customer browsers.

    --
    Everything I write is lies, read between the lines.
  15. Re: Public Post Should be Open to Everyone by __aaclcg7560 · · Score: 1, Informative

    Your phyton script should not know about that.

    Someone on Slashdot complained that my script was taking to long to fetch, parse and save each page. So I rewrote the script to use a concurrent queue for each phase that launches 16 threads. Since 16 was the maximum number of threads that could launch without the web server shutting down the connection, I used that number for all the queues in the pipeline. It takes 30 minutes to process 733+ pages (11,000+ comments).

  16. Am I the only one here... by BerkeleyDude · · Score: 1

    Am I the only one here who actually tried to read the article? The summary points to the wrong article: "Tech companies in the crosshairs on white supremacy and free speech".

    The LinkedIn article is here.

  17. Re:People about to quit update their LinkedIn page by ls671 · · Score: 1

    Doesn't apply to what I'm doing. My Python script isn't a web crawler and I'm scraping my own comments. If you look at the bottom of each Slashdot page:
      "Comments owned by the poster." I'm just recovering my own intellectual property that I freely shared with the Slashdot community.

    If you seriously believe that I'm violating the Slashdot TOS,
      file a compliant with management. However, considering the shit that Anonymous Cowards get away with, I wouldn't hold my breath.

    Your script is sure enough a robot! Whether /. tolerates it or not is irrelevant, your are still not being a nice christian by not following their robot.txt guidelines.

    https://slashdot.org/robots.tx...

    Your user-agent is *, so your robot should not access the following pages:
    User-agent: *
    Disallow: /authors.pl
    Disallow: /index.pl
    Disallow: /comments.pl
    Disallow: /firehose.pl
    Disallow: /journal.pl
    Disallow: /messages.pl
    Disallow: /metamod.pl
    Disallow: /users.pl
    Disallow: /search.pl
    Disallow: /submit.pl
    Disallow: /pollBooth.pl
    Disallow: /pubkey.pl
    Disallow: /topics.pl
    Disallow: /zoo.pl
    Disallow: /palm
    Disallow: /slashdot-it.pl
    Disallow: slashdot-it.pl
    Disallow: authors.pl
    Disallow: index.pl
    Disallow: comments.pl
    Disallow: firehose.pl
    Disallow: journal.pl
    Disallow: messages.pl
    Disallow: metamod.pl
    Disallow: users.pl
    Disallow: search.pl
    Disallow: submit.pl
    Disallow: pollBooth.pl
    Disallow: pubkey.pl
    Disallow: topics.pl
    Disallow: zoo.pl
    Disallow: /~
    Disallow: ~

    --
    Everything I write is lies, read between the lines.
  18. Nothing in the ruling prevents... by bazmail · · Score: 1

    ...Linkedin from rate-limiting the scraping. For example, limit scraping to 1 page ever 10 seconds after the 100th page request within 100 seconds. That would solve their problem.

    1. Re:Nothing in the ruling prevents... by swilver · · Score: 1

      Sounds like a barrier to me...

    2. Re:Nothing in the ruling prevents... by Zero__Kelvin · · Score: 1

      Yeah, there WAS nothing stopping them, but now there is a court order.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  19. Re:value of linkedin... by bazmail · · Score: 1

    MS bought Skype for 8.5B, Minecraft for 2.5B, they make terrible purchases. It's like they like to find ways to avoid paying shareholders a proper dividend.

  20. Re:People about to quit update their LinkedIn page by bazmail · · Score: 1

    lol you do know that the whole robots.txt thing is an honor system right? not a replacement for a .conf file.

  21. Re:People about to quit update their LinkedIn page by ls671 · · Score: 1

    lol didn't you notice the word "guidelines" in my OP?

    --
    Everything I write is lies, read between the lines.
  22. Good to probe the greedy hypocritical ToS by Katatsumuri · · Score: 1

    Linkedin wants to have their cake and eat it, too. The users post their data for all interested parties to see, unless they put some explicit restrictions (e.g. friends only). Linkedin then add all sorts of artificial limits on visibility, search, and god forbid you try to fetch that data with a script. Suddenly it is no longer the person's data shared as they want, but Linkedin's data intended for monetization.

    I understand they have expenses incurred by careless bots. It is possible to traffic shape the active connections, or provide a reasonable API, without being greedy and hypocritical, obfuscating the data that is not yours, and pretending it is about the user protection.

  23. Their servers by Anonymous Coward · · Score: 1

    LinkedIn should have a right to keep anyone from using their property - their servers.

  24. Linkedin needs a better argument by Anonymous Coward · · Score: 1

    The ruling is certainly a tradeoff for the Internet.
    (Lowers content creation funding, but raises content access freedoms.)
    I think on balance it's a good thing.

    Here's the kernel of hiq's argument.

    28. LinkedIn is thus improperly using the Computer Fraud and Abuse Act, the Digital
    Millennium Copyright Act and related state penal code and trespass law, not as a shield – as
    intended by those laws – to prevent harmful hacking and unauthorized computer access, but as a
    sword to stifle competition and assert propriety control over data in which it has no exclusive
    interest. In other words, LinkedIn recognizes it has no valid propriety or copyright interest, so it
    claims only that it has a propriety interest to control access to its website, treating that digital
    realm as though it were physical real property. Not only is the analogy inapposite, but LinkedIn
    ignores that the public profile data of members would not reside on its website in the first place
    but for its express promise that the date would be public for all to see and use. Thus, while
    LinkedIn can certainly prevent abusive access to its website, it should not be allowed to pervert
    the purpose of the laws at issue by using them to destroy putative competitors, engage in unlawful
    and unfair business practices and suppress the free speech rights of California citizens and
    businesses as alleged more fully herein.

    http://digitalcommons.law.scu.edu/historical/1491/

    Not sure where linkedin;'s response or the ruling are?

  25. Re:People about to quit update their LinkedIn page by __aaclcg7560 · · Score: 1

    Your script is sure enough a robot!

    Yet no tutorial on Python web scraping ever mentioned the robots.txt.

    Whether /. tolerates it or not is irrelevant, your are still not being a nice christian by not following their robot.txt guidelines.

    I'll let God sort it out since He has a better algorithm.

  26. Re:People about to quit update their LinkedIn page by __aaclcg7560 · · Score: 1

    What shit do AC's get away with?

    Dick pics.

    [...] Amazon links that nobody clicks on [...]

    Let me check... $1,000+ in merchandise this past weekend... not bad for links that nobody clicks on.

    [...] while claiming you're going to buy a yacht [...]

    Citation, please?

    I know which way I'd go if I were you.

    I'm here to stay. Especially since you ACs have convinced me that I could easily make coffee money while reading and posting as I normally do. You have no one to blame but yourselves.

  27. Re: Public Post Should be Open to Everyone by __aaclcg7560 · · Score: 1

    The real question is, why does any of this matter?

    I've gotten quite a few requests for this script. It's a shame that Slashdot doesn't offer the functionality for users to download their own comment history.

  28. Re: Public Post Should be Open to Everyone by __aaclcg7560 · · Score: 1

    So you spent 3.5 months refactoring your code [...]

    I haven't touched my script in two months. After those five user accounts got deleted, I no longer needed to use the script that often.

    https://www.kickingthebitbucket.com/2017/06/20/the-confessions-of-slashdot-asshats/

  29. we block people from scraping our clients' sites by cascadingstylesheet · · Score: 1

    We block people from scraping our clients' sites all the time, because it places excess load on the server.

    We played cat and mouse with one for awhile ... eventually, they emailed a generic address with our client and said they weren't going to give up, so we should just make an easy to consume feed available to them. I laid it out to the client and said they might want to consider it, but they didn't go for it.

    I can't imagine a court order mandating us to allow scrapers.

  30. Re:we block people from scraping our clients' site by mysidia · · Score: 1

    We played cat and mouse with one for awhile ... eventually, they emailed a generic address with our client and said they weren't going to give up

    This is when you get your attorney to write up a Cease and Decist letter and reply back to the scraper's E-mail, AND now they have been warned and ordered by the owner of the property to stop, and further actions can result in a lawsuit or criminal charges regarding Unauthorized Access/Access In Excess of Authorization.

  31. Re:M$ loses and suddenly /. loves and respects the by bazmail · · Score: 1

    Shut up Nadella you sweaty insect bell-end.

  32. Re:People about to quit update their LinkedIn page by __aaclcg7560 · · Score: 1

    Sure, I believe you. Maybe you should post a pic with proof of that on your blog, creimer. Then maybe we'd believe the utter bullshit you spout here!

    https://twitter.com/cdreimer/status/897516205216604160

  33. Clearly no one read the FA by apraetor · · Score: 1

    If you'd bothered to RTFA before commenting you'd have noticed the link doesn't go to the story mentioned, it links to an article about Charlottesville.

  34. Re:People about to quit update their LinkedIn page by __aaclcg7560 · · Score: 1

    He's actually quite clever if his claims are true. It would never occur to me to monetize posting and interacting on here.

    ;)

  35. Re:People about to quit update their LinkedIn page by ls671 · · Score: 1

    Your script is sure enough a robot!

    Yet no tutorial on Python web scraping ever mentioned the robots.txt.

    Says the Unabomber: "Your honor, no tutorial mentioned that what I was doing was illegal..."

    Whether /. tolerates it or not is irrelevant, your are still not being a nice christian by not following their robot.txt guidelines.

    I'll let God sort it out since He has a better algorithm.

    I am god you insensitive clod! A nice Christian at your church asked me to look over you in a prayer she made...

    --
    Everything I write is lies, read between the lines.