Judge Says LinkedIn Cannot Block Startup From Public Profile Data (reuters.com)
A U.S. federal judge on Monday ruled that LinkedIn cannot prevent a startup from accessing public profile data, in a test of how much control a social media site can wield over information its users have deemed to be public. Reuters reports: U.S. District Judge Edward Chen in San Francisco granted a preliminary injunction request brought by hiQ Labs, and ordered LinkedIn to remove within 24 hours any technology preventing hiQ from accessing public profiles. The dispute between the two tech companies has been going on since May, when LinkedIn issued a letter to hiQ Labs instructing the startup to stop scraping data from its service. HiQ Labs responded by filing a suit against LinkedIn in June, alleging that the Microsoft-owned social network was in violation of antitrust laws. HiQ Labs uses the LinkedIn data to build algorithms capable of predicting employee behaviors, such as when they might quit. "To the extent LinkedIn has already put in place technology to prevent hiQ from accessing these public profiles, it is ordered to remove any such barriers," Chen's order reads. Meanwhile, LinkedIn said in a statement: "We're disappointed in the court's ruling. This case is not over. We will continue to fight to protect our members' ability to control the information they make available on LinkedIn."
We will continue to fight to protect our members' ability to control the information they make available on LinkedIn
If users added their info, and made it public, it's not up to LinkedIn to decide what users want to protect.
Besides, given LinkedIn's past behavior with scraping people's contacts/address books on their PCs and email accounts, it has no lessons to give anyone else.
AC comments get piped to
Read https://linkedin.com/robots.tx...
Especially at the end
1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
"We will continue to fight to protect our members' ability to control the information they make available on LinkedIn."
Translates to
"We will continue to fight to protect our profits and our ability to control and sell the information they make available on LinkedIn "
There should be no difference between a human reading the site and a machine. If it is able to be accessed by a person then it should be ok to scrape and aggregate it.
Do you also believe that there should be no difference between a person buying tickets to an event, and a bot doing so? That if it is able to be purchased by a person then it should be ok to use bots to buy up a few thousand tickets in a few seconds and artificially increase the price?
BTW, I agree with what you said; but while I was thinking about your comment that analogy crossed my mind. I'd like the people who use bots to buy up tickets to DIAF, yet I'm happy to let hiQ scrape LinkedIn data. Strange...
'The Economy' is a giant Ponzi scheme whose most pitiable suckers are the youngest among us and the yet-unborn.
LinkedIn's servers are their private property, and they should have the right to decide who can access them.
In the physical world, there are many places that are generally "open to the public", but they are private property, and the property owner can order you to leave and never come back. If you come back again it's called trespassing, and it's a criminal offense. You can and will be arrested, and if you go to trial, you will be convicted. It's well settled law.
I don't see why the LinkedIn situation is any different. The fact that LinkedIn are hypocritical corporate assholes doesn't change the legal analysis.
Microsoft bought Linkedin to profit off of users data. Users on Linkedin specifically post info so it is shared. Most users were members long before MS bought the social network. I certainly didn't have any say in this purchase, or my data. I don't appreciate that they can buy my public data, 3rd party website or not, and then act holier than though about it.
I'm not sure MS could create a social network that worked based on their past history. They've already changed the behavior of the site to promote more clicks and revenue, which would have seriously turned me off if they were in place when it started. Unfortunately, I put up, for now.
For MS to go to court and now say they are protecting their users is shameful. By throwing the users in front of the judge for their purposes is using the users as human shields. We all know this is about profits, and not being saved from another evil corp.
Apply this to Microsoft's practices across their platform, and its the users that need further protections from them. For them to throw us in front of a judge to claim this is for anything, even semi related to privacy, is a joke.
At best, this is the pot calling the kettle black.
By your logic, they should revise their statement then:
We will continue to fight to protect our data we extracted from our members and the ability to control the information they make available to us, here on LinkedIn"
Quick, there's still time for you to call them and tell them to revise it!
AC comments get piped to
B: Creimer's still waiting for the coffee money to roll in. He's focusing on making that Little Debbie money, first. At 25 cents per delicious, chewy Oatmeal Cream Pie, he should start making enough to buy 2 or 3 a month, soon!
When I get my June earnings at the end of the month, I can buy three cases and still have enough change for a skinny vanilla latte.
5. Void Where Prohibited; Indemnification
Doesn't apply to what I'm doing. This is standard legal boilerplate to cover Slashdot's collective ass from legal liability.
Also look slashdot.org/robots.txt
Doesn't apply to what I'm doing. My Python script isn't a web crawler and I'm scraping my own comments. If you look at the bottom of each Slashdot page: "Comments owned by the poster." I'm just recovering my own intellectual property that I freely shared with the Slashdot community.
If you seriously believe that I'm violating the Slashdot TOS, file a compliant with management. However, considering the shit that Anonymous Cowards get away with, I wouldn't hold my breath.
So accessing the public profiles is to be allowed unless its done in such a way as to create unnatural load on their servers, something akin to a DDoS attack. They can set a throttle on hits per minute for programmed access. Or provide an API so HiQ and others can access the public profile info without impacting user facing servers, except the users get an additional profile security option to allow API access and default it to Off for everyone initially so they can opt in.
A computer scaping will read all, causing a heavy load making the website performance poor.
Depends on how the web server is set up. When I run my Python script to scrape my Slashdot comment history, 16 pages can be requested at the same time. More than 16 pages, the server shuts down the connection.
until the whopper at the end.
Your phyton script should not know about that. Connection KeepAlive server settings like:
KeepAlive On
MaxKeepAliveRequests 50
KeepAliveTimeout 5
should be completely transparent to you. Your client library should transparently reconnect when it gets a Connection: close from the server. Heck, some sites don't even use keep alives (KeepAlive Off).
I have written such client software and I never bothered about MaxKeepAliveRequests setting on the servers and if KeepAlive was on, the libraries I used were doing the re-connection for me so I did not have to know the MaxKeepAliveRequests for every site I was connecting to. Heck, any browser does just the same!
Also, if you write a scraper, it is a smart move to sleep between request, any scraper like Google, etc. does sleep between request. 1 or 2 seconds is a nice value because your sleep time has to be less than KeepAliveTimeout for the connection to be re-used for the next request.
https://httpd.apache.org/docs/...
https://httpd.apache.org/docs/...
https://httpd.apache.org/docs/...
Everything I write is lies, read between the lines.
An additional note; the same applies if you build an auto-refresh web page in ajax etc. Arrange so that you refresh the page more often than KeepAliveTimeout if you want connections to be re-used by your customer browsers.
Everything I write is lies, read between the lines.
Your phyton script should not know about that.
Someone on Slashdot complained that my script was taking to long to fetch, parse and save each page. So I rewrote the script to use a concurrent queue for each phase that launches 16 threads. Since 16 was the maximum number of threads that could launch without the web server shutting down the connection, I used that number for all the queues in the pipeline. It takes 30 minutes to process 733+ pages (11,000+ comments).
Am I the only one here who actually tried to read the article? The summary points to the wrong article: "Tech companies in the crosshairs on white supremacy and free speech".
The LinkedIn article is here.
Doesn't apply to what I'm doing. My Python script isn't a web crawler and I'm scraping my own comments. If you look at the bottom of each Slashdot page:
"Comments owned by the poster." I'm just recovering my own intellectual property that I freely shared with the Slashdot community.
If you seriously believe that I'm violating the Slashdot TOS,
file a compliant with management. However, considering the shit that Anonymous Cowards get away with, I wouldn't hold my breath.
Your script is sure enough a robot! Whether /. tolerates it or not is irrelevant, your are still not being a nice christian by not following their robot.txt guidelines.
https://slashdot.org/robots.tx...
Your user-agent is *, so your robot should not access the following pages: /authors.pl /index.pl /comments.pl /firehose.pl /journal.pl /messages.pl /metamod.pl /users.pl /search.pl /submit.pl /pollBooth.pl /pubkey.pl /topics.pl /zoo.pl /palm /slashdot-it.pl /~
User-agent: *
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow: slashdot-it.pl
Disallow: authors.pl
Disallow: index.pl
Disallow: comments.pl
Disallow: firehose.pl
Disallow: journal.pl
Disallow: messages.pl
Disallow: metamod.pl
Disallow: users.pl
Disallow: search.pl
Disallow: submit.pl
Disallow: pollBooth.pl
Disallow: pubkey.pl
Disallow: topics.pl
Disallow: zoo.pl
Disallow:
Disallow: ~
Everything I write is lies, read between the lines.
...Linkedin from rate-limiting the scraping. For example, limit scraping to 1 page ever 10 seconds after the 100th page request within 100 seconds. That would solve their problem.
MS bought Skype for 8.5B, Minecraft for 2.5B, they make terrible purchases. It's like they like to find ways to avoid paying shareholders a proper dividend.
lol you do know that the whole robots.txt thing is an honor system right? not a replacement for a .conf file.
lol didn't you notice the word "guidelines" in my OP?
Everything I write is lies, read between the lines.
Linkedin wants to have their cake and eat it, too. The users post their data for all interested parties to see, unless they put some explicit restrictions (e.g. friends only). Linkedin then add all sorts of artificial limits on visibility, search, and god forbid you try to fetch that data with a script. Suddenly it is no longer the person's data shared as they want, but Linkedin's data intended for monetization.
I understand they have expenses incurred by careless bots. It is possible to traffic shape the active connections, or provide a reasonable API, without being greedy and hypocritical, obfuscating the data that is not yours, and pretending it is about the user protection.
LinkedIn should have a right to keep anyone from using their property - their servers.
The ruling is certainly a tradeoff for the Internet.
(Lowers content creation funding, but raises content access freedoms.)
I think on balance it's a good thing.
Here's the kernel of hiq's argument.
28. LinkedIn is thus improperly using the Computer Fraud and Abuse Act, the Digital
Millennium Copyright Act and related state penal code and trespass law, not as a shield – as
intended by those laws – to prevent harmful hacking and unauthorized computer access, but as a
sword to stifle competition and assert propriety control over data in which it has no exclusive
interest. In other words, LinkedIn recognizes it has no valid propriety or copyright interest, so it
claims only that it has a propriety interest to control access to its website, treating that digital
realm as though it were physical real property. Not only is the analogy inapposite, but LinkedIn
ignores that the public profile data of members would not reside on its website in the first place
but for its express promise that the date would be public for all to see and use. Thus, while
LinkedIn can certainly prevent abusive access to its website, it should not be allowed to pervert
the purpose of the laws at issue by using them to destroy putative competitors, engage in unlawful
and unfair business practices and suppress the free speech rights of California citizens and
businesses as alleged more fully herein.
http://digitalcommons.law.scu.edu/historical/1491/
Not sure where linkedin;'s response or the ruling are?
Your script is sure enough a robot!
Yet no tutorial on Python web scraping ever mentioned the robots.txt.
Whether /. tolerates it or not is irrelevant, your are still not being a nice christian by not following their robot.txt guidelines.
I'll let God sort it out since He has a better algorithm.
What shit do AC's get away with?
Dick pics.
[...] Amazon links that nobody clicks on [...]
Let me check... $1,000+ in merchandise this past weekend... not bad for links that nobody clicks on.
[...] while claiming you're going to buy a yacht [...]
Citation, please?
I know which way I'd go if I were you.
I'm here to stay. Especially since you ACs have convinced me that I could easily make coffee money while reading and posting as I normally do. You have no one to blame but yourselves.
The real question is, why does any of this matter?
I've gotten quite a few requests for this script. It's a shame that Slashdot doesn't offer the functionality for users to download their own comment history.
So you spent 3.5 months refactoring your code [...]
I haven't touched my script in two months. After those five user accounts got deleted, I no longer needed to use the script that often.
https://www.kickingthebitbucket.com/2017/06/20/the-confessions-of-slashdot-asshats/
We block people from scraping our clients' sites all the time, because it places excess load on the server.
We played cat and mouse with one for awhile ... eventually, they emailed a generic address with our client and said they weren't going to give up, so we should just make an easy to consume feed available to them. I laid it out to the client and said they might want to consider it, but they didn't go for it.
I can't imagine a court order mandating us to allow scrapers.
We played cat and mouse with one for awhile ... eventually, they emailed a generic address with our client and said they weren't going to give up
This is when you get your attorney to write up a Cease and Decist letter and reply back to the scraper's E-mail, AND now they have been warned and ordered by the owner of the property to stop, and further actions can result in a lawsuit or criminal charges regarding Unauthorized Access/Access In Excess of Authorization.
Shut up Nadella you sweaty insect bell-end.
Sure, I believe you. Maybe you should post a pic with proof of that on your blog, creimer. Then maybe we'd believe the utter bullshit you spout here!
https://twitter.com/cdreimer/status/897516205216604160
If you'd bothered to RTFA before commenting you'd have noticed the link doesn't go to the story mentioned, it links to an article about Charlottesville.
He's actually quite clever if his claims are true. It would never occur to me to monetize posting and interacting on here.
;)
Your script is sure enough a robot!
Yet no tutorial on Python web scraping ever mentioned the robots.txt.
Says the Unabomber: "Your honor, no tutorial mentioned that what I was doing was illegal..."
Whether /. tolerates it or not is irrelevant, your are still not being a nice christian by not following their robot.txt guidelines.
I'll let God sort it out since He has a better algorithm.
I am god you insensitive clod! A nice Christian at your church asked me to look over you in a prayer she made...
Everything I write is lies, read between the lines.