Judge Says LinkedIn Cannot Block Startup From Public Profile Data (reuters.com)
A U.S. federal judge on Monday ruled that LinkedIn cannot prevent a startup from accessing public profile data, in a test of how much control a social media site can wield over information its users have deemed to be public. Reuters reports: U.S. District Judge Edward Chen in San Francisco granted a preliminary injunction request brought by hiQ Labs, and ordered LinkedIn to remove within 24 hours any technology preventing hiQ from accessing public profiles. The dispute between the two tech companies has been going on since May, when LinkedIn issued a letter to hiQ Labs instructing the startup to stop scraping data from its service. HiQ Labs responded by filing a suit against LinkedIn in June, alleging that the Microsoft-owned social network was in violation of antitrust laws. HiQ Labs uses the LinkedIn data to build algorithms capable of predicting employee behaviors, such as when they might quit. "To the extent LinkedIn has already put in place technology to prevent hiQ from accessing these public profiles, it is ordered to remove any such barriers," Chen's order reads. Meanwhile, LinkedIn said in a statement: "We're disappointed in the court's ruling. This case is not over. We will continue to fight to protect our members' ability to control the information they make available on LinkedIn."
"We will continue to fight to protect our members' ability to control the information they make available on LinkedIn."
Translates to
"We will continue to fight to protect our profits and our ability to control and sell the information they make available on LinkedIn "
Your phyton script should not know about that.
Someone on Slashdot complained that my script was taking to long to fetch, parse and save each page. So I rewrote the script to use a concurrent queue for each phase that launches 16 threads. Since 16 was the maximum number of threads that could launch without the web server shutting down the connection, I used that number for all the queues in the pipeline. It takes 30 minutes to process 733+ pages (11,000+ comments).