Is Data Mining for Product Pricing, Illegal?
wessman asks: "I started to read Orin S. Kerr's 80-page paper looking for how his proposal would pertain to: ripping music/movies, P2P, corporate espionage, and lastly, the use of web scraper robots. Little did I know just how relevant his paper would be in regards to that last item! Kerr makes note of EF Cultural Travel v. Explorica in which Explorica is caught hiring a consultant to program a scraping robot to gather pricing information from a competitor, EF Cultural Travel. Well, I do consulting on the side from home and am currently working a project whereby I gather pricing information from all the major travel conglomerates (Orbitz, Expedia, Lodging.com, WorldRes, Sabre, etc.) so that the travel booking business that hired me can meet or beat all their prices. Granted, the circumstances of the Explorica case are different and the case was an example of an extreme ruling, but my questions to the Slashdot community are: Do I notify the company that hired me of the Explorica case? Why is using a scraper robot so different from, say, walking into Best Buy with a handheld and recording product pricing manually? Should I continue with this project and the similar projects I do in this area of programming?" Now, add in the text in the "deliverables" section of this press release and it seems we may have some contradictory information. Who is right, and under what circumstances is price harvesting off of the internet not allowed?
In what sane land would PRICES be protected under law? You can't really keep them secret, so "trade secret" is right out. It's not a identifying mark (unless you're a dollar store), so much for trademark. There's nothing useful that hasn't been done before, fuck patenting them. Copyright? It's a simple derivation of what the supplier charges you.
There is nothing creative about pricing stuff. Good lord.
-- Bill "Houdini" Weiss
Anybody remember the Popeye cartoon where Popeye opens a car wash, and then Bluto opens a car wash right across the street?
When Popeye posts his price, Bluto beats it by five cents. Then Popeye beats Bluto's by five cents.
It goes back and forth until Popeye is washing cars for free.
I was tempted to call this an infinite loop, but I doubt a retailer would pay you to take their products.
Of course it is. Let's dissect this sentence:
Not having the comma would completely distort the meaning of the sentence.
Thank you.
My grandmother calls this "shopping around." The only difference is that someone else is doing all the work.
Smeghead every day of the week.
Actually, price scraping was done in a very low-tech way a reasonably long time ago by a pretty well-known businessman: Mr. Sam Wal-Mart. Early in his career, he would dumpster-dive his competitors to find out the prices his competitors paid for their goods, the contracts they had with their suppliers, etc. This provided him with "insider information" so he'd know how to prepare his pricing in a more strategic fashion and obviously out-compete them where it counts: financially.
I think any large business where the pricing structure isn't directly related to costs is probably deeply afraid of agents that aggregate their data with competitors. You end up with a more ideal market, a more frictionless market, if you will, and they'll be forced to compete on narrower and narrower margins of profit. Of course they'll want to throw up barriers to that.
But I'll bet this issue comes down to Terms of Service and what a company can reasonably expect to be able to legally require/forbid about the use of data provided via an automated means...
Tweet, tweet.
As long as you get paid, let them worry about the lawsuit. They're the ones who are going to actually use it. Keep your mouth shut.
If powerful people get screwed, it's illegal.
If it forces large corporations have to work harder to earn a profit, it's illegal.
If it give the little guy a leg up or levels the playing field in any way, it's illegal.
If it's illegal and you're big and powerful, don't worry about it, you can probably get away with it with little damage to your business or career and keep almost all of you cash minus legal fees.
<a href="http://www.joblessjimmy.com">Work is dumb and so is Jobless Jimmy.</a>
Who is this "Illegal" person and why are we asking him questions about Star Trek characters?
I'm not "Illegal," but I'll answer. Seeing as how he was killed off in the last movie, I think it's safe to say that no, Data is not mining for product pricing.
(In other words, you illiterate clods need to be more careful with your commas.)
Look, I I can visit a web site and the business (Let's say Amazon) publicly posts their prices for anybody to see then you sure as hell can use them! If suddently using bots to do work are illegal then I'd wadger that every shell script that I write is an affront to US Laws. Rotating log files and all sorts of other "make my job easier so that I can play Quake" scripts are perfectly legal, so how the hell can it be questionable just to go to a site and record prices???
Jebus, please help the Unites States Gub'ment!
Two specific cases in point.
1) At many of the deal sites (i.e. slickdeals.net, etc) once in a while this offer appears where after getting back your rebate, you have more money than you spent for the product.
2) Grocery coupons - in some cases, a store will run one of those "triple coupon Thurdays" promotions, and if you have the right coupon, the money-off total will exceed the price of the product. Depending on the the store, money is returned, or a credit is.
How does one receive authorization to access a web server? Hmm, maybe with a simple html GET? The basic fact here is that of judicial cluelessness. If I put information on a public web server, pretend to "protect" it with a disclaimer (of everything) at the bottom of the page, and then get pissed off because somebody browsed that information, I'm an idiot. In addition, I am legless in court. Web servers make information available to the world. If I had wanted to make information available to certain parties that I trust not to compete with me, I should have set up a secure server with some provision for authentication and authorization.
It really is that simple
later,
Jess
I am programmed for etiquette, not destruction!
The problem is that when you sell a commodity like a TV or a vcr and the only difference between them is price you can't exactly maintain a high profit margin. What they need to do is obfuscate the prices so that its next to impossible to compair products. Thats how cell phones work.
This phone has 500 any time minutes for 3 cents a minute from your calling area roaming is 10 cents a minute, unlimited text messaging, 800 night and weekend minutes is free for the first 6 months and has rollover.
This other phone has 1000 minutes for 2 cents a minute but with out rollover and text messaging costs 1 cent per message, night and weekends are free but don't start till 9pm.
See its not exactly easy to just look at the plans and see which one is the better plan.
He who knows not and knows he knows not is a wise man. He who knows not and knows not he knows not is a fool.
Once their prices hit the Internet, they're in the public domain. It would be like posting your prices in the window, and complaining that a car driving past could photograph them.
We all know that bots crawl the web - Google, Altavista, spam-bots... they're all common knowledge. You put information on a website, and it's going to be viewed by an automated process. Surely with that knowledge, it's ridiculous to think you can ban people for using the information you've posted publicly in whatever way they desire.
Perhaps these companies (airlines, computer stores, whatever) need to start offering their services at the price they really mean to sell it for, rather than this stupid haggling they expect from us. Or maybe it's time they focused on quality of service, value-add, etc rather than price wars (which never help anybody in the long term).
Bottom line? If you don't want your competitors seeing your prices, don't make them available to them - this means no junkmail, no spam, no website, no prices in the store window, no prices inside the store, nothing.
Also, Pricewatch, Pricegrabber and Froogle scour the web for prices and create search engines out of them so consumers can find the best price.
I'm not saying just because everyone else is doing it means you can too (and you might have a slightly different objective causing these examples to be weightless) but it's being done all over the place.
Hope that helps.
As for the ethical part of telling your employer about this... well, first, remember, this is just a decision of the First Circuit. If you live in a different Circuit, then it may or may not be binding on you. I know this jurisdictional stuff can be a little confusing, but a decision by a Circuit only affects the jurisdictions within it. Only the US Supreme Court (generally, I know there are federal tax, patent, admiralty, etc. courts, too) can make decisions that are binding on the entire country. If you're not sure, check with your corporate counsel. And it might be a good idea to forward the case to him anyway, you might be able to pick up some "bonus points" from your boss for being an especially conscientious employee.
IAAL
seems like it's the using confidential information part that got the scrapper capped.
I don't see why accessing *public* information be problematic.
the only thing that may be of trouble is the website EULA, but then the EULA would be saying the same thing as "don't visit my store unless you intend to buy," which would be rediculous in brick-and-mortar world (and should be similarly in cyberspace).
last question, though - why the heck would you ask this kind of stuff HERE? wouldn't a law-forum be a better choice?
My life in the land of the rising sun.
The amount of beauty required to launch 1 ship: 1 Millihelen
I am not a lawyer.
Slashdot is not a lawyer.
Slashdot is not a replacement for a lawyer.
Individual posters on slashdot may be lawyers, but are you really willing to trust your future to what some random person online says, when they could be a lawyer, but could also be some 14 year old kid who thinks it's amusing to screw with people?
Repeat after me:
I will seek proper legal advice.
Seriously, this comes up time and time again. If you're in a situation where you need actual concrete legal advice, SLASHDOT IS NOT THE PLACE TO GO. Sending in an Ask Slashdot is fine for theoretical questions, but when your ass is at stake if a lawsuit comes around, do you really want to trust your future to the legal advice given to you by Anonymous Cowards and karma whores?
Be the Ultimate Ninja! Play Billy Vs. SNAKEMAN today!
Filing lawsuits to protect your price information is just dumb, not to mention waste (if not abuse) of the legal system.
Personal feelings about freedom of information aside, and just from a coder's POV, here's my solution.
If they really want to avoid getting scraped, they should just get their existing, underpaid web developers to create a backend setup that generates the prices as gif's that give OCR hell (such as those used to prevent automated registration of say Yahoo! email accounts).
Coders are cheaper than lawyers (at least those needed to write such code as this).
Sure, the compition could pay more money to get somebody to develop better OCR to read each and every dynamically generated GIF, but most people require proof reading of OCR data, which leads to even more cost.
Something I learned from my Uncle who works with the DOD is this: Any lock can be picked; Any encryption can be broken. It's just a matter of if it's worth the time and money to get what's inside.
In short, with a little one time cost, the company that doesn't want it's prices scraped can just make it so hard to scrape their prices that it's not worth it. The price of scraping the graphically displayed price tags would also be an ongoing cost of software and proofreaders that would dip into profit margins, which management at the company that desires the scraping won't like.
It's not perfect, but it's better (and more bankable) than going whining to the legal system. (Especially since coders are generally cheaper than lawyers).
DONT PANIC
I think what you have to look at is the media context in which the prices are displayed.
It's quite true that many stores will try to prevent you from making recordings of any kinds on their physical premises. I've been reprimanded by store managers many times for taking photos in the store. But their right to prevent me from creating media on their premises is based on their property rights, not any some legally backed authority to censror the media.
The web is a totally different story. I use web scrapers all the time and a site that doesn't like it can kindly take its ass off the web. Once you place material on the web, it is published. If you don't want to publish your prices, you don't have to. That's like publishing a book and complaining the readers read it too fast.
The people who compain about such things are the idiots who create unworkable business plans based on their own assumptions about how people are going to use the resource. This is an interesting issue with news media that want to sell access to their archives. There's no way they can both publish to the web and prevent me from caching old copies. If that's the business plan then web publishing is an inappropriate business decision and guess who should pay for bad business decisions: the consumer, or the fool who pursued an ignorant business plan?
Anybody who publishes information about their business runs the risk that a competitor will get hold of the information and use it in some way. This has always been a fact of life in the physical world. As computers came online in recent decades many companies have maintained databases of information about competitors' products. The Internet doesn't change any of this.
Aside from the well-known problems with any click-through agreement (contract between unknown parties, software circumvention, lack of notarization, etc.), the additional flaw in this case is provided by web archives. If you don't want to have to look at a click-through page before reading your competitor's deep dark secrets, just download what you want from a public web cache. Are these jokers going to turn around and sue Google, as well?
Actually, that brings up an interesting point. When Google gets sued for forwarding information to competitors without click-throughing them, they will probably deny that such was not their "intent" in providing the web archive. Of course, the competitors do have an "intent" that the original site doesn't condone. But there is not a technical means of determining intent over the current version of HTTP. If the original site wants to do this, it is using the wrong technology. Of course someday if the ebXML folks get off their collective butts, we might have some sort of contract-negotiation protocol. I doubt a consumer e-commerce site would be interested in erecting such barriers to entry, but this would probably be useful in certain B2B contexts. Until then, honoring click-through pages in the breach will only harm the internet. Any court case that declares that particular intents make a party ineligible to download particular material served over the web (that's my understanding of the agreement that we're clicking through here) will only harm the web and all open systems.
later,
Jess
I am programmed for etiquette, not destruction!
When somebody asked him how he can make a living like that, he replied:
"Volume!"
They say the first thing to go is your penis. Well, it's either that or your brain. I forget which...
I'd say yes. Second only to apostrophe's.
Everything that was once directly lived has receded into a representation. -debord
Read the case...EF Cultural Travel BV v. Explorica hinges on the fact that the defendant company hired an ex-programer from the plaintif company. The programmer had special knowledge of codes used in the pricing (which he had signed a confidentiality agreement not to disclose). When he made the scrapper program he violated the confidentiality agreement.
:) Depending on how the contract is written you could be jointly liable.
It was the violation of the confidentiality agreement that the court held was illegal.
As for whether you should tell your employer, it depends on your employment agreement!
While this is a 1st Circuit case, it has been followed by the 5th Circuit (Ingenix, Inc. v. Lagalante) and cited in cases in the 7th and 9th Circuit.
Hope this helps.
--me
of a man named Ronald Kahlow and his troubles with Best Buy back in 1997.
Karma: Can only be portioned out by the Cosmos.
What did blind people ever do to you?
All's true that is mistrusted
What he's doing isn't really data mining. Data mining is the process of discovering patterns in data which are not known ahead of time, such as the infamous "beer and diapers" correlation.
That said, I don't understand why the author is worried. I can't see how looking at publicly posted prices could be considered illegal.
People who studies economics faces some irreal hipothesis in text books. The first topic most students have to deal with when taking the microeconomics course is when you have a big group of firms selling the very same product. If the buyer has perfect information about prices hi will choose the lowest price. The buyer's choice will influence the behaviors of all other firms that will tend to get their prices down to beat the one choosed by the buyer. We will have a dynamics that will make the price go down until the item will cost to the user the same it costs to be produced. In the real world it is very unrealistc to believe someone could have information about all the sellers prices. But with Data Minig we can have MORE information about sellers than in the real world, and we can access this infrmation with a smaller cost. We should then be nearest to perfect competition books theorize than in the real world. There is although some problems to solve before jumping to this conclusion: There is not that big number of firms competing, delivery fees, warranty and time of arrival of the product can be very different from seller to seller. Could a "perfect bot" could handle all this information. If the answer is positive firms can folow two paths : cartelization or dumping. The first one happens when firms pacts prices together and force buyers to py more, because competition is "freeze". The second one hapens when the firm artificially gets down the price to a lower level than the costs to force the competition to bankrupcy. Both behaviors are dangerous to consumers and are forbiden in most countries. IMHO a site's EULA can't go agains market law. I presume that, at least inside the same democratic country, it is legal to data mine in that way. And I can't see why a competitor can't use it as tool to build it's price strategy. It's the invisible hand Adam Smith's intuished about. The WWW is evolving, maybe in a way some people can dislike, and is using the same rules we use in the real world to make money. And I'm sure competitor will soon find solutions to prevent data mining from their sites, at least information they don't want to share. IT solutions. That do not require lawyers but intelligence and insight.
You see signs everywhere you go:
-Shirt & shoes required.
-No loitering.
-No soliciting.
-Check all bags at counter.
-No more than two students allowed in store at one time.
-Parking lot, bathroom, etc. for customer use only.
Just because a building (or a web site) is in a public place doesn't mean that everyone is free to do whatever they want. Business owners are free to create house rules that everyone needs to follow.
Similarly, web sites can legally restrict what you are allowed to do when you visit them without having to build security measures to force compliance. If web retailers don't mind robots harvesting their inventory and prices, great. If, however, they want to place restrictions on who can access their site and how, that's entirely their prerogative.
Think about it. Leaving the door to your home unlocked would make it easier for people to steal your stuff, but it still wouldn't make it legal for them to do so unless you put up a sign saying something like, "Free for the taking."
Web scrapers are legal to develop and they're legal to use on sites with acceptable use policies that allow them. However, your customers should be prepared for the possibility that some or all of their competitors could make them stop using it at some point. And, in the interests of maintaining your own professional ethics, you should probably call their attention to the issues surrounding the job they're asking you to do.
Discussing legal issues is not just a business for lawyers. Non-lawyers can give each other useful pointers. And non-lawyers actually have an obligation to determine whether their legislators are doing a good job with the laws they enact and judges they appoint, and a healthy discussion is a good start.
Sorry, I call bullshit.
Over the past 6 or 7 years I've used a palm (handspring visor, to be more precise) hundreds of times, in every Best Buy (and Circuit City, MicroCenter, etc.) in the Boston area to record prices. I've never had anyone even look at me funny.
Maybe it's related to how guilty (or difficult to remove) you look, but I really doubt that happened to anyone ever (note the once-removed story -- it's always a 'friend of mine' in these types of stories.)
In any case, what kind of wuss would leave without making a fuss and forcing them to call the police over something so ridiculous? I could be using my palm to look up my friend's number to call and ask which video card to buy. Fsck them if they don't like it.
Or, maybe this particular Best Buy was located in an airplane and the event happened during takeoff or landing. Or your friend lied to you. One or the other.
everything in moderation
"Instead, the Court concluded that the mere fact that Register.com had decided to sue Verio meant that Verio's use of the search robot was without authorization: 'because Register.com objects to Verio's use of search robots,' the Court held, 'they represent an unauthorized access to the [Register.com] WHOIS database.'"
[Ironically, the pdf for the paper apparently uses some feature of Acrobat which disallows copying text from it. I guess they don't want robots scraping text out of it or something. First time in quite a while I've had to type a quote from the net by hand!]
There are 0x40000000 types of people: those who understand 32-bit IEEE 754 floating point, and those who don't.
Slashdot is not a replacement for a lawyer.
Slashdot is useful to get a sense of what the legal landscape is like. Some comments are to the effect: "I am not a lawyer, but my lawyer told me this." Or "I am not a lawyer, but here is the statute [cornell.edu], and here is how a court has interpreted it [eff.org]." When you do see an attorney after reading the comments, you don't have to wait for the attorney to explain the basics. This saves time, and time is money, especially at the typical copyright and trade secret specialist's rate.
That said, you're right about one thing: anything you read on Slashdot is not legal advice.
Will I retire or break 10K?
Since you mention that you may be building a screen scraper that gathers airline fares, you may be interested to know that American Airlines has already sued (and won a preliminary injunction against) a software company that built a tool that does much the same thing. The case is American Airlines v. FareChase, and was discussed on LawMeme:
The injunction order is posted on EFF's site, and the briefs are posted on Bag & Baggage.If you fear that it might be the wrong thing to do, it probably is.
You were 80% angel, 10% demon. The rest was hard to explain. - Over The Rhine
"Math in a song is good."-Linford
Screen scraping is data gathering. Data mining is looking for trends or patterns in data you already have. Getting the nuggets out of your data to continue with the mining analogy. From this presentation titled "An Introduction to Data Mining Technology" data mining is defined as "The automated extraction of hidden predictive information from (large) databases".
The bottom line is this: when you put this work experience down on your resume don't say you were data mining. Companies looking for that experience will ask you hard questions you don't know the answers to and you will be embarrassed.
There may be precedent for this. eBay was able to convince a judge to bar spidering of their site.
There is another legal concept called "Unfair Competition" which links copyright and facts.
Normally, facts cannot be copyrighted. However, this law seems to kick in when one company compiles and publishes time-sesitive information that it has taken from a direct competitor in a way which "free-rides" on the efforts of the competitor. It is usually applied to news organizations, when one newspaper sends a reporter to Iraq and a second newspaper (perhaps an evening edition) uses the "facts" in the first newspaper's article to publish the very same news.
I could see the instantaneous publishing of all competitors' prices as a violation of this legal theory.
The corporation will continue to hump the internet like a dog in heat until it becomes as regulated, watered down, and crapped out like the NBC news.
Two things that come to mind (depends on the site and what they currently support technically browser- and serverside):
-- use front-end technologies that prohibit or at least inhibit the workings of server-side scraperbots. examples: flash, javascript.
-- use sessions to control how often a given client can access prices, e.g. 'a 10 and you're out' rule: most 'ordinary' users have no need to view a certain page of prices more than X times in a browser session. here, cookies provide even more protection since some scrapers won't be set up to handle them.
both systems may have their drawbacks (no flash allowed), weaknesses (against sessions i can simply make multiple logins), but i've incorporated similar systems on sites for clients with prices that need a sensible level of protection (i.e. one shouldn't be able to grab the whole damn price list with a one-page GET e.g.). guarding against SQL injection is also something which is often forgotten.
Cheers,
Nalfy
-- Despair is an operating system that ANY human being can run, sort of a psychological JAVA --
I'm sure there are a few products that assist disabled persons to "surf" websites by disecting the web page (through essentially screen scraping techniques) and performing one or more the following:
1. Adjusting text size.
2. Dictation of content.
3. Numbering of links.
4. Numerous other alternate presentation of the same data (changing colors for the color blind for example).
An outright ban of automated scraping techniques would eliminate these uses. (While I am at it: What is a web-browser but a form of screen scrapper?).
If the basic technique is allowed, all that can be debated is the use of such data and I think that is a much more dubious area. Facts are public domain.
Maybe they can use the "my bot is blind defense".
Check out Largest recipe database on the web.
Here is a related incident:
http://news.com.com/2110-1017-944258.html
Bargain Network spidered real estate prices on homestore.com/realtor.com and posted them on the bargain.com website. Homestore sued and the case was settled out of court. I wish it was not settled out of court because that would set up a precident.
In my opinion you are asking for the problems. Taking a case like this to court and winning would be difficult. At the very least it would be a serious legal expense.
The last time I checked the rules for Froogle you had to be the actual merchant that ships the product in order to show up in their index. If you are spidering a merchant then you are an affiliate, the products do not originate from you so you would be exluced from Froogle. Froogle does not allow you to sort products by price - so obviously what you plan on doing is different. Froogle also gives merchants the option to be excluded from their index.
My advice is this - get a lawyer because one will surely be contacting you. Familiarize yourself with these phrases: false advertising, breach of contract, and unfair competition.
AARRRGGH. I hate ./ sometimes.
From the court summary of the decision:
The court affirmed a preliminary injunction enjoining defendant Zefer Corporation ("Zefer") from utilizing a "scrapper" tool it designed to obtain pricing information from plaintiff's website on the ground that Zefer was doing so to assist defendant Explorica, Inc. ("Explorica"), which was itself enjoined from such activity by virtue of its improper use of confidential information obtained from plaintiff to aid it in gathering this information.
This means that Zefer was prohibited from mining the data because their client (Explorica) was prohibited, because Explorica and Zefer had gained access to the data by exploiting confidential information. Which is another issue entirely from data mining...