Slashdot Mirror


Is Data Mining for Product Pricing, Illegal?

wessman asks: "I started to read Orin S. Kerr's 80-page paper looking for how his proposal would pertain to: ripping music/movies, P2P, corporate espionage, and lastly, the use of web scraper robots. Little did I know just how relevant his paper would be in regards to that last item! Kerr makes note of EF Cultural Travel v. Explorica in which Explorica is caught hiring a consultant to program a scraping robot to gather pricing information from a competitor, EF Cultural Travel. Well, I do consulting on the side from home and am currently working a project whereby I gather pricing information from all the major travel conglomerates (Orbitz, Expedia, Lodging.com, WorldRes, Sabre, etc.) so that the travel booking business that hired me can meet or beat all their prices. Granted, the circumstances of the Explorica case are different and the case was an example of an extreme ruling, but my questions to the Slashdot community are: Do I notify the company that hired me of the Explorica case? Why is using a scraper robot so different from, say, walking into Best Buy with a handheld and recording product pricing manually? Should I continue with this project and the similar projects I do in this area of programming?" Now, add in the text in the "deliverables" section of this press release and it seems we may have some contradictory information. Who is right, and under what circumstances is price harvesting off of the internet not allowed?

20 of 350 comments (clear)

  1. TITLE HAS POOR GRAMMAR by Anonymous Coward · · Score: 3, Informative
    EDITORS: A better way to phrase that headline is as follows:
    Is It Illegal to Data-Mine for Product Pricing?


    Thank you.
  2. Price Scraping? by Anonymous Coward · · Score: 3, Informative

    Actually, price scraping was done in a very low-tech way a reasonably long time ago by a pretty well-known businessman: Mr. Sam Wal-Mart. Early in his career, he would dumpster-dive his competitors to find out the prices his competitors paid for their goods, the contracts they had with their suppliers, etc. This provided him with "insider information" so he'd know how to prepare his pricing in a more strategic fashion and obviously out-compete them where it counts: financially.

    1. Re:Price Scraping? by Anonymous Coward · · Score: 5, Informative

      Wrong. Read Sam Walton: Made in America: My Story. Sam Walton says that was a story put out by his competitors to disparage his name and he never did anything of the sort.

  3. Re:I swear by Anonymous Coward · · Score: 1, Informative

    uhh, the vendors add themselves to pricewatch bub.

  4. The US Code... by Pettifogger · · Score: 3, Informative
    It seems that the US Code and the First Circuit make this one pretty clear. If you have to agree to some sort of "terms of use" to get onto a website, you are bound by what those "terms of use" say. By clicking through, you have agreed to a contract and you have to abide by it. If your competitor's site requires you to agree to such terms, and those terms prohibit data mining, then you can't data mine there. Simple.

    As for the ethical part of telling your employer about this... well, first, remember, this is just a decision of the First Circuit. If you live in a different Circuit, then it may or may not be binding on you. I know this jurisdictional stuff can be a little confusing, but a decision by a Circuit only affects the jurisdictions within it. Only the US Supreme Court (generally, I know there are federal tax, patent, admiralty, etc. courts, too) can make decisions that are binding on the entire country. If you're not sure, check with your corporate counsel. And it might be a good idea to forward the case to him anyway, you might be able to pick up some "bonus points" from your boss for being an especially conscientious employee.

    --

    IAAL

  5. Re:I swear by Sparr0 · · Score: 5, Informative

    Pricewatch doesnt mine. Companies PAY for the privelege of listing their prices on pricewatch.

  6. Why waste time in the legal system? by x00101010x · · Score: 4, Informative

    Filing lawsuits to protect your price information is just dumb, not to mention waste (if not abuse) of the legal system.

    Personal feelings about freedom of information aside, and just from a coder's POV, here's my solution.

    If they really want to avoid getting scraped, they should just get their existing, underpaid web developers to create a backend setup that generates the prices as gif's that give OCR hell (such as those used to prevent automated registration of say Yahoo! email accounts).

    Coders are cheaper than lawyers (at least those needed to write such code as this).

    Sure, the compition could pay more money to get somebody to develop better OCR to read each and every dynamically generated GIF, but most people require proof reading of OCR data, which leads to even more cost.

    Something I learned from my Uncle who works with the DOD is this: Any lock can be picked; Any encryption can be broken. It's just a matter of if it's worth the time and money to get what's inside.

    In short, with a little one time cost, the company that doesn't want it's prices scraped can just make it so hard to scrape their prices that it's not worth it. The price of scraping the graphically displayed price tags would also be an ongoing cost of software and proofreaders that would dip into profit margins, which management at the company that desires the scraping won't like.

    It's not perfect, but it's better (and more bankable) than going whining to the legal system. (Especially since coders are generally cheaper than lawyers).

    --
    DONT PANIC
  7. Read the case... by anubis · · Score: 5, Informative

    Read the case...EF Cultural Travel BV v. Explorica hinges on the fact that the defendant company hired an ex-programer from the plaintif company. The programmer had special knowledge of codes used in the pricing (which he had signed a confidentiality agreement not to disclose). When he made the scrapper program he violated the confidentiality agreement.

    It was the violation of the confidentiality agreement that the court held was illegal.

    As for whether you should tell your employer, it depends on your employment agreement! :) Depending on how the contract is written you could be jointly liable.

    While this is a 1st Circuit case, it has been followed by the 5th Circuit (Ingenix, Inc. v. Lagalante) and cited in cases in the 7th and 9th Circuit.

    Hope this helps.

    --me

  8. I'm reminded by Nexzus · · Score: 2, Informative

    of a man named Ronald Kahlow and his troubles with Best Buy back in 1997.

    --
    Karma: Can only be portioned out by the Cosmos.
  9. Re:well....duh by Jeremy+Erwin · · Score: 4, Informative
    Amazon says you can't.


    Amazon.com grants you a limited license to access and make personal use of this site and not to download (other than page caching) or modify it, or any portion of it, except with express written consent of Amazon.com. This license does not include any resale or commercial use of this site or its contents; any collection and use of any product listings, descriptions, or prices; any derivative use of this site or its contents; any downloading or copying of account information for the benefit of another merchant; or any use of data mining, robots, or similar data gathering and extraction tools. This site or any portion of this site may not be reproduced, duplicated, copied, sold, resold, visited, or otherwise exploited for any commercial purpose without express written consent of Amazon.com.


    I am guessing that the prohibition on "visit[ing] for any commercial purpose" precludes me from actually purchasing their wares.
  10. Re:hmm, anybody rfta? by Anonymous Coward · · Score: 2, Informative

    I work for one of the major online travel sites.

    If you screen scrape us, and we notice it (and we very often do), your IP address will be blocked by our firewalls.

    What most people don't realize is that very often any search performed on these sites costs the company money. In many cases, if you search for, say, a hotel on Expedia or Orbitz or Travelocity, those companies are paying one of the major Hotel reservation systems for their results from that search.

    So, if someone is screenscraping our site, each search they perform to grab prices for a hotel for a day requires us to send some money to Pegasus or Travelweb or one of the other biggies. In hotels, for example, Pegasus is the big CRS. In order to get enough bookable hotels to make thier site useful, Expedia and Orbitz and Travelocity all need to buy search results from Pegasus.

    So it's not just a matter of taking their bandwidth and CPU time.

    We do allow some companies to screen scrape our site for some various reasons, but they all pay us for the privilege. We don't tend to take legal action against unallowed screen scrapers, but we will do what we can to make their life harder.

  11. Re:Easy fix. by canajin56 · · Score: 2, Informative

    Scott Adams refers to this as a "confusopoly" Telephone companies and airlines use them. Since you can never tell who has the best price, they can all remain in business.

    --
    ASCII stupid question, get a stupid ANSI
  12. AA.com v FareChase by FredEFF · · Score: 2, Informative

    Since you mention that you may be building a screen scraper that gathers airline fares, you may be interested to know that American Airlines has already sued (and won a preliminary injunction against) a software company that built a tool that does much the same thing. The case is American Airlines v. FareChase, and was discussed on LawMeme:

    American ... sued FareChase in a Texas court (American is based in Dallas, so that's its home turf) and got a preliminary injunction against FareChase's screen-scraping practices. The court decided that the screen-scraping constituted an "interfer[ence] with American's personal property," also known these days as a trespass to chattels. The court also noted that FareChase's actions might be a criminal violation of Texas Law, which states, "A person commits an offense if the person knowingly accesses a computer, computer network, or computer system without the effective consent of the owner." Tex. Penal Code 33.02(a).
    The injunction order is posted on EFF's site, and the briefs are posted on Bag & Baggage.
  13. Screen scraping is not data mining by Call+Me+Black+Cloud · · Score: 3, Informative

    Screen scraping is data gathering. Data mining is looking for trends or patterns in data you already have. Getting the nuggets out of your data to continue with the mining analogy. From this presentation titled "An Introduction to Data Mining Technology" data mining is defined as "The automated extraction of hidden predictive information from (large) databases".

    The bottom line is this: when you put this work experience down on your resume don't say you were data mining. Companies looking for that experience will ask you hard questions you don't know the answers to and you will be embarrassed.

  14. Not the pricing -- the timing by RalphSlate · · Score: 3, Informative

    There may be precedent for this. eBay was able to convince a judge to bar spidering of their site.

    There is another legal concept called "Unfair Competition" which links copyright and facts.

    Normally, facts cannot be copyrighted. However, this law seems to kick in when one company compiles and publishes time-sesitive information that it has taken from a direct competitor in a way which "free-rides" on the efforts of the competitor. It is usually applied to news organizations, when one newspaper sends a reporter to Iraq and a second newspaper (perhaps an evening edition) uses the "facts" in the first newspaper's article to publish the very same news.

    I could see the instantaneous publishing of all competitors' prices as a violation of this legal theory.

    1. Re:Not the pricing -- the timing by jkabbe · · Score: 2, Informative

      Keep in mind that eBay lost their first case against Bidder's Edge's because it was solely a copyright case. Then they brought a conditions of use case that they won.

  15. Re:you are stating the obvious by Jerf · · Score: 2, Informative

    You don't say. Do you really think everybody else is stupid?

    Stupid? No. But the number of people who seem to think they are lawyers is very large, and not just on Slashdot either. I can't count the number of times in my real life I've discussed intellectual property issues and not only has the other person been very, very wrong, but I was not even able to get them to listen to me.

    I'm not a lawyer, but I've taken a close interest in that sort of thing and I know the basics very well.

    As a Slashdot example, basically, if someone is insisting that they have a "fair use" right to something, unless they are jusitifying it with reference to the four criteria used to determine if something is fair use, they're wrong.

    People seem to need periodic reminding that they aren't lawyers, and other non-lawyers aren't lawyers.

    For computer examples, how many times have you heard someone around you give an incredibly wrong reason for a crash... and stick by it, even after you fix it, because of course they're right?

    Discussion of issues is one thing. Talking about something that could make or break a career, that's a time for a real lawyer, not hundreds of people who think they are lawyers.

    Please read Unskilled and Unaware of It: How Difficulties in Recognizing One's Own Incompetence Lead to Inflated Self-Assessments. I don't know if that's risen to "classic paper" status but as far as I'm concerned it has.

  16. Protecting against scrapers by nalfeshnee · · Score: 2, Informative

    Two things that come to mind (depends on the site and what they currently support technically browser- and serverside):

    -- use front-end technologies that prohibit or at least inhibit the workings of server-side scraperbots. examples: flash, javascript.

    -- use sessions to control how often a given client can access prices, e.g. 'a 10 and you're out' rule: most 'ordinary' users have no need to view a certain page of prices more than X times in a browser session. here, cookies provide even more protection since some scrapers won't be set up to handle them.

    both systems may have their drawbacks (no flash allowed), weaknesses (against sessions i can simply make multiple logins), but i've incorporated similar systems on sites for clients with prices that need a sensible level of protection (i.e. one shouldn't be able to grab the whole damn price list with a one-page GET e.g.). guarding against SQL injection is also something which is often forgotten.

    Cheers,

    Nalfy

    --

    -- Despair is an operating system that ANY human being can run, sort of a psychological JAVA --

  17. Re:I swear by CTho9305 · · Score: 3, Informative

    WARNING: You can see I write crappy perl / shell script. I make no guarantees about this code.

    NOTE: replace all instances of "ABC" with "|".

    parse.sh:
    #!/bin/bash
    #Copyright CTho9305 2003. You are given permission to redistribute this file provided this copyright notice is left intact. You may modify it as you want. Please share any modifications (you are not required to).
    #barton
    lynx -dump http://www.pricewatch.com/menus/m3.htm ABC egrep 'upABCdnABC - ' ABC cut -b5- ABC perl -pe 's/\s+/ /' ABC cut -f1,3- -d ' ' ABC perl -pe 's/\[.*\]//' ABC perl -pe 's/\s+/ /' ABC grep -i xp ABC grep 333 ABC cut -f1,4 -d ' ' > XP333.dat
    #XP
    lynx -dump http://www.pricewatch.com/menus/m3.htm ABC egrep 'upABCdnABC - ' ABC cut -b5- ABC perl -pe 's/\s+/ /' ABC cut -f1,3- -d ' ' ABC perl -pe 's/\[.*\]//' ABC perl -pe 's/\s+/ /' ABC grep -i xp ABC grep -v 333 ABC cut -f1,4 -d ' ' > XP.dat
    #Apparently I have too many junk characters
    #MP
    lynx -dump http://www.pricewatch.com/menus/m3.htm ABC egrep 'upABCdnABC - ' ABC cut -b5- ABC perl -pe 's/\s+/ /' ABC cut -f1,3- -d ' ' ABC perl -pe 's/\[.*\]//' ABC perl -pe 's/\s+/ /' ABC grep -i mp ABC perl -pe 's/\.//' ABC perl -pe 's/GHz/00/' ABC cut -f1,4 -d ' ' > MP.dat
    # Celeron
    lynx -dump http://www.pricewatch.com/menus/m3.htm ABC egrep 'upABCdnABC - ' ABC cut -b5- ABC perl -pe 's/\s+/ /' ABC cut -f1,3- -d ' ' ABC perl -pe 's/\[.*\]//' ABC perl -pe 's/\s+/ /' ABC grep -i celeron ABC cut -f 1,3 -d ' ' ABC perl -pe 's/ 1GHz/ 1.0GHz/;s/\.//;s/GHz/00/' > celeron.dat
    #P4
    lynx -dump http://www.pricewatch.com/menus/m3.htm ABC egrep 'upABCdnABC - ' ABC cut -b5- ABC perl -pe 's/\s+/ /' ABC cut -f1,3- -d ' ' ABC perl -pe 's/\[.*\]//;s/\s+/ /' ABC grep -i 'pentium 4' ABC egrep -i "sockABC2\.ABC3\." ABC grep -v -i 400MHz ABC perl -pe 's/ 533MHz//;s/GHz/00/;s/ Sock 478//;s/\.//' ABC cut -f1,4 -d ' ' ABC cut -b -8 > pentium4.dat
    gnuplot gnuplot.script > ~/www/out.png

    Anyway, I wrote this because I was bored and wanted to see what a good price point was for current Athlons. If you examine the graphs carefully you might note that the XP's are not properly differentiated. Some are 333s and marked as that, others aren't marked properly, etc. With the new 400s, it gets worse. For the P4s, I got a little luckier because the speed ranges don't overlap as much. I think I'm going to not differnetiate between the various FSBs of Athlon XPs because the prices are close enough anyway.

    Anyway, it has served its purpose by helping me find a point where the processors are reasonably fast, and the bang for the buck is decent.

    gnuplot.script

    #Copyright CTho9305 2003. You are given permission to redistribute this file provided this copyright notice is left intact. You may modify it as you want. Please share any modifications (you are not required to).set terminal png color
    set xlabel "Speed (MHz or rating)"
    set ylabel "Cost ($USD)"
    set title "Speed vs. Cost"
    set grid
    set time
    set linestyle 1 lw 3
    plot "XP.dat" using 2:1 title "XP" with linespoints, \
    "XP333.dat" using 2:1 title "XP333" with linespoints, \
    "celeron.dat" using 2:1 title "celeron" with linespoints, \
    "MP.dat" using 2:1 title "MP" with linespoints, \
    "pentium4.dat" using 2:1 title "P4" with linespoints

    Anyone know how to change the text font, or the thickness of the lines?

    Sample output

  18. here is a similar case by mcguyver · · Score: 4, Informative

    Here is a related incident:

    http://news.com.com/2110-1017-944258.html

    Bargain Network spidered real estate prices on homestore.com/realtor.com and posted them on the bargain.com website. Homestore sued and the case was settled out of court. I wish it was not settled out of court because that would set up a precident.

    In my opinion you are asking for the problems. Taking a case like this to court and winning would be difficult. At the very least it would be a serious legal expense.

    The last time I checked the rules for Froogle you had to be the actual merchant that ships the product in order to show up in their index. If you are spidering a merchant then you are an affiliate, the products do not originate from you so you would be exluced from Froogle. Froogle does not allow you to sort products by price - so obviously what you plan on doing is different. Froogle also gives merchants the option to be excluded from their index.

    My advice is this - get a lawyer because one will surely be contacting you. Familiarize yourself with these phrases: false advertising, breach of contract, and unfair competition.