Is Data Mining for Product Pricing, Illegal?
wessman asks: "I started to read Orin S. Kerr's 80-page paper looking for how his proposal would pertain to: ripping music/movies, P2P, corporate espionage, and lastly, the use of web scraper robots. Little did I know just how relevant his paper would be in regards to that last item! Kerr makes note of EF Cultural Travel v. Explorica in which Explorica is caught hiring a consultant to program a scraping robot to gather pricing information from a competitor, EF Cultural Travel. Well, I do consulting on the side from home and am currently working a project whereby I gather pricing information from all the major travel conglomerates (Orbitz, Expedia, Lodging.com, WorldRes, Sabre, etc.) so that the travel booking business that hired me can meet or beat all their prices. Granted, the circumstances of the Explorica case are different and the case was an example of an extreme ruling, but my questions to the Slashdot community are: Do I notify the company that hired me of the Explorica case? Why is using a scraper robot so different from, say, walking into Best Buy with a handheld and recording product pricing manually? Should I continue with this project and the similar projects I do in this area of programming?" Now, add in the text in the "deliverables" section of this press release and it seems we may have some contradictory information. Who is right, and under what circumstances is price harvesting off of the internet not allowed?
In what sane land would PRICES be protected under law? You can't really keep them secret, so "trade secret" is right out. It's not a identifying mark (unless you're a dollar store), so much for trademark. There's nothing useful that hasn't been done before, fuck patenting them. Copyright? It's a simple derivation of what the supplier charges you.
There is nothing creative about pricing stuff. Good lord.
-- Bill "Houdini" Weiss
Anybody remember the Popeye cartoon where Popeye opens a car wash, and then Bluto opens a car wash right across the street?
When Popeye posts his price, Bluto beats it by five cents. Then Popeye beats Bluto's by five cents.
It goes back and forth until Popeye is washing cars for free.
I was tempted to call this an infinite loop, but I doubt a retailer would pay you to take their products.
Of course it is. Let's dissect this sentence:
Not having the comma would completely distort the meaning of the sentence.
Thank you.
My grandmother calls this "shopping around." The only difference is that someone else is doing all the work.
Smeghead every day of the week.
I agree. It makes the delivery somewhat, stilted.
Actually, price scraping was done in a very low-tech way a reasonably long time ago by a pretty well-known businessman: Mr. Sam Wal-Mart. Early in his career, he would dumpster-dive his competitors to find out the prices his competitors paid for their goods, the contracts they had with their suppliers, etc. This provided him with "insider information" so he'd know how to prepare his pricing in a more strategic fashion and obviously out-compete them where it counts: financially.
I think any large business where the pricing structure isn't directly related to costs is probably deeply afraid of agents that aggregate their data with competitors. You end up with a more ideal market, a more frictionless market, if you will, and they'll be forced to compete on narrower and narrower margins of profit. Of course they'll want to throw up barriers to that.
But I'll bet this issue comes down to Terms of Service and what a company can reasonably expect to be able to legally require/forbid about the use of data provided via an automated means...
Tweet, tweet.
As long as you get paid, let them worry about the lawsuit. They're the ones who are going to actually use it. Keep your mouth shut.
If powerful people get screwed, it's illegal.
If it forces large corporations have to work harder to earn a profit, it's illegal.
If it give the little guy a leg up or levels the playing field in any way, it's illegal.
If it's illegal and you're big and powerful, don't worry about it, you can probably get away with it with little damage to your business or career and keep almost all of you cash minus legal fees.
<a href="http://www.joblessjimmy.com">Work is dumb and so is Jobless Jimmy.</a>
Who is this "Illegal" person and why are we asking him questions about Star Trek characters?
I'm not "Illegal," but I'll answer. Seeing as how he was killed off in the last movie, I think it's safe to say that no, Data is not mining for product pricing.
(In other words, you illiterate clods need to be more careful with your commas.)
The Albertson's I walked into today said No video taping or cameras on "premises" without prior permission.
This is obviously their perogative.
Similiarily, wouldn't it be up to the site in question to at least post up-front rules to the conduct on their site? Such 'adult-material' warnings seem a comparible attribute to a site.
And in that case, would they be enforcable or even proveable?
Look, I I can visit a web site and the business (Let's say Amazon) publicly posts their prices for anybody to see then you sure as hell can use them! If suddently using bots to do work are illegal then I'd wadger that every shell script that I write is an affront to US Laws. Rotating log files and all sorts of other "make my job easier so that I can play Quake" scripts are perfectly legal, so how the hell can it be questionable just to go to a site and record prices???
Jebus, please help the Unites States Gub'ment!
First, the obvious: IF you are concerned about YOUR obligations towards your clients, talk to a lawyer competent in the areas of law this involves.
Second.. a cursory reading of the case you linked says that case was not about scraping in general, but about a consultant doing scraping on behalf of a client who was not permitted lawful access to the site/service being scraped by way of a terms of use.
In other words: one party was indirectly accessing the site by way of a second party... it was determined that in the particular instance of this case, their action was not permitted.
I think they'll get a clue RSN.
Two specific cases in point.
1) At many of the deal sites (i.e. slickdeals.net, etc) once in a while this offer appears where after getting back your rebate, you have more money than you spent for the product.
2) Grocery coupons - in some cases, a store will run one of those "triple coupon Thurdays" promotions, and if you have the right coupon, the money-off total will exceed the price of the product. Depending on the the store, money is returned, or a credit is.
How does one receive authorization to access a web server? Hmm, maybe with a simple html GET? The basic fact here is that of judicial cluelessness. If I put information on a public web server, pretend to "protect" it with a disclaimer (of everything) at the bottom of the page, and then get pissed off because somebody browsed that information, I'm an idiot. In addition, I am legless in court. Web servers make information available to the world. If I had wanted to make information available to certain parties that I trust not to compete with me, I should have set up a secure server with some provision for authentication and authorization.
It really is that simple
later,
Jess
I am programmed for etiquette, not destruction!
The problem is that when you sell a commodity like a TV or a vcr and the only difference between them is price you can't exactly maintain a high profit margin. What they need to do is obfuscate the prices so that its next to impossible to compair products. Thats how cell phones work.
This phone has 500 any time minutes for 3 cents a minute from your calling area roaming is 10 cents a minute, unlimited text messaging, 800 night and weekend minutes is free for the first 6 months and has rollover.
This other phone has 1000 minutes for 2 cents a minute but with out rollover and text messaging costs 1 cent per message, night and weekends are free but don't start till 9pm.
See its not exactly easy to just look at the plans and see which one is the better plan.
He who knows not and knows he knows not is a wise man. He who knows not and knows not he knows not is a fool.
why, you little! gakk, gakk, gakk (bart choking)
I had a friend pricing out some equipment in Best Buy as well as all the other chains, when he pulled out his pda and started writing down prices for the things he was going to buy, store security rushed him and escorted him to the door.
Once their prices hit the Internet, they're in the public domain. It would be like posting your prices in the window, and complaining that a car driving past could photograph them.
We all know that bots crawl the web - Google, Altavista, spam-bots... they're all common knowledge. You put information on a website, and it's going to be viewed by an automated process. Surely with that knowledge, it's ridiculous to think you can ban people for using the information you've posted publicly in whatever way they desire.
Perhaps these companies (airlines, computer stores, whatever) need to start offering their services at the price they really mean to sell it for, rather than this stupid haggling they expect from us. Or maybe it's time they focused on quality of service, value-add, etc rather than price wars (which never help anybody in the long term).
Bottom line? If you don't want your competitors seeing your prices, don't make them available to them - this means no junkmail, no spam, no website, no prices in the store window, no prices inside the store, nothing.
If information is placed on a web site without some method to secure that information being used other than its intened purpose then It should not be data mined... But if that information is in the open with no steps taken to secure that information then it should be ok to datamine.
Who needs WiFi when we can have Packet Over Sheep! http://datacomm.org/PoS-InternetDraft.txt
All variations ona theme. they claim pricing info is copyrighted info.
o ri es/2002/12/16/story5.html
http://twincities.bizjournals.com/twincities/st
Also, Pricewatch, Pricegrabber and Froogle scour the web for prices and create search engines out of them so consumers can find the best price.
I'm not saying just because everyone else is doing it means you can too (and you might have a slightly different objective causing these examples to be weightless) but it's being done all over the place.
Hope that helps.
As for the ethical part of telling your employer about this... well, first, remember, this is just a decision of the First Circuit. If you live in a different Circuit, then it may or may not be binding on you. I know this jurisdictional stuff can be a little confusing, but a decision by a Circuit only affects the jurisdictions within it. Only the US Supreme Court (generally, I know there are federal tax, patent, admiralty, etc. courts, too) can make decisions that are binding on the entire country. If you're not sure, check with your corporate counsel. And it might be a good idea to forward the case to him anyway, you might be able to pick up some "bonus points" from your boss for being an especially conscientious employee.
IAAL
A client recently hired me to scrape the real estate listings off of a multiple listings service, and substitute his name and photo for the name and photo of the actual agent who had originally listed the properties. All of this gets shoehorned into his site's look and feel.
As hesitant as i was to initially do this (i did it for a friend who had promised this without understanding what it involved, and was under the gun), apparently several other agents are now dying to have the same treatment done to their websites. I wrote off this entire project the first time, since it is likely to choke the first time the original content site changes their layout or page structure. But now, others want the same thing, and i'm stuck whether to provide it with the same caveats, or to decline these crappy wrapper jobs.
later,
Jess
I am programmed for etiquette, not destruction!
Just the act of clicking on a link sets in motion a series of automated software tasks that will deliver me a price. Millions of shoppers do this every day. Are they in violation as well?
seems like it's the using confidential information part that got the scrapper capped.
I don't see why accessing *public* information be problematic.
the only thing that may be of trouble is the website EULA, but then the EULA would be saying the same thing as "don't visit my store unless you intend to buy," which would be rediculous in brick-and-mortar world (and should be similarly in cyberspace).
last question, though - why the heck would you ask this kind of stuff HERE? wouldn't a law-forum be a better choice?
My life in the land of the rising sun.
The amount of beauty required to launch 1 ship: 1 Millihelen
With respect to the question of whether prices are "in the public domain":
Airline prices are not like other prices. They involve complex calculations, calculations sufficiently complex that only a very small number of computer programs are capable of doing them, programs maintained by hundreds of developers that depend on data that costs companies like Expedia and Orbitz literally millions of dollars per year.
There have been court cases that have disallowed copyrights on numbers created by un-creative processes; the standard examples are telephone books and baseball scores. But it's not clear that airline prices would fall under the same categorization: there's a huge amount of "creativity" involved in their calculation, as evidenced by the different prices that appear for the same flights on different web sites.
So even if you can legally screen-scrape Best Buy, I wouldn't assume you can legally screen-scrape Orbitz or Travelocity or Expedia. And note that those sites quite explicitly prohibit robots and screen-scrapers.
IIRC, if the web sites include a statement forbidding the collection of prices from their website by bots in their terms of use, they might have a civil case against the company that you hired, but I wouldn't worry about it as the developer of the bot.
I read an article a year or so ago...can't remember where, maybe here, that Besty Buy and Circuit City do discourage this activity. The problem the article brought out is how do they tell the difference between comparing prices and people data mining? J
Abiit, excessit, evasit, erupit.
Prices cannot be copyrighted, since they are facts. Prices are not trade secrets if they are advertised in public (even on a shelf). Prices cannot be patented.
So, why can't they be researched again? That's right. It's a non-issue.
Next.
I am not a lawyer.
Slashdot is not a lawyer.
Slashdot is not a replacement for a lawyer.
Individual posters on slashdot may be lawyers, but are you really willing to trust your future to what some random person online says, when they could be a lawyer, but could also be some 14 year old kid who thinks it's amusing to screw with people?
Repeat after me:
I will seek proper legal advice.
Seriously, this comes up time and time again. If you're in a situation where you need actual concrete legal advice, SLASHDOT IS NOT THE PLACE TO GO. Sending in an Ask Slashdot is fine for theoretical questions, but when your ass is at stake if a lawsuit comes around, do you really want to trust your future to the legal advice given to you by Anonymous Cowards and karma whores?
Be the Ultimate Ninja! Play Billy Vs. SNAKEMAN today!
They simply take info given voluntarily by vendors and list it. They do not actively go out looking for this info, or actively ask for it, nor do they verify it. Inclusion on their pricing list is strictly voluntary.
STFU and do your job.
Vote Quimby!
Filing lawsuits to protect your price information is just dumb, not to mention waste (if not abuse) of the legal system.
Personal feelings about freedom of information aside, and just from a coder's POV, here's my solution.
If they really want to avoid getting scraped, they should just get their existing, underpaid web developers to create a backend setup that generates the prices as gif's that give OCR hell (such as those used to prevent automated registration of say Yahoo! email accounts).
Coders are cheaper than lawyers (at least those needed to write such code as this).
Sure, the compition could pay more money to get somebody to develop better OCR to read each and every dynamically generated GIF, but most people require proof reading of OCR data, which leads to even more cost.
Something I learned from my Uncle who works with the DOD is this: Any lock can be picked; Any encryption can be broken. It's just a matter of if it's worth the time and money to get what's inside.
In short, with a little one time cost, the company that doesn't want it's prices scraped can just make it so hard to scrape their prices that it's not worth it. The price of scraping the graphically displayed price tags would also be an ongoing cost of software and proofreaders that would dip into profit margins, which management at the company that desires the scraping won't like.
It's not perfect, but it's better (and more bankable) than going whining to the legal system. (Especially since coders are generally cheaper than lawyers).
DONT PANIC
I think what you have to look at is the media context in which the prices are displayed.
It's quite true that many stores will try to prevent you from making recordings of any kinds on their physical premises. I've been reprimanded by store managers many times for taking photos in the store. But their right to prevent me from creating media on their premises is based on their property rights, not any some legally backed authority to censror the media.
The web is a totally different story. I use web scrapers all the time and a site that doesn't like it can kindly take its ass off the web. Once you place material on the web, it is published. If you don't want to publish your prices, you don't have to. That's like publishing a book and complaining the readers read it too fast.
The people who compain about such things are the idiots who create unworkable business plans based on their own assumptions about how people are going to use the resource. This is an interesting issue with news media that want to sell access to their archives. There's no way they can both publish to the web and prevent me from caching old copies. If that's the business plan then web publishing is an inappropriate business decision and guess who should pay for bad business decisions: the consumer, or the fool who pursued an ignorant business plan?
If you have a genuine concern you should raise it. (As always, if it isn't in writing, it never happened.)
Then let it go. If you want to be extra-double sure, get your own lawyer.
You, however, are not their lawyer. It is not your job to advise them on legalities.
-Peter
Anybody who publishes information about their business runs the risk that a competitor will get hold of the information and use it in some way. This has always been a fact of life in the physical world. As computers came online in recent decades many companies have maintained databases of information about competitors' products. The Internet doesn't change any of this.
These people all think they know more about any topic than the actual practitioners and experts in any given subject. Jehovah himself could appear here, the the /.ers would lecture Him on how Heaven is made.
Output the price in a un-OCRable jpeg image.
So to me (IANAL) it does not look like any precedent against data mining for pricing information has been set. The closest this case comes to doing that is the First Circuit's opinion on what constitutes authorised use of the site under the Computer Fraud and Abuse Act. They say that the terms of use of the site can restrict the use of scrapers. This would be a weakening of the district court's opinion which was that the authorisation question should be looked at in light of the parties "reasonable expectations" (i.e. the website owners could "reasonably expect" the users of the site to be human and not scraper software).
Aside from the well-known problems with any click-through agreement (contract between unknown parties, software circumvention, lack of notarization, etc.), the additional flaw in this case is provided by web archives. If you don't want to have to look at a click-through page before reading your competitor's deep dark secrets, just download what you want from a public web cache. Are these jokers going to turn around and sue Google, as well?
Actually, that brings up an interesting point. When Google gets sued for forwarding information to competitors without click-throughing them, they will probably deny that such was not their "intent" in providing the web archive. Of course, the competitors do have an "intent" that the original site doesn't condone. But there is not a technical means of determining intent over the current version of HTTP. If the original site wants to do this, it is using the wrong technology. Of course someday if the ebXML folks get off their collective butts, we might have some sort of contract-negotiation protocol. I doubt a consumer e-commerce site would be interested in erecting such barriers to entry, but this would probably be useful in certain B2B contexts. Until then, honoring click-through pages in the breach will only harm the internet. Any court case that declares that particular intents make a party ineligible to download particular material served over the web (that's my understanding of the agreement that we're clicking through here) will only harm the web and all open systems.
later,
Jess
I am programmed for etiquette, not destruction!
When somebody asked him how he can make a living like that, he replied:
"Volume!"
They say the first thing to go is your penis. Well, it's either that or your brain. I forget which...
I worked on a similar project about three years ago actually looking through pretty much the very same websites. :) The project was finally shut down when one of these websites used an old trespassing law to stop us i.e. saying that we were using the site in a way that broke the user agreement and therefore we were trespassing.
.. If the guy who hired you read ./, then you dont have to ;-)
I'd say yes. Second only to apostrophe's.
Everything that was once directly lived has receded into a representation. -debord
Read the case...EF Cultural Travel BV v. Explorica hinges on the fact that the defendant company hired an ex-programer from the plaintif company. The programmer had special knowledge of codes used in the pricing (which he had signed a confidentiality agreement not to disclose). When he made the scrapper program he violated the confidentiality agreement.
:) Depending on how the contract is written you could be jointly liable.
It was the violation of the confidentiality agreement that the court held was illegal.
As for whether you should tell your employer, it depends on your employment agreement!
While this is a 1st Circuit case, it has been followed by the 5th Circuit (Ingenix, Inc. v. Lagalante) and cited in cases in the 7th and 9th Circuit.
Hope this helps.
--me
"Why is using a scraper robot so different from, say, walking into Best Buy with a handheld and recording product pricing manually?"
I guess it's as much about the percieved ability to stop a human and the lag in that information being used.
A number of years ago I was using my Psion to check the E-numbers on the food labels, I'm both vegetarian and have an intolerance to certain food additives. The store's duty manager came up to me to ask me to stop or I'd be ejected from the store for unacceptable behaviour, it took quite a while to persuade him that I was not an undercover operative from a competitor or investigative reporter - I was just doing my shopping carefully!
People don't like the idea that they can't stop the robot from checking their prices and providing that information back in seconds so that their competition can undercut them almost as soon as they post their prices.
With modern technology, retailers like to believe they could stop a person from gleaning too much information for a competitor in person but it's all just perception, the fact is that we're well past all that.
Go permanent? In your dreams and my worst nightmares.
I remember reading something about a year ago about someone who got arrested at (I think) Circuit City for writing down the prices; on a laptop he brought with him, as I recall. A quick Google search didn't turn up anything; maybe someone will remember more information than I can recall.
Also, I've seen some online sites seem to have "this information is for your personal use only" type 'licenses' for their prices; the notices I've seen are usually in tiny print in some legal page link no one ever clicks; and the legality of such a clause is something I'm not sure about either.
and I think they have taken people to court over it.
of a man named Ronald Kahlow and his troubles with Best Buy back in 1997.
Karma: Can only be portioned out by the Cosmos.
Orin S. Kerr is a lawyer.
In fact, he's a law professor, too (after being a member of the DOJ's cybercrime division).
He taught my computer crime law class.
--Dan
To me, this is a loss for open systems. Nothing in any of the relevant RFCs mentions a method of specifying or obeying provisions against automatic downloading for a particular purpose. I'm sure one of them mentions robots.txt, but that would prohibit all automatic downloads, and I'm sure most e-commerce sites don't want to chase off Froogle and its ilk.
I'm against any judicial action that changes by fiat the conventions under which the internet operates, especially in the jursidiction in which I happen to reside. It is fundamental to the web that one only serves information one wants public. Someday there will be protocols that deal with issues of trust, priveleges, and negotiation programatically. They will be used in certain circumstances, but HTTP or its descendants will be far more common. When the courts trade the future of open systems for a temporary convenience to businesses that are careless with their proprietary information, we have lost.
later,
Jess
I am programmed for etiquette, not destruction!
Competition is the whole idea in capitalism. In order to make capitalism work, your competitor must compete with your product/prices, in a way that is transparant to the consumer.
:)
:)
Trying to avoid competition should be punishable, since it is pure terrorism
In my country they are even trying to prohibit certain non-transparant pricing methods (banks telephone companies), since they disable the competition by obscuring the price to the consumer.
go ahead ! those guys are communists !
I wrote a simple data mining/aggregation app last year that scans Hot Deals forums (ie. Fatwallet.com, Anandtech.com, Gotapex.com, etc.) and emails the new posts out each hour to my subscribers. The service has always been free. I maintain the original links and titles, so I don't feel as though I am stealing anyone's links. A few months ago, I decided to put some ads at the top of my newsletter to make a few extra dollars to pay for my time in writing the app. I immediately received a response from one of the main forums I aggregate, stating that I needed to stop running my newsletter for profit, or remove their copyrighted and trademarked name from my list of links, as they didn't allow others to benefit from their name.
Well, not knowing what I might be getting myself into, I removed my advertising.
So my question is this: Are forum threads copyrightable, even though they're written by users of a forum and don't necessarily represent fact? Can I advertise without being sued for using someone else's links in my newsletter, even though I am not representing them as my own?
-- dan
http://groups.yahoo.com/group/dansdeals/
You would then need an additional comma after "Data".
What he's doing isn't really data mining. Data mining is the process of discovering patterns in data which are not known ahead of time, such as the infamous "beer and diapers" correlation.
That said, I don't understand why the author is worried. I can't see how looking at publicly posted prices could be considered illegal.
People who studies economics faces some irreal hipothesis in text books. The first topic most students have to deal with when taking the microeconomics course is when you have a big group of firms selling the very same product. If the buyer has perfect information about prices hi will choose the lowest price. The buyer's choice will influence the behaviors of all other firms that will tend to get their prices down to beat the one choosed by the buyer. We will have a dynamics that will make the price go down until the item will cost to the user the same it costs to be produced. In the real world it is very unrealistc to believe someone could have information about all the sellers prices. But with Data Minig we can have MORE information about sellers than in the real world, and we can access this infrmation with a smaller cost. We should then be nearest to perfect competition books theorize than in the real world. There is although some problems to solve before jumping to this conclusion: There is not that big number of firms competing, delivery fees, warranty and time of arrival of the product can be very different from seller to seller. Could a "perfect bot" could handle all this information. If the answer is positive firms can folow two paths : cartelization or dumping. The first one happens when firms pacts prices together and force buyers to py more, because competition is "freeze". The second one hapens when the firm artificially gets down the price to a lower level than the costs to force the competition to bankrupcy. Both behaviors are dangerous to consumers and are forbiden in most countries. IMHO a site's EULA can't go agains market law. I presume that, at least inside the same democratic country, it is legal to data mine in that way. And I can't see why a competitor can't use it as tool to build it's price strategy. It's the invisible hand Adam Smith's intuished about. The WWW is evolving, maybe in a way some people can dislike, and is using the same rules we use in the real world to make money. And I'm sure competitor will soon find solutions to prevent data mining from their sites, at least information they don't want to share. IT solutions. That do not require lawyers but intelligence and insight.
WTF? Mods on crack. This post was fucking hilarious. I had to read it twice before I got it, but once I did it cracked me up. You guys need to get the stick out of your butts.
You see signs everywhere you go:
-Shirt & shoes required.
-No loitering.
-No soliciting.
-Check all bags at counter.
-No more than two students allowed in store at one time.
-Parking lot, bathroom, etc. for customer use only.
Just because a building (or a web site) is in a public place doesn't mean that everyone is free to do whatever they want. Business owners are free to create house rules that everyone needs to follow.
Similarly, web sites can legally restrict what you are allowed to do when you visit them without having to build security measures to force compliance. If web retailers don't mind robots harvesting their inventory and prices, great. If, however, they want to place restrictions on who can access their site and how, that's entirely their prerogative.
Think about it. Leaving the door to your home unlocked would make it easier for people to steal your stuff, but it still wouldn't make it legal for them to do so unless you put up a sign saying something like, "Free for the taking."
Web scrapers are legal to develop and they're legal to use on sites with acceptable use policies that allow them. However, your customers should be prepared for the possibility that some or all of their competitors could make them stop using it at some point. And, in the interests of maintaining your own professional ethics, you should probably call their attention to the issues surrounding the job they're asking you to do.
Discussing legal issues is not just a business for lawyers. Non-lawyers can give each other useful pointers. And non-lawyers actually have an obligation to determine whether their legislators are doing a good job with the laws they enact and judges they appoint, and a healthy discussion is a good start.
Sorry, I call bullshit.
Over the past 6 or 7 years I've used a palm (handspring visor, to be more precise) hundreds of times, in every Best Buy (and Circuit City, MicroCenter, etc.) in the Boston area to record prices. I've never had anyone even look at me funny.
Maybe it's related to how guilty (or difficult to remove) you look, but I really doubt that happened to anyone ever (note the once-removed story -- it's always a 'friend of mine' in these types of stories.)
In any case, what kind of wuss would leave without making a fuss and forcing them to call the police over something so ridiculous? I could be using my palm to look up my friend's number to call and ask which video card to buy. Fsck them if they don't like it.
Or, maybe this particular Best Buy was located in an airplane and the event happened during takeoff or landing. Or your friend lied to you. One or the other.
everything in moderation
While it's not illegal, collating price information by hand will get you booted from a store. It being private property, they don't have to have a reason to ask you to leave, and if you refuse it's trespassing (in the U.S. anyway). I remember one of the TV newsmagazines detailed a case where a guy was entering price comparison info into a PDA at one of the big chains (Circuit City?) and they kicked him out. He sued and lost. Since they can't trespass you on the Net, I guess Big Corporate will resort to whatever cockamamie methods are at their disposal.
"Instead, the Court concluded that the mere fact that Register.com had decided to sue Verio meant that Verio's use of the search robot was without authorization: 'because Register.com objects to Verio's use of search robots,' the Court held, 'they represent an unauthorized access to the [Register.com] WHOIS database.'"
[Ironically, the pdf for the paper apparently uses some feature of Acrobat which disallows copying text from it. I guess they don't want robots scraping text out of it or something. First time in quite a while I've had to type a quote from the net by hand!]
There are 0x40000000 types of people: those who understand 32-bit IEEE 754 floating point, and those who don't.
In writing a bot, you are using an automated system to mine a company's website while the benefit to them is questionable - since, of course, you're just mining prices *for a competitor*, there's no chance that they'll see an increase in sales - and if the data is being used by the competitor effectively, that company will lose sales.
In addition, it is important to remember that while the "freeness" of data itself can be debated, accessing data is *not* free. To provide the pricing data over the web - intended as a service to their customers, no less - costs the company money in terms of bandwidth, hardware, software, and human resources. They are clearly spending money offering this service for the purpose of profit. Since the company doesn't know anything about the bot, the consequences on their response times, system performance, and stability can not be predicted. When then system is negatively affected by the competitor, that is tantamount to DoS which has definite legal repercussions.
Obviously, a technical solution would be to block the bot, but when direct cost of even accessing the data is hurting their bottom line, asking for compensation appears to be justified in the light of the future indirect cost. Having to spend resources trying to block the bot would also increase cost.
Many years ago I was working for a book store. We didn't have pictures for all the books, so I went to amazon.com to download the cover pictures of the books, and did so succesfully for about 4 hours. After that I noticed our IP address being blocked from accessing amazon.com.
EXACTLY what I have been saying for a time...
there should be this exact type of service. actually what i was thinking of was a service that you would "bid" on an object - say only for items valued over 500 bux.
Then that service would scour the NET, ebay and every other normal avenue looking for the lowest possible price for the purchase you want to make. then buy it... and you would pay a service fee % based on the value of the item and the obscurity etc...
but all in all - attempting to make it as automated as possible so that you still really get a good deal on the item.
hello USPTO.
If a bot activates a click-through agreement, does anyone hear it fall?
Can anyone please tell me how to set up my machine on mining prices for any selected item on the web ?
Thank you !
Muchas Gracias, Señor Edward Snowden !
Quoth the poster:
"Why is using a scraper robot so different from, say, walking into Best Buy with a handheld and recording product pricing manually?"
Ask yourself why you're being paid to write a scraper instead of being paid to walk into stores and record pricing manually.
The answer is this: because using a scraper robot is faster, cheaper and easier to repeat systematically. Using a scraper makes it possible to check competitor pricing with a scope and speed that is so impracticable in the brick and mortar world that it hasn't been necessary to prohibit it in the brick and mortar world.
"A modicum of creativity" is part of the standard test of copyrightability. Telephone books are not copyrightable on this basis (see Feist Publications v. Rural Telephone Service, 499 US 340 (1991)).
But airline prices probably are.
Calculating the price of an airline ticket is a difficult process, part of the reason travel agents exist and make use of sophisticated computer programs to calculate prices. Compare 5 different web sites' prices for a complex ticket and you'll get 5 different numbers, often differing by thousands of dollars -- for exactly the same flights. So it would seem that creativity is involved. Certainly one could argue that the selection of what tickets to display on a web site like Orbitz is a creative process.
On this basis, I'd say airline prices are copyrightable. And all the major travel web sites specifically prohibit robots in their terms-of-use legalese.
So I think that screen-scraping travel web sites is a very iffy proposition, legally. Courts seem to agree: American Airlines was granted an injunction prohibiting FareChase from scraping them and selling their prices to competitors.
It's not even an old Joke - I believe it was a quotable thing said by, probably, Micron CEO a while back.
This, if nothing else, shows how thin the DRAM margin is, and why RAMBUS trying to skim 2% *gross* will never fly.
anyway, the point is that by pushing out a large volume, they keep their lines running, which means that they can actually make SOMETHING off those lines and get some kind of profit because the cost of running them things through arn't so damn high. This can be demonstrated by how 512M SODIMMs are 135 at crucial while the 1G SODIMMs are 1000... How much you wanna bet the 1G ones are the real money makers?
That, and you are grabbing market share.
to be honest I am not 100% sure how this works out, but by keeping the lines oiled and running, if it's nothing else, it's preventing the company from losing the *real* big bux.
My life in the land of the rising sun.
Can anyone tell me what is a "scraping robot"? Where to find them? How to use them?
Thank you !
Think, write, think, edit, think...then post.
Advertising costs what it costs, and the Net behaves the way it behaves. I'll laugh if I ever get sued by a grocery store who sent me an advertising flier in the mail, because it cost them money and I didn't buy anything. The Net is a public forum. That's why it's attractive to commercial entities.
Think, write, think, edit, think...then post.
Please read the parent to understand this post, I am in fact talking to someone from the future, isn't that cool
Let's go over your post so I can show you just how wrong you are.
You've used you visor in Best Buy...etc.. hundreds of times for 6 or 7 years, damn dude, did you set you clock wrong, or are you writing your post from the future, or were you in the future when you aquired your visor?
Being that Handspring was formed in 1998 and the products were not rolling until 1999, oh so you have infact misstated a fact, future traveler.
Does it work like the classic HG Wells Time Machine, or is it more like Back to the Future, well of course the original one, everyone knows time machines can't fly!
I take it the hundreds of times you've used your palm (ahem handspring visor, to be more precise)is correct. Wow! Security never harassed you, I mean you must have picked some exceptionally busy stores to case, oh um, ya see when I was in retail that's what we called it when someone would wander around the store taking notes as they go, SEVERAL TIMES A WEEK. I guess Boston is the place for shoplifters and retail shrink.
NEXT!
Well to be honest with you, calling me or my friends liars would most definitely result in an altercation(hypothetically speaking), this is one of the nice things about slashdot, we can have an argument and you can hide.
Hmm, the next one that would result in an altercation (hypothetically, of course) is being a smartass, but you're not here and I haven't a clue where you are, eh. But in any case when you are asked to leave a private establishment, you do. Failure to leave when asked constitutes trespassing and you will be arrested. Ever been in jail, I highly doubt it as if you had ever seen the inside of a squad car you wouldn't make stupid comments about doing something that would constitute arrest.
Raising hell makes you look like an idiot, I did it when I was younger, but it didn't produce a solution, so I stopped, but I see you're all for the flail your head against the wall approach to life. At least it gives your neck muscles something else to do when you're away from your day job.
KARMA to burn, karma to burn, yes it's worth it!
Slashdot is not a replacement for a lawyer.
Slashdot is useful to get a sense of what the legal landscape is like. Some comments are to the effect: "I am not a lawyer, but my lawyer told me this." Or "I am not a lawyer, but here is the statute [cornell.edu], and here is how a court has interpreted it [eff.org]." When you do see an attorney after reading the comments, you don't have to wait for the attorney to explain the basics. This saves time, and time is money, especially at the typical copyright and trade secret specialist's rate.
That said, you're right about one thing: anything you read on Slashdot is not legal advice.
Will I retire or break 10K?
Wal-mart bought out one of the UK's biggest supermarket chains - Associated Dairies (ASDA)
I read at the time of the merger that this made Wal-mart the worlds largest company by square footage of shop floor.
m
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
P2P networks mine my data of mine all the time.
Pipe lynx -source http://somewebsite.co.uk through appropriate grep, sed and awk filters, or a Perl script, or Python if you think Perl isn't trendy enough. Then edit your crontab so all this happens by itself. Easy enough :-)
Je fume. Tu fumes. Nous fûmes!
Why is using a scraper robot so different from, say, walking into Best Buy with a handheld and recording product pricing manually?
well for one thing, you can't be sold service plans, monster cables (you need them), msn internet access... and hell why not get that surround sound system and plasma with those prices you are obtaining?
all joking aside that is the difference -- by having a bot that scours the internet, you are not being sold on that service -- which is why the page is there. it can conflict with usage agreements
/me shrugs. i think it's lame and i would do the same thing
WTPOUAWYHTTOTWPA
What's the point of using acronyms when you have to type out the whole phrase anyways?
...for those at work who open it, think "can't be bothered to try that one", then go off and get a coffee.
:-(
Guess who else that just happenned to
The thing I find disgusting about companies that try to prevent price comparisons is that this is in violation of the very principles of the market economy that have made them so wealthy in the first place. Market economy theory dictates there should be competition, and with that competition comes lower prices because people choose the lower-priced goods over the higher priced ones. Lower prices, however, cannot be found if we do not allow people to compare prices. In sum, companies simply can't have their cake and eat it, too. Allowing people to compare prices is not just a priviledge companies give, but it is a right these people have.
Moreover, when you put your store on the Internet, you take advantage of the possiblity of millions of people looking at your wares, and all you have to do is put a webpage and database out there. You MUST be willing to accept the fact that with this conveinence, others should be able find ways to "conveniently" look at your data. I will not accept someone dictating to me how I can look at their data - you're the one who made it public, right? Again, these companies can't have their cake and eat it, too...
rant rant rant
-tk
Since you mention that you may be building a screen scraper that gathers airline fares, you may be interested to know that American Airlines has already sued (and won a preliminary injunction against) a software company that built a tool that does much the same thing. The case is American Airlines v. FareChase, and was discussed on LawMeme:
The injunction order is posted on EFF's site, and the briefs are posted on Bag & Baggage.Getting product pricing right is very important in insurance because unlike most businesses you incur most of your costs (in the form of claims) after the customer has paid you their money, not before, which is more typically the case.
These prices were not published, partly because the pricing ended up so complex to describe and so personalised to the individual customer that it would have been difficult to come up with a way of presenting it simply, and partly because the pricing structure embedded one of the company's competitive advantages, ie its underwriting function.
What we found was that at least one of our competitors organised a campaign to "reverse engineer" our company's rating book by paying a bunch of people to ring us up and get quotes. Ie, in the case of car insurance, they'd ring up and say they were a 25 year old male living in suburb X driving model Y. Then they'd ring up saying they're a 25 year old male living in suburb X driving model Z . Etc.
This way they'd be able to consistently match or beat our pricing merely by investing in 500 phone calls, rather than by investing in millions of dollars in brains and gear.
So this would be at least one instance where data mining for product pricing would be, if not necessarily illegal, then certainly mercenary.
(Note that merely matching another insurance company's prices won't necessarily always help you because there are other factors that affect your cost:income ratio, but certainly this tactic has given me food for thought over the years.)
a world in progress...
I would doubt very much that price harvesting is illegal. For a comparison I would refer to the case a few months back where a large company attempted to use the DMCA to protect its price information. From what I recall it was deemed that information published on the internet was in the public domain and therefore freely available. It is why the best buy bargain sites are still around. TQ1.
But before that I'd say it's uncanny how so many people think it's just fine to throw around terms like "redneck" when they'd never think of saying other things - things like "OH, only niggers watch BET."
I realize it's expecting a lot, but I really thought the people who frequent /. were more evolved than that. For those of you who have apparently slept through the last decade, Mississippi ("redneckville") begat one of the largest telecom companies (and wall street scandals) in history.
Yeah, there's still problems here - but none I've not seen in just as grand a fashion in NY or Michigan or California (What do you call a redneck from california? "Officer.") Meanwhile, Worldcom owned (at least as far as wall street was concerned) "the internet" - and Wal-Mart owns retailing on a global scale. Both these companies came from the deep south. Pretty damn good for "a buncha rednecks."
-LJ
[crawling the web since '97 to gather pricing information - check out geizhals.net]
"I love my job, but I hate talking to people like you" (Freddie Mercury)
The same question occured to me...
:)
I was writing a bot to collect all ebay auctions of a given category [ie, DVDs] and build a database to get the current market price of any DVD and archive the auction texts for more than the ususal 90 days [along with the option to detect false bidders using multi-accounts].
Of course, there would be some filtering for the top and lowest prices [to remove the $1000-DVD signed by Agend Smith from messing up the Matrix-prices].
Then I saw the Ebay terms of service, which seem to forbid this type of spidering. Ebay would not answer my mails, so in the end I stopped the project *sigh*. Would have been nice to know what kind of price a DVD/CD/Game will fetch on Ebay.
Remember: Never build your business on Ebay or Google without obtaining their permission first
"Is Data Mining for Product Pricing, Illegal?"
:-)
This sentence no verb, has.
well. kinda.
If you fear that it might be the wrong thing to do, it probably is.
You were 80% angel, 10% demon. The rest was hard to explain. - Over The Rhine
"Math in a song is good."-Linford
So they don't have ANY kind of copyright protection if they are gathered from public channels of communication or from the shelfs. Of course, if you get the prices from a not yet released price cathalogue... then, you aren't in violation of copyright of those prices, but you are violating trade secrets laws and the like... In any case, just SUE!
^It has a tendancy to make a great many things legal or illegal that would not be otherwise.
Any sufficiently advanced influence is indistinguishable from control.
The question is, where do you draw the line? The data already is being accessed in an "automated" fashion (by the browser).
Plus, who will speak up for the rights of robots? Who are we to deny access to all this goody-goody data to robots? What if a robot really wanted to travel ?
Also, I don't see how the OP's employer hopes to beat Sabre's price if they discover that Sabre is cheaper. The airline industry requires that money collected for a ticket be divied up a certain way: x goes to taxes, y goes to commission, and the the rest is split amongst the actual fare (which is set by the airline). the OP's employer would have to find the fares such that they could actually beat the price. If not, the airline is within their right to ask for the difference.
Set your homepage to pricewatch.com, then train a monkey to type in your chosen product names and write down the prices. Easy!
$8.95/mo web hosting
not html. Did you just guess a little to try and earn some karma?
Do your homework next time, wannabe.
This is clearly illegal if you are some how, some way, making money from the price gathering or providing it to the public. It is clearly protected in law. In the exact case (airline pricing), they are backed up by previous court rulings. It is a no-brainer. Don't compare this to gathering prices on apples at the local grocery store chains.
Don't go there. I worked on similar projects in the travel industry and what you are doing will get a cease-and-desist as soon as they find out about it.
If you see others doing it (priceline, etc.), they are paying for it in some manner. All the airlines have legal means for you to pull pricing info from them (it may be from a 3rd party service). That's where you want to start.
"If you want to improve, be content to be thought foolish and stupid." - Epictetus
Most web sites have end user agreements, especially the bigger ones. For example, here's a snipet from Expedia's:
This Web site is for your personal and noncommercial use. You may not modify, copy, distribute, transmit, display, perform, reproduce, publish, license, create derivative works from, transfer, or sell any information, software, products, or services obtained from this Web site.
"Personal, non-commercial use..." so I guess if you're looking for a business flight, you better not use it. Point being that even public info, or info in the public domain can be protected when it is provided in a special format. While you may be able to publish a companies prices, sucking that info from their web site will most likely violate their terms of use. It's like the old shareware and freeware floppies - the programs can't be sold, and the publisher can't stop you from distributing them - but you can't make duplicates of the disk as a whole, because the compilation and format is what's protected. Look at it another way. Say you manually type 400 pages of parts pricing information into a database or web page for a local appliance dealer. You don't charge them much, because you can sell copies of the database or web pages to 100's of dealers across the country. But when you start making sales calls, all your leads tell you "No thanks, I just downloaded it of their web site". Companies should be allowed to publish information to be used by their customers only without fear of their competition using it against them. Otherwise it's the consumers who suffer from a lack of resources. It's already happening- how many times have you been to a web site that makes you call them to find out who the dealer is in your area? It's because they don't want their competition going after their dealers. Also remember this - forget about the legality for a moment - anything you write will most likely be easily broken by small changes in the format and/or code of the source web sites. If they don't want to (or can't) bother with going after you legally, they could make your life a never-ending nightmare of coding updates. And if your site is successful, you'll be a big red flag in their web logs.
666-607: 6th floor apartment of the beast
Even though the other replier says Sam's book says he never did it, etc., the TV documentary I saw on Sam showed that he tried to grab competitor's prices but that once anyone in the store saw him doing it he was immediately (politely) asked to leave the store.
Apparently, to those TV producers at least, price scraping is protected under law. That's why no one can bring a camera into a store and start snapping pictures. Those surveillance cameras aren't there only to discourage shoplifting.
Wish I could find the reference, though. Anyone?
8-PP
Try walking into Best Buy with a pad of paper and a pen.. Walk through the isles..
I've done this before - and have been told to remove those items from the store..
Screen scraping is data gathering. Data mining is looking for trends or patterns in data you already have. Getting the nuggets out of your data to continue with the mining analogy. From this presentation titled "An Introduction to Data Mining Technology" data mining is defined as "The automated extraction of hidden predictive information from (large) databases".
The bottom line is this: when you put this work experience down on your resume don't say you were data mining. Companies looking for that experience will ask you hard questions you don't know the answers to and you will be embarrassed.
There may be precedent for this. eBay was able to convince a judge to bar spidering of their site.
There is another legal concept called "Unfair Competition" which links copyright and facts.
Normally, facts cannot be copyrighted. However, this law seems to kick in when one company compiles and publishes time-sesitive information that it has taken from a direct competitor in a way which "free-rides" on the efforts of the competitor. It is usually applied to news organizations, when one newspaper sends a reporter to Iraq and a second newspaper (perhaps an evening edition) uses the "facts" in the first newspaper's article to publish the very same news.
I could see the instantaneous publishing of all competitors' prices as a violation of this legal theory.
"Why is using a scraper robot so different from, say, walking into Best Buy with a handheld and recording product pricing manually?"
Well duh-to-the-nth! It might be a *teensy* bit faster using a robot as compared to, uh, manual input. Hence the word, uh, 'robot'.
<heavy sarcasm>
In other news: "Consumer surprise as Consultant proves buying on the web *can* be faster than going out of your house, taking money from the bank, catching the bus, entering a store, choosing a product, walking to the counter and paying for it, waiting for the bus again..."
ok, ok, you get the point.
</heavy sarcasm>
Nalfy.
-- Despair is an operating system that ANY human being can run, sort of a psychological JAVA --
Did you know "The Receiver" actually has a son?
As long as the information is available to the public in some fashion, there is nothing wrong with scraping/data mining as described. Both ethically and legally, this methodology would be completely acceptable to CI pros. Check out www.scip.org
Obviously, you're right. The four biggies (patent, trademark, copyright, and trade secret) all eat it. So why does shit like this happen? Because the judges in these cases half of the time have never used email. Their aides do it for them. Computers are nebulous devices that other people use. When you get right down to it, they don't have the experience to see the obvious analogies that you made and we all understand.
In 20 years when we get rid of these dipshits, maybe crap like this won't happen. Might be too late by then, but oh well.
-Looking for a job as a materials chemist or multivariat
Two things that come to mind (depends on the site and what they currently support technically browser- and serverside):
-- use front-end technologies that prohibit or at least inhibit the workings of server-side scraperbots. examples: flash, javascript.
-- use sessions to control how often a given client can access prices, e.g. 'a 10 and you're out' rule: most 'ordinary' users have no need to view a certain page of prices more than X times in a browser session. here, cookies provide even more protection since some scrapers won't be set up to handle them.
both systems may have their drawbacks (no flash allowed), weaknesses (against sessions i can simply make multiple logins), but i've incorporated similar systems on sites for clients with prices that need a sensible level of protection (i.e. one shouldn't be able to grab the whole damn price list with a one-page GET e.g.). guarding against SQL injection is also something which is often forgotten.
Cheers,
Nalfy
-- Despair is an operating system that ANY human being can run, sort of a psychological JAVA --
Follow your own advice. If you put a comment after the subject if your sentence it'd read:
You put a comma after the name of the person whom you're addressing (listener? addressee? target?) if it's at the beginning of the sentence or before if it's at the end.
The subject gets no comma love. If he's also the one being addressed, he can get this:
But "you" is the subject.
Hypothetical scenario:
a client hires you to write a scraper to grab prices and other visible data from a competitor's website. BUT: in a password-protected section (for that competitor's registered customers). the client you are working for happens to share many of the same customers, and thus obtaining access is as easy as receiving login information from a friend of a friend of a friend...
as the consultant, you have no actual knowledge about where the login comes from, other than some reasonably educated guesses that they are probably from a chain of employees that leads to one of these customers, and you write a scraper, and then execute it to build a db to mine.
if you get caught, is anything about this illegal, who would be at fault in this scenario?
let's assume that perhaps the terms of the site say something about the data being private.
EF Cultural Travel does not stand for "you cant scrape". It depended on one key thing - Explorica was started by ex-EFCT employees that were bound by a non-compete agreement. The circuit court rejected the argument that there was any generally implied term that forbade a company from scraping. Note that the circuit court did say that a TOS that restricted scraping may be enforcable. In short, check the TOS' of the sites you are proposing to scrape. Incidently, in a seperate appeal, the injunction against the programmers were upheld based upon their knowledge of a previous injunction against the employees (not based upon a general "thou shall not scrape rule).
Set the prices as GIFs, then:
1) make it take too long to download webpages
2) Make it impossible for physically disabled individuals to use your webpage, thus possibly losing business.
3) Make it take forever to actually come up with the GIFs and link them to the appropriate flight/trip/fare. Imagine the prices.
4) make it only marginally harder for the scrapers who then find out that either you have a pattern to which GIFs are used for which digits in your numbers for prices... OR
5) Show everyone how much money you are willing to waste by encoding every single fare for each flight into its own GIF with a unique identifier and thus blowing your investment.
"All great wisdom is contained in .signature files"
The answer to that is, "when influencial businesses can dictate the law to their own ends". I am sure that a great many (non-internet) businesses would love to ban people from walking around their store with a notepad jotting down prices.
The promise of the Internet was that it would make everyone equal: the vastly increased flexibility in how online salespeople can rip you off was supposed to be counterbalanced by the consumer's (or groups of consumers) ability to counteract this by actively extending the functionality of the internet by policing these actions, an important component of which is keeping track of prices/goods/services offered by various merchants.
This 'ideal' absolutely must be enshined in law (probably international law is the correct forum), otherwise it will be whittled away in some juristictions where business has a controlling influence on the legal process (read: USA), and then pressure on the rest of the world to conform.
It is a failure of humanity that we always choose the most optimistic outcome for an upcoming technology. The 'promise' of computers and automation was the paperless office, no more menial tasks resulting in increased leisure time for everyone. The reality is more wasted paper than ever before (printing another copy of the whole document, just to eradicate a typo, is but a mouse click away), and a smaller fraction of the population working much longer hours, while the rest suffer unemployment.
In hindsight, both of these effects can be seen to be at least as likely as the 'promise'. A proper analysis back in the days when it might have been possible to make a difference, might well have shown that it was in fact the far more likely outcome.
The same mistake has been made with the Internet. While the 'promise' of equality and empowering individuals is a possible outcome, the underlying technology also allows unprecedented restrictions on freedom. Given the track record, which do you think is more likely?
What will be required to reverse the course, if the 'promise' doesn't come to fruition? Are there any comparable examples from history?
By putting it up on the web without a password, they are PUBLISHING the information. This means all things you can do to a store cirular's data, you can do to web site published information
Want to see every step I took to start my company? http://www.rowdylabs.com/blogs/pitchtothegods
There was an incident of a man being arrested in a Best Buy (for trespassing, I believe) after being asked to stop writing down prices for large screen TVs. The irony of it, and I explained this in a post a long time ago, is that I used to work at Best Buy, and on weekends, we were asked to bring a non-blue shirt with us to work so that we could go incognito to the local electronics store (H.H. Greggs, before we had a Circuit City locally) and use a micro-cassette recorder to 'steal' all their prices so we could mark down the items in store to compete. Now they're telling people that they can't do what they themselves do (or did). Reminds me of a local story about a guy who was wearing one of those fancy NASCAR leather jackets with either Home Depot or Menards as the sponsor of the team going into the store that wasn't the sponsor and who was celebrating their grand opening (i.e., wore a Menards jacket into a Home Depot, or maybe it was the other way around) and was asked by management to leave because they thought he was a spy from the other company. Made the local paper when it happened...
A computer once beat me at chess, but it was no match for me at kick boxing -- Emo Phillips
But I used to be the network admin for an e-tailer, and we discouraged people from webcrawling our site. Not because our pricing was proprietary, or anything, but for the simple fact that I had to pay for all that damned bandwidth. Sure, one crawler doesn't add up to much, but we had as many as 20 crawlers at any one time, some of them obviously on T1+ links, and using every ounce of speed they could.
We did provide pre-formatted price lists for those people who asked, prepared daily, and available via FTP. That way, instead of having to wade through our HTML code and try and locate the pricing, they could get an SQL data file, or maybe a CSV file, or any one of a dozen formats...
-merlyn
he he...
I think much of this discussion may have missed this point:
According (or most likely legally inferred) to section 1 of the Sherman Antitrust Act, a company isn't permitted to get prices directly from its competitor.
--
Exchanging Information with Competitors
It is important to avoid the exchange of sensitive business information with competitors without guidance from legal counsel. The exchange of price lists or prices charged to customers may violate Section 1 of the Sherman Act even though there is no agreement to fix prices, due to the natural tendency that such conduct will produce uniform or stabilized prices in the industry. Of course, you must obtain this information from some source in order to compete. But you should be able to show that you did not obtain it directly from your competitor and that you did not make your lists available to competitors.
http://www.hhrmlaw.com/antitrust.htm
--
I don't know how extensive the links inbetween the company and the competitor need to be, but if the company is using the program (that you made) I would think that is more direct than indirect. If you do it and give them the prices, who knows.
Froogle detects a certain camera as USD 499, although it really costs 30 USD less, which you can only see if you put that camera in the shopping cart. Here's the explanation - what's wrong with the price being an advertisement? Maybe someone can shed some light on this for me.
21:42 15/5/2546
... ... ... i am buying a ticket from an airline, which sells it to another comanpy first. ... scratch scratch ... ...
TPOIC: DATA-MINING
i don't think it illegal. free info forever. but instead, now, beating their price, they should maybe
extend their services. i assume, it's something with travel, so who cares if it's cheaper, if
the other guy has the way cooler tours?
i do data-mining all the time.
how about this: i check what tour they offer and do it on my own
maeh: i'm not allowed to LOOK! what?
so why are they posting their prices on the web? hire a secretary. call 1-800-travelling, or something.
-
it might be, that this fight between the two companies is a "personal" one. maybe they used
to be one comapny until the TWO boss got in a fight and they seperated and started their own
business. maybe it's personal
-
i would like to know how the found out they were data-mining?
-
so in the future all the computers are doing the work,
but who is acctually earning money to be able to spend it? human nil, computer infinity?
-
i don't get it
really why is it cheaper? if i go to the airline direct it costs more
oh and why aren't the airlines buying ORBITZ and the other online ticket reservation services?
why haven't hey got their own?
"something-bulk" garenties a full airplane
maybe in future we just need ONE powerplant, ONE airplane, ONE car, ONE house, ONE shower for whole humankind.
I remember that story! Ironically enough, it was a Best Buy, not a circuit city. And they had the guy ARRESTED for trespassing because he refused to leave when the manager told him he had to leave.
it's not a contract if you don't agree. you read the license, you say 'no'. you then proceed to browse the site. it doesn't even have the power of a click through.
If I'm not allowed to use a computer program to automatically make web requests, then I don't see why they should be allowed to use a computer program to automatically respond to them. If they want the convenience of having a computer sit there answering anybody's query of 'how much does this cost?', then they should not be surprised when somebody writes a computer program to simplify the process of asking a number of different retailers what their prices are.
So, if they don't have a person sitting at a terminal in the server room personally typing in all the HTML for each web page, I'm not going to type in URLs personally. Those terms and conditions are, of course, available to anybody who cares to look at my web site, so I don't see any reason why websites won't comply with them.
More seriously, web site terms and conditions are always written in appalling pseudo-terminology that talks about allowing people to 'access' the web site, but prohibiting them from 'downloading' content from it, or 'storing' it. Quite how one accesses content on a website without downloading and subsequently storing it (if only in my local computer RAM) is beyond me.
There's an implicit assumption here that using a web browser to generate and send your HTTP requests is okay, but any other program is not; quite how the border between browsers and non-browser user agents is drawn is completely ignored. The terminology used in the legal documentation should at least, surely, bear some relation to the terminology used in the HTTP RFCs. For example, I'd respect a web site whose T's and C's (which were perhaps available from a URL identified in a header in every HTTP response issued) said something like this:
'You may submit HTTP GET and POST requests to port 80 of the server provided they are correctly formed according to current IETF RFCs; HTTP responses transmitted by this web server must be interpreted in strict accordance with the prevailing IETF RFCs. The content of any response issued by the server is copyright this website.'
Frankly, any attempt to require any more than that on the part of your users is a futile effort on the web.
Ah well. I can dream.
If not clicking the click through actually prevented you from accessing the site, then the googlebot wouldn't be able to get in.
Amazon also marks certain items so that you cannot see the price unless you add it to the shopping basket then view your shopping basket. This makes it more difficult to automate mining for prices for some of the products. If companies see this type of price mining as an issue, I would not be surprised to see companies start putting their prices in bitmaps -- similar to some sites that ask you to retype a value displayed in a bitmap in order for the submit to work.
If the prices are made avaiable by one means then that same means should be allowed to view them.
So if the prices are available to the average customer by allowing them to walk into the store, browse the aisles, then leave, then I should be allowed to do the same.
If the prices are available electronically then I should be able to read them electronically however I choose.
If I have to get the prices by speaking to a salesperson then that is the means that I need to use.
This is fair because the cost (time, money, inconvenience) is equitable across the board. If you want to go cheap on me by not putting a salesperson with every piece of merchandise then I'm allowed to go cheap on you and just look at the sign.
42 - So long and thanks for all the fish.
I think this has more to do with Price Fixing laws than anything. Especially of the gathering of competitive information is used to raise prices in cases, rather than lower.
Personally I think it's a dangerous business practice, sooner or later your competitors will begin to do the same and your whole industry will suffer severe margin errosion.
Hey, that's what downsizing is for right?
I thought a court case had already been won which said basically "If you put it on the net, it isn't private/secret information anymore." Wasn't it about a company's report that was "leaked" when they put it on their web server but didn't intend people to read it yet?
I've received many rebates over the years. And yes, I paid sales tax, but that's not the purchase price. It's just the local gangsters taking their rake.
Mail? Put "slashdot" in the subject to pass the spam filters.
What if you change the client id of your data mining spider to something like "By granting me access you hereby waive all rights to privacy. Any data sent to me is mine to do with as I wish."
Then, if their web server still serves up the page to you it would seem to me that you would win a court case if it ever came to that.
"When the president does it, that means it's not illegal." - Richard M. Nixon
Another example of another nimrod seeking legal advice from /.ers instead of doing the smart thing and asking a lawyer.
It drives me crazy how everyone thinks that you can put anything in a license and if someone accepts/clicks-through/whatever then it is binding. If I put in my license that you are not allowed to breathe while viewing my site, and I have proof that you did so, I will not win a case against you. Even if you accepted that license. Even if you wrote me a letter telling me how you agree to every clause. Even if you specifically mention the breathing clause and how great you think it is, and how you hate people who breathe while viewing web sites. Just because I (or my lawyer) puts a restriction in a license, does not mean that I have the right to make that restriction. P
-P
Why have ONE conviction when you can have TWO?
Access to websites costs money because the bandwidth and systems aren't free.
On that basis alone, I believe they can restrict your access. Minimally, they can tell you to cease and desist.
When I worked at a major E-tailer, over 10% of our traffic was due to robot activity. That 10% utilization of the systems represents more 'hits' than the vast majority of websites get.
The bandwidth and systems cost to support that 10% were *substantial*.
I currently monitor the performance of 20 websites from my home broadband connection. This amounts to around 100 Gigabytes of traffic per month. At some point, I expect to be told by a site to *STOP*.
As for your service, for some time I have wanted to do an 'Orbitz watch' that would monitor specific route prices on a regular basis. Essentially, telling subscribers what they can reasonably expect to pay if they time it right and then telling them when to buy..
IANAL, but I'm sure there is a good reason why all search engines I've run across allow your site not to be spidered with robots.txt present.
Really, I'm not trying to be clever with my signature.
Currently, I am conducting some MIS research at the University of Arizona that does just this: mines information from a website and stores it to a database. I am mostly concerned with the pricing information, however I do not believe it to be illeagal to mine such info. I am thinking of some commercial applications I can use the information for.
If the information is posted on a webpage then it is public knowledge. Hence mining of this information is perfectly ok. Look at how a search engine crawler works.
austintsmith.com
There are some stores that actively discourage people from bringing in notebooks and PDAs to record their pricing information. While I wasn't working for anyone at the time, this happened to me about 2 years ago at a local Future Shop (division of Best Buy Canada). The experience suggests that some stores are very protective of their pricing info. I don't know to what extent this is legal.
I was shopping for a CD burner and a joystick at the time for Christmas (and maybe a little something for my own rig). I don't like shopping online and wanted to know what was available and at what prices locally, so I went around to stores gathering this info. Most didn't have a problem with this. One of the stores I stopped at was Future Shop. I was in the middle of writing down the information when I was suddenly surrounded by 3 salesmen in business suits asking me what I was doing. I told them, and one of them said I couldn't do that. Another fellow in a blue suit IDing himself as a manager came over and told me the same thing. He then said he wanted to know who I worked for, and what to see what I had written in my notebook. I told him I was shopping for myself and refused to hand over my notebook. I left without being challenged. Weirdest Christmas shopping experience I've ever had.
Since the manager joined in, I'm guessing the gestapo stuff is (or was, haven't been back in a while) store policy. It certainly didn't make me want to shop there again. The only thing I can think of is that their competitors often sent "operatives" into the store to collect pricing info and they tired of it. The only thing is, if that store didn't allow price comparisons, how the heck else can you find out what is available locally and at what prices?
=======================MCheu
-Interna
I guess we'll soon have really fucked-up e-commerce websites if more people start to use this technique and robots from multiple sites start to underbid each other.
Will they start to sell items for $-1??
I think the point here is being missed. Read the brief. There were confidentiality agreements involved which ultimately decided the direction of the original injunction. As for the second injunction, a third party cannot aid in the circumventure of an injunction. Period.
You or I could gather price data from sites and use it as a guidelines without legal worries. Compiling and selling that data outright would probably get you in trouble because of its time sesitive nature.
"Ask Slashdot: Is Data Mining for Product Pricing, Illegal?"
Who, wrote this, subject, William, Shatner?
Is Data's mining for product pricing illegal?
It is a far clearer sentence now.
This is just "web scraping".
You already know you are looking for very particular information (prices) which can reliably be found in a particular location. Data mining involves slicing and dicing data to discover new information which may or may not be there.
BTW, [and yes, IAAL], it doesn't make a whole lot of sense for non-lawyers to fret over dubious caselaw from what is probably some other jurisdiction with a very different set of facts.
Just because one judge was dumb enough to go along with a bad decision doesn't mean you have to worry what the next one would do. Don't worry, they'll send you a letter if they really don't like it, and then your legal department can worry about it. With very few exceptions, just collect your paycheck, live your life, and move on.
I was with a friend at Fry's where we were examining a number of DVDs to see if they had a few items in some out-of-print sets. Since I can never remember which of these items I am looking for I have a list on a PDA. After I pulled out the PDA and we started looking through the poorly-organized shelf of DVDs one of the obvious-looking Fry's "loss prevention" or whatever they are Fry's employees appeared. We ignored him and he didn't say anything either, just stood there looking. After a while another one showed up and they had a conversation together in some non-English language which ended with the 2nd one saying 'collectors' in heavily-accented English, after which time they both left. Don't know what would have happened if they had decided otherwise, or if they had actually spoken English well enough to communicate with customers.
I'm sure there are a few products that assist disabled persons to "surf" websites by disecting the web page (through essentially screen scraping techniques) and performing one or more the following:
1. Adjusting text size.
2. Dictation of content.
3. Numbering of links.
4. Numerous other alternate presentation of the same data (changing colors for the color blind for example).
An outright ban of automated scraping techniques would eliminate these uses. (While I am at it: What is a web-browser but a form of screen scrapper?).
If the basic technique is allowed, all that can be debated is the use of such data and I think that is a much more dubious area. Facts are public domain.
Maybe they can use the "my bot is blind defense".
Check out Largest recipe database on the web.
Forget price protection for a moment -- the thing the stores don't like is specifically price scraping. Why do we have any justification to put up legal barriers to block price scraping? It benefits consumers and drives prices down. I'd call it a *positive* factor that is necessary for a free market.
Hell, I could see *requiring* retail outlets to make their prices publically available.
May we never see th
It is VERY common practice to explictly forbid the use of scrapers in the Ts&Cs you quickly pass over on such sites -- and perfectly reasonable. You are using that company's hardware and software under conditional use. Both Expedia and Travelocity prohibit ALL commercial use.
I'm not a lawyer, but I would cover my legal ass before I released a single line of code and I would be hesitant to proceed on this project until I talked to a lawyer. If it's a work-for-hire project, you probably have less of a liability problem.
And what would you put there - a semicolon?
Sheesh.
Here is a related incident:
http://news.com.com/2110-1017-944258.html
Bargain Network spidered real estate prices on homestore.com/realtor.com and posted them on the bargain.com website. Homestore sued and the case was settled out of court. I wish it was not settled out of court because that would set up a precident.
In my opinion you are asking for the problems. Taking a case like this to court and winning would be difficult. At the very least it would be a serious legal expense.
The last time I checked the rules for Froogle you had to be the actual merchant that ships the product in order to show up in their index. If you are spidering a merchant then you are an affiliate, the products do not originate from you so you would be exluced from Froogle. Froogle does not allow you to sort products by price - so obviously what you plan on doing is different. Froogle also gives merchants the option to be excluded from their index.
My advice is this - get a lawyer because one will surely be contacting you. Familiarize yourself with these phrases: false advertising, breach of contract, and unfair competition.
Don't use gnuplot. Use Grace. It can be scripted and is much more powerful and flexible.
That is because part of the price calculation involves the second and third letters of the name or IP address. The reason for this is a trade secret, so nobody will tell you about it.
1) Use $1100 - $1000 rebates
2) Keep their prices and stock a secret.
3) Use deceptive advertising to lure me into the store.
The question in the article is bogus, it's like the whole "deep-linking" scare. What's next? Registration and signing an NDA to view some amateur site where the search feature doesnt work? BS. I'll shop somewhere else. Whatever, if it's publically accessible and doesn't copy (or IFRAME) any content, then you can point to it. It's like the whole "you can't take pictures" or "write down prices" in our store BS.
I've worked retail... the best customer come-back ratio and most profitable stores are *always* UPFRONT and HONEST about their pricing, products, service and policies. To me, anything less is a store/business shooting itself in the foot. You might hood-wink a few customers some of the time, but word-of-mouth, the internet and economic forces will surely take you out.
That's my two pennies
The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
... Digikey would let me sort by price.
;)
Therefore, I will spiderbot them.
You could just make all your prices on your site into OCR-hardened, dynamic images. But it's lame to force your customers to check every site. It's anti-competitive.
The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
See, the problem is that Data's positronic brain and direct computer interface gives him an unfair advantage over humans who are mining for product pricing. So, the obvious solution was to make Data mining for product pricing, illegal.
A simple solution is to check a robots.txt file or similar. If there is none, it should be legal to grab prices online.
However, if through such a file, someone disallows price leeching, then people should respect that and be liable if they ignore those instructions.
Unity in Diversity
No matter whether you tell your client or not, you've just announced to lawyers in the future that you knew about the legal situation. Lawyers looking for violations similar to what you described will be trying to figure out who you are. Let's hope that you twisted the story and are actually working in a different field (such as competitive Peeps prices).
You probably don't have to worry about it but some consulting contracts contain clauses that might lead to trouble if your client is nailed for using this system because you did something illegal in creating it. Usually this is limited to IP violations (stealing code, voilating patents) but the language can be pretty broad sometimes.
Lawyers are often 100% adversarial with people whom they detect cannot supervise them well. You MUST know the law yourself. You MUST think about what should and should not be legal. You cannot sensibly leave important questions about legality to people who make $350 per hour for dealing with confusing legal situations. There is a HUGE conflict of interest. Lawyers make more money if non-lawyers cannot understand the law.
Slashdot is a conversation web site, so, I would expect to see a lot of conversational grammar in
use.
In the case of the headline, I believe that the comma represents 'a DRAMATIC pause'.
Much like:
The inhabitants of Mars, do they realy care?
thank God the internet isn't a human right.
AARRRGGH. I hate ./ sometimes.
From the court summary of the decision:
The court affirmed a preliminary injunction enjoining defendant Zefer Corporation ("Zefer") from utilizing a "scrapper" tool it designed to obtain pricing information from plaintiff's website on the ground that Zefer was doing so to assist defendant Explorica, Inc. ("Explorica"), which was itself enjoined from such activity by virtue of its improper use of confidential information obtained from plaintiff to aid it in gathering this information.
This means that Zefer was prohibited from mining the data because their client (Explorica) was prohibited, because Explorica and Zefer had gained access to the data by exploiting confidential information. Which is another issue entirely from data mining...
me too!
from link:
In a blatant misuse of the Digital Millennium Copyright Act, over the past two weeks a group of national retailers forced FatWallet.com (www.fatwallet.com) to remove Day After Thanksgiving sales information from its site. In letters sent to FatWallet, each retailer claimed that the Copyright Act gives it a monopoly over this price data. Today, the Samuelson Clinic and Gray Matters, on behalf of FatWallet.com, challenges those letters as abuses of federal law, insists on damages, and refuses to disclose identifying information on the individuals who posted the sales information. For more about this issue, please read the press release [PDF], FatWallet's online story or the Chilling Effects story.