This issue distills down to one thing: bots vs. people manually using browsers and their differing effective load on servers, bandwidth, etc.... more importantly, their potential load.
I don't believe eBay's objection was to having their content show up somewhere else, because people will still come back to eBay to close an actual transaction; rather it was that resources were being measurably soaked up by bot traffic, something in the 5 to 10% range at times, as I recall.
As that proportion grows further, and it certainly will, potential for denial-of-service enters in, keeping the people that the site was designed for, People Manually Using Browsers, increasingly disserviced.
A billion simultaneous hits from people might melt them down, but eBay would accept the challenge and scramble to meet the needs of their customers, a.k.a. People. They shouldn't have to double the size of the farm to accommodate bots.
eBay and every other site have designed an experience for People and should be allowed to restrict that only to People if they choose. Sites that do business based on pageviews and clickstreams will lose their livelihood if they can no longer document an audience of eyeballs vs. perl scripts. Darwin would even approve in that, for example, shopping sites that choose to not be part of The Harvested would fall off of the radar of the MySimon.coms of the world, and subsequently might fall off alltogether.
I work with some other large sites on both sides of the issue, where we develop bots to mine specific content, and where we watch others come in and do the same. The rules that work fine for us and should for everyone else:
- Use robots.txt; implement your own, and respect others'.
- Throttle yourself; don't open 30 simultaneous connections just because you can and it'll get done faster; behave just like a Person Manually Using A Browser instead of a GulpBot and you won't become a problem that has to be dealt with (or eventually legislated against).
It's a scheme based on courtesy, followed next by civil suits between disputing parties, and hopefully nothing further. A blanket treatment by the courts will likely be an all-or-nothing-ish ruling, giving bots access to everything or nothing, in either case, a disaster.
I don't believe eBay's objection was to having their content show up somewhere else, because people will still come back to eBay to close an actual transaction; rather it was that resources were being measurably soaked up by bot traffic, something in the 5 to 10% range at times, as I recall.
As that proportion grows further, and it certainly will, potential for denial-of-service enters in, keeping the people that the site was designed for, People Manually Using Browsers, increasingly disserviced.
A billion simultaneous hits from people might melt them down, but eBay would accept the challenge and scramble to meet the needs of their customers, a.k.a. People. They shouldn't have to double the size of the farm to accommodate bots.
eBay and every other site have designed an experience for People and should be allowed to restrict that only to People if they choose. Sites that do business based on pageviews and clickstreams will lose their livelihood if they can no longer document an audience of eyeballs vs. perl scripts. Darwin would even approve in that, for example, shopping sites that choose to not be part of The Harvested would fall off of the radar of the MySimon.coms of the world, and subsequently might fall off alltogether.
I work with some other large sites on both sides of the issue, where we develop bots to mine specific content, and where we watch others come in and do the same. The rules that work fine for us and should for everyone else:
- Use robots.txt; implement your own, and respect others'.
- Throttle yourself; don't open 30 simultaneous connections just because you can and it'll get done faster; behave just like a Person Manually Using A Browser instead of a GulpBot and you won't become a problem that has to be dealt with (or eventually legislated against).
It's a scheme based on courtesy, followed next by civil suits between disputing parties, and hopefully nothing further. A blanket treatment by the courts will likely be an all-or-nothing-ish ruling, giving bots access to everything or nothing, in either case, a disaster.
Again, the robots.txt scheme in use today works.
-- deej