Wal-Mart's Data Obsession
g8oz writes "The New York Times covers Wal-Mart's obsession with collecting sales data.
Fun fact: 'Wal-Mart has 460 terabytes of data stored on Teradata mainframes, at
its Bentonville headquarters.
To put that in perspective, the Internet has less than half as much data, according to experts.'
That much information results in some interesting data-mining. Did you know hurricanes increase strawberry Pop Tarts sales 7-fold?"
When you have 460TB of data, how the hell do you even begin to search it?
Seems like they'd need to license map-reduce from google or something. (That's a distributed data correlation engine. With extremely high fault tolerence, to boot.)
...Microsoft has an astonishing amount of information collected from Windows Update users (none of it personally identifiable, of course).
I highly suspect Wal-Mart didn't get into the position it's in of being the largest retailer by being stupid, at least business-wise. This is the sort of project that allows them to stock a 120,000 square-foot big box store from JIT shipments every night, and why every Wal-Mart in a region looks the same. Though I would be interested to read more on the pop-tart to hurricane correlation...
Correlation doesn't imply causation!!!!!
I mean what if a third factor caused both the hurricanes and strawberry Pop Tart sales to increase 7-fold????
Somebody was going to blurt that bromide out at that statement, so it may as well be me.
Seastead this.
My brother sells mangoes to the Wal Mart Beast. He says it's all computerized, beginning with an order for the fruit, following the trucks, even the rotation of the ripening process in the warehouses is computer related. It's as close to virtual management as any company comes.
Anyone seen my jagged little pill?
The Law of truely large numbers.
Basically, the more data you have, the more likely you'll find weird coincidental correlations.
I guess these kinds of 'statistical finding' will become more and more prevalent in the future, given that we're living in an age where we're collecting ever-larger amounts of data, and have the resources to process all this data automatically.
It would be a good thing if people were a bit more sceptical of this kind of stuff. Correlation isn't causation.
Firstly, there is no way they can be talkinging about all the data availible on the internet. Filesharing networks alone have WAY more data than this, and when you add all the FTP servers and mirrors, the webmail archives, the home Windows users with insecure shares...
There is no way this can be true. Even if you ONLY take publicly availible WWW pages, it would far exceed their measly estimate.
If it's in you sig, it's in your post.
The gentleman who gave me the tour indicated they have something like 72 weeks (1 year plus 2 weeks)
According to Google:
1 year = 52.177457 weeks
So 72 weeks is 1 year plus 19.822543 weeks.
That means that the internet has well over a petabyte of information on it, much of the information is probably the same but it is on the internet>
No, Working there means your income has dropped 7-fold.
The government which is strong enough to protect you from everything is strong enough to take everything from you.
WalMart's 460 TB of data, shared among about 300M Internet users, would spread about 1.5MB to each person. That is, of course, a tiny amount of data - probably just the indices on each person's inbox, let alone their email data itself. Each of those people average storage capacity is over 20GB, on new computers, excluding upgrades which are probably usually about 80GB. So just typical end user computers alone account for at least 10,000 - 40,000 times WalMart's big data dump. And then of course there are all the other servers on the Internet, like the SABRE airline reservation system, the US Federal databases of publications, Google's image cache, all the albums and other MP3/SHN/FLACs in P2P, and of course the endless stream of porn.
WalMart is trying to make itself look like it is turning its customer data into success, and benefits for its customers. That serves to downplay its reliance on labor exploitation, monopolistic competition when it enters local markets, and political favors that structure labor and market laws to give it a competitive edge. And WalMart might just be believing the IT sales hype that it spends millions of dollars on. But that's no reason we should buy their IT BS as much as we seem to buy their wares.
--
make install -not war
Why should we be afraid of Wal-Mart? They're using their data to be more responsive to their customer. They want to make sure that if you want something, it's in-stock and ready to go.
What could they do with their data, really, that would hurt anyone? It wouldn't be like "Bob Smith is buying condoms again." It would be more like "there's a condom spike in area code 78750 every Thursday, let's ship more out."
People who are afraid of data aggregation are jumping at shadows. Nobody cares what you in particular are buying. An individual as a data point is useless, unless you're an exemplar or something like that (which would be unusual).
Let's face it, individuals just aren't that interesting. More importandly from Wal-Mart's point of view, there's no return on looking at individuals.