Wal-Mart's Data Obsession
g8oz writes "The New York Times covers Wal-Mart's obsession with collecting sales data.
Fun fact: 'Wal-Mart has 460 terabytes of data stored on Teradata mainframes, at
its Bentonville headquarters.
To put that in perspective, the Internet has less than half as much data, according to experts.'
That much information results in some interesting data-mining. Did you know hurricanes increase strawberry Pop Tarts sales 7-fold?"
I agree,
Wouldn't Walmart's records constitute some part of the internet also? It has to be connected at some point to the internet, and given some clever haXing skills... one could access it.
It really depends on your definition of the bounds of the internet, but I think someone is being hyperbolic.
When you have 460TB of data, how the hell do you even begin to search it?
Seems like they'd need to license map-reduce from google or something. (That's a distributed data correlation engine. With extremely high fault tolerence, to boot.)
...Microsoft has an astonishing amount of information collected from Windows Update users (none of it personally identifiable, of course).
I highly suspect Wal-Mart didn't get into the position it's in of being the largest retailer by being stupid, at least business-wise. This is the sort of project that allows them to stock a 120,000 square-foot big box store from JIT shipments every night, and why every Wal-Mart in a region looks the same. Though I would be interested to read more on the pop-tart to hurricane correlation...
Correlation doesn't imply causation!!!!!
I mean what if a third factor caused both the hurricanes and strawberry Pop Tart sales to increase 7-fold????
Somebody was going to blurt that bromide out at that statement, so it may as well be me.
Seastead this.
That 460 terabyte mark sounds fishy.
/300,000,000 ~= 1,500,000
There are 300 million americans. Let's include Canada and a couple other countries walmart is in to make it a round number.
460,000,000,000,000
1.5 Megabytes per person?? I dont believe the average person has generated 1.5 megabytes of data at walmart! If you listed every single item I ever bought at ANY store and even include timestamps this will not reach 1.5 megs! These figures must be exaggerated and include a lot of redundancy.
My brother sells mangoes to the Wal Mart Beast. He says it's all computerized, beginning with an order for the fruit, following the trucks, even the rotation of the ripening process in the warehouses is computer related. It's as close to virtual management as any company comes.
Anyone seen my jagged little pill?
Imagine what evil could be done with this data: how about a service where you can track your spouse's/SO's buying habits? See if they buy condoms and flowers every night they work late for example. Imagine what would happen if they started keeping track of fingerprint data off of cash/checks that people use in stores too. Well I am off to go buy some tin foil now (with cash, wearing gloves) :-)
I Am My Own Worst Enemy
The Law of truely large numbers.
Basically, the more data you have, the more likely you'll find weird coincidental correlations.
I guess these kinds of 'statistical finding' will become more and more prevalent in the future, given that we're living in an age where we're collecting ever-larger amounts of data, and have the resources to process all this data automatically.
It would be a good thing if people were a bit more sceptical of this kind of stuff. Correlation isn't causation.
I would assume this data is more than just shopping trends. I guess it includes survelance photos, employee data, backups of it all, etc. if it is all shopping trends, there are either very observative or stalkers.
Firstly, there is no way they can be talkinging about all the data availible on the internet. Filesharing networks alone have WAY more data than this, and when you add all the FTP servers and mirrors, the webmail archives, the home Windows users with insecure shares...
There is no way this can be true. Even if you ONLY take publicly availible WWW pages, it would far exceed their measly estimate.
If it's in you sig, it's in your post.
The gentleman who gave me the tour indicated they have something like 72 weeks (1 year plus 2 weeks)
According to Google:
1 year = 52.177457 weeks
So 72 weeks is 1 year plus 19.822543 weeks.
"Did you know hurricanes increase strawberry Pop Tarts sales 7-fold?"
Did you know that coorelation does not imply causation?
what?
Pointless comparision. There's hardly that much non-redundant, noiseless or meaningful data in the walmart database either.
That means that the internet has well over a petabyte of information on it, much of the information is probably the same but it is on the internet>
Well, full of some kind of prepared meats anyway. It's really impossible to quantify the amount of information that is available via the Internet. Even public databases don't necessarily publish how large they actually are And besides, 460 TB sounds like an awful lot but it really isn't when you think about it. Banks have data stores of that magnitude, so do research institutions of various sorts (weather and geophysics alone account for a huge quantity of data), governments are famous for squirreling things away (they also have other things in common with squirrels but we won't get into that right now). Hell, even law firms have immense data storage needs. NASA could probably teach Wal-Mart a thing or two about really big data stores. This whole business of the British Health Ministry (is that the one?) that wants to computerize all of their their medical records will dwarf Wal-Mart if it ever gets off the ground. Don't really see why this is newsworthy, in and of itself.
The higher the technology, the sharper that two-edged sword.
No, Working there means your income has dropped 7-fold.
The government which is strong enough to protect you from everything is strong enough to take everything from you.
WalMart's 460 TB of data, shared among about 300M Internet users, would spread about 1.5MB to each person. That is, of course, a tiny amount of data - probably just the indices on each person's inbox, let alone their email data itself. Each of those people average storage capacity is over 20GB, on new computers, excluding upgrades which are probably usually about 80GB. So just typical end user computers alone account for at least 10,000 - 40,000 times WalMart's big data dump. And then of course there are all the other servers on the Internet, like the SABRE airline reservation system, the US Federal databases of publications, Google's image cache, all the albums and other MP3/SHN/FLACs in P2P, and of course the endless stream of porn.
WalMart is trying to make itself look like it is turning its customer data into success, and benefits for its customers. That serves to downplay its reliance on labor exploitation, monopolistic competition when it enters local markets, and political favors that structure labor and market laws to give it a competitive edge. And WalMart might just be believing the IT sales hype that it spends millions of dollars on. But that's no reason we should buy their IT BS as much as we seem to buy their wares.
--
make install -not war
But since Archive.org is archiving Internet stuff, that's just duplicates. What I'm interested is the unique data on the Internet compared to Walmart's own DB.
Why should we be afraid of Wal-Mart? They're using their data to be more responsive to their customer. They want to make sure that if you want something, it's in-stock and ready to go.
What could they do with their data, really, that would hurt anyone? It wouldn't be like "Bob Smith is buying condoms again." It would be more like "there's a condom spike in area code 78750 every Thursday, let's ship more out."
People who are afraid of data aggregation are jumping at shadows. Nobody cares what you in particular are buying. An individual as a data point is useless, unless you're an exemplar or something like that (which would be unusual).
Let's face it, individuals just aren't that interesting. More importandly from Wal-Mart's point of view, there's no return on looking at individuals.
The internet has well over a petabyte of data in it.
It has far less actual information...
Look at suprnova.org. The number of unique data sets is the number of torrents. They don't publish the total size of all torrents, but suppose you have an average 300 MB. Multiply by the number of torrents (bottom of page), and you get about 100 TB right there.
If instead you look at the number of seeders, it is like 2 PB, just not all unique.
Umm - Vlassic is under no obligation to do business with Walmart in any capacity, so if they did not think the deal was in their best interest, they were free not to enter into it.
Wal-Mart has never put a gun to a company's head and forced them to sell there. Vlassic management went into bankruptcy because they were willing to trade off profitable pickle lines to grow their volumes at Wal-Mart. All that data is what Wal-Mart does best, identify what consumers want and deliver it to them. Don't blame the messanger blame the consumer.
Degaussing scares the bad magnetism out of the monitor and fills it with good karma.
Yes, but it's all the same data, still. If you only count UNIQUE data, the number is MUCH, MUCH lower.