Slashdot Mirror


Wal-Mart's Data Obsession

g8oz writes "The New York Times covers Wal-Mart's obsession with collecting sales data. Fun fact: 'Wal-Mart has 460 terabytes of data stored on Teradata mainframes, at its Bentonville headquarters. To put that in perspective, the Internet has less than half as much data, according to experts.' That much information results in some interesting data-mining. Did you know hurricanes increase strawberry Pop Tarts sales 7-fold?"

32 of 581 comments (clear)

  1. Re:I would have thought that the Internet had more by nerd256 · · Score: 1, Insightful

    I agree,
    Wouldn't Walmart's records constitute some part of the internet also? It has to be connected at some point to the internet, and given some clever haXing skills... one could access it.

    It really depends on your definition of the bounds of the internet, but I think someone is being hyperbolic.

  2. economies of scale by man_ls · · Score: 4, Insightful

    When you have 460TB of data, how the hell do you even begin to search it?

    Seems like they'd need to license map-reduce from google or something. (That's a distributed data correlation engine. With extremely high fault tolerence, to boot.)

    1. Re:economies of scale by Sexy+Bern · · Score: 5, Insightful

      More to the point - how do they back it up?

  3. And in other news... by wesmills · · Score: 4, Insightful

    ...Microsoft has an astonishing amount of information collected from Windows Update users (none of it personally identifiable, of course).

    I highly suspect Wal-Mart didn't get into the position it's in of being the largest retailer by being stupid, at least business-wise. This is the sort of project that allows them to stock a 120,000 square-foot big box store from JIT shipments every night, and why every Wal-Mart in a region looks the same. Though I would be interested to read more on the pop-tart to hurricane correlation...

  4. Correlation doesn't imply causation!!!!! by Baldrson · · Score: 5, Insightful
    Did you know hurricanes increase strawberry Pop Tarts sales 7-fold

    Correlation doesn't imply causation!!!!!

    I mean what if a third factor caused both the hurricanes and strawberry Pop Tart sales to increase 7-fold????

    Somebody was going to blurt that bromide out at that statement, so it may as well be me.

    1. Re:Correlation doesn't imply causation!!!!! by krymsin01 · · Score: 4, Insightful

      It makes sense though. If you are going ride out a storm, you are going to need lots of food that will not require refrigeration nor cooking.

      Beer makes sense also. There are always a hell of a lot of hurrican parties in Florida whenever a hurrican comes 'round.

      --
      stuff
    2. Re:Correlation doesn't imply causation!!!!! by zbyte64 · · Score: 3, Insightful

      Yes there could be a third reason, but lets think about this. When a hurricane comes, you want non-perishable foods. Computer geeks like myself, like poptarts cuz you just open them up and eat em, and those things don't go bad for a while. No need for a microwave or stove, something you would want for soup and such. SO if a hurricane comes by and wipes out gas & electric and everything is friggen wet, you need something that requires no cooking or heating -> poptarts Of course 7 fold does seem a bit high

  5. Re:Huh? by Anonymous Coward · · Score: 1, Insightful

    That 460 terabyte mark sounds fishy.

    There are 300 million americans. Let's include Canada and a couple other countries walmart is in to make it a round number.

    460,000,000,000,000 /300,000,000 ~= 1,500,000

    1.5 Megabytes per person?? I dont believe the average person has generated 1.5 megabytes of data at walmart! If you listed every single item I ever bought at ANY store and even include timestamps this will not reach 1.5 megs! These figures must be exaggerated and include a lot of redundancy.

  6. even the mango is tracted by loid_void · · Score: 4, Insightful

    My brother sells mangoes to the Wal Mart Beast. He says it's all computerized, beginning with an order for the fruit, following the trucks, even the rotation of the ripening process in the warehouses is computer related. It's as close to virtual management as any company comes.

    --
    Anyone seen my jagged little pill?
  7. Just imagine by nizo · · Score: 2, Insightful

    Imagine what evil could be done with this data: how about a service where you can track your spouse's/SO's buying habits? See if they buy condoms and flowers every night they work late for example. Imagine what would happen if they started keeping track of fingerprint data off of cash/checks that people use in stores too. Well I am off to go buy some tin foil now (with cash, wearing gloves) :-)

  8. There's a name for this.. by k98sven · · Score: 4, Insightful

    The Law of truely large numbers.

    Basically, the more data you have, the more likely you'll find weird coincidental correlations.

    I guess these kinds of 'statistical finding' will become more and more prevalent in the future, given that we're living in an age where we're collecting ever-larger amounts of data, and have the resources to process all this data automatically.

    It would be a good thing if people were a bit more sceptical of this kind of stuff. Correlation isn't causation.

    1. Re:There's a name for this.. by sql*kitten · · Score: 4, Insightful
      It would be a good thing if people were a bit more sceptical of this kind of stuff.

      Ermm, RTFA.
      1. They predicted that pop tart sales would increase
      2. They shipped additional pop tarts in anticipation
      3. The pop tarts sold like, umm, hot pop tarts

      You can be skeptical all you want. Someone at Walmart made the call, and they were right.
    2. Re:There's a name for this.. by k98sven · · Score: 4, Insightful

      I did RTFA.

      And, firstly: that's not exactly a proper test.
      (Supply does create demand. Why do you think stores like building big pyramids of merchandise, and so on.. Hint: It's not just because it looks pretty.)

      Perhaps you should read my comment again and try to get the point. I wasn't neccesarily being sceptical about pop-tarts. I was being sceptical about the method in general.

      Obviously some of the correlations they'll find are real too. That's not what I was referring to.

      What I was referring to, was that it's very easy to become blind to the statistics. To fall into the trap of seeing correlations where there are none. The human brain has a remarkable pattern-finding ability. Unfortunately that ability does lead us astray sometimes.
      (For instance reading human faces into natural formations, and so on)

      Besides this, the Wal-mart people probably aren't very interested in talking about the times their fancy new method failed, are they?

    3. Re:There's a name for this.. by Anonymous Coward · · Score: 1, Insightful
      Correlation isn't causation.

      Walmart doesn't care about causation. That is the realm of philosophers.

      Walmart cares only whether Item A is positively or negatively correlated with Event B. If Event B is likely to occur and there is a positive correlation between the sale of Item A and occurrence of Event B, they increase the stock of Item A. Find a negative correlation between the sale of Item A and the occurrence of Event B? Let the stock of Item A deplete and use the shelf space for an Item B whose sale is positively correlated with Event B.

    4. Re:There's a name for this.. by Gooba42 · · Score: 2, Insightful

      Just nitpicky...

      The previous post about the "flaw" of the correlation said, accurately, that correlation is not causation. Then you said this isn't a "real" correlation.

      This is a *real* correlation but whether it's causative is the only part that is suspect. Correlation is easy, *meaningful* correlation is not.

      --
      I just found out there's no such thing as the real world. It's just a lie you've got to rise above. - John Mayer
  9. Re:"Nothing for you to see here. Please move along by PKPerson · · Score: 2, Insightful

    I would assume this data is more than just shopping trends. I guess it includes survelance photos, employee data, backups of it all, etc. if it is all shopping trends, there are either very observative or stalkers.

  10. Re:So, if Walmart put up a web interface... by Frnknstn · · Score: 5, Insightful

    Firstly, there is no way they can be talkinging about all the data availible on the internet. Filesharing networks alone have WAY more data than this, and when you add all the FTP servers and mirrors, the webmail archives, the home Windows users with insecure shares...

    There is no way this can be true. Even if you ONLY take publicly availible WWW pages, it would far exceed their measly estimate.

    --
    If it's in you sig, it's in your post.
  11. Re:Seen it! by SamMichaels · · Score: 4, Insightful

    The gentleman who gave me the tour indicated they have something like 72 weeks (1 year plus 2 weeks)

    According to Google:

    1 year = 52.177457 weeks

    So 72 weeks is 1 year plus 19.822543 weeks.

  12. coorelation by gnuLNX · · Score: 1, Insightful

    "Did you know hurricanes increase strawberry Pop Tarts sales 7-fold?"

    Did you know that coorelation does not imply causation?

    --
    what?
  13. Re:I would have thought that the Internet had more by Anonymous Coward · · Score: 1, Insightful

    Pointless comparision. There's hardly that much non-redundant, noiseless or meaningful data in the walmart database either.

  14. Re:I would have thought that the Internet had more by Brynath · · Score: 5, Insightful
    But the Internet Archive is on the internet right?

    That means that the internet has well over a petabyte of information on it, much of the information is probably the same but it is on the internet>

  15. Re:So, if Walmart put up a web interface... by ScrewMaster · · Score: 2, Insightful

    Well, full of some kind of prepared meats anyway. It's really impossible to quantify the amount of information that is available via the Internet. Even public databases don't necessarily publish how large they actually are And besides, 460 TB sounds like an awful lot but it really isn't when you think about it. Banks have data stores of that magnitude, so do research institutions of various sorts (weather and geophysics alone account for a huge quantity of data), governments are famous for squirreling things away (they also have other things in common with squirrels but we won't get into that right now). Hell, even law firms have immense data storage needs. NASA could probably teach Wal-Mart a thing or two about really big data stores. This whole business of the British Health Ministry (is that the one?) that wants to computerize all of their their medical records will dwarf Wal-Mart if it ever gets off the ground. Don't really see why this is newsworthy, in and of itself.

    --
    The higher the technology, the sharper that two-edged sword.
  16. Re:Yeah by vettemph · · Score: 3, Insightful

    No, Working there means your income has dropped 7-fold.

    --
    The government which is strong enough to protect you from everything is strong enough to take everything from you.
  17. WalMart BS by Doc+Ruby · · Score: 4, Insightful

    WalMart's 460 TB of data, shared among about 300M Internet users, would spread about 1.5MB to each person. That is, of course, a tiny amount of data - probably just the indices on each person's inbox, let alone their email data itself. Each of those people average storage capacity is over 20GB, on new computers, excluding upgrades which are probably usually about 80GB. So just typical end user computers alone account for at least 10,000 - 40,000 times WalMart's big data dump. And then of course there are all the other servers on the Internet, like the SABRE airline reservation system, the US Federal databases of publications, Google's image cache, all the albums and other MP3/SHN/FLACs in P2P, and of course the endless stream of porn.

    WalMart is trying to make itself look like it is turning its customer data into success, and benefits for its customers. That serves to downplay its reliance on labor exploitation, monopolistic competition when it enters local markets, and political favors that structure labor and market laws to give it a competitive edge. And WalMart might just be believing the IT sales hype that it spends millions of dollars on. But that's no reason we should buy their IT BS as much as we seem to buy their wares.

    --

    --
    make install -not war

  18. Re:I would have thought that the Internet had more by Anonymous Coward · · Score: 1, Insightful

    But since Archive.org is archiving Internet stuff, that's just duplicates. What I'm interested is the unique data on the Internet compared to Walmart's own DB.

  19. Be Afraid? Why? by mveloso · · Score: 4, Insightful

    Why should we be afraid of Wal-Mart? They're using their data to be more responsive to their customer. They want to make sure that if you want something, it's in-stock and ready to go.

    What could they do with their data, really, that would hurt anyone? It wouldn't be like "Bob Smith is buying condoms again." It would be more like "there's a condom spike in area code 78750 every Thursday, let's ship more out."

    People who are afraid of data aggregation are jumping at shadows. Nobody cares what you in particular are buying. An individual as a data point is useless, unless you're an exemplar or something like that (which would be unusual).

    Let's face it, individuals just aren't that interesting. More importandly from Wal-Mart's point of view, there's no return on looking at individuals.

  20. Re:I would have thought that the Internet had more by Tony+Hoyle · · Score: 2, Insightful

    The internet has well over a petabyte of data in it.

    It has far less actual information...

  21. Can't be just 250 TB on the net by pyth · · Score: 2, Insightful

    Look at suprnova.org. The number of unique data sets is the number of torrents. They don't publish the total size of all torrents, but suppose you have an average 300 MB. Multiply by the number of torrents (bottom of page), and you get about 100 TB right there.

    If instead you look at the number of seeders, it is like 2 PB, just not all unique.

  22. Re:Walmart does drop your income by Anonymous Coward · · Score: 2, Insightful

    Umm - Vlassic is under no obligation to do business with Walmart in any capacity, so if they did not think the deal was in their best interest, they were free not to enter into it.

  23. Re:So, if Walmart put up a web interface... by Anonymous Coward · · Score: 1, Insightful
    Don't really see why this is newsworthy, in and of itself.
    The data collection isn't. The data analysis, and consequent application, is.
  24. Re:Walmart does drop your income by nelsonal · · Score: 2, Insightful

    Wal-Mart has never put a gun to a company's head and forced them to sell there. Vlassic management went into bankruptcy because they were willing to trade off profitable pickle lines to grow their volumes at Wal-Mart. All that data is what Wal-Mart does best, identify what consumers want and deliver it to them. Don't blame the messanger blame the consumer.

    --
    Degaussing scares the bad magnetism out of the monitor and fills it with good karma.
  25. Re:So, if Walmart put up a web interface... by DrEldarion · · Score: 2, Insightful

    Yes, but it's all the same data, still. If you only count UNIQUE data, the number is MUCH, MUCH lower.