Wal-Mart's Data Obsession
g8oz writes "The New York Times covers Wal-Mart's obsession with collecting sales data.
Fun fact: 'Wal-Mart has 460 terabytes of data stored on Teradata mainframes, at
its Bentonville headquarters.
To put that in perspective, the Internet has less than half as much data, according to experts.'
That much information results in some interesting data-mining. Did you know hurricanes increase strawberry Pop Tarts sales 7-fold?"
and shopping there means your income has dropped 7-fold
Who says how much data the Internet has available?
Get your own free personal location tracker
would like to welcome our new (evil) data collecting overlords.
"Sanity is not statistical", George Orwell, "1984"
My company alone has over 50 terabytes of data available for download on the internet. Whoever thinks there's that little data on the internet is very poorly-informed.
I'd be highly surprised if the internet combined didn't reach the exabyte mark ...
Sunny Dubey
you fools have no idea that I would never let you hurt the Wall-Mart
Someone at Walmart has ALOT of pr0n!
Even Walmart probably doesn't even know what all that data means. Think of the processing power needed to make sense out of it all. I'm sure there are countless interesting trends that are lost in that data ocean.
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
MaxPower (2263)
"I got it from a hair dryer."
When you have 460TB of data, how the hell do you even begin to search it?
Seems like they'd need to license map-reduce from google or something. (That's a distributed data correlation engine. With extremely high fault tolerence, to boot.)
the Internet has less than half as much data, according to experts
What's the word I'm looking for? Oh yeah - it's bullshit
...Microsoft has an astonishing amount of information collected from Windows Update users (none of it personally identifiable, of course).
I highly suspect Wal-Mart didn't get into the position it's in of being the largest retailer by being stupid, at least business-wise. This is the sort of project that allows them to stock a 120,000 square-foot big box store from JIT shipments every night, and why every Wal-Mart in a region looks the same. Though I would be interested to read more on the pop-tart to hurricane correlation...
they're storing them on a huge cluter of their $200 lindows systems. ;)
Marge, get me your address book, 4 beers, and my conversation hat.
Correlation doesn't imply causation!!!!!
I mean what if a third factor caused both the hurricanes and strawberry Pop Tart sales to increase 7-fold????
Somebody was going to blurt that bromide out at that statement, so it may as well be me.
Seastead this.
The moderation on this guy amuses the hell out of me. Instead of saying "Why can't you be nice? -1 Troll" you say "Yeah, I know. -1 Redundant."
"Never attribute to malice that which can be adequately explained by stupidity." -- Hanlon's Razor
As a guest of WalMart I was able to enter their data center and see this Terraplex first hand. It's massive. It's thousands upon thousands of disks in ~8' frames, rows upon rows of racks. I walked down it and across it and up it and was simply awestruck by the idea of that many disks in one spot.
The gentleman who gave me the tour indicated they have something like 72 weeks (1 year plus 2 weeks) of purchase data on LIVE disk arrays, plus huge archives of the same data on tape. If you buy anything and use your credit, debit, or whatever card they can figure out your sales history obscenely quickly. Be afriad. Be very afraid.
I also got to see Walmart.com (Sun E15k) and Samsclub.com (A bunch of HP boxes in a smallish frame), they were creepy, in a sense... all those sales going on at once, converging on a spot not a few feet from me.
Comment removed based on user account deletion
If Walmart created a web interface for their data, would the amount of data on the Internet suddenly triple?
I think the expert they got their information from was full of baloney.
--
RumorsDaily
I've been reading the comments
I forgot, are we supposed to hate Wallmart?
On one hand they are a large corporate empire and on the other, they promote cheap linux computers.
arg, Im so confused
Yes I did. God help me!
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
A few years ago when I worked in retail, everything was going smoothly. Every night the managers would go around with electronic guns and see what needed ordering the next day. Except for the busiest times of the year the backroom was pretty much empty of stock, and on top of the aisles the extra stock was minimal.
Then one day, the managers were really excited, as we were going to have a computer order everything for us, from records of sales from before and it would "predict" what we would need. They said the extra stock on top of the aisles would be eliminated. We would be able to concentrate on customer service.
Well, the day came, and for a few months you could tell the computer was fighting with limited data. Some weeks would be rediculously overstocked on a few items, others, the leading sellers in the store would have empty shelves. When it finally settled down after a year, it was worse than before the computer.
The top of aisles were jammed to the ceiling with stock, there was never any room to put anything up there, and getting to the bottom for something you needed cost a lot of time. Plus, the backroom was packed with stock. You could hardly move around, and trying to find the last box of something buried underneath these huge piles was a task that killed your morale. During the slow months, one stocker for the whole store was enough for a night, now 3 were common to deal with all the stock.
Google has 8E9 web pages and documents indexed. If the average document is 20 kB in length, then we have 160 TB of publicly available data on the internet, not including pictures and filesharing. The latter probably has a great deal of duplicate data anyway.
Avantslash: low-bandwidth mobile slashdot.
Yeah, all that evil marketing data is really oppressing the masses and restricting the free flow of ideas.
Mathematics is made of 50 percent formulas, 50 percent proofs, and 50 percent imagination.
My brother sells mangoes to the Wal Mart Beast. He says it's all computerized, beginning with an order for the fruit, following the trucks, even the rotation of the ripening process in the warehouses is computer related. It's as close to virtual management as any company comes.
Anyone seen my jagged little pill?
Imagine what evil could be done with this data: how about a service where you can track your spouse's/SO's buying habits? See if they buy condoms and flowers every night they work late for example. Imagine what would happen if they started keeping track of fingerprint data off of cash/checks that people use in stores too. Well I am off to go buy some tin foil now (with cash, wearing gloves) :-)
I Am My Own Worst Enemy
The Law of truely large numbers.
Basically, the more data you have, the more likely you'll find weird coincidental correlations.
I guess these kinds of 'statistical finding' will become more and more prevalent in the future, given that we're living in an age where we're collecting ever-larger amounts of data, and have the resources to process all this data automatically.
It would be a good thing if people were a bit more sceptical of this kind of stuff. Correlation isn't causation.
Did you know?
EVERY TIME A LOAF OF BREAD IS BAKED,
APPROXIMATELY
150,000,000 YEASTS ARE
KILLED
Come to the award-winning 1987 film,
"The Very Small and Quiet Screams"
-- a cinematic electromicrograph of yeasts being baked.
A must for those who care about yeast, and especially for those who don't.
SPONSORED BY
Brown Anaerobe Rights Coalition (BARC)
Student Bakers for Social Responsibility
Coalition for the Elevation of Life (CELL)
Defend all life: "From greatest to least, from human to yeast!"
Help Fight SPAM today!
People who call themselves "experts" but are really just talking out of their asses do. Consider that The Internet Archive alone contains more than a petabyte (1024 terrabyte) of data, all of it accessible, and that they are adding on the order of 20 terrabyte a day, and you start realizing how much bigger the Web is.
Perhaps non redundant DATA?
I would assume this data is more than just shopping trends. I guess it includes survelance photos, employee data, backups of it all, etc. if it is all shopping trends, there are either very observative or stalkers.
I hate to sound like some pro-totalitarian next generation Big Brother, but it's not as if they are collecting personal information on customers without the customer's consent. Wal-Mart are just doing some major (I agree with obsessive though) market research so as they can optimise their stores to maximise profits, exactly the same as every other business in the world.
Fat people are hard to kidnap
Coworkers who have worked with Wal-mart IT tell me that Wal-mart does indeed have mountains of data. However, they have so much data that they do not know what to do about it. They can't interpret it all because there is just too much of it.
This makes me wonder... there must be some ideal point where a certain amount of data collected is worth the most money because you can act on that data. After that point, collecting additional data is increasingly more costly and counterproductive unless you invest in an infrastructure that lets you process more data. How does one figure out that ideal point? Just a thought.
Wal-Mart employees who use their employee discount cards have every purchase tracked and monitored.
Activity of the cards is ACTUALLY monitored for discrepencies in buying habits to find abusive employees who buy things for their friends?
Did you also know Wal-Mart's employee name badges have RFID tags (and have had for many years) that allow Wal-Mart to track where an employee is at any given time?
Another interesting tidbit, did you know at Wal-Mart's Jewelery warehouses they actually WEIGH the amount of metal in your body when you enter a leave? (And I don't mean they ask you to put things in a dish and weigh the dish - they scan YOU)
Another interesting thing, Wal-Mart has a fallout facility in Oklahoma that has a near-real-time backup of each BIT of that 460 terabytes of data?
Wal-Mart could survive a direct nuclear blast and still keep on a truckin'.
And, of course, if you're in a Wal-Mart home office - ISD building - distribution center - et al... and dial 911 - BOOM - you get Wal-Mart's private security? Niiice, hope it's not a real emergency, you first have to explain it to them - then if they deem it neccessary THEY will call the REAL 911!
How the hell can they estimate that? Assuming "less than half" means about 45%, that gives us about 207 TB. Let's just round that up to 240.148445 TB to make it a nice, even number.
Google is searching 8,058,044,651 "webpages"* -- who knows what that means. Now, Google isn't searching every single page on the internet, certainly. But also, they can't be searching pages that don't exist. So the 8bn Google pages aren't certainly all the internet. But Google isn't double or triple counting pages. Still, at 240.148445 TB (my rough estimate), we come up with a page size of exactly> 32KB per page.**
Is this just counting the text? The code for this page right here (comments.pl) weighs in at about 14KB. Wal-Mart, in no way, has twice as much info as the internet. I would say the "internet" should be measured in at least petabytes. Archive.org itself already has 1PB, and I consider any of that content available to me "on the internet".
* I'm not even counting the Google cache.
* Which means Mr. Gates over-estimated by a factor of 20 when considering how much memory we all needed!
Small potatoes make the steak look bigger.
Did you know hurricanes increase strawberry Pop Tarts sales 7-fold? ...and if you needed a 460 TB data array to tell you that then you're too stupid to live.
You're using her as bait, Master!
We learned a lot about Walmart and Data mining in my database 101 class. And the professor asks "Why do you think Walmart is so successful?"
And everyone says something about leveraging technology and JIT delivery, etc.
Professor Liu says "Nope. Location."
Walmart chose most of their initial locations in cities/regions where there was no other competition. Places where there was no Kmart, no department stores, no malls. And they flourished.
In the future, I would want to not be isolated from my friends in the Space Station.
Listen, if you really are that paranoid, pay in cash. Then there is no way for the evil Wal-mart overlords to find you and force you to buy more pop tarts.
Mathematics is made of 50 percent formulas, 50 percent proofs, and 50 percent imagination.
I think the experts as in ex=former and spurt=drip under pressure.
Oolite: Elite-like game. For Mac, Linux and Windows
The Internet definately has more data than Wal-Mart. Consider this old 2002 study. The "deep web" alone, comprised mostly of databases, comprises 91,850 TB of data. And this was a couple years ago. It doesn't include email or P2P either.
The definition they used for "Internet" was probably "web pages indexed with a search engine" which is definately not the entire Internet.
My company has 300,000 employees each of whom has about 40GB on their desktops. That's 12,000,000 GB which is 12,000 TB most of which is junk.
For which it stands, one store under God, indivisible, with sales and product for all.
From the article;
"You can see the pattern of Wal-Mart's mandates, and as Wal-Mart grows in power, it is getting more dictatorial.....Wal-Mart lives in a world of supply and command, instead of a world of supply and demand."
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
Political parties are using consumer shopping patterns to figure out who to reach with 1-to-1 political messages.
Stuff like: women who buy from catalogs, eat "crunchy" peanut butter, own a cat and drive a minivan you are 87% more likely to react positively to prayer in schools as a "motivating issue."
I just made that up, but it's the sort of thing they find out. No tin-foil hats here - corporations and pollsters are shelling out millions of dollars for this stuff.
Here's a few google searches links to get you started:
Acxiom
Seisint
Uh, except that Google hasn't indexed all of the publicly available WWW. It's only indexed a small fraction of it. And the WWW isn't the Internet. They're different. Secondly, the Internet Archive alone has archived 1 petabyte of data so the figure of 230 terabytes of data on the Internet is obviously wrong.
Support the First Amendment. Read at -1
... do they have a freezer big enough for 460TB worth of drives?
Perhaps you should switch to Wal-Mart. I hear Wal-Mart Pharmacy has the cheapest anti-psychotic medications in the US.
Hugh Hefner?!? Dude, didn't think you'd be posting anonymously! Share the wealth, man :)
Condemnant quod non intellegunt.
To put that in perspective, the Internet has less than half as much data, according to experts.
someone realized that the DB servers are actually accessible from the internet and then bam, instand 2x increase in the amount of data on the internet.
"Not knowing when the dawn will come, I open every door." - Emily Dickinson
That means that the internet has well over a petabyte of information on it, much of the information is probably the same but it is on the internet>
Also, don't forget that the internet includes Usenet and other services under the protocol, which has TONS of additional data. Chances are, the internet is not 230 terabytes large and the idiot who made that claim...is an idiot.
A blog like any other.
That much information results in some interesting data-mining. Did you know hurricanes increase [non-perishable food item] sales 7-fold?
It took them 460 terabytes of data to figure out that hurricanes make people buy more non-perishable food than usual?
Wow, data mining is "usefull"...
You can't take the sky from me...
Ok, people are going to be without power for a while, possibly a long while, and Walmart predicted the sale of nearly unperishable dry goods would rise? My God, the sheer genius of it baffles me!
Call me when they can Mathmatically prove which flavors are most popular in a Hurricane.
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
Your number is wrong, from their faq:
The Internet Archive Wayback Machine contains approximately 1 petabyte of data and is currently growing at a rate of 20 terabytes per month.
That's 20 terabytes per month, not per day.
They got their Internet statistics from the Chinese government.
First of all, most Walmarts don't primarily sell food, they primarily sell loads of other stuff. In fact, what they sell is a lot of stuff that people might need to survive a hurricane, including various kinds of hardware, containers, lights, reading material. So a hurricane would naturally drive lots of people into Walmart. Naturally those people will buy food products while they're in there, and the standard Walmart sells mostly junk food. So it's not as if people are seeking out pop-tarts in hurricane season, but the massive influx of people buying all kinds of things will also increase the number of people buying non-perishable junk food.
Consider also that people will not be worrying about their diets when they're primarily worried about not being killed by their own rooftops...
Combine a bunch of these factors together, and yes, I can easily believe 7x.
It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
That's 20 terabytes per month, not per day.
Even with that number, I wouldn't want to be the Hard Drive specialist...
Interviewer:Would care to describe you previous job?
-Installing HDs 24/7.
I live in Soviet Canuckistan you insensitive clod!
If you are concerned about all this consumer information being used as 'big brother', maybe you ought to start doing something about it. Lying on the census or your income taxes is illegal, but marketers are fair game. The easiest way to mess with them is to tell them the opposite of the truth. Or, camouflage your true interests by entering a lot of junk. I.E. if your are pissed off that you didn't get a refund you were due from MicroCenter (notorious refund scammers) just fill out several hundred bogus refund forms. Jam the system.
If you're willing to break the law, you can even do worse harm. But I don't condone that.
Using legal methods to increase the entropy are the best way to fight the marketing databases.
300k employees all with desk jobs?
The internet got substantially more data than that. Heck, only my ultra small hobby-company has around 1 TB on the internet. And privatly I have around 0.5 TB shared over the internet from home. Then add all other small hobby companies, billions of webpages, colocation-servers, communities, p2p-"seeders" etc etc, and it will quickly pass 230 TB data, many thousand times over.
No problem, drop on in!
As mentioned by a friend when referrering to his video clip collection (but it doesn't help the videos/films he makes):
"Oh, I have a few frigabytes of data."
"Frigabyte? What's that?"
"Oh, that's a friggin lot of data."
WalMart's 460 TB of data, shared among about 300M Internet users, would spread about 1.5MB to each person. That is, of course, a tiny amount of data - probably just the indices on each person's inbox, let alone their email data itself. Each of those people average storage capacity is over 20GB, on new computers, excluding upgrades which are probably usually about 80GB. So just typical end user computers alone account for at least 10,000 - 40,000 times WalMart's big data dump. And then of course there are all the other servers on the Internet, like the SABRE airline reservation system, the US Federal databases of publications, Google's image cache, all the albums and other MP3/SHN/FLACs in P2P, and of course the endless stream of porn.
WalMart is trying to make itself look like it is turning its customer data into success, and benefits for its customers. That serves to downplay its reliance on labor exploitation, monopolistic competition when it enters local markets, and political favors that structure labor and market laws to give it a competitive edge. And WalMart might just be believing the IT sales hype that it spends millions of dollars on. But that's no reason we should buy their IT BS as much as we seem to buy their wares.
--
make install -not war
Walmarts storage breakdown (where 460Tb goes)...
Illicit Pornography 200Tb
Hidden Toilet Camera archive footage 100Tb
Sys admins private warez collection 80Tb
Previous employees records 60Tb
CIO's mp3's 15Tb
Sales Records 3Tb
Records of Returned / Faulty Products 2Tb
... More than 640 Terabytes anyway, right?
(did I just say that out loud?)...
Do you realise the volume of items Wal-Mart stores WORLD WIDE sell?
If anything, 460 TB seems like an understatement. Not to mention the claim that the Internet contains less than half of that. I alone have over a terrabyte of shit downloaded from the Internet. I seriously doubt there is only 229 more terrabytes to download.
Why should we be afraid of Wal-Mart? They're using their data to be more responsive to their customer. They want to make sure that if you want something, it's in-stock and ready to go.
What could they do with their data, really, that would hurt anyone? It wouldn't be like "Bob Smith is buying condoms again." It would be more like "there's a condom spike in area code 78750 every Thursday, let's ship more out."
People who are afraid of data aggregation are jumping at shadows. Nobody cares what you in particular are buying. An individual as a data point is useless, unless you're an exemplar or something like that (which would be unusual).
Let's face it, individuals just aren't that interesting. More importandly from Wal-Mart's point of view, there's no return on looking at individuals.
"With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
RFC 1925
Yes. However this data will be surrendered to authorities conducting a criminal investigation. Case in point: There was a case earlier this year that involved a criminal doing business using a payphone with an AT&T calling card. AT&T was able to track the point-of-sale of the calling card to a particular Wal-Mart (months after the sale). Walmart used the barcode provided by AT&T to get a time and date (and register) of purchase. Wal-Mart then hits its massive security camera archive to see our suspected felon purchasing the card. He was Id'd and apprehended within a week.
Xenon, where's my money? -Borno
The internet has well over a petabyte of data in it.
It has far less actual information...
A friend who worked briefly @ a local walmart during the downturn in tech employment told me about the huge datacenters. Evidentially he was told this in training, or a manager filled him in. Basically they are an IBM shop from what he said.
The systems have the layout of every walmart store in them, and the stores respond to orders from the main office to move products around on the shelves. The systems will tell various stores to move products into different places, and anaylyze the results. If a store is making more money with XYZ sitting near the entrance, then the WOPR tells more stores the move that product into place, but still plays games against shoppers with a few more. It's basically an insanely well oiled statistical war against the shoppers to squeeze every last penny out of them. I hate to say it, but it doesn't work on me when I go there. But overall, it's creepy, and impressive at the same time.
PS- I had this evil idea. If anyone is into the hactivism role, embed a voice recorder IC into a telephone set that matches your local WalMart's phones. Get the code to get on the PA system, and setup your "rouge" telephone to bump onto the PA every 5 hours or so. Be sure to include sounds to make it sound like someone is picking up the phone, and hanging it up. It will drive them nuts. Some stores seem to use Lucent sets on the wall (MLX-xxx) which are most likely ISDN on the back. Other stores seem to have analog ports on a lucent system. Just remember to give me props. Feel free to announce all shoppers a winner of a contest where they get everything they can stuff into a cart for free. Or remind them about the $700,000 in taxes the minimum wage making people cost the community at every WalMart.
Southeastern Virginia REPRESENT!
Walmart has been doing this for a long long time. One of the things they discovered is that people who buy diapers usually also buy beer (in states where walmart can sell beer), and vice versa. So, they moved the beer and diapers to the same aisle, and ended up increasing their sales by like 7 times on both of these items.
Virtually everyone who keeps track of this sort of thing is looking for their own beer and diapers revelation. I used to run a data warehouse which tracked the paths users took through websites in order to lay them out better to increase revenue on ads or purchases. Mine only had 6TB of data though.
Target has been getting quite good at this, since it seems everytime I walk into their store to buy one little thing, I walk out of there with a cart full of crap I didn't really need but thought would be nice to have.
Need Free Juniper/NetScreen Support? JuniperForum
'Wal-Mart has 460 terabytes of data stored on Teradata mainframes, at its Bentonville headquarters. To put that in perspective, the Internet has less than half as much data, according to experts.'
Apparently the "experts," overlooked alt.binaries.*
https://www.eff.org/https-everywhere
Look at suprnova.org. The number of unique data sets is the number of torrents. They don't publish the total size of all torrents, but suppose you have an average 300 MB. Multiply by the number of torrents (bottom of page), and you get about 100 TB right there.
If instead you look at the number of seeders, it is like 2 PB, just not all unique.
Actually the grandparent is correct. Walmart puts so much pressure on their suppliers to actually drop prices every year (inflation is for sissies) that they drive small manufactures out of business. Not to mention the small businesses that it suffocates. There are towns that literally shop themselves out of a job. Heck. Walmart singled handedly put Vlassic in bankruptcy by forcing them to sell a gallon of pickles for $2.97 dollars. This is a facinating article about why we should all boycot the place.
"I can not bring myself to believe that if knowledge presents danger, the solution is ignorance" - Isaac Asimov
I graduated from the Sam M. Walton College of Business at the University of Arkansas with a B.S.B.A in Information Systems. Wal-Mart was nice enough to donate a big chunk (~1 Terabyte) of information for us to datamine. It's pretty interesting stuff and very CPU intensive, as you can probably imagine; we tried not to do any CD burning while waiting on our results ;)
IIRC, It seems like one of the strange correlations we found is that the two items most commonly purchased together were beer and baby diapers. Go figure...
First of all, most Walmarts don't primarily sell food
Super Wal-Marts sell groceries. You see those in places like Florida. I was in Orlando and it was frustrating the simple fact that there was no where else to buy groceries where I was at. Ok there was a Win Dixie just across the parking lot, but its prices were insane and the quality of the produce was not so good. There were other grocery stores and a Costco but all were about 15 miles away. Trust me I did my best to stock up with Costco goods but for staples like milk, bread, eggs Wal-Mart was the only practical solution.
Regular Wal-Marts I don't believe sell groceries. I don't honestly know because I don't shop there. Super Wal-Marts have a very respectable grocery.
There is no sanctuary. There is no sanctuary. SHUT UP! There is no shut up. There is no shut up.
Brandybuck's Law states "the collective inteligence of an organization is inversely proportional to its size." There's a lot of reason for this, but it's a genuine observable phenomena. Just ask anyone who's been in the military.
If it's "a genuine observable phenomena [sic]" then surely there are scientific studies documenting those observations. Please point me to one because I'm currently under the impression that "Brandybuck's Law" is complete nonsense or just a funny observation from a frustrated corporate "human resource." (I can relate.)
If a law needs only one contradictory observation to prove it wrong, I offer the following:
I've always viewed Novell's products as technically superior to Microsoft's products. Novell, Inc. is also smaller than Microsoft, Inc. But Microsoft is a much smarter corporate player/criminal than Novell so they dominate their market. Novell tends to make stupid marketing and strategy decisions, as well, therefore the smaller-equals-smarter theory is disproven.
- Hail to our fearless misleader! Fool speed ahead!