Slashdot Mirror


Data Mining Briefly Explained

handy_vandal writes "Time.com has published an interesting article on data mining." Note the prominent sticker ;)

50 of 119 comments (clear)

  1. Uhhh... by Grip3n · · Score: 2

    Note the prominent sticker ;)

    Doesn't he mean "snicker"? ;)

    --
    To make a pun demonstrates the highest understanding of a language
  2. The Real Key is People.... by airrage · · Score: 4, Insightful

    I think every major corporation has some sort of data-mining, and I find that there is a gap between the data (even scrubbed) and the person who needs to make the decisions. Also, the article suggests, that CRM is a subset of data-mining. In reality, it's the other way around, or completely unrelated, or both, unless I read that sentence wrong.

    Chao

    --
    "This isn't a study in computer science, its a study in human behavior"
  3. you'd be amazed... by inode_buddha · · Score: 4, Funny

    at how powerful data mining tecniques can be. Why, just today I have recieved 3 more "Nigerian" mails, an offer to increase my bust size (I'm a guy), and an excellent credit report from 5 different, unheard-of companies...

    Of course, the local supermarket cannot accept my personal check for groceries without their "discount card", never mind that it was *their* database admins who lost my account after a few weeks...

    (er, yeah right, and my driver's licence and birth certificate aren't worth as much as their card ??)

    Ggrrrrrrr......

    --
    C|N>K
    1. Re:you'd be amazed... by geekoid · · Score: 2

      " an offer to increase my bust size (I'm a guy)"
      Then I take it your wife is getting the emails to increase the size of her penis!

      thank you, I'm here all week!

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  4. Data Mining Briefly Explained by hdparm · · Score: 4, Funny
    Briefly? This would be briefly:

    1. Collect data

    2. Do some mining

    3. ???

    4. Profit!

    1. Re:Data Mining Briefly Explained by Lucas+Membrane · · Score: 2
      You have hit the nail on the head. The ??? is the problem. The link or leap between knowledge and action is the hard part. Data mining can 'identify' 'profitable' and 'unprofitable' customers, but it can't tell you if your expense and profit allocations are right or if you should want to 'get rid' of 'unprofitable' customers or should want to try to turn them into profitable customers.

      The classic data mining result is diapers and beer. People who buy beer at convenience stores are also likely to buy diapers. Great. Given that bit of intelligence, do we:

      1. Put diapers and beer in close proximity so that people who buy diapers can easily pick up beer and vice versa, or
      2. Put diapers and beer at opposite ends of the store so that people who buy both diapers and beer must travel through the store and have a chance to buy everything else?
      The data seldom tell you what to do. Taking the data too seriously leads to treating customers like numbers, predictable statistical entities to be manipulated for profit's sake. This is not healthy for most businesses. Most of the important things that the data tell you, you could learn better by simply listening to customers respectfully.
    2. Re:Data Mining Briefly Explained by Exantrius · · Score: 2

      Actually, step three could be explicated as:
      3. Sell derivative information to people who want it, i.e. the people you *DON'T* want to have it.

      This includes, as others said, life insurance companies teaming up with grocery stores to find out what you eat, thus raising rates for people who eat "bad" stuff.

      Or phone spam companies buying info from phone companies-- Consumer A contacts consumer B, and A bought our stuff, therefore you should call B.

      Or, perhaps radio stations selling the numbers of people who request songs to the Wherehouse, so the Wherehouse can call you and say that you can buy the cd.

      Or, maybe the police decide to track where you go by reading license plates off of each of the cameras that they have up to detect speeders or light runners.

      Just some thoughts. This isn't a joke-- They know exactly how to get money from mining-- It depends on what data you have to who you can sell it to. Noone buys data for no reason-- And the only two reasons to buy data is to target for selling other stuff, or to "find people who don't want to be found"-- Whether it be to find terrorists, criminals, or theoretically people that make x hundred thousand/million a year, so that they can rob you.

      Of course, most of this stuff happens every day, and noone realizes. /ex.

    3. Re:Data Mining Briefly Explained by scubacuda · · Score: 2
      Based on my experience, people who likely to buy beer and liquor are much more likely to buy toilet paper than diapers... :b

  5. Well.. so? by metlin · · Score: 5, Interesting

    Interesting article, but this is something that has been happening and will continue to.

    Technology being put to use to seek out enemies of the state for the world governments is nothing new.

    Atleast it is a good thing that companies are making good money in the process. Your privacy? That was lost long ago.

    It was only a matter of time before this happened. Atleast be glad that we've not yet reached the stage where they'd bother having your entire genome sequence to create solutions and replacements for you :-)

    Perhaps the author of the article has just read Cryptonomicon or something.

    Get over it, companies will track you, governments will monitor it. And there will be people who will beat both, and people who will be susceptible to both. Unfortunate, but hey, paranoia does not help either.

    And oh, first post?

    1. Re:Well.. so? by symbolic · · Score: 3

      Atleast it is a good thing that companies are making good money in the process. Your privacy? That was lost long ago.

      Oh, the irony.

      They call themselves patriotic, and yet they're supplying the very means that are slowly turning the U.S. into a police state. Sorry, but I seriously doubt that this is what the U.S. founders had in mind, and it's certainly not the reason that U.S. war veterans both risked and sacrificed their lives. Patriots aren't sheep that blindly follow the government, they are the ones who fight to maintain the fundamental (constitutional) precepts upon which the United States were built.

  6. Reminds me of... by gpinzone · · Score: 5, Interesting

    ...how the Bayesian spam filters operate (on a much smaller scale). They find predictors of "spam" like these guys find predictors of "terrorists."

    If the false positives of this system finding terrorists are as low as the ones that identify spam, is it really unreasonable to consider that probable cause for an investigation? At least, until the 0.000001% slips by and causes a lawsuit for wrongful arrest.

    1. Re:Reminds me of... by Anonymous Coward · · Score: 2, Interesting

      With a spam filter, the penalty for false positive is perhaps a lost sale or an annoyed friend/coworker.

      With a terrorist classification filter, the penalty for a false positive could cost some innocent person days/weeks in prison and thousands of dollars in lost wages and legal fees. And thats assuming they are a US citizen. A non-citizen could be held indefinitely complely destroying any career they might have.

    2. Re:Reminds me of... by gpinzone · · Score: 3, Interesting

      Yes, but remember that the current methods aren't much better. I mean, right now there's lots of complaints about how the USA is racially profiling Middle Eastern men. Whether or not this profiling is justified could be based on a report of such a filter.

      The issue isn't whether or not we should use data mining to profile individuals or groups. Profilling will occur no matter what. What these methods do are help find parameters that more accurately identify candidates rather than just assume all Middle Easterners are automatically guilty until proven otherwise.

  7. to what end? by loveandpeace · · Score: 2, Interesting

    the more i read about data mining, the more it seems to provide a conectinvity and interaction leap, a step we are really due, in a technological sense. when the internet was new and all (shortly after Al Gore invented it), there was much talk of how Big Brother would swoop in and turn us into ones and zeros, monitor our every move, and control us through the new portal. that hasn't happened yet (though Ashcroft is trying). doese it seem that data mining is more harmful (making us all into terrorsts for buying fireworks and seeing born on the fourth of july in the same day) than good (allowing better prediction of supply and demand to lower costs and raise productivity)?

  8. profiteering? by SHEENmaster · · Score: 5, Interesting

    Today, however, companies that excel in connecting the data dots are finding a lifeline in a customer whose IT ineptitude is matched only by its means: the U.S. government, which will spend $53 billion on information technology this year. The Federal Government's inability to share and analyze information became clear in the months after the 9/11 attacks.

    While I want argue against the governments inability to do anything but waste money, I do think that these "anti-terrorism" dealies are going too far. We know that they are spending $53 billion on information technology. When they spend it on a hammer or a toilet seat I know that something is getting done, but "information technology" makes me suspicious.

    Granted my opinion is largely a result of window flags selling in excess of twenty dollars and not hearing the results of such spending. In fact, I haven't heard of a single terrorist act averted since 9/11. It couldn't hurt to inform us when the spending pays off; could it?

    Is this information actually getting results, or is it just profiteering of the corporations that we so love to slander and libel?

    --
    You can't judge a book by the way it wears its hair.
    1. Re:profiteering? by acidfast7 · · Score: 2, Informative
      In fact, I haven't heard of a single terrorist act averted since 9/11.

      With the current sensationalized state of mass media, would one hear of a terrorist act if it was avoided?

    2. Re:profiteering? by RDPIII · · Score: 2, Insightful

      It couldn't hurt to inform us when the spending pays off; could it?

      But would you believe it if your government told you "23 terrorist plots foiled this month"? They probably couldn't be more specific than that, and without any details or corroboration, who's to say. I'm all for openness and accountability, but if it's unlikely that one would get these here (there are better areas for this, like public health care), then I can do without monthly statistics that one would have to take on faith.

      In Soviet Russia official statistics were made up all the time, and dismissed just as often or more.

      --
      Marklar: marklar
  9. Not that it helps by Alien54 · · Score: 2
    Noting all of the ways certain monopolies have acted illegally has not helped in getting appropriate penalties for them in court.

    data is useless by itself unless it can be used appropriately.

    sort of like the list on conservative site NewsMax that finds that the vast majority of truly corrupt politicians in the past year were democrats. What a coincidence!

    What are the odds of finding out more things like this, like at the office of Total information Awareness? Or the Transport Security Agencies list of people who cannot fly

    --
    "It is a greater offense to steal men's labor, than their clothes"
  10. The data gnomes are stealing my data! by SHEENmaster · · Score: 2

    Why doesn't anyone else see them!?

    --
    You can't judge a book by the way it wears its hair.
  11. Print Link by VargrX · · Score: 4, Informative

    dunno 'bout any one else, but I don't care for all the ads...
    Print Link

    --
    Sometimes people just have to learn and adapt to change, it is one of the requirements of being a living thing.
    1. Re:Print Link by echucker · · Score: 2

      But at least with the revised print link, cnn.com doesn't get credit for the referral. If you still want the pretty picture, use http://www.time.com/time/globalbusiness/article/0, 9171,1101021223-400017,00.html, leaving off the original ?cnn=yes portion.

  12. Already used in mineral exploration by core+plexus · · Score: 4, Informative
    We've been using data mining in mineral exploration for quite some time now, and it really helps given the tremendous volums of data generated from modern geophysical, geochemical, and geological exploration.

    In related news: Seeking Sperm, Not Sex, Online

  13. Before You Jeer... by robbyjo · · Score: 3, Informative

    You may want to read this book and see it yourself whether data mining would make a breakthrough in the future.

    --

    --
    Error 500: Internal sig error
    1. Re:Before You Jeer... by arasinen · · Score: 2, Interesting

      Another good book that explains the basics of data mining is Principles of Data Mining by Hand et al.

      It is perhaps not the most simple book around, but it covers a lot of important issues. Furthermore it doesn't ignore the role of computer science, as two of the authors have a CS background.

      You won't find explicit instructions about how to build your own Google, but it surely does wonders for your insight.

      --
      [ Antti Rasinen ]
  14. *sigh* by Chester+K · · Score: 3, Funny

    Ok let's get this out of our system now:


    Imagine a beowulf cluster of these things!....mining...data... yeah.

    In Soviet Russia, data mines YOU!

    It's official, Data Mining is DEAD. You don't have to be Kreskin to figure it out.

    Hey! I just found this site all about data mining here!!!!!

    Come on, really, is this News for Nerds or Stuff That Matters?

    You could probably use data mining to determine how many hot grits Natalie Portman actually eats.


    Alright. That should do it. Carry on with the discussion.

    --

    NO CARRIER
  15. Makes me think of Bowling For Columbine by flopsy+mopsalon · · Score: 2, Interesting
    I couldn't help noticing the Time.com article made reference to crime and terrorism, particularly the September 11 WTC/Pentagon attacks (which happened over a year ago), and to the recent Washington Sniper killings (which ended months ago), in spite of the fact that this article would have been jst as fascinating if they had simply used the business examples as illustration.

    In the movie 'Bowling For Columbine' Michael Moore speculates that one of the root causes of gun violence in the US is the type of fearmongering the US media engages in in an effort to keep their sales/ratings up.

    It looks like Time.com's gratuitous exploitation of US fears of crime and terrorism might be an example of this.

    1. Re:Makes me think of Bowling For Columbine by BWJones · · Score: 2

      I couldn't help noticing the Time.com article made reference to crime and terrorism, ....in spite of the fact that this article would have been jst as fascinating if they had simply used the business examples as illustration.

      Sure, fear sells lots of stuff. MRE's, guns, ammo, radiation pills (iodine), bomb shelters etc.... The thing that people should realize with data mining software though is that its application to terrorism and consumer tracking is new but the technology is not. In fact, people have been using it in remote sensing to prospect for gold and oil among other things from space, it has been used since the late 70's to interpret satellite images for the CIA and NRO, it has been used for psychological research etc...etc...etc... and I use a form of it for retinal research. What should not happen with the fear mongering is that the technology be given a bad name from those who want to abuse the technology. Like many technologies, data mining is a tool that can be mis-used, but its application can also do tremendous good.

      --
      Visit Jonesblog and say hello.
  16. Open Source DateMining! by cosmosis · · Score: 4, Interesting

    Ok, I've been annoyed for years at the disparity between corporations and customers in who knows what about who. I think its time someone came up with a P2p, open source, reputation system in which we can turn the lens of datamining back on them. Technologies like Cuejack combined with the efforts of groups like Transparency International, can help bring about Participitory Capitalism.

    Power to the people!

    Planet P Blog - Liberty with Technology.

  17. Data Mining as used by Colombian Drug Cartels ... by Anonymous Coward · · Score: 4, Interesting

    Here is a real life story about data mining and its potential for brutal consequences. This was a very early application. Those who were fingered were killed. Of course, they adopted our new (lack of) due process rules a decade ago...

    http://www.business2.com/articles/mag/0,1640,412 06 ,00.html

  18. KnowledgeMiner 5.0 software for Mac OS 9. by alchemist68 · · Score: 2, Informative

    can be located here:

    http://www.knowledgeminer.net/

    I've thought about using this software to analyze stocks to purchase, but never got around to looking at the information required for the software to give me an edge in the market. Looks promising though.

  19. Objection to the numbers by rootmonkey · · Score: 4, Informative

    The article use NASDAQ as an example of having to process terabytes of data on a daily basis and the data mining software can help filter things out. The software may be useful but NASDAQ does not process terabytes per day of incoming data. I work in the market data industry and we take exchange feeds from around the world including NASDAQ and we don't process close to that much. OPRA (options) have the most data per day and that is only in the order of tens of GB range.

    --

    Yes but every time I try to see it your way, I get a headache.
  20. Data mining companies by MrWa · · Score: 2, Interesting
    So "Data-mining companies have been among the hardest hit in recent years" is claimed by Time.com, which goes on to use MicroStrategy as a prime example of a company that skyrocketed in value and plummeted in the "tech crash" later. Oh, and by the way, they also overstated earnings. What these articles about the "tech crash" need to do is normalize the comparisions, because these companies that balloned in value so much, then crashed, probably just experienced a slight correction due to the stupid values they attained to begin with!

    As for datamining itself: more power to them. The government gaining the ability to mine the data it already have should mean that we don't need more organizations, more intrusive investigations, etc. Every report or credible news item about post-9/11 studies indicates that we already had enough information, so there should be no need to create new laws that allow for more information to be collected. Just use what you have already, kthx.
    What would be nice is if this data-mining allowed Muslims living in the U.S. to stop having to wrry whenever they go outside. Look at the information publicly available, that may provide patterns of "nonobvious" connections, and let people live thier lives in peace, regardless of background.

    As a consumer, everything I do in public I consider public information. If a business uses this to better serve me, all the better. Maybe this will mean I don't have to watch feminine ads on TV, or the phone gets answered faster when I call. Maybe it just means that the customer rep knows my name and what I bought already.

  21. Digging For Autism Correlations by Baldrson · · Score: 2, Interesting
    If you look at closely at autism statistics, you'll notice it has a lower average correlation with all other statistics than 95% of the variables normally available to epidemiologists.

    So, I decided to mine almost 200 by-State demographic variables for correlates to autism by running through every combination of 2 variables via multiplication or division under a polynomial, exponential or null transformation -- then sorted them by their correlation to autism in the year 2000.

    This is a case where what was "mined" was not just the raw data but various arithmetic combinations of statistical variables derived from the data. There needs to be some additional work to make the figure of merit, not just correlation but statistical significance. I couldn't find Perl modules that provide "alpha" (probability the null hypothesis is true) for correlations.

  22. Uber Loyalty Card in the UK (Nectar) by Boss,+Pointy+Haired · · Score: 5, Insightful

    Three large British retail companies have recently created a joint loyalty card.

    Nectar has been set-up by Sainsbury's (a supermarket), Barclays (a financial services company) and BP (a petrol filling station company).

    I didn't mind Sainsbury's knowing that I eat junk, but now that they're telling Barclays what junk I eat I end up with Barclays putting my life insurance premiums up.

    Interesting stuff.

  23. don't forget NIH by Anonymous Coward · · Score: 3, Interesting

    At the end of the article, it mentions data mining helping to catch the DC snipers. Whoooooooa.

    The cops had profiled a white male Christian terrorist, and that's all they were looking for. You didn't catch the article, but the real perps were stopped **10** times at roadblocks, they were in custody that many times.

    And they were let go, their skin color contradicted what the data mining told them. They weren't caught until a Maryland state trooper leaked the license plate, then a trucker at a rest stop made the collar.

    Data mining won't solve the stupidity of leaders like Chief Moose.

    1. Re:don't forget NIH by vrmlguy · · Score: 2
      Did you read (as opposed to glance over) the article? Data mining was *NOT* used during the DC sniper case, only after the fact:
      The system was set up in Montgomery County, Md., only a day before the arrests were made, so it did not play a role in solving the shootings. Working through the hundreds of thousands of leads that were entered into various police computer systems, however, Coplink noted that witnesses reported seeing John Muhammad's blue Chevrolet Caprice near two of the Washington-area shootings, and local police ran computer checks on his license plate at least three times during the killing spree.
      The profiling was done entirely by humans, with no computer assistance.
      --
      Nothing for 6-digit uids?
  24. Plots that have been averted... by MyNameIsFred · · Score: 5, Insightful
    ...I haven't heard of a single terrorist act averted since 9/11...
    You haven't been paying much attention to the news have you. Let's see, we had the plot to attack ships in the Straits of Gilbrater that was averted, the possibly overblown Jose Padilla - Dirty Bomb case, and the capture of key operatives such as Abu Zubaydah, which surely put a dent in al-Qaida's plans.

    Frankly the problem is attacks such as the Twin Towers are always going to stick in your mind more than a brief news report that Abu Zubaydah was captured. Also there is always more skepticism that capturing some guy actually averted a plot -- see Jose Padilla. We will never know whether he would have actually done something. There will always be second guessing on whether a plot was really averted.

  25. Data Mining is the wrong term by nrobert · · Score: 2, Interesting
    Ther term data mining is misleading. Mining is more a matter of sifting through lots of junk to get at the valuable material. That's not exactly what 'data mining' is about.

    If you want valuable information and you know what you're looking for, you just query. Find X in pile of data. That's mining. I know it's a semantic comment, but mining's not what we're talking about doing here.

    Data mining is more like what geneticists searching for a genetic cause for a cancer are doing. Finding usable correlations and meaningful precursors. We don't call cancer-fighting biologists 'gene miners'. I think the term mining belittles a more complicated activity.

    A better term? Data Correlating? Mining also just sounds brutish.

    --
    --- Programmers do it with their digits!
    1. Re:Data Mining is the wrong term by geekoid · · Score: 2

      No, its mining, not coralating.
      If I have a cube of date, I can find things outside of how the data is orginazed.
      Data mining is not finding X in data, its finding X in data when X isn't nessarily an hard value.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  26. I don't believe by SHEENmaster · · Score: 2

    that any true Christian could do this anymore than I believe a true Jew or a true Muslim could have done it.,

    mod parent up

    --
    You can't judge a book by the way it wears its hair.
    1. Re:I don't believe by Xerithane · · Score: 2

      that any true Christian could do this anymore than I believe a true Jew or a true Muslim could have done it.,

      No, of course not. Because we all know that the most wars have not been fought in the name of opposing religious beliefs...

      --
      Dacels Jewelers can't be trusted.
  27. The problem with automatic identification by Sgs-Cruz · · Score: 2, Insightful

    The problem with automatic identification of any specific type of person within a large group (Say, the entire U.S. population - or , hey, the entire world! Why not? ) is the obscenely low false positive rate you must have. I mean to identify 100 terrorists in 270 million people, sure, a 50% false negative rate is fine (catching 50 terrorists is better than catching none, right?), but to not get those real terrorists swamped by innocent people who happen to match a profile, then the false positive rate must be lower than about 0.000037% ... that's almost impossible to achieve. And that is why automated terrorist (or anything) identification is still a long way off.

    --

    Karma: pi (Mostly due to circular reasoning in posts).

    1. Re:The problem with automatic identification by nrobert · · Score: 2, Interesting

      I'm not sure the goal is to have the miner spit out names of confirmable terrorists with that kind of accuracy. You're comment is fair if you're looking for that kind of entirely automated solution, but that's not the goal. It doesn't need to be 100% accurate in order to mitigate risk and pay for itself. Neither does the J Crew web site product predictor.

      The goal is definitely to help single out people that are worth further investigation. By motivated, thinking, observant humans. That's all.

      I also think you might be a little bit reductionist in your estimate of 100 terrorists. It's quite possible that there are many more, though I suppose it doesn't matter because even if you're looking for just one person, it's still worth doing.

      Given that you're looking for a reasonably good filter to find qualifiers for a round of investigation, a better metric to use might be the number of people you're willing to investigate as a ratio against those you hope to positively I.D. You might argue that you'd be happy to investigate 5,000 people just to find one 'terrorist'. If so, and you're looking for an estimated 100 terrorists, you can multiply to get the number of 'persons of interest' of 500,000 or .19% of the USA population. This % is much more achievable, and besides, then you use a different algo to ID which of these you should interview first or do MORE research on first.

      It seems pretty managable to me. I also think your assessment of the 50% false negative rate is too rosy. It seems to me that the risks would be serious enough of even 1 getting away (as in scanning baggage for instance) that you'd want to cast the widest net possible and then narrow those carefully. False negatives may be more costly than you are suggesting.

      --
      --- Programmers do it with their digits!
  28. The Beast by macdaddy357 · · Score: 3, Interesting

    Does this data mining stuff remind anyone of the old urban legend about "The Beast?" A super computer in Antwerp of Brussels that knows everythin about everyone? Is that idea still as ridiculous as it was back in the day?

    --
    How ya like dat?
  29. Re:WHAT?!? by kfg · · Score: 2

    A small chest.

    KFG

  30. Sauron commands you by SHEENmaster · · Score: 3, Funny

    to murder all Harry Potter fans!

    --
    You can't judge a book by the way it wears its hair.
    1. Re:Sauron commands you by Xerithane · · Score: 2

      Sauron commands you to murder all Harry Potter fans!

      Damn man, I don't need Sauron or anyone to command me to do that...

      --
      Dacels Jewelers can't be trusted.
  31. Re:an advertisement for privatization of security? by scubacuda · · Score: 2
    Not that I advocate insulting "policemen, firemen, or military officers"...

    ...but I'd say that the difference is that these people are on the "front lines", so to speak. I'd rather have an IT job where I can surf /. on my spare time rather than have to investigate shootings, put out fires, or make strategy decisions that could potentially costs millions of lives.

  32. yep, me.. by geekoid · · Score: 3, Funny

    ..and six other dwarfs grab are pickaxes, and lanterns, and go to the data mines.
    those 1's and 0' can be tricky..

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  33. That's not data mining! by djkitsch · · Score: 2

    Software developed by Autonomy, based in Cambridge, England, connected BAE's research databases and alerted civilian aircraft engineers to the fact that the wing-construction problem they were working on was also being addressed by the company's military division.

    That's not exactly a task for data miners - it's just bad communication! They could have done exactly the same thing just by making sure the directors were paying attention...there seems to be a big market for telling people the perfectly obvious.

    --
    sig:- (wit >= sarcasm)