Slashdot Mirror


Data Mining Goes 3D

Roland Piquepaille writes "At Sandia National Laboratories (SNL), a data mining and visualization software suite developed in the last two years is now able to extract information from many sources of data and to return 3D images as results. In Sandia's intelligence lab converts business data into 3-D images, the New Mexico Business Weekly reports that Sandia's Information Visualization Lab is able to search structured documents, such as scientific journals, or unstructured ones, such as the Web or an intranet. Since the lab has been established five months ago, this software has already been used to determine the potential of several partnerships with SNL. Other firms, such as Lockheed Martin, also are starting to use the lab. Let's hope that SNL releases this software as open source. It should be fun to use it. For more details and pictures, please read this overview."

23 of 79 comments (clear)

  1. 3D images? by Dorothy+86 · · Score: 2, Funny

    I think they should report it as music!! (if you don't get the reference, it's from Dirk Gentlty's Holistic Decective Agency by Douglas Adams)

  2. Oh god! They've discovered... by lxt · · Score: 3, Funny

    ...Excel and PowerPoint! The nightmare has been unleashed!

    "In Sandia's intelligence lab converts business data into 3-D images," ...ie, really dodgy pie charts and bar graphs!

    1. Re:Oh god! They've discovered... by LostCluster · · Score: 2, Funny

      Let's just hope they don't stumble into Microsoft Access next....

  3. 3-d data mining.... by drfrog · · Score: 5, Informative

    is over 5 years old already

    google search

    people have been doing real time data mining in VRML since the vrml2.0 plugins came out back in 97

    --
    back in the day we didnt have no old school
  4. HollywoodOS by Xpilot · · Score: 4, Funny

    Just think, if Hollywood bothers to at least try and get some technical stuff even remotely realistic (and look cool), they could incorporate such things into movies. But no... we get a fusion reaction which you can control with metal tentacles (just push the little flames back in!).

    --
    "Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it." -- Linus Torvalds
  5. How much? by DAldredge · · Score: 5, Insightful

    How much will the license this for? I know the taxpayers paid for it, but it always seams like it gets exclusivly licensed to some company for next to nothing then that company charges the people that paid for it in the first place a lot of money to use it.

    1. Re:How much? by orthogonal · · Score: 4, Insightful

      Sandia's intelligence lab converts business data into 3-D images

      I know the taxpayers paid for it, but it always seams like it gets exclusivly [sic] licensed to some company for next to nothing then that company charges the people that paid for it in the first place a lot of money to use it.

      You're a wisely cynical man.

      In the light of the 9/11 Commission's report of the multiple failures of the CIA and FBI that allowed the terrorists to attack us in 2001, in the light of Sibel Edmonds's allegations that the FBI intentionally destroyed translations of intercepted terrorist conversations, in light of the Senate Intelligence Committee's report about systemic CIA failures to provide accurate intelligence about WMDs in Iraq, why am I less than thrilled to discover that Sandia National Laboratories' businesses?

      When I further learn that "Sandia officials say tech firms or venture capitalists can use the lab on a per-request basis," I begin to understand that Sandia's Corporate Business Development and Partnerships aren't using my tax dollars to protect me, they're providing corporate welfare by dong the Research and Development that business wants but doesn't want to pay for.

      Remember, these are the same businesses that vociferously object to government programs that might compete with them, whether that's sponsorship of Open Source Software or rural electric cooperatives or IRS software that might be efficient enough to cost H&R Block. These are the same corporations that got a provision added to the Medicare Prescription Drug Bill to prevent the government from getting discounts by buying those drugs in bulk, but which profit from research funded by the National Institutes of Health.

      These are the same corporations that want Ashcroft's Department of Justice to stop worrying so much about fixing the FBI's failures, so it can spend government time -- and your money -- prosecuting civil -- civil, not criminal -- suits against file traders under the PIRATE Act on behalf of those corporations. If you need to sue a corporation, you're on your own; maybe you'll get some coupons out of a class action suit. But if the corporation wants to sue you, they get the assistance of top government lawyers and FBI agents packing guns and warrants.

      And this just after the U.S. House passed the biggest corporate tax cuts in twenty years, because existing direct subsidies -- or less politely, corporate welfare -- will no longer be permitted under World Trade Organization rules. Even House Republicans admit this tax cut "is riddled with special-interest provisions that would further complicate the tax code, send jobs overseas and worsen a federal deficit already at record highs."

      Does anyone really expect Sandia's going to release the source code to the data mining software to us, the citizens who have to pay for it?

      Be proud, Americans, of how fat your labor makes your corporate masters! What a joy it is to serve them! It is your privilege to work long hours and pay high taxes so your masters can buy their yachts -- and buy the laws that enslave you.

      America, Of the People, By the People, for the Pe^H^H Corporations

  6. More important than the capability... by DeepDarkSky · · Score: 3, Insightful

    Is having the knowledge, experience, and creative talent to know how to use the capability to design meaningful and easy to understand data visualization. Anybody can be an Excel monkey and drag and drop charts and graphs, but it doesn't mean they'd make sense. Leaping to 3D is not a panacea for data mining visualization, but the potential is certainly there.

  7. OSS can't be everything... by LostCluster · · Score: 3, Interesting

    Come on.... Let's hope that SNL releases this software as open source.

    Wouldn't the work of a government-funded national lab be public domain if it ever were to be released?

    As great as OSS is, the only truely free license with absoultely no restrictions is public domain, and that's what works of the government usually become.

    1. Re:OSS can't be everything... by wintermute42 · · Score: 2, Informative

      Wouldn't the work of a government-funded national lab be public domain if it ever were to be released?

      As far as I know the Department of Energy labs, which include the Sandia labs, Lawrence Livermore, Los Alamos, are all managed by contractors. The contractor does work for the government, but frequently maintains co-ownership with the government for the work performed.

      I have worked with commercial contractors that worked under similar arragements. The customer paid the contractor for software development work, but the contractor also owned a copy, which tbey could sell to others. Only work that was explicitly identified as proprietary was exempted from this. Some consulting companies, like Wind River, in its early days, have built a significant amount of intellectual property following this model. Once they build up a software base they have a competitive advantage in licensing it for new applications. The fact that some software can be provided "off the shelf" rather than developed provides an incentive for the prospective customer to agree to co-ownership.

      The organizations that manage the national labs seem to take a similar approach. They also own much of the intellectual information they develop. Release of software into the public domain at the University of California managed labs requires a review by a UC office that is in charge of licensing.

  8. Light on details... by TheQuestion · · Score: 5, Insightful

    I wish this story went into more details into the algorithms used. Saying stuff like "we take tons of data and out comes a 3D image" is great, but what does the 3D image actually represent? What are the dimensions being graphed?

    My company manages a very large portfolio of auto loans. I'd like to know more details as to what they are actually doing so that I can judge whether we can use this technology or one like it to predict trends in our consumer base, or to develop better scoring models.

    1. Re:Light on details... by Coryoth · · Score: 4, Informative

      I wish this story went into more details into the algorithms used. Saying stuff like "we take tons of data and out comes a 3D image" is great, but what does the 3D image actually represent? What are the dimensions being graphed?

      If I had to guess I would guess that they are doing 3D Self Organizing Maps, or something very similar.

      The principle is: create a huge feature space for the documents in question (something like word counts for each document for each word in the corpus, with appropriate fixes (drop the most and least common words, do stemming etc.). You can now "visualize" the documents in a massive 20,000 dimensional space. However, what you can do, is try to create a projection from 20,000 dimensions down to 2 or 3 dimensions in a way that best preserves distances in the 20,000 dimensional space. This automatically creates a clustering of the documents as well, and you now have something that you can visualize practically. If you start doing things like labelling clusters and subsclusters by the words unique to/defining that cluster you can start to make some sense of the visualisation.

      Effectively this is just a means of doing clustering on a large document space in such a way that the final output can be visualized (instead of the sort of results you get from k-means, or heirarchical clustering, which are a lot harder to visualize in a meaningful way for laymen). The benefit of being able to visualize it in that sense is that you can "see" patterns of other document attributes by adding that to the visualization (via colors, labels, etc.) and see a global overview of those attributes across the entire document space.

      Just to reiterate: I do not know that this is what is being done, and they don't say a lot in the article, but I do have some experience in this field, and what I gleaned from the article would tend to imply an approach like this.

      Jedidiah.

  9. for the trees by Doc+Ruby · · Score: 4, Funny

    The technology is called "ClearForest", in homage to the continents of forests cleared for paper printouts of these 3D reports that PHBs will have shredded once they've "read" them.

    --

    --
    make install -not war

  10. ok let's move to piquepaille blog by clarkie.mg · · Score: 2, Insightful

    Wow almost every story from Roland Piquepaille is selected into slashdot.

    --
    Men are born ignorant, not stupid; they are made stupid by education. Bertrand Russel
  11. MacSpin by Anonymous Coward · · Score: 3, Informative

    MacSpin was a 3-d data mining tool that is over 16 years old now.

  12. paper? by cyklo · · Score: 2, Funny

    for 3D, they're going to have to carve them out of entire trunks. imagine the shredders you'd have to use...

  13. 3D data visualization by Coryoth · · Score: 3, Informative

    Anyone interested in doing powerful 3D data visualization should make a mandatory stop here. It's an open source visualization toolkit written in C++, but with bindings for Java and Python as well. This is a very powerful and very impressive system, and ought to be rated as one of the great open source projects. It doesn't seem to get much attention - I'm not sure why.

    Have a look, and look at what it is actually capable of doing. If you want to do any sort of 3D visualization, it really is worth your time to learn a bit about VTK.

    Jedidiah.

  14. Silicon Graphics MineSet by marmite · · Score: 2, Informative

    SGI had a product called "MineSet" which did this kind of stuff, only a long long time ago. Originally it was inspired by the 3D filemanager SGI did for Jurassic Park. Cool idea, but old hat :).

    --ralpht

    --
    I do not represent myself.
  15. No surprise that Lockheed uses Sandia work by wintermute42 · · Score: 2, Informative

    Other firms, such as Lockheed Martin, also are starting to use the lab.

    I don't find it surprising that Lockheed Martin is one of the firms "starting to use the lab". Lockheed Martin runs Sandia as a contractor for the Department of Energy. Lockheed has a builtin bias to show how applicable the work at Sandia is.

  16. This is so 90s by Don+Tobin · · Score: 2, Informative

    I feel like I'm playing Civilization and my agent is reporting that another civilization has just invented something my people have had for the last hour.

    Seriously, I was doing this at the Census Bureau years ago with VRML and enhanced it with those dodgy Performance Copilot (SGI) type tools. Since then products such as, oh, I don't know, Cognos and Crystal Reports (4+) have implemented 3d data set controls and reports in spades(Tivoli Business Decision Manager anyone?).

    Open source tends to lack the robust (read: overcomplicated buggy) features of the commercial variants but the underlying technology is still mesozoic for us terrans. And yeah, many MBA dinosaurs lack the ability to visualize data like this (compare business typical fiugures to an economist's throughput figures and the economist has no trouble understanding this stuff, odd how they make so little when they show off that title). Still, there are countless open minded business ppl with econ backgrounds who love these kinds of tools. Not to mention the courses being offered for the past decade in the mindset of 3d management.

    Nachos for all, but not all the nachos.

  17. ObSimpsons by generationxyu · · Score: 2, Funny

    And turning to the 3D graph, we see an inmistakeable cone of ignorance.

    --
    I mod down pyramid schemes in sigs.
  18. Data Mining *WENT* 3D... by dwater · · Score: 2, Interesting

    ... along time ago. Purple|Insight

    Some nice screen shots there too :)

    Mineset detail
    Network Analysis
    Intrusion Detection
    Fraud Detection

    --
    Max.
  19. Open Source Data Visualisation based on IBM code by trillion · · Score: 2, Informative

    I have been tinkering with this since I came across it last year sometime. But it too is nothing new; first release was in 1998

    http://www.opendx.org