Slashdot Mirror


Data Mining Goes 3D

Roland Piquepaille writes "At Sandia National Laboratories (SNL), a data mining and visualization software suite developed in the last two years is now able to extract information from many sources of data and to return 3D images as results. In Sandia's intelligence lab converts business data into 3-D images, the New Mexico Business Weekly reports that Sandia's Information Visualization Lab is able to search structured documents, such as scientific journals, or unstructured ones, such as the Web or an intranet. Since the lab has been established five months ago, this software has already been used to determine the potential of several partnerships with SNL. Other firms, such as Lockheed Martin, also are starting to use the lab. Let's hope that SNL releases this software as open source. It should be fun to use it. For more details and pictures, please read this overview."

7 of 79 comments (clear)

  1. 3-d data mining.... by drfrog · · Score: 5, Informative

    is over 5 years old already

    google search

    people have been doing real time data mining in VRML since the vrml2.0 plugins came out back in 97

    --
    back in the day we didnt have no old school
  2. HollywoodOS by Xpilot · · Score: 4, Funny

    Just think, if Hollywood bothers to at least try and get some technical stuff even remotely realistic (and look cool), they could incorporate such things into movies. But no... we get a fusion reaction which you can control with metal tentacles (just push the little flames back in!).

    --
    "Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it." -- Linus Torvalds
  3. How much? by DAldredge · · Score: 5, Insightful

    How much will the license this for? I know the taxpayers paid for it, but it always seams like it gets exclusivly licensed to some company for next to nothing then that company charges the people that paid for it in the first place a lot of money to use it.

    1. Re:How much? by orthogonal · · Score: 4, Insightful

      Sandia's intelligence lab converts business data into 3-D images

      I know the taxpayers paid for it, but it always seams like it gets exclusivly [sic] licensed to some company for next to nothing then that company charges the people that paid for it in the first place a lot of money to use it.

      You're a wisely cynical man.

      In the light of the 9/11 Commission's report of the multiple failures of the CIA and FBI that allowed the terrorists to attack us in 2001, in the light of Sibel Edmonds's allegations that the FBI intentionally destroyed translations of intercepted terrorist conversations, in light of the Senate Intelligence Committee's report about systemic CIA failures to provide accurate intelligence about WMDs in Iraq, why am I less than thrilled to discover that Sandia National Laboratories' businesses?

      When I further learn that "Sandia officials say tech firms or venture capitalists can use the lab on a per-request basis," I begin to understand that Sandia's Corporate Business Development and Partnerships aren't using my tax dollars to protect me, they're providing corporate welfare by dong the Research and Development that business wants but doesn't want to pay for.

      Remember, these are the same businesses that vociferously object to government programs that might compete with them, whether that's sponsorship of Open Source Software or rural electric cooperatives or IRS software that might be efficient enough to cost H&R Block. These are the same corporations that got a provision added to the Medicare Prescription Drug Bill to prevent the government from getting discounts by buying those drugs in bulk, but which profit from research funded by the National Institutes of Health.

      These are the same corporations that want Ashcroft's Department of Justice to stop worrying so much about fixing the FBI's failures, so it can spend government time -- and your money -- prosecuting civil -- civil, not criminal -- suits against file traders under the PIRATE Act on behalf of those corporations. If you need to sue a corporation, you're on your own; maybe you'll get some coupons out of a class action suit. But if the corporation wants to sue you, they get the assistance of top government lawyers and FBI agents packing guns and warrants.

      And this just after the U.S. House passed the biggest corporate tax cuts in twenty years, because existing direct subsidies -- or less politely, corporate welfare -- will no longer be permitted under World Trade Organization rules. Even House Republicans admit this tax cut "is riddled with special-interest provisions that would further complicate the tax code, send jobs overseas and worsen a federal deficit already at record highs."

      Does anyone really expect Sandia's going to release the source code to the data mining software to us, the citizens who have to pay for it?

      Be proud, Americans, of how fat your labor makes your corporate masters! What a joy it is to serve them! It is your privilege to work long hours and pay high taxes so your masters can buy their yachts -- and buy the laws that enslave you.

      America, Of the People, By the People, for the Pe^H^H Corporations

  4. Light on details... by TheQuestion · · Score: 5, Insightful

    I wish this story went into more details into the algorithms used. Saying stuff like "we take tons of data and out comes a 3D image" is great, but what does the 3D image actually represent? What are the dimensions being graphed?

    My company manages a very large portfolio of auto loans. I'd like to know more details as to what they are actually doing so that I can judge whether we can use this technology or one like it to predict trends in our consumer base, or to develop better scoring models.

    1. Re:Light on details... by Coryoth · · Score: 4, Informative

      I wish this story went into more details into the algorithms used. Saying stuff like "we take tons of data and out comes a 3D image" is great, but what does the 3D image actually represent? What are the dimensions being graphed?

      If I had to guess I would guess that they are doing 3D Self Organizing Maps, or something very similar.

      The principle is: create a huge feature space for the documents in question (something like word counts for each document for each word in the corpus, with appropriate fixes (drop the most and least common words, do stemming etc.). You can now "visualize" the documents in a massive 20,000 dimensional space. However, what you can do, is try to create a projection from 20,000 dimensions down to 2 or 3 dimensions in a way that best preserves distances in the 20,000 dimensional space. This automatically creates a clustering of the documents as well, and you now have something that you can visualize practically. If you start doing things like labelling clusters and subsclusters by the words unique to/defining that cluster you can start to make some sense of the visualisation.

      Effectively this is just a means of doing clustering on a large document space in such a way that the final output can be visualized (instead of the sort of results you get from k-means, or heirarchical clustering, which are a lot harder to visualize in a meaningful way for laymen). The benefit of being able to visualize it in that sense is that you can "see" patterns of other document attributes by adding that to the visualization (via colors, labels, etc.) and see a global overview of those attributes across the entire document space.

      Just to reiterate: I do not know that this is what is being done, and they don't say a lot in the article, but I do have some experience in this field, and what I gleaned from the article would tend to imply an approach like this.

      Jedidiah.

  5. for the trees by Doc+Ruby · · Score: 4, Funny

    The technology is called "ClearForest", in homage to the continents of forests cleared for paper printouts of these 3D reports that PHBs will have shredded once they've "read" them.

    --

    --
    make install -not war