Slashdot Mirror


Visualizing Complex Data Sets?

markmcb writes "A year ago my company began using SAP as its ERP system, and there is still a great deal of focus on cleaning up the 'master data' that ultimately drives everything the system does. The issue we face is that the master data set is gigantic and not easy to wrap one's mind around. As powerful as SAP is, I find it does little to aid with useful visualization of data. I recently employed a custom solution using Ruby and Graphviz to help build graphs of master data flow from manual extracts, but I'm wondering what other people are doing to get similar results. Have you found good out-of-the-box solutions in things like data warehouses, or is this just one of those situations where customization has to fill a gap?"

26 of 180 comments (clear)

  1. perhaps worth looking at? by Anonymous Coward · · Score: 3, Informative
    Portraits of complex networks

    Abstract: We propose a method for characterizing large complex networks by introducing a new matrix structure, unique for a given network, which encodes structural information; provides useful visualization, even for very large networks; and allows for rigorous statistical comparison between networks. Dynamic processes such as percolation can be visualized using animations. Applications to graph theory are discussed, as are generalizations to weighted networks, real-world network similarity testing, and applicability to the graph isomorphism problem.

    1. Re:perhaps worth looking at? by Anonymous Coward · · Score: 5, Funny

      If you think that's worth looking at... how about this? http://www.phdcomics.com/comics/archive.php?comicid=1121

  2. PtolemyPlot by technofix · · Score: 2, Informative

    PtolemyPlot and Java.

  3. Great source of data visualization inspiration by Anonymous Coward · · Score: 3, Interesting

    http://visualcomplexity.com

    Have fun!

  4. Re:Reminescent by gravos · · Score: 2, Insightful

    If you have access to a plotter, Graphviz gives you a great deal of flexibility with regards to how big these images can physically be. Maybe you could consider posing them up on the wall and having a roundtable session at your office.

  5. I have a question for you by zappepcs · · Score: 4, Insightful

    How are you supposed to handle the data if you do not understand it? Sure, there can be too much to see/think about at one time, but if you don't understand it, how can you visualize it usefully?

    I am asking because I have a problem: Where I work, I understand the data and I make efforts to visualize it for others. The trouble starts when they don't understand the data and it's sources and limitations, so what they see in my visualization is all they know of it, and they make assumptions about it. I've even had people worry that the network is down because there were holes in the collected data which then showed up in the visualizations.

    If anyone has some good URLs for such thinking, I'd be grateful.

    I simply do not understand how you can visualize data for people if you yourself do not understand it.

    1. Re:I have a question for you by Cutting_Crew · · Score: 2, Interesting

      here is a *sample* of some of my early work that i did long ago when i was just starting out. i dont have any mature 100% working screenshots but you get the idea.

      the lat, lon and depth values are courtest of NOAA, freely available. this is a screenshot of a real time frame in openGL of the world with each vertex pair colored by depth. you can rotate it, probe it and a few other things.

      link

    2. Re:I have a question for you by TapeCutter · · Score: 3, Interesting

      Sorry but I think the GP is spot on.

      What you are doing in your post is investigating the data until you UNDERSTAND what is usefull and then presenting (visualising) it for you're boss, who probably adds another layer of "visualization" for his boss, etc. (ie: You are acting as human visualisation tool that the boss can use to visualise the output of silicon visualisation tools)

      To scale up you're simple X/Y plot of two variables to corporate size you propose using a visualization tool that UNDERSTANDS database structures and UNDERSTANDS the fact that to plot strings against integers you need a default transform, etc, etc. You are handed a bunch of DB's with hundereds of tables, thousands of columns and countless transaction transforms ferrying data from one DB to the other.

      So you start with all possible pairs to see if there is a nice easy curve that can relate them. You get 10,000 statistically significant relationships - the problem posed in TFS is how do you now visualize all those graphs to find the relevant relationships without UNDERSTANDING the data.

      As to TFS, visualization relies on data minning which will never be "solved" because given enough data you can always add one more level of UNDERSTANDING (see: Godel). This is not to say that trying to solve it is pointless. On the contrary, google news is excellent and accessible example of how far things have progressed in the last couple of decades.

      Simply presenting multiple known facts/relationships in an easily accessible format takes a deep UNDERSTANDING of the data. Even if you do UNDERSTAND the facts/relationships, creating the format is an art that has few masters.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  6. R language by QuietLagoon · · Score: 4, Informative

    There was a thread about the R language a couple of weeks ago. Look it up and read it....

    1. Re:R language by koutbo6 · · Score: 5, Informative

      I second that. If you are visualizing graphs be sure to get the igraph package which can be used with R, Python, C, or Ruby.
      http://cneurocvs.rmki.kfki.hu/igraph/
      Processing is another package that is geared towards data visualization which java developers might find easier use
      http://www.processing.org/

      --
      You speak London? I speak London very best.
  7. Try the InfoVis community by Mithrandir · · Score: 5, Informative

    The infovis community has been dealing with these subjects for years. There's many different visualisation techniques around. Here's a list of the past conferences and the papers:

    http://conferences.computer.org/Infovis/

    Plenty of good products out there, but the one that I like most is from Tableau Software (http://www.tableausoftware.com/).

    --
    Life is complete only for brief intervals in between toys or projects -- John Dalton
  8. Spotfire by DebateG · · Score: 2, Informative

    I work in biology, and we use Spotfire DecisionSite to visualize and analyze a lot of our massive genetic data. It's a very powerful program that I barely know how to use. It seems to have packages able to analyze pretty much anything you want, and you can even write your own scripts to help things along.

  9. Am I missing something or... by Shados · · Score: 4, Informative

    Wouldn't any everyday cube browser along with any tool to detect base dimentions in a datawarehouse schema do the trick? You may have to add a few custom dimentions on your own depending on how shitty the master data is (I don't think that can be helped, no matter the solution, if a dimention is "these two fields multiplied together times a magic number appended to the value of another table", you need to know, no tool will guess), but aside that?

    Thats usually what I do anyway. I dump my data in a datawarehouse, use whatever built in wizard can auto-generate dimensions, then play with them in a cube browser. Works for even pretty archaic home-made multi-thousand-tables-without-normalization ERP systems I had to work with in the past anyhow.

  10. Business Intelligence by Anonymous Coward · · Score: 3, Informative

    Your ERP isn't supposed to directly analyze the data. You're supposed to use a Business Intelligence software package for that. This being SAP, I believe they'll try to sell you Hyperion.

  11. Just take the first 65k rows by Anonymous Coward · · Score: 2, Funny

    Just take the first 65k rows and dump them into excel and create a pivot table.

  12. Just pipe it by sleeponthemic · · Score: 2, Funny

    Into a matrix screensaver.

    --
    I record my sleeptalking
  13. IBM data explorer by shish · · Score: 2, Interesting

    I have no idea how I stumbled across this, but it looks very pretty...

    --
    I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
  14. Traditionally... by FurtiveGlancer · · Score: 3, Funny

    Large, highly complex data sets are best described on the back of four cocktail napkins or on a fixed white board in a shared conference room. ~

    --
    Invenio via vel creo
  15. Re:Edward Tufte, anyone? by Mithrandir · · Score: 2, Interesting

    Tufte's ideas are good for presenting simple information. He gets many things right (eg if the visualisation doesn't work in black and white, adding colour won't fix it). However, many in the infovis community are outright sceptical, if not dismissive of his ideas for analysing high dimensional datasets.

    Where his ideas really work is once you have "the answer" that you want to present to someone else. However, the basic exploration of the data to find interesting keypoints, is not what he specialises in. There's whole communities devoted to techniques for datamining and presentation, principly infovis/Visual Analytics.

    --
    Life is complete only for brief intervals in between toys or projects -- John Dalton
  16. more details on Tableau by morton2002 · · Score: 2, Informative

    Tableau Desktop is an interactive analysis and visualization product that connects to relational and cube data sources to help people see and understand their data. There was a webinar (slides - PDF) back in November 2008 covering Blastrac Global's success in using Tableau with their ERP system.

    Disclaimer: I work at Tableau Software, so I encourage you to see for yourself with a free trial: http://www.tableausoftware.com/products/tour

  17. Cytoscape by adamkennedy · · Score: 4, Informative

    I had a similar situation to yours recently, except I was trying to detangle a horridly complex product substitution graph for a logistics company.

    I used a bunch of Perl to crunch the raw databases into various abstract graph structures, but instead of graphviz or something created by/for developers, I found that the best software for graph visualisation is the stuff that the genetics and bio people use.

    The standout for me was a program called Cytoscape which can import enormous graph datasets and then gives you literally dozens of different automated layout algorithms to play with (most of which I'd never heard of, but it's easy to just go through them one at a time till something works)

    It's got lots of plugins for talking to genetics databases and such, but if you ignore all that and use Perl/Ruby/whatever for the data production part of the problem, it's a great way to visualise it.

  18. Re:Four dimensions by SillyPerson · · Score: 2, Insightful

    I'm sure it's my mathematics background, but when I saw the headline I assumed the author would be discussing something involving the square root of negative one, to which my response was, "Silly author, you can't visualize four dimensions. (Sober.)"

    You have a mathematical background and can not visualize four dimensions? Here is how you do it: Just visualize the problem in n dimensions and then set n=4.

  19. Re:Looks like SAP tricked another sucker by Hognoxious · · Score: 2, Insightful

    SAP is NOT a business application. It's a programming environment where you get to build and customize your own.

    Looks like you work for another company that tried to reimplement their old system word for word and step by step in SAP.

    And customization has a specific meaning in SAP that doesn't involve any coding. It appears you don't know that, which doesn't improve your credibility.

    A "good" business software package allows you to customize "it" to match your business processes.

    If you want that, don't use a package. Now I'm not saying SAP is perfect - there are some companies whose process certainly don't fit SAP. The solution in that case is to write your own from scratch.

    Make vs buy is an old dilemma, but doing both isn't really a good solution.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  20. SAP MDM ? BI ? by obUser · · Score: 2, Insightful

    Since you've already bought licences for SAP ERP, you could get a bargain on the Master Data Mgt component. It also offers support to control the Master Data harmonisation process, which you probably need if you have such a large amount of data.

    1. Re:SAP MDM ? BI ? by codepunk · · Score: 2, Funny

      Bargain and SAP are not two words to be used together!

      --


      Got Code?
  21. Re:get rich slow by OSXCPA · · Score: 2, Funny

    Their priorities are different. I come from three generations of German VW assembly line workers (Wolfsburg plant). I grew up in the US. Last time I saw my now-retired uncle, he asked me about the (then) new VW Bug and why the Americans kept putting flowers in the dashboard-mounted gun rack. Explanation was pointless.