I ran across Edward Tufte via another webzine, it as excellent in its own way as slashdot proves itself to be daily. For years as a network traffic analysis tools creator I had been depending on the coding by my collegues for the visualization of the data I collected and transformed via various numerical methods. The results were almost exclusively somehow disappointing, and thus not to put to fine a point on it a constant source of internal tension as we pushed and pulled on each other over 'how to graph time series and event data'. For me a deeper understanding was the result of the purchase of the first three books. As neither long nor difficult reads I quickly finished the trio and was inspired to strike out on my own to see is Perl/w GD would be something I could accomplish. First and foremost using a concept of less is more quickly resulted in a functional module from which I could experiment. While I am far from 'priding myself' or suggesting I have produced a superior solution I have succeeded in a Perl CGI (GD based) GUI which allows the analysts to dynamically select horizontal and vertical scaling, n-scaling of the vertical (scaling) to allow very small and large values to be displayed w/o either suppressing the other while allowing for the 'mid-section' its 'equal share' of the vertical space, with a variety of numerical measuremenets (avg, median, trend (least square best fit line), min, max, (1st) std, 1st/3rd quartile, etc), presented as scatter, line, bar, and fill with active map regions allowing 'drilldown'. While not for everyone the inspiration that I have recieved from Edward Tufte has provided me with the motivation to investigate my understanding of his ideas using my data in a dynamic way that allows me to adjust the visualization to the data. One key section for me was the discussion of the method used to represent the effect of temperature on seals used on the space shuttle and how an altrnative visualization would have clearly led to a more cautious conclusion that that which resulted in the Challenger disaster. That, as do all of Edward Tufte's examples, has had a profound and lasting impression on me and my visualization coding. I can therefore recommend Edward Tufte's books w/o reservation and sincerely hope that you will be inspired as I have been.
I too have read this book, and it remains one of several that is open somewhere under all the clutter on my desk. Since I write network applications, mostly in the monitoring and traffic analysis space, for a living such things are of keen intrest to me. I can thus recommend it without reservation.
When reading the article linked to please double check the writers math. While the numbers given for the capture rate are large the example seems to state that 10,000 million DSL links at a speed of 256K would be captured @ layer 4 on the 10Gbps links. My math suggests that there
are aprox. 3900 DSL links @ 256Kbps each in 1Gbps,a nd therefore approx 39K @ 10Gbps. What is
really missing here for me is details about the 'flow', ala Cisco NetFlow, rate per second which would effect the layer 4 processing rate. As we 'converse', i.e. traverse the tier1 transits, we send many frames in a single flow which could occur over several seconds. In a NetFlow like consideration all of such frames would comprise a single flow accounting and thus the 'data rate' to the probe, narus or otherwise, would be considerably smaller. Those tier1 transits probably do have 1000's, even 10's of 1000's of concurrent flows per second. that number is still not overwhelming as I myself have coded and operated NetFlow processing systems that process normalized records into an RDBMS at the rate of 1G records per day which is a per-second rate of under 12000 flows per second on 6x 450mhz SUN system. And I don't find Sparc to be the most powerful or processing environments! Surely the full 10Gbps per second full capture and storage of such feeds IS impressive and any such solution would have to have massive storage capacity on many storage channels opperating concurrently in order to just capture the data for later analysis. But those solutions can be purchased, just think EMC and a bunch of fiber channels. You could even experiment with this on your own DSL, or cable, by loading up ethereal and storing everything to your ata just to see that it is feasible. From there you could bypass all the cannd solution by going straight to libpcap and your homegrown code, Perl being my preference, and readily include your own indexing/tagging scheme to the data being grabbed by libpcap. So, certainly there is great issue here, however i is not one about the amount of hardware needed. I suggest a 'wire speed' collector writing to a large high speed storage with backend systems having read access to that storage for subsequent processing is rather straight forward for the 'average' homebrew.
Oh my. I guess it could happen. If BOINC SETI code had a bug then caused execution
to pick up in the data area, think of something like a nop sled to get you there,
AND the alien signal was h/w specific and alignment was just right and the code bump occured on the right architecture 9above dependency) then viola! You're off and running.
NOW THAT IS SOME FINE PIECE OF ALIEN INTELLIGENCE!
But, this isn't independence day so ID4 or not... Peace!
I ran across Edward Tufte via another webzine, it as excellent in its own way as slashdot proves itself to be /w GD would
daily.
For years as a network traffic analysis tools creator I had been depending on the coding by my collegues for
the visualization of the data I collected and transformed via various numerical methods. The results were almost
exclusively somehow disappointing, and thus not to put to fine a point on it a constant source of internal
tension as we pushed and pulled on each other over 'how to graph time series and event data'.
For me a deeper understanding was the result of the purchase of the first three books. As neither long nor
difficult reads I quickly finished the trio and was inspired to strike out on my own to see is Perl
be something I could accomplish. First and foremost using a concept of less is more quickly resulted in a
functional module from which I could experiment.
While I am far from 'priding myself' or suggesting I have produced a superior solution I have succeeded in
a Perl CGI (GD based) GUI which allows the analysts to dynamically select horizontal and vertical scaling,
n-scaling of the vertical (scaling) to allow very small and large values to be displayed w/o either
suppressing the other while allowing for the 'mid-section' its 'equal share' of the vertical space, with a
variety of numerical measuremenets (avg, median, trend (least square best fit line), min, max, (1st) std,
1st/3rd quartile, etc), presented as scatter, line, bar, and fill with active map regions allowing 'drilldown'.
While not for everyone the inspiration that I have recieved from Edward Tufte has provided me with the
motivation to investigate my understanding of his ideas using my data in a dynamic way that allows me to
adjust the visualization to the data. One key section for me was the discussion of the method used to represent
the effect of temperature on seals used on the space shuttle and how an altrnative visualization would have
clearly led to a more cautious conclusion that that which resulted in the Challenger disaster. That, as do
all of Edward Tufte's examples, has had a profound and lasting impression on me and my visualization coding.
I can therefore recommend Edward Tufte's books w/o reservation and sincerely hope that you will be inspired
as I have been.
I too have read this book, and it remains one of several that is open somewhere under all the clutter on my desk. Since I write network applications, mostly in the monitoring
and traffic analysis space, for a living such things are of keen intrest to me. I can
thus recommend it without reservation.
When reading the article linked to please double check the writers math. While the numbers given for the capture rate are large the example seems to state that 10,000 million DSL links at a speed of 256K would be captured @ layer 4 on the 10Gbps links. My math suggests that there are aprox. 3900 DSL links @ 256Kbps each in 1Gbps,a nd therefore approx 39K @ 10Gbps. What is really missing here for me is details about the 'flow', ala Cisco NetFlow, rate per second which would effect the layer 4 processing rate. As we 'converse', i.e. traverse the tier1 transits, we send many frames in a single flow which could occur over several seconds. In a NetFlow like consideration all of such frames would comprise a single flow accounting and thus the 'data rate' to the probe, narus or otherwise, would be considerably smaller. Those tier1 transits probably do have 1000's, even 10's of 1000's of concurrent flows per second. that number is still not overwhelming as I myself have coded and operated NetFlow processing systems that process normalized records into an RDBMS at the rate of 1G records per day which is a per-second rate of under 12000 flows per second on 6x 450mhz SUN system. And I don't find Sparc to be the most powerful or processing environments! Surely the full 10Gbps per second full capture and storage of such feeds IS impressive and any such solution would have to have massive storage capacity on many storage channels opperating concurrently in order to just capture the data for later analysis. But those solutions can be purchased, just think EMC and a bunch of fiber channels. You could even experiment with this on your own DSL, or cable, by loading up ethereal and storing everything to your ata just to see that it is feasible. From there you could bypass all the cannd solution by going straight to libpcap and your homegrown code, Perl being my preference, and readily include your own indexing/tagging scheme to the data being grabbed by libpcap. So, certainly there is great issue here, however i is not one about the amount of hardware needed. I suggest a 'wire speed' collector writing to a large high speed storage with backend systems having read access to that storage for subsequent processing is rather straight forward for the 'average' homebrew.
Oh my. I guess it could happen. If BOINC SETI code had a bug then caused execution to pick up in the data area, think of something like a nop sled to get you there, AND the alien signal was h/w specific and alignment was just right and the code bump occured on the right architecture 9above dependency) then viola! You're off and running. NOW THAT IS SOME FINE PIECE OF ALIEN INTELLIGENCE! But, this isn't independence day so ID4 or not... Peace!