Slashdot Mirror


Book Review: Data-Driven Security: Analysis, Visualization and Dashboards

benrothke writes There is a not so fine line between data dashboards and other information displays that provide pretty but otherwise useless and unactionable information; and those that provide effective answers to key questions. Data-Driven Security: Analysis, Visualization and Dashboards is all about the later. In this extremely valuable book, authors Jay Jacobs and Bob Rudis show you how to find security patterns in your data logs and extract enough information from it to create effective information security countermeasures. By using data correctly and truly understanding what that data means, the authors show how you can achieve much greater levels of security. Keep reading for the rest of Ben's review. Data-Driven Security: Analysis, Visualization and Dashboards author Jay Jacobs and Bob Rudis pages 352 publisher Wiley rating 10/10 reviewer Ben Rothke ISBN 978-1118793725 summary Superb book for effective use of data for information security The book is meant for a serious reader who is willing to put in the time and effort to learn the programming necessary (mainly in Python and R) to truly understand what information exists deep in the recesses of their logs. As to R, it is a GNU project and a free software programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. For analysis the level of which Jacobs and Rudis prescribe, R is a godsend.

After completing the book, the reader will have the ability to know which questions to ask to gain security insights, and use that data to ensure the overall security of their data and networks. Getting to that level is not a trivial at all a trivial task; even if there are vendors who can promise to do that.

For many people performing data analysis, the dependable Excel spreadsheet is their basic choice for data manipulation. The book calls the spreadsheet a gateway tool between a text editor and programming. The book notes that spreadsheets work as long as the data is not too large or complex. The book quotes a 2013 report to shareholders from J.P. Morgan in which parts of their 2012 $6 billion in losses was due in part to problems with their Excel spreadsheets.

The authors suggest using Excel as a temporary solution for quick one-shot tasks. For those that have repeating analytical tasks or models that are used repeatedly, it's best to move to some type of structured programming language, specifically those that the book suggest and for provides significant amounts of code examples; all of which are available on the companion website here.

The goal of all data extraction is to use data analysis to answer real questions. A large part of the book focuses on how to ask the right question. In chapter 1, the authors write that every good data analysis project begins with setting a goal and creating one or more research questions. Without a well-formed question guiding the analysis, you may wasting time and energy seeking convenient answers in the data, or worse, you may end up answering a question that nobody was asking in the first place.

The value of the book is that it shows the reader how to focus on context and purpose of the data analysis by setting the research question appropriately; rather than simply parsing large amounts of data. It's ultimately irrelevant if you can use Hadoop to process petabytes of data if you don't know what you are looking for.

Visualization is a large part of what this book is about, and in chapter 6 — Visualizing Security Data, the book notes that the most efficient path to human understanding is via the visual sense. It goes on to details the many advantages data visualization has, and the key to making it work.

As important as visualization is, describing the data is equally important. In chapter 7, the book introduces the VERIS(Vocabulary for Event Recording and Incident Sharing) framework. VERIS is a set of metrics designed to provide a common language for describing security incidents in a structured and repeatable manner. VERIS helps organizations collect useful incident-related information and to share that information, anonymously and responsibly with others.

The book shows how you can use dashboards for effective data visualization. But the authors warn that a dashboard is not an art show. They caution that given the graphical nature of dashboards, it's easy to fall into the trap of making them look like pieces of modern or fringe art; when they are far more akin to architectural and industrial diagrams that require more controlled, deliberate and constrained design.

As to dashboards the authors do not like, they consider the Cyber Security Situational Awarenessto be glitzy but not informative. Personally, I thought the dashboard has a lot of good information.

The book uses the definition of dashboard according to Stephen Few, in that it's a "visual display of the most important information needed to achieve one or more objectives that has been consolidated in a single computer screen so it can be monitored at a glance". The book enables the reader to create dashboards like that.

Data-Driven Security: Analysis, Visualization and Dashboards is a superb book written by two experts who provide significant amounts of valuable information in every chapter. For those that are willing to put the time and effort into the serious amount of work that the book requires, they will find it a vital resource that will certainly help them achieve much higher levels of security.

Reviewed by Ben Rothke.

You can purchase Data-Driven Security: Analysis, Visualization and Dashboards from amazon.com. Slashdot welcomes readers' book reviews (sci-fi included) -- to see your own review here, read the book review guidelines, then visit the submission page. If you'd like to see what books we have available from our review library please let us know.

26 comments

  1. Question, what does R do that other lingos cannot? by Anonymous Coward · · Score: 0

    Does it just have statistical functions built in and ready to go?

  2. Re:Question, what does R do that other lingos cann by vux984 · · Score: 3, Informative

    Question, what does R do that other lingos cannot?

    Nothing. I'm sure other languages can do everything R can do.

    Does it just have statistical functions built in and ready to go?

    It does have that, along with an active community and growing popularity in scientific circles, so there is lots cutting edge interesting work being done with R -- and a lot of its free and open source. Plus it has multi-core support in several libraries places, and even gpu support in some.

  3. Re:Question, what does R do that other lingos cann by Anonymous Coward · · Score: 1

    http://en.wikipedia.org/wiki/R_%28programming_language%29

    R is a free software programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software[2][3] and data analysis.[3] Polls and surveys of data miners are showing R's popularity has increased substantially in recent years.[4][5][6]

    R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. S was created by John Chambers while at Bell Labs. R was created by Ross Ihaka and Robert Gentleman[7] at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S.[8]

    R is a GNU project.[9][10] The source code for the R software environment is written primarily in C, Fortran, and R.[11] R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. R uses a command line interface; however, several graphical user interfaces are available for use with R.

  4. Re:Question, what does R do that other lingos cann by majid_aldo · · Score: 1

    Question, what does R do that other lingos cannot?

    Nothing. I'm sure other languages can do everything R can do.

    Does it just have statistical functions built in and ready to go?

    It does have that, along with an active community and growing popularity in scientific circles, so there is lots cutting edge interesting work being done with R -- and a lot of its free and open source. Plus it has multi-core support in several libraries places, and even gpu support in some.

    since it has cutting-edge stat functions that's plenty of functionality that R has that other languages DON'T have.

    --
    --- widget evolution: enhanced, plus, super, ultra, extreme, exxxtreme, ultra-extreme, ..etc.
  5. Re:Question, what does R do that other lingos cann by vux984 · · Score: 1

    since it has cutting-edge stat functions that's plenty of functionality that R has that other languages DON'T have.

    MATLAB, Python and other languages have stuff in the same class as R. R is particularly well suited for stats functionality... but its is not UNIQUELY suited for it.

  6. Re:Question, what does R do that other lingos cann by Anonymous Coward · · Score: 0

    that is correct ...but MATLAB is expensive. R is a free and open framework.

  7. true. All languages can do exactly the same things by raymorris · · Score: 1

    Question, what does R do that other lingos cannot?

    Nothing. I'm sure other languages can do everything R can do.

    This is an interesting point, which I'm going to veer slightly off topic with. All general purpose programming languages* can do _precisely_ the same things. All fit the requirements to be "Turing complete". ANY Turing complete language "A" can emulate any other Turing complete language "B", and therefore "A" can do the anything that "B" can do. Since "B" can also emulate "A", the two languages can do precisely the same things. (Church-Turing thesis). An interesting example of this is that JavaScript can do everything that CPU microcode can do, as shown at http://bellard.org/jslinux/ .

    Therefore, the question is never "which language can do more", it's always "which language can do it most quickly, most securely, etc." C is often faster than Java for many operations. R is more convenient for statistics, PHP 5.3 makes security bugs less likely than PHP 4.0, but all of those languages can run the exact same programs.

    Contrast HTML and XML, which being markup languages rather than general purpose programming languages, are not Turing complete. Standard regexs are also not Turing complete, though Perl's extended regexs very well may be.

  8. Basically Bullshit by gweihir · · Score: 0

    This may have some use against script-kiddies, bot-nets and similarly non-sophisticated adversaries. It is worse than nothing against other adversaries, as it creates a false sense of security.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:Basically Bullshit by Anonymous Coward · · Score: 0

      What are you responding to?

      It has nothing to do with the book.

    2. Re:Basically Bullshit by Anonymous Coward · · Score: 0

      Recursive...your comment on Basically Bullshit is what your comment is.

    3. Re:Basically Bullshit by Anonymous Coward · · Score: 0

      shut up...u r an angry man!

    4. Re:Basically Bullshit by strikethree · · Score: 1

      Hm. I am going to have to disagree with you there. A false sense of security can be gleaned from such data; however, a false sense of security can be had from NO information at all. A false sense of security is a failing in the security practitioner, not a result of the data. For example, let's say someone has done this analysis and thinks they are secure and then reads your comment. They then know the limitations of what their data can expose and can continue to look for more subtle traces leading to discovery of more sophisticated agents operating within their systems.

      A false sense of security is in essence another formulation of what I call the arrogance problem. Always keep your mind open to new ideas and interpretations and you can avoid this issue more often than not. I hope you did not get modded down for displaying a potential problem. You were a bit harsh but skins should not be thin here.

      To be fair, it is always good to eliminate the obvious first. Stopping there is stupid, but starting there is good.

      --
      "Someone needs to talk to the tree of liberty about its ghoulish drinking problem." by ohnocitizen
    5. Re:Basically Bullshit by Anonymous Coward · · Score: 0

      you make zero...wait...less than zero sense!

  9. Re:true. All languages can do exactly the same thi by vux984 · · Score: 1

    All general purpose programming languages* can do _precisely_ the same things.

    For a rather broad and mathematically abstract definition of "precisely".

    The Church-Turing thesis applies to computers and computation in the abstract. Actual computer languages on actual hardware may theoretically be able to do the same things in an abstract sense, but not necessarily do precisely the same things with the actual physical hardware they run on.

    Not necessarily due to the language itself, but the nature of how they are compiled, interpreted, and/or otherwise used in practice.

  10. Re:Question, what does R do that other lingos cann by Anonymous Coward · · Score: 0

    I hope you appreciated that softball. ;)

  11. Re:true. All languages can do exactly the same thi by Anonymous Coward · · Score: 0

    wateva....

    the truth is that R is a specialized language...and for this purposes...it WORKS!

    really does.

  12. Exmpl? If an interpreter for A can be written in B by raymorris · · Score: 1

    If an interpreter for language A can be written in language B, then B can therefore do everything A does, by running that interpreter. Do you have an example in mind of two languages that can do very different things?

  13. Re:Exmpl? If an interpreter for A can be written i by vux984 · · Score: 1

    If an interpreter for language A can be written in language B, then B can therefore do everything A does, by running that interpreter.

    Mathematically speaking yes. Practically speaking no.

    Do you have an example in mind of two languages that can do very different things?

    Postscript is Turing complete. Now go write an interpreter for C / C++ with it, and use it to play Call of Duty.

    You can write an interpreter for C/C++ with it.

    Hypothetically speaking it would compile and run the source code for Call of Duty.

    Practically speaking however, it would not work. This abomination would not have access to directX, game controller inputs, sound, multiplayer / networking because postscript doesn't have those things, and therefore the interpreted C code would not have those things.

    Those things aren't required to be Turing Complete, but they are required to play Call of Duty on a modern PC precisely the same way one might play it.

    You could create something mathematically equivalent of the computation required for Call of Duty, but you could not "play" it precisely the same way.

    If one were to build the relevant api/libraries and make them available to postscript then you could, but that doesn't exist right now and its worth pointing out that those graphics, sound, networking, and input APIs could not themselves be written entirely in postscript.

    So while postscript and C are mathematically equivalent in a Church-Turing thesis sense they really aren't equivalent on real hardware in the real world.

    You could not start with what is in the world today, and writing postscript, only postscript code, and nothing else, come up with a playable call of duty.

  14. sure it does. If you sandbox J, it's sandboxed too by raymorris · · Score: 1

    If you sandbox Java in the browser, or sandbox a plugin written in C, it can't access DirectX either. The fact that people often choose to run a program in a sandbox doesn't mean anything about the language(s) the program is written in. Try writing a C compiler in C. It's not easy in any language. It's possible in any.

  15. ps - I wouldn't want to write COD in Postscript by raymorris · · Score: 1

    Ps, it would certainly be EASIER to write Call of Duty in some languages than it would in others. It would be difficult to get it to run QUICKLY in some languages (actually that's true of all languages). It could be done, though, and that's point. The question isn't what CAN the language do, the question is what it's best suited for. Just because you CAN write a pixel shader in Perl doesn't mean you should.

  16. Re:sure it does. If you sandbox J, it's sandboxed by vux984 · · Score: 1

    The fact that people often choose to run a program in a sandbox doesn't mean anything about the language(s) the program is written in.

    The fact that you can theoretically put any language into a given sandbox or theoretically take it out of one is not equivalent to a real ability to actually do it in the real world today.

    It's not easy in any language. It's possible in any.

    Imagine a turing complete toy language which only operates on binary values. The only implementation of that language allocates a byte to hold a one or a zero. An "8 bit integer" would require 8 bytes of computer memory. Horribly wasteful I know, but its still a turing complete language.

    You cannot implement an x86 C compiler in this language. (At least not today.) Not because the language itself is incapable of computing it, but the implementation of the language lacks the ability to output valid x86 C code that an x86 CPU will execute. If for example, my executable program needs to have its first byte as 110100011, this language cannot output that. I'll get

    00000001, 00000001, 00000000, 00000001, 00000000, 00000000, 00000000, 00000001

    which can be shown to be mathematically equivalent by a simple function of ignoring the leading 7 bits of each byte and placing the remaining 8th bit into a single byte. BUT this language lacks the capability to actually do THAT in practice.

    I could even write an x86 CPU emulator in my toy language and use it to run my "equivalent to x86 but not x86" machine code. The emulator would emulate the 32-bit CPU registers with 32 bytes each containing 1 bit, etc...

    But no matter how much I twist and contort, I can't get 11010011 into a single hardware byte. I don't need to do that for the language to be turing complete, since it can *simulate* the ability to do that, without actually doing it.

    And THAT is my point. Church-Turing is satisfied by simulation. A simulation of a thing isn't necessarily precisely the same thing as being able to actually do the thing directly.

    Now of course, one could re-implement the language differently (allowing bits to set within a physical byte), and then one could do this. But the reimplemention would have to be done in a different language -- one simply could not bootstrap what was needed from the original implementation of the language.

    Despite it being Turing complete.

  17. Interesting, but not a Turing machine, unless is by raymorris · · Score: 1

    We're way off in the weeds here, of course, but that's cool. I don't mind playing in the weeds.

    What you've done there is analogous to Dear Leader's argument "it's Constitutional because it is not a tax and is a tax". You've tried to say "it can write the single value 00000001, which is eight values". Either that's one value or eight, pick one.

    The definition of a Turing machine has requires very few capabilities. One of the very few things required by the definition of a Turing machine is that is has to be able to update memory one value at a time (block writes aren't good enough). That's the DEFINITION of a Turing machine - it's a machine that writes individual symbols to a strip of tape of other storage.

    You've defined a language that can only update eight bits at a time, and additionally you've said it updates them only in certain patterns. That's not Turing complete.

    If we want it to be Turing complete, we can interpret it as one value by saying that the LANGUAGE writes "1" and the HARD DRIVE happens to store that physically with eight molecules. The language would then be Turing complete since it's updating the single value "1". Fine. The language can write 1010101, 11111, 0000, 01010, or any other series since it's writing one value at a time. Perhaps the hard drive stores "10" physically as 1111111100000000, but the hard drive is going to read back what was written to it. Write a "1", get a "1" back. That's part of the definition of Turing complete because the storage in a turing complete system can be like a dumb piece of paper - it doesn't change what you write to it. Given that the tape doesn't change what's written to it, the language can write valid machine code and get valid machine code back.

    You can't have it both ways. If "1" is one value, it can write "1", then write "0", in whatever pattern is needed to produce valid machine code. If it can only write the eight separate values 0,0,0,0,0,0,0,1 that's not a Turing machine.

  18. Re:Interesting, but not a Turing machine, unless i by vux984 · · Score: 1

    We're way off in the weeds here, of course, but that's cool. I don't mind playing in the weeds.

    Way out there. :)

    You've defined a language that can only update eight bits at a time, and additionally you've said it updates them only in certain patterns. That's not Turing complete.

    No you are mistaken.

    From the point of view of the LANGUAGE, each bit is individually and directly accessible. All the language sees is

    0, 1, 0, 1, 0 ...

    The implementation of the language however, runs on x86, and uses a byte to represent each 0, and each 1. (as 00000000 and 00000001 respectively.

    When the language saves a file out to disk, it writes out its binary bits one bit at a time, but they are each saved as a byte. When it reads them back in, the byte 00000001 is read, and stored in memory as 00000001 in a byte... but the language, just sees 1.

    The problem if you try to write a C compiler in the language, is that as far as the language is concerned the first 8 bits of the program it compiled *IS* 11010001, however, what is in the physical computer memory to represent that is 8 bytes each 000000000 or 00000001. What gets written to the file on the disk is the same. The representation of the compiled C program I generated, is mathmatically equivalent to the actual C program ... but it cannot be run as the representation is wrong.

    One of the very few things required by the definition of a Turing machine is that is has to be able to update memory one value at a time

    The definition of a turing machine places NO restriction on the representation of its "memory".

    So you are mistaken or misunderstood what I've done. I am *absolutely* free to use arrays of 8 bit bytes that each contain one of 2 bit patterns to simulate a turing tape.

    You've defined a language that can only update eight bits at a time, and additionally you've said it updates them only in certain patterns. That's not Turing complete.

    The language doesn't see 8 bits at a time. The arrays of bit patterns are an implementation detail that the language is not 'aware' of.

    If we want it to be Turing complete, we can interpret it as one value by saying that the LANGUAGE writes "1" and the HARD DRIVE happens to store that physically with eight molecules.

    Ok... yes. Exactly Right. Except that rather than molecules I'm asserting the language simply dumps it to the hard disk using the logical file system already in place (say by Windows or Linux or whatever) the way its stored in memory... as a stream of bytes, containing one of two patterns.

    . Fine. The language can write 1010101, 11111, 0000, 01010, or any other series since it's writing one value at a time. Perhaps the hard drive stores "10" physically as 1111111100000000, but the hard drive is going to read back what was written to it. Write a "1", get a "1" back.

    Right again. So far so good. The language writes a "1" and it reads back a "1". Yes.

    Given that the tape doesn't change what's written to it, the language can write valid machine code and get valid machine code back.

    Swing and a miss... ok not a miss... foul ball.

    It can read and write "valid" machine code back, subject to the constraint that we view it from within the language / language implementation. If we look at what is actually in the physical computer memory, it is not valid machine code from the physical computers perspective.

    This language can compute results that are logically equivalent to machine code, but they are not actually usable as machine code. We can't simply set the ACTUAL CPU instruction pointer to the spot in physical computer memory I'm storing my compiled C program because it is just a sequence of 111111111 and 000000000 ... a representation of the machine code, but not usable machine code itself.

    Further because the language *implementation* provides no way of setting the physical b

  19. Re:Interesting, but not a Turing machine, unless i by Anonymous Coward · · Score: 0

    this is most interesting...why not post this as an article on the slashdot...

    cuz this has nothing though to do with this books review.

  20. Re:Question, what does R do that other lingos cann by Anonymous Coward · · Score: 0

    R has more than just the statistical functions. It has language syntax and assumptions that are different that other languages. Also it is class based. Some of the base level are: Arrays, Matrix's, Vector's and Data Frames. Personally, I don't know the subtle differences ( I am not a R programmer, I work with R programmers)
    Example: Consider a classic array of numbers (intergers not reals), 5 columns (called a, b, c, d, e) by N number of rows, the goal is to square each and every cell.
    a simple bit of code would be a loop on the number of rows, within which there would be a square of column.
    Assuming the array is called M, in R you could simple code:

    M - M^2

    and that's it. You do not need to be concerned about either the number of rows or the number of columns, because this functionality is built into the language itself.
    At my company the R programmers create plots, graphs, and such. While I organize, categorize and present them on a web application.
    Using R (or Sas) or other datacentric programming language is a different way of thinking.

  21. Re:Question, what does R do that other lingos cann by Anonymous Coward · · Score: 0

    Ready...but not to go.