Slashdot Mirror


Bayesian Tail

flok writes "We all know anti-spam-software using Bayesian filtering. The results with these are amazingly good. So that made me thinking: why not create a tool which monitors logfiles and determines using a Bayesian filter what events to display and what not? That's why I created btail. Btail is just that: it monitors a logfile and filters it with a Bayesian filter. The results are above my own expectations!"

10 of 63 comments (clear)

  1. Cool idea but may be dangerous by PhilippeT · · Score: 2, Insightful

    This is a cool idea but I wouldn't want to use it on to filter logs on important systems... every line may be crucial.

    Anyhow credits on a decent idea

    --
    A psychopath can't tell the difference between right and wrong. A sociopath knows the difference - he just doesn't care.
    1. Re:Cool idea but may be dangerous by dougmc · · Score: 4, Insightful
      This is a cool idea but I wouldn't want to use it on to filter logs on important systems... every line may be crucial.
      Perhaps, but doesn't the same apply to your email? Every email may be crucial as well -- but if you miss a crucial email because it was buried in spam, isn't the effect the same as if it was caught by an overzealous spam filter?
    2. Re:Cool idea but may be dangerous by bzebarth · · Score: 1, Insightful

      I guess it depends on the kind of log file and what you are looking for. If you are talking about a database log file for instance, all errors lines may start with ERR or something and the log file may contain entries for every login and logout that you really don't care about. If you are only interested in certain types of entries it certainly seems useful.

    3. Re:Cool idea but may be dangerous by cpuffer_hammer · · Score: 4, Insightful

      Why not use it to colorize, Or to rebuild the logs in HTML.

    4. Re:Cool idea but may be dangerous by GlassHeart · · Score: 2, Insightful
      The far more important difference is that we cannot control the generation of incoming email, which is why we are reduced to filtering as intelligently as possible.

      Server logs are not the same at all. The administrator has some control over the logs that get generated, and the programmer has full control. There isn't supposed to be the equivalent of email spam at all, because useless messages should just be filtered or redirected at the source. Leaving everything at "verbose" and relying on filtering just doesn't seem like the right approach to the problem.

      It is a cute idea, though, and probably applicable to some specific cases (no source code, etc).

  2. If this were Trek... by AndroidCat · · Score: 5, Insightful
    01:37 Overheat in plasma injector #1.
    01:56 Plasma injector #1 offline, switching to #2 backup.
    02:23 Overheat in plasma injector #2.
    02:44 Failure to shutdown plasma injector #2.
    02:58 Overheat in reactor core.
    03:20 Containment weakening.
    03:25 Containment weakening.
    03:30 Containment weakening.
    03:35 Five minutes to containment failure.
    03:40 FIVE SECONDS TO WARP CORE BREACH!!!

    Better be careful to train the filter about those warnings that don't happen very often, but when they do, you really want to know about them.

    --
    One line blog. I hear that they're called Twitters now.
  3. Re:This code belongs on by rmohr02 · · Score: 2, Insightful

    Give him a break--it is the first release, and I doubt he's had much feedback yet.

  4. Well, no it doesn't ... by Chromodromic · · Score: 4, Insightful

    All due respect, you're being a bit hard on the guy. He's not doing badly here.

    The [brackets] used in the usage message are standard in the Unix world for specifying an optional or default argument. Just look at any man page. So that, actually, is pretty straightforward. The name of the default config file would likely also be spelled out in the man page, which I would expect, so that's not confusing.

    As for changing the if construct into a switch, well, I'm trusting the accuracy of your excerpt, but I didn't find his code to be very difficult to read, to be honest, and certainly not a candidate for DailyWTF, which typically contains laughably horrible code.

    As far as other code may go, the guy states that this is in a nascent stage, so jumping on his source files seems like a bit of an easy shot :|

    --
    Chr0m0Dr0m!C
  5. Re:This code belongs on by Hard_Code · · Score: 4, Insightful
    That aside, your code would be easier to read (slashcode's broken formatting nonwithstanding) if you used a switch construct.
    Speak for yourself. Given that the switch cases are all mutually exclusive, and disregarding the default case, there are only 2 paths, switch is more obfuscatory than clarifying in my opinion.
    --

    It's 10 PM. Do you know if you're un-American?
  6. Nor for Me by schestowitz · · Score: 1, Insightful

    People who monitor log files know best where to look and what to ignore. It is better to incorporate filtering into the application that generates the logs.

    --
    My Linux - (L)ove (I)s (N)ever (U)tterly eXPensive