Slashdot Mirror


Bayesian Tail

flok writes "We all know anti-spam-software using Bayesian filtering. The results with these are amazingly good. So that made me thinking: why not create a tool which monitors logfiles and determines using a Bayesian filter what events to display and what not? That's why I created btail. Btail is just that: it monitors a logfile and filters it with a Bayesian filter. The results are above my own expectations!"

6 of 63 comments (clear)

  1. Site getting sluggish already by Kiaser+Zohsay · · Score: 4, Informative
    Blockquote from the readme.txt:


    Step 1. compile & install

    make install

    Step 2. configure btail

    Default configuration file:
    db_bad = .btail_db_bad
    db_good = .btail_db_good
    db_conf = .btail_db_conf
    logfile = /var/adm/messages

    db_... are the database files which are filled by blearn. They are
    used as reference when btail calculates if an event is bad or good.
    logfile is the logfile which you want to monitor. As you see, one
    needs a seperate configurationfile AND databases(!) for each file
    to monitor.

    Step 3. learn logging

    blearn -g good_logging
    blearn -b bad_logging

    good_logging should contain events which are considered ok.
    bad_logging should contain logging of events you want to see, e.g.
    disk errors, invalid loggings, etc.

    Step 3. use btail

    btail

    This will read the logfile defined in btail.conf and emit events
    which are considered not-ok by the bayesian filter.

    --- folkert@vanheusden.com


    Still very preliminary at this point, but shows promise. Now, to build and try it out!
    --
    I am not your blowing wind, I am the lightning.
  2. Re:Cool idea but may be dangerous by flok · · Score: 3, Informative

    If you need something that colorizes and/or does regular expression filtering, merging with other (log-)files, multiple windows, etc. etc. then maybe multitail might come in handy.

    Initially I wanted to integrate btail into multitail, but multitail is bloated enough already :-)

    --

    www.vanheusden.com - home of Multitail, HTTPing, CoffeeSaint, EntropyBroker, rsstail, bsod, listener, nagcon, nagi
  3. Re:What I would like to see by tonkdude · · Score: 4, Informative

    I currently use CRM114 and on the mailing list, some one (Evan Prodromou) has created a program that does just this using the CRM114 language. It is called "Monkeyplexer" based on the idea that you could train a monkey to sort your mail box into folders.

    If you pop over to the CRM114 site and search the general list archives for monkeyplexer to find the discussions about it.

    Here is the last version announcement that I could find in my mailbox:

    monkeyplexer is a tool for automatically sorting incoming email messages into appropriate folders. A new version of monkeyplexer, 0.7, is now available. http://bad.dynu.ca/~evan/monkeyplexer/monkeyplexer -0.7.tar.gz

    This version includes the following changes:
    You can specify which mailboxes to use, instead of which mailboxes to exclude. This can save some typing and some time at runtime, at the expense of dynamically updating the list. You can tell the monkeytrainer to only train messages that were received in the last few weeks, days, hours, minutes -- whatever. The monkeyplexer remembers which messages have been trained for which folders. If you train a message for a different folder, the monkeyplexer will automatically forget the first folder before training for the new one. Thanks to everyone who has installed monkeyplexer already. I hope this new version helps some people out. I find it easier and more accurate.

    ~ESP

  4. Re:What I would like to see by rmohr02 · · Score: 2, Informative

    POPFile is exactly what you're looking for.

  5. Reinvent the Wheel Much? by runswithd6s · · Score: 4, Informative
    (Stage Left) Enters the Controllable Regex Mutilator, crm114, with a noticable strut. He's been there, done that.
    CRM114 is a system to examine incoming e-mail, system log streams, data files or other data streams, and to sort, filter, or alter the incoming files or data streams according to the user's wildest desires. Criteria for categorization of data can be by satisfaction of regexes, by sparse binary polynomial matching with a Bayesian Chain Rule evaluator, a Hidden Markov Model, or by other means. Accuracy of the SBPH/BCR classifier has been seen in excess of 99 per cent, for 1/4 megabyte of learning text. In other words, CRM114 learns, and it learns fast .
    --
    assert(expired(knowledge)); /* core dump */
  6. Re:Cool idea but may be dangerous by lars_stefan_axelsson · · Score: 2, Informative
    Why not use it to colorize, Or to rebuild the logs in HTML.

    I published a paper, with GPL source code (you need Python etc) a few months back using visualisation (colorisation) to lend the user insight into the operation of a Bayesian classifier.

    It actually works pretty well, and the idea could be applied to other uses of the Naive Bayesian classifier.

    --
    Stefan Axelsson