Bayesian Tail

← Back to Stories (view on slashdot.org)

Posted by timothy on Wednesday December 29, 2004 @05:48AM from the seek-novelty dept.

flok writes "We all know anti-spam-software using Bayesian filtering. The results with these are amazingly good. So that made me thinking: why not create a tool which monitors logfiles and determines using a Bayesian filter what events to display and what not? That's why I created btail. Btail is just that: it monitors a logfile and filters it with a Bayesian filter. The results are above my own expectations!"

11 of 63 comments (clear)

Min score:

Reason:

Sort:

examples by rogueuk · 2004-12-29 05:52 · Score: 3, Interesting

Do you have any examples of what type of stuff it learns to filter and what it learns to show? The btail site is kind of lacking of what it outputs versus what it filters
Well, whereæs the story ? by noselasd · 2004-12-29 06:09 · Score: 1, Interesting

Did that story end a bit quick ?
Heck, I wanna know what the results are goddamnit. What made the thing so great.
What I would like to see by bhima · 2004-12-29 06:19 · Score: 1, Interesting

The environment I work in is highly E-mail centric and I work on many projects. I would like to see some sort of Bayesian filtering employed to sort all of the e-mails I get into folders based on projects.

--
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Bayesian is good for almost everything by Ki+Master+George · 2004-12-29 06:59 · Score: 4, Interesting

Bayesian filtering could be used for lots of things outside of spam. One example could possibly be Wikis, determining spam from ham modifications (well, yes, it is spam here). I've had some other ideas that involve Bayesian, but they've escaped me for the moment.

--
Before you walk a mile in someone's shoes, you should insult them so you know how they are and what they're doing.
1. Re:Bayesian is good for almost everything by dasunt · 2004-12-29 20:33 · Score: 2, Interesting
  Bayesian filtering could be used for lots of things outside of spam. One example could possibly be Wikis, determining spam from ham modifications (well, yes, it is spam here). I've had some other ideas that involve Bayesian, but they've escaped me for the moment.
  
  Email sorting filters: imagine a baynesian setup that can decide if a new mail should be sorted into "work", "friends", "ebay", "amazon", "project", etc.
  
  Interest filters: Run slashdot stories and comments through your own trained baynesian sorting system and filter out the stories you probably don't want to see. Do the same for news.google.com, cnn, or usenet.
  
  Music sorter: Can Baynesian filters be taught to understand music (pitch, amplitude, etc?) If so, can they sort on it? If I see a song playing in xmms, can I use my nifty baynesian_sort plugin to play more songs that sound like that for the rest of the day? Consider tying it in to the 'next' button -- if I don't play a song completely, I probably don't want to hear songs like that for the next few days.
  
  IM secretary: Add a 'secretary' feature to your IM client. When you enable it, it will show you only messages that it thinks you want to see.
  
  There are a ton of possibilities available.
Re:If this were Trek... by aoteoroa · 2004-12-29 07:21 · Score: 2, Interesting

True. But if the Star Trek error log resembled real life then it might look more like:
01:37 [error] Overheat in plasma injector #1.
01:37 [warning] Cargo bay door 2 is open.
01:38 [warning] Cargo bay door 2 is open.
01:38 [warning] Oxegen sensor on deck 2 not responding.
01:39 [warning] Cargo bay door 2 is open.
01:40 [warning] Cargo bay door 2 is open.
01:41 [warning] Oxegen sensor on deck 2 not responding.
01:56 [error] Plasma injector #1 offline, switching to #2 backup.

In other words real interesting errors in the logs can get hidden by a bunch of trivial log entries.

I use tail all the time when developing php applications. PHP logs errors to the apache log file so I type:
tail -f /var/log/apache/mysite.com-error.log
To track changes to the apache logs as I test the php pages.

But the truth of the matter is that I am only interested in php errors, and not broken links, and missing images. So if I can train btail to pay attention to php errors like:
[Wed Dec 29 10:58:04 2004] [error] PHP Fatal error: Call to undefined function: badFunction() in /home/aoteoroa/www/pages/info.php on line 1
and ignore file not found errors like:
[Wed Dec 29 11:16:22 2004] [error] [client 192.168.0.2] File does not exist: /home/aoteoroa/www/pages/info-over.gif
it would make my job just a little bit easier.
Why learning with supervision? by MoobY · 2004-12-29 10:39 · Score: 2, Interesting

I thought this app was learning everything was in the log, and then only showed the new out-of-the-ordinary log entries that didn't quite fit in with the rest. This would allow to filter out freak events from the log and show them to the user. How different would such an app be from the proposed btail? And how confident would you be about such an unsupervised log analyzer?

--
--- Sigmentation Fault - Comments Dumped
Bayesian is good for almost everything-Dessert. by Anonymous Coward · 2004-12-29 12:22 · Score: 1, Interesting

"I've had some other ideas that involve Bayesian, but they've escaped me for the moment."

Recovering the Slashdot lost since 2000, by eliminating most (-1) material e.g.GNAA,FP,etc. Eliminating the human biasis in the moderation system (Since client-side moderation is out). Tagging interesting material (A Baysian agent).
Here's how to make this a lot more useful by Julian+Morrison · 2004-12-29 18:21 · Score: 3, Interesting

Step 1: Allow the option to automatically discover and load canned training packages, eg: a directory under /etc. Make it automatically pick the right training file to use when called with a logfile (so eg: btail httpd.conf knows to look for the training for httpd.conf files).

Step 2: Include btail with major distros

Step 3: Any package for an app that generates logs can come with a ready-made canned training package, which gets dropped into the /etc directory.

That way, you could apt-get a package, start btail-ing its logfiles immediately without the need to tediously train the filter first. Training would still be possible, to personalise the filter.
Bayesian by inertia187 · 2004-12-30 06:00 · Score: 2, Interesting

Bayesian tail might be neat. I like the idea of broadening the use, but I'd much rather see bayesian filters used on my in-box for more than just spam. I envision a filter that would sort out e-mails based on subject matter. This would have the net effect of improving the filter technology because it's trying to sort e-mails you actually want to look at.

We all know that if the filter makes a mistake and hides a message in the Spam box, and chances are you'll might miss many of them, another the chance to train the filter has been lost. But if an e-mail that was intended to land in the Irate Customer box, instead lands in the Clueless Customer box, the likelihood of noticing it is much greater.

--
A programmer is a machine for converting coffee into code.
Bayesian AIM bot by duncangough · 2004-12-30 21:31 · Score: 3, Interesting

I love Bayes stuff - and there's a very nice Python module written by divmod.

I was playing around with AIML to cobble together a basic chat bot when I realised that I could use a Bayesian parser to radically cut down the amount of AIML that I needed to write. AIML is an XML style of chat bot repsonses, it's clever in that it's highly recursive but the downside is that you need to create a rule for every eventuality.

By adding in a bit of Bayesian guessing before the AIML parser got it hands on the conversation, I'm able to keep the AIML files very focused and give the chat bot a bit more sparkle - you don't have to train him about everything. After a while he realised that 'yo', 'hi' and 'hello' are all the same thing, so he just guesses that you're saying hello and pulls out the correct response from the AIML file (rather than creating an AIML rule to deal with all the variations on 'hello').

If you're interested I'd strongly recommend installing GrokitBot. You can get the source and a bit more explanation at my site, Suttree.com

Playaholics : Free Online Games

--
Suttree, a weblog about casual games development