Bayesian Tail
flok writes "We all know anti-spam-software using Bayesian filtering. The results with these are amazingly good. So that made me thinking: why not create a tool which monitors logfiles and determines using a Bayesian filter what events to display and what not? That's why I created btail. Btail is just that: it monitors a logfile and filters it with a Bayesian filter. The results are above my own expectations!"
This is a cool idea but I wouldn't want to use it on to filter logs on important systems... every line may be crucial.
Anyhow credits on a decent idea
A psychopath can't tell the difference between right and wrong. A sociopath knows the difference - he just doesn't care.
Do you have any examples of what type of stuff it learns to filter and what it learns to show? The btail site is kind of lacking of what it outputs versus what it filters
Still very preliminary at this point, but shows promise. Now, to build and try it out!
I am not your blowing wind, I am the lightning.
Did that story end a bit quick ?
Heck, I wanna know what the results are goddamnit. What made the thing so great.
Go Folkert, go! ;-]
The environment I work in is highly E-mail centric and I work on many projects. I would like to see some sort of Bayesian filtering employed to sort all of the e-mails I get into folders based on projects.
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
01:56 Plasma injector #1 offline, switching to #2 backup.
02:23 Overheat in plasma injector #2.
02:44 Failure to shutdown plasma injector #2.
02:58 Overheat in reactor core.
03:20 Containment weakening.
03:25 Containment weakening.
03:30 Containment weakening.
03:35 Five minutes to containment failure.
03:40 FIVE SECONDS TO WARP CORE BREACH!!!
Better be careful to train the filter about those warnings that don't happen very often, but when they do, you really want to know about them.
One line blog. I hear that they're called Twitters now.
Bayesian filtering could be used for lots of things outside of spam. One example could possibly be Wikis, determining spam from ham modifications (well, yes, it is spam here). I've had some other ideas that involve Bayesian, but they've escaped me for the moment.
Before you walk a mile in someone's shoes, you should insult them so you know how they are and what they're doing.
Give him a break--it is the first release, and I doubt he's had much feedback yet.
Pfff.
Did you see the version number?
There are tons of things that need to be redone.
For example the commandline parameters. No, not with a switch statement (or even my if-construction) but with getopt() etc.
www.vanheusden.com - home of Multitail, HTTPing, CoffeeSaint, EntropyBroker, rsstail, bsod, listener, nagcon, nagi
All due respect, you're being a bit hard on the guy. He's not doing badly here.
:|
The [brackets] used in the usage message are standard in the Unix world for specifying an optional or default argument. Just look at any man page. So that, actually, is pretty straightforward. The name of the default config file would likely also be spelled out in the man page, which I would expect, so that's not confusing.
As for changing the if construct into a switch, well, I'm trusting the accuracy of your excerpt, but I didn't find his code to be very difficult to read, to be honest, and certainly not a candidate for DailyWTF, which typically contains laughably horrible code.
As far as other code may go, the guy states that this is in a nascent stage, so jumping on his source files seems like a bit of an easy shot
Chr0m0Dr0m!C
Go Folkert! Your site is still standing, so lets wait and see what happens when your story hits the frontpage ;-)
This space is intentionally staring blankly at you
It's 10 PM. Do you know if you're un-American?
assert(expired(knowledge));
I thought this app was learning everything was in the log, and then only showed the new out-of-the-ordinary log entries that didn't quite fit in with the rest. This would allow to filter out freak events from the log and show them to the user. How different would such an app be from the proposed btail? And how confident would you be about such an unsupervised log analyzer?
--- Sigmentation Fault - Comments Dumped
"I've had some other ideas that involve Bayesian, but they've escaped me for the moment."
Recovering the Slashdot lost since 2000, by eliminating most (-1) material e.g.GNAA,FP,etc. Eliminating the human biasis in the moderation system (Since client-side moderation is out). Tagging interesting material (A Baysian agent).
Why would you run this on an MS system? The critical errors are so common that btail would discard them with the rest of the log file.
do I get a discount if I already have a subscription to Black Tail?
Heil Sig! -Rob
Now even geeks can get a little tail!
"Flyin' in just a sweet place,
Never been known to fail..."
Why is everony giving this guy a hard time. Just wait until Apple wants him to work on the Panther server Admin tool. Think about all those Grandmas using a new G5 in some new complex in Florida. A tool like this might prevent a huge DDoS! It might also be good for people usin Linspire, but I doubt Linspire is in use on that many high speed networks.
Yes, the first poster was right. People who should read every line of thier logs should read every line of their logs.
.\.\att Clare
grep -v ssh
Also, my current installation pretty much breaks things down into
The reporting from the logwatch package also seems to be pretty good.
Step 1: Allow the option to automatically discover and load canned training packages, eg: a directory under /etc. Make it automatically pick the right training file to use when called with a logfile (so eg: btail httpd.conf knows to look for the training for httpd.conf files).
/etc directory.
Step 2: Include btail with major distros
Step 3: Any package for an app that generates logs can come with a ready-made canned training package, which gets dropped into the
That way, you could apt-get a package, start btail-ing its logfiles immediately without the need to tediously train the filter first. Training would still be possible, to personalise the filter.
People who monitor log files know best where to look and what to ignore. It is better to incorporate filtering into the application that generates the logs.
My Linux - (L)ove (I)s (N)ever (U)tterly eXPensive
If anyone else is interested in what getopt is gnu usage and example
That's akin to only filling a dictionary with words that you use. Most people only use 5000-10,000 words. A dictionary contains a multitude more.
In most cases, I'll be looking to spell something mundane (to check it's not "mundain"). Occasionally I'd like to spell something a lot more difficult, eg: "disenfranchisement". (I wish firefox had a spellchecker...) I'd be out of luck, if some filter decided not to include it in the book... I hope you see my point.
click-clack, front and back. I'm not moving this car otherwise.
> Actually a better design would be to not use ANY default config. What's the point? Reduce [crunch] useless choices
You, my friend, have understood the the art of software design.
Too bad most people I work with have not.
Bayesian tail might be neat. I like the idea of broadening the use, but I'd much rather see bayesian filters used on my in-box for more than just spam. I envision a filter that would sort out e-mails based on subject matter. This would have the net effect of improving the filter technology because it's trying to sort e-mails you actually want to look at.
We all know that if the filter makes a mistake and hides a message in the Spam box, and chances are you'll might miss many of them, another the chance to train the filter has been lost. But if an e-mail that was intended to land in the Irate Customer box, instead lands in the Clueless Customer box, the likelihood of noticing it is much greater.
A programmer is a machine for converting coffee into code.
Yes, I contradicted myself somehow. You need to look at the two parts of my reply separately. About Firefox, try what I do and use kedit or the like to run a spellcheker (ALT+T+S). It takes 2 seconds to copy and paste text and I invoke kedit using CTRL+ALT+E (xbindkeys).
My Linux - (L)ove (I)s (N)ever (U)tterly eXPensive
I love Bayes stuff - and there's a very nice Python module written by divmod.
I was playing around with AIML to cobble together a basic chat bot when I realised that I could use a Bayesian parser to radically cut down the amount of AIML that I needed to write. AIML is an XML style of chat bot repsonses, it's clever in that it's highly recursive but the downside is that you need to create a rule for every eventuality.
By adding in a bit of Bayesian guessing before the AIML parser got it hands on the conversation, I'm able to keep the AIML files very focused and give the chat bot a bit more sparkle - you don't have to train him about everything. After a while he realised that 'yo', 'hi' and 'hello' are all the same thing, so he just guesses that you're saying hello and pulls out the correct response from the AIML file (rather than creating an AIML rule to deal with all the variations on 'hello').
If you're interested I'd strongly recommend installing GrokitBot. You can get the source and a bit more explanation at my site, Suttree.com
Playaholics : Free Online Games
Suttree, a weblog about casual games development