Text-Mining Your E-mail
Misha writes "There have been a number of weeks/months in anyone's life that called for a better organization of your Inbox. filtering and folders work, but it'd be nice to have an text-mining tool running in the background that categorized incoming messages by topic as they arrive. It's nice to see that besides NLP research, there are some great algorithmic advances being done, as seen in this paper. Perhaps even one of them Perl monkeys will quickly hack such a background tool." Note: it's a PostScript file.
Somewhat to my astonishment when I clicked on the link up popped a box asking me to confirm Postscript Renderer options! I had no idea that I had anything on this box that could read Postscript.
Some minutes of 100% CPU later up pops a PSP window, with the document rendered in a font about five pixels square. Fair enough, I suppose, for what's basically a photograph editing application.
But really, how bizarre, posting something in a low level printer file format. We'll have people posting documents in PCL5 next.
DBMAIL looks cool, once it supports postgresql it would be awesome.
I have been dissapointed in general with most SMTP, IMAP and POP servers. A real database is the proper way to do things. Email is my #1 app and I want to do complex queries on my archives.
So last year I bit the bullet and wrote a 50 line python program which imported all my mbox and Maildir format archives into a simple postgresql database. 600 megs worth over the last 4 years.
And another simple 50 line php program gives me a web database query interface. It suits my needs now and is much faster than searching through a big (but much much smaller) imap folder with almost every mail program I've tried. With some good design it really shouldn't be too hard to make an industrial strength email database system and I am surprised that it hasn't happened sooner in the open source world.
I think that direct SQL access to the mail database is preferred over IMAP. SQL gives you more capabilities and I find it less problematic than all the various combinations of IMAP servers and mail programs.
Jeff
ipv6 is my vpn