Slashdot Mirror


Improving Unix Mail Storage?

At first, there was mbox, then there was Maildir, and Bill begat Outlook and .mbx. CaraCalla wonders if there is a better way to store mail than the way we currently store it today. I admit, with the changes that email has undergone over the past 5 years (changes in what is being sent, not necessarily in how it is sent), it may be time to reinvent the mail format. Read on for CaraCalla's analysis of the current mail options, and his thoughts on where we may go in the future. If you were to design your own MUA, how would you design its mail storage? CaraCalla asks: "Does anybody know a good, free solution for storing mail on unix hosts? The reason that I ask this question is my discontent with available techniques:
  • mbox: There are problems with locking, corruption, access-times, and bloat.
  • Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi) and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
  • Cyrus: Basically the same as Maildir with database features.
  • UW-Imap mbx: That's classical mbox with extensions allowing multiple access.
  • Evolution: Basically mbox with database features.
  • Windows clients: Typically some proprietary db-format. Pathetic.

But the thing that bugs me most is disk space. Typical inboxes are made of 5% to 10% of Text including Headers and HTML. The rest are BASE64- (or UU-) encoded pictures, word documents, zip archives and so on. The problem here is the encoding which wastes considerable amounts of space (at least one third).

Some ideas about the ideal mail-storage:

  • One file per Mailbox-folder, allowing multiple folders per user. Should those files reside in one central location or in users Homedirs?
  • Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?
  • File format: gdbm, Sleepycat db? Something new?
  • Should the security model allow users to directly access their files, grep them, copy them around?
  • Shared folders, virtual domains?
  • Unicode support in folder names? Imap message-IDs, flags, useragent specific state-information?
  • How would MTAs deliver mail? How would clients access? File-locking (NFS)?
  • What about backwards-compatibility? Writing libmailstore (anyone)? adopting UW c-client?

Does my ideal mailstorage exist somewhere? Is somebody working on a project addressing this? Does anybody have some other hints? And please no mbox/Maildir flamewar!"

172 of 554 comments (clear)

  1. One folder to rule them all... by Pig+Hogger · · Score: 5, Interesting
    Stuff the mail in one folder, one to rule them all.

    But put multiple indexes (by sender, subject, date, whatever key-classes you want to assign messages) and the possibility to restrict the range displayed. With careful programming, you can manage many users who won't be able to read each other's mail, except as required.

    This way, you can arrange your mail as you please.

    No more message duplication. Send a memo to 250 people? Just send it once, but tag it as readable by the 250 sendees.

    Of course, this calls for an SQL database... :) :) :)

    1. Re:One folder to rule them all... by TheBracket · · Score: 4, Insightful
      You do realise that you just described MS Exchange (albeit in a chronically simplified form), right? :-)

      Exchange is actually a pretty decent mail server, although only using it for mail is pretty dumb - its groupware features are the killer app. It exposes both benefits (in particular, single storage of messages with multiple recipients) and flaws (if your db goes boom, it affects all your users - or at least all your users in a given mail partition) of database-based mail storage.

      I remember seeing a project to combine mail storage with PostgreSQL a while ago. Anyone know what happened to it?

      --
      Lead developer, http://wisptools.net
    2. Re:One folder to rule them all... by mikefoley · · Score: 3, Interesting

      Do they still not have single mailbox restore? Do you still need to build a seperate exchange server just to restore a single mailbox or message?

      Exchange made my life miserable for many years in the 93-95 timeframe. It might be better now.

      The concepts weren't bad (db for mail, etc) but the execution was terrible. I was field testing Exchange (for Alpha NT) when I was at DEC and asked the Exchange manager point blank about single mailbox restore and he said "Why?" My answer "When my boss wants that email he really needs yesterday, you're telling me I have to build a totally new system and restore 8GB (at the time) of data just to restore a single mail message????"

      "Uh, yea?"

      No thanks...

      --
      What's my Karma Mr. Burns? "Excellent"
    3. Re:One folder to rule them all... by alen · · Score: 2

      In exchange 5.5 it depends on your back up program. We have it with Veritas net backup. Exchange 2000 it's out of the box. And with Veritas or Commvault software you can do single message restore. Or there is deleted item retention time out fo the box. Even if user empties the trash folder the message will still be there fot the number of days you specify on the server side admin program. You just recover the message in outlook client.

    4. Re:One folder to rule them all... by ffattizzi · · Score: 4, Informative

      It's called a brick level backup, and most Exchange admins don't use them. The better setup is to set a reasonable deleted item retention policy. I set mine for 60 days. If I need any email deleted in the last 60 days, I can get it with out any restore, mailbox or otherwise. Works great.

    5. Re:One folder to rule them all... by Malcontent · · Score: 3, Informative

      James from the apache group can use an SQL datastore.

      --

      War is necrophilia.

    6. Re:One folder to rule them all... by Ageless · · Score: 3, Insightful

      If you think that you can replicate what Exchange does in "a couple house of time" you have not been at it long enough.
      There are two excellent reasons that so many people use Exchange.

      1) In general, it works out of the box. A company with someone with meager knowledge can set up a fairly complex mail handling system without much help.

      2) It does A LOT. In it's most basic configuration it does what you need 10 or more programs in Linux to do, not to mention that most of those 10 don't exist.

      Rage against the machine all you want, but when your boss says you will have shared contacts and calendars and your clients will run Windows; find me a solution that comes within miles of the ease of Outlook and Exchange and I'll give you a cookie.
      Actually, I'll probally give you several thousand dollars.

    7. Re:One folder to rule them all... by tupps · · Score: 2, Informative

      Samsung Contact:

      http://www.samsungcontact.com

      Which is based on HP OpenMail. About 1/6 the cost.

      --
      Go out and get sailing!
    8. Re:One folder to rule them all... by jcoy42 · · Score: 2, Interesting
      Exchange is actually a pretty decent mail server

      The problem is, mail is a critical app, and in some ways exchange *really* misses the point.

      I think the most frustrating situation I've ever seen as a system administrator was when we were doing a scheduled reboot of the exchange server. After about 20 minutes waiting for it to resync itself & shutdown, my boss, one of those "smart enough to be dangerous types", decided it was a typical NT hang & to go ahead and hit the reset button (he did it in a single motion without a word, there was nothing anyone could do).

      It took almost 2 weeks to get things straightend out. We had backups, but it turned out there was a 2048 meg bug with NT restores that had been re-introduced by a recent server upgrade, and we had problems getting the patch rolled into the new code (legato tech support- need I say more?).

      350 people *screaming* for 2 weeks. I was very glad I was not the mail administrator.. but very sad to be sitting next to him.
      --
      Never trust an atom. They make up everything.
    9. Re:One folder to rule them all... by H310iSe · · Score: 4, Interesting

      ditto. Either exchange is impossible to administer well or just very, very hard. Until recently you couldn't restore single users mailboxes (there was brick backup but eventually even MS admitted it didn't work) you have to restore the entire server to get back one corrupted data store.

      At one firm I was at exchange went down and it took 3 days of 24 hour work to get it back up (I guess we were 'lucky') - the solution? a $50,000 backup server that does absolutely nothing but wait for the main exchange server to go down. First time we had to use it, we were down for a day, it didn't come up.

      I'm currently looking for a mail server, any server, that does mail well for 50 - 500 users (I'd settle for 50-100). I've played w/ xMail, it's tough to config. Heard good things about WorldMail (qualcomm?) but not used it. Heard free BSD's qmail (?) is good as well. I'm very interested in anyone who has info about free or cheap mail servers that can be configured in a day or two of work. If that exists.

      --
      closed minded is as closed minded does
    10. Re:One folder to rule them all... by BlueUnderwear · · Score: 3, Funny
      If I need any email deleted in the last 60 days, I can get it...

      ... and so can the FBI, the SEC, and the Attorney General. Using Exchange should not be an excuse to also repeat Bill's other mistakes ;-)

      --
      Say no to software patents.
    11. Re:One folder to rule them all... by qeL3-i · · Score: 2, Informative

      Postfix is good, free, and open source. It's also easy to configure. You should be able to get it going in about half an hour.

      As for Microsoft and Exchange Server, aren't they convicted criminals? I don't want to use software made by criminals. If they are willing to break the anti-trust laws, what other laws might they be willing to break? I don't trust them with my email.

    12. Re:One folder to rule them all... by lynnroth · · Score: 3, Informative

      I would definitely recommend XMail Server. Cross platform (Linux, FreeBSD, WinNT/2K/XP, Solaris), runs multiple domains with no problem. Not really that hard to set up if you read all the docs. There are several web config apps for it now and it's not that hard to program against the TCP config interface. It's being actively developed (new release every month or more often if a rare bug comes up.) It's licensed under the GPL. I use it with about 30 domains with 4-20 users per domain. I have had 0 problems with it. Easy to use, easy to upgrade (just copy the new binaries) no complaints.

    13. Re:One folder to rule them all... by Zathrus · · Score: 2

      Domino doesn't need more publicity. It needs to be put into a grave. And then shot.

      Domino well predates Exchange (Lotus Notes was the precursor to Domino), and it's generally considered expensive, hard to admin, and difficult to use. It is, however, much more flexible than Exchange.

    14. Re:One folder to rule them all... by mikefoley · · Score: 2

      Yea, ok, whatever.. I messed up the dates. I was involved with Exchange from the beta period on. Made 2 trips to Microsoft during the beta. That was around 95. It was all painful.

      First thing that goes after 40 is the memory.. Second is the...er...ah....

      --
      What's my Karma Mr. Burns? "Excellent"
    15. Re:One folder to rule them all... by Ageless · · Score: 2

      I am actually a Certified Lotus Notes Programmer Thing, it being required for a previous job. I know Notes / Domino inside and out and it doesn't come anywhere close to the ease of use of Outlook.

      Outlook is so popular because it's GUI is quite good. (Like many of the MS products). I freely admit that it's storage sucks, it's servers and protocols suck but to the end user that doesn't matter. They want a nice GUI they can use without a tech guy standing over their shoulder and Outlook does provide that.

    16. Re:One folder to rule them all... by Ageless · · Score: 2

      It won't be stamped out until the need for email is stamped out. Every single company, no matter how small needs email these days with few exceptions. The problem arises that getting a decent admin, and certainly one that can make a Linux or UNIX server sing does not come cheap. It's much cheaper to pay $5000 (or much less, if you buy it with your boxes) for Exchange and have the one guy that knows a little about computers set it up.

    17. Re:One folder to rule them all... by swb · · Score: 2

      Exchange is actually a pretty decent mail server, although only using it for mail is pretty dumb - its groupware features are the killer app.

      Icky. Exchange does well enough for email and scheduling, but anything else requires you to dump your life into the black hole of the exchange database.

      We've been fortunate with our Exchange installation -- lots of AV software, excessive hardware and limited use of the groupware functionality have kept it stable and functional.

    18. Re:One folder to rule them all... by guinsu · · Score: 2

      Try iMail for NT/2000, my company has been using it for about 3 years. Its not perfect, but its pretty good, and it hasn't lost any data yet.

    19. Re:One folder to rule them all... by smnolde · · Score: 2
      exim or postfix.

      'nuff said.

    20. Re:One folder to rule them all... by Ben+Hutchings · · Score: 2

      Those just deliver mail, either to other programs or into simple mailboxes; they don't provide any facilities for reading or searching the mail afterwards. It's easy enough to integrate either of these with Cyrus, though, which will do that.

    21. Re:One folder to rule them all... by morzel · · Score: 2
      Exchange is actually a pretty decent mail server, although only using it for mail is pretty dumb - its groupware features are the killer app.
      <IRONIC>
      What groupware features?
      </IRONIC>
      Exchange is a reasonably stable email-platform, but calling it 'groupware' is paramount to calling a mini cooper a luxury sedan.
      Single Copy Object Stores (that's what they're called in LotusSpeak) can be advantageous in some cases, but there ain't no free lunches: it's more difficult to manage, and if something's corrupted or needs te be restored from back-up, you're SOL.

      --
      Okay... I'll do the stupid things first, then you shy people follow.
      [Zappa]
    22. Re:One folder to rule them all... by Anonymous Coward · · Score: 2, Informative

      when your boss says you will have shared contacts

      LDAP

      and calendars

      CorporateTime (http://www.steltor.com)

      and your clients will run Windows

      Both of these work fine on Windows


      find me a solution that comes within miles of the ease of Outlook


      You can even keep using Outlook; LDAP is supported by Outlook, and Steltor provides an Outlook plugin that talks to their server instead of Exchange.

    23. Re:One folder to rule them all... by darkonc · · Score: 2
      I had sendmail running a server for 100,000 users. It was slightly modified, however.
      The user database for most of the users was a very simple fixed-record database. this made for fast access. (later migrated to IMAP)
      Email was stored in a separate directory for each user, and the user directories were hashed into a tree, with about 100 (or was it 10?, this was a while ago) users per leaf directory.
      heavily RAIDed -- more than a dozen disks to store the email. This distributed the I/O cost.
      The server had 4 processors (200Mz each -- it was a while ago) and 2GB of ram.

      I was the only person directly responsible for the mail server, and I considered it a sign that something was wrong if it took more than 5-10 seconds for email to get delivered (users with 400MB mailboxes that insisted on checking them every 5 minutes excepted).

      --
      Sometimes boldness is in fashion. Sometimes only the brave will be bold.
    24. Re:One folder to rule them all... by ahde · · Score: 2

      The problem is that people don't want an alternative. They want exchange+outlook and if there is anything different (down to the icons) they won't use it. There are alternatives to Exchange. Anyone can whip up a web-based alternative to the "killer apps" of exchange, the addressbook and scheduling. But people won't use anything that doesn't work exactly the same with outlook, or that requires installation, because Dell with sell you a server with exchange already installed, and clients with outlook.

    25. Re:One folder to rule them all... by crucini · · Score: 2
      Want a standards-based SMTP server with server-side calendaring that works nicely with Outlook and the plethora of email clients? You want this affordable Intel based application!

      From http://www.bynari.net/bynari/products.html.
      The server runs on Linux, of course.
      Unfortunately, the linked page does not render for me in Netscape/Linux.

      Steltor, whose site seems to be broken, makes good scheduling apps that can connect to Outlook. Their server runs on lots of OS's, including Linux. I know one customer, and he's happy.
    26. Re:One folder to rule them all... by Electrum · · Score: 2

      Are you sure that you want to be using Postfix? I don't...

      http://cr.yp.to/maildisasters/postfix.html
    27. Re:One folder to rule them all... by Electrum · · Score: 2

      Yes, and there is a VERY good reason why it does that:

      http://cr.yp.to/proto/verp.txt
    28. Re:One folder to rule them all... by Beliskner · · Score: 2
      Here's the secret: many IIS have no admin. Some manager dude double-cliked on setup.exe and that was it.
      Administering of server by people with "meager knowlege" as you put it... should really be stamped out
      What are you suggesting? Making it illegal to set up a webserver without a licence?? Who'll issue this license? The Government? Great, that's game over in the censorship area.

      Personally I'd much rather have Code Red flooding my servers all day than having to hand control of webservers over to the Government. How would you police this? Block port 80? Wouldn't this set a precedent for the (RI|MP)AA to block Kazaa ports by requiring the reconfiguring of routers that give connectivity to all Kazaa supernodes? Your statement is absolutely unacceptable.

      --
      A caveman dreams of being us, the incalculable power and riches. We dream of being Q, then what?
  2. My fast, easy solution. by crazney · · Score: 2, Interesting

    Well. My solution for storing ALOT of BIG email but still browsing fast is to use MySQL. My mail client is Pronto! (written by Muhri, in perl, gtk, etc).. I have several 10's of thousands emails in about 10 different folders. Reaction time is immediate and searching is pretty damn quick aswell.
    The mysql server is at work, and I can view my mail from anywhere simply by pointing my client at my IP. Presto.

    I'm also slowly writing a MySQL-based IMAP server which will hopefully be compatible with Pronto!... But as with so many projects, itl probably take some time to complete...

    David

    --
    stuff
    1. Re:My fast, easy solution. by crazney · · Score: 2, Informative

      when you say big emails....I assume that you mean less that 16 MB to handle MySQL row limitation. We have users who want to send 30 MB messages. Damn artists.

      Nope, This limitation disapeared ages ago..

      Information can be found here here and here

      I suggest opening up the config file (generally /etc/mysql/my.cnf) and ensuring everything like "max_allowed_packet" etc are > 50-ish MB.

      David

      --
      stuff
    2. Re:My fast, easy solution. by cybermage · · Score: 2

      We have users who want to send 30 MB messages. Damn artists.

      In all seriousness, get these lunatics to use some kind of P2P solution. Heck, even AIM is better than hosing your mail server.

      A friend sent me 42MB worth of zipped MP3's over AIM with no trouble. Took about 5 minutes, or so. (me=cable, him=~T1)

    3. Re:My fast, easy solution. by vsync64 · · Score: 2

      But tar.gz is compressed. Maybe you just meant .tar.

      --
      TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
    4. Re:My fast, easy solution. by cybermage · · Score: 2

      Actually, now that I look, the seperate files total 47.8MB and zipped were 46.3MB; So, there was some compression. However, it would still be more handy as a .zip than seperate files even if the "compression" made the sum greater than its parts.

      The real point of my post though was that large files have no business being sent via email. P2P solutions are faster and friendlier; And, although not typically used for this, AIM is probably the most common.

      Peace.

  3. yEnc by Glytch · · Score: 2

    If you're so worried about encoded binaries, why not try yEnc instead of base64 or uuencoding? It works well in newsgroups. It might work well for email storage as well.

    1. Re:yEnc by fejjie · · Score: 2, Informative

      *sigh*

      yEnc is a complete waste of time. Had the author of yEnc actually gone out and read some pre-existing MIME specifications before going out and re-inventing (a square) wheel, he would have found that MIME already defines an encoding that gets even better compresion than yEnc. It's called "binary". Yes, MIME can handle binary content.

      Content-Transfer-Encoding: binary

      it's as simple as that.

      Btw, I've implemented the yEnc specification in my library GMime

      My favourite part of the yEnc authors defense for why he implemented yEnc is "but most news clients don't implement MIME". Hah, join the real world where NO news reader implemented yEnc. (yes, I know there are clients that implement it now, in fact my code is used in a few of them).

      Believe me as someone who spends time hacking on news and mail readers, yEnc is nothing but a headache.

  4. I like MS Exchange by alen · · Score: 5, Informative

    A single database to hold of the user's email. Single instance storage ensures that only one copy of any attachment is in the database at once, no matter in how many email messages it was sent in. API's for back up let you back up the whole database or individual mailboxes. And depending on your backup solution you can restore mailboxes and individual emails. Anti-virus software that integrates into the server side of the software. In Exchange 2000 if you accidently delete a mailbox you can easily bring it back with all emails without restoring from tape. Only files to worry about on the user end is a personal address book and archived email. Unless you use POP3 or it's archived in personal folders the email always stays on the server preventing problems like accidentaly downloading important emails you need at the office being on a home PC. And it's stable. Not as stable as UNIX I admit, but it stays up for months without a reboot. And in my experience most problems are solved by a simple reboot. In 4 yeas of exposure to exchange, the only non-admin related problems I've seen were 1 database corruption where I needed to run a utility and wait 45 minutes for it to work again. And a corrupted MTA that needed a reboot to get it working right again.

    1. Re:I like MS Exchange by BrookHarty · · Score: 2

      We do the same thing, 1 Large multi-TB oracle database, ldap front ends. Of course this is for Voice Mail(encoded), SMS and Email. Not cheap, but its pretty standard, all the vendors seems to offer the same configuration.

      I think the sweetest thing is how 1 object(voicemail/etc) can go be tagged for a select group of people. Theres Garbage Collection, extra storage, all kinds of handy features. Just a well thought out, easy to manage, solution. Thou it costs :)

      Oh by the way, its Unix baby, ya ya. (-;

    2. Re:I like MS Exchange by alen · · Score: 2

      Backups are important. I work for an internal IT in a company and we do them every night. Fulls on weekends.

    3. Re:I like MS Exchange by alen · · Score: 2

      And I forgot to add. Deleted item retention. You set the number of days. The user deletes an email and empties the trash. The email is still in the database for that number of days and can be restored without back up software.

    4. Re:I like MS Exchange by alen · · Score: 2

      I ran isinteg with all options and to fix everything on a 12GB database and it took 30 minutes to fix it. Another time we lost power a few times in a few hours. Decided to run isinteg since we had sudden shutdowns on the server. Same database took 45 minutes that time. We had 20,000 warnings after multiple power losses and a UPS failure at the same time. And this was a huge APC UPS that wasn't wired right by the previous generation of admins and electricians. 6 hours? You must have some old hardware.

    5. Re:I like MS Exchange by afidel · · Score: 2, Insightful

      I liked this until the server (well cluster actually) that served our EMEA operation fell over. EMC, Compaq and Microsoft fought over who was at fault and in the end 22 hours later the thing had been rebuilt and restored from tape. This was a solution put together by a Microsoft Premier Support partner that was supposed to have 5 9's availability and fell over in its first couple months! Instead of 0 lost email we had all emails that hadn't been in the last tape cycle lost along with any emails that timed out waiting for the server to come back up, not only that but noone could read their email for an entire day (2 business days actually).

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    6. Re:I like MS Exchange by Slashamatic · · Score: 2
      All these people posting anonymously. Nobody should punish you becase you run Microsoft, you must suffer enough already. However the original poster and the replies really put their finger on it.

      MS Exchange isn't a bad thing. It is quite useful and a lot easier than having zillions of mail files. Unfortunately, being proprietary, it is difficult to repair because you don't have the sources to hack around yourself. Even if a standard MS database was used like SQL Server, there would be more possibility of sucessfully repairing the thing. With a fully open-source message repository, it would be even better.

    7. Re:I like MS Exchange by Slashamatic · · Score: 3, Insightful
      Backups are important

      Sorry, restores are even more important. I hope you check your backup strategy by trying a recovery every so often. Many a time I have heard people who "thought they had a backup" and then it turns out that the thing that was being backed up was in an inconsistent state.

    8. Re:I like MS Exchange by Telastyn · · Score: 2

      One thing to consider with Exchange though is the gigantic bloat the thing is, and how much hardware is required to run the thing in comparison to *nix mail servers.

      Example: 100 person little tiny company:

      Exchange:

      Win2k server + MS Exchange ($5k? for proper licensing)
      Dell Poweredge 2550 2x1.8ghz 4gb ram ($10k)

      This should support you to (*maybe*) 150 people.

      Unix (assuming i386 hardware)

      *BSD + sendmail/qmail (free)
      Dell Poweredge 2550 2x1.8ghz 4gb ram ($10k)

      This should support you to probably around 250 people.

      Anti-virus software can run on the server side here as well (Norton has a version for unicies), and if you accidentally delete a mailbox (why did you do that again?) you can restore from tape much easier than with Exchange (yes, I've had to do both).

      And god knows you don't have to take the bsd machine down for an hour each week to patch it.

      And you can actually go through and run proper scripting on mail with the unix solution (spam catchers, conditional distribution lists, proper server side OoO replies).

      And the unix solution will give you proper logging of messages and mailboxes.

      And the unix solution will require far less attention by the IT admin(s) that will likely be woefully understaffed to begin with.

      And the unix solution will be faster, cleaner, more reliable, more scalable, more compatable to clients, while still being about a third cheaper.

      Not to mention of course that Exchange machines are a security liability, and should never ever ever be deployed into a hostile environment (ie: the internet, where mail comes from; or any company, where there's (statistically) someone with less-than-good intentions trying to get payroll info and the such.)

    9. Re:I like MS Exchange by Sabalon · · Score: 2

      Uh...

      Dell PE2550 2x1.13 w/2GB RAM - $7k
      Supporting about 800 people just fine.

      I will agree on the "hidden" costs though - Ex2k requires active directory, so you need an NT server w/ however many licenses and that'll hurt you, in addition to the exchange license.

      You can run anti-virus right on the exchange server intergrated into exchange, and with exchange you can set it to retain deleted mailboxes for a period of time before they are purged - removing the tape restore altogether. If you backup individual mailboxes, then it's just as easy for tape restore.

      The logging and scripting is true, though I don't agree with the IT admin attention - our exchange and Unix mail machines are both pretty problem free. As for security - uh...how many times has sendmail been the root of problems? ;)

      Now that I'm done with that :
      To quote our exchange person when upgrading to Ex2000 "Why don't we just go ahead and put all the e-mail on Linux?"

    10. Re:I like MS Exchange by Sabalon · · Score: 2

      How many times has exchange been the root?

      Now outlook is a different story!

      We have 2GB ram, right now it as 600M avail and 726K in system cache and has peaked at 1.3G of RAM used. That is with about 70 active users out of our total.

    11. Re:I like MS Exchange by ahde · · Score: 2

      of course, that Dell Poweredge 2550 2x1.8ghz 4gb ram isn't used for the email. That's the admin's toy. He put the mail server on an old P100.

    12. Re:I like MS Exchange by Sabalon · · Score: 2

      We buy almost all of our servers the same specs - that way they are pretty much interchangable and can handle extra load if they ever need it.

  5. Re:why use a 'file' at all? by alen · · Score: 3, Funny

    MS Exchange has been doing this for almost a decade now. Version 5 was the first decent one to get and 5.5 is great.

  6. I vote for a filesystem-based database by Dr.+Awktagon · · Score: 5, Insightful

    Something like Maildir .. if the FS is slow and can't handle that kind of application, then we need to improve our filesystems!

    Lots of applications need lightweight databases with indexes, locking, and atomic operations. Why not bake this into the filesystem, and it won't have to be just for email, it will have many uses.

    I was thinking about this the other day as I was working on a logging system for a large in-house email filtering system.. similar problem, except instead of storing emails, I'm storing small XML fragments describing the structure of each email and what was done to each. So far the easiest solution was large monolithic XML files, and an external index pointing in the large file (i.e., like mbox + a DB index). As it grows we'll probably have to move it to a "real" database.

    There is a need for something like sleepycat DB + ReiserFS on steriods..

    1. Re:I vote for a filesystem-based database by GoRK · · Score: 3, Interesting

      This, as several other threads note, is the approach that Hans Reiser is taking with his filesystem. That is, if the filesystem is not good enough for storing our (large-grained) data, that we are resorting to what basically amounts to indexed archive files or databases full of BLOB objects to store our data, then our filesystems are broken. A directory with 1,000,000 files in it shouldn't take any longer to return a sorted directory listing than one with say, 10 files - because it all should be indexed behind the scenes. Same for the problems of inode starvation, fsck, etc. A program such as mail clients and servers (for most people anyway) -- or any other apps that need simple storage should use the filesystem as the storage mechanism.

    2. Re:I vote for a filesystem-based database by Slashamatic · · Score: 2
      RDB is not built into any version of VMS. It is layered on top of VMS (these days, it isn't even from the same people). RDB layers its storage containers on top of a standard VMS file system. However the standard VMS filesystem is rather a lot more powerful than many with integral ISAM support and the possibility of recovery-unit journalling (you pay extra to turn it on, but it comes as part of RMS).

      There were systems with DB filesystems, but that was stuff like MUMPS or Pick.

    3. Re:I vote for a filesystem-based database by ahde · · Score: 2

      The trick to this is to make it compatible with other file systems. And that means deciding what metadata gets tossed when you convert to raw text. A file system should hold data. The more data you include in the filesystem itself, the more data you will lose in conversion. And the harder it will be to maintain and restore data. Because no file system is permanent. Think of MICRO~1.

  7. The Reiser guys have some ideas. by SwellJoe · · Score: 5, Informative

    I've followed ReiserFS development for years now, shipping our first servers with it some two years ago (and every box we've shipped since then), and I believe they have the best long-term plan for this kind of thing. Hans has written some excellent white-papers on making small files extremely cheap.

    The eventual goal of Reiser is a filesystem that is indistinguishable from a powerful database (if a special purpose database). The plan is to make small files so cheap that every extension of a file, directory, etc. is just another file. Another interesting turn is that files would no longer be, necessarily, of the form '/big/long/path/to/some/file'...because the filesystem is a database, one could also access it by a category, so that one file read pulls in all of the data of that category (from any number of files). Directories become just one view of the data available, with any number of other views possible depending on the application.

    As was mentioned in the parent, this would lead to things like 250 email recipients and only one actual file. But of course, this leaves out the copy-on-write functionality needed to make this seamless.

    So I think the solution is probably to fix the filesystem--not to fix the email storage mechanism. A number of very smart people have 'fixed' email storage in the past, leading to all of the options we have today, none of which works extremely well on really large mailboxes. Yes, many are good enough, and many work fabulously for small to mid-sized applications. But the day will come when they do not work so well, due to the higher volume and growing average size of emails.

    A good place to start for information about these ideas (which are primarily a consolidation of the most interesting research in the field of filesystems and databases):

    http://www.namesys.com/whitepaper.html

    ReiserFS is good stuff. Give Hans' papers a read sometime.

    BTW-Don't gripe at me about ReiserFS instability, etc. I know better. As I mentioned I've been shipping servers with it for 2 years, and we've never had a single ReiserFS-caused corruption. Not one.

    1. Re:The Reiser guys have some ideas. by SwellJoe · · Score: 5, Insightful
      I have also heard from someone who does Linux consulting who won't use ReiserFS. Overall, I don't call it stable.


      Heheh...I read a funny quote here on slashdot earlier today that I think applies:

      The plural of anecdote is not data.


      I've heard from a lot of people who consider themselves experts that ReiserFS is not stable, never has been, never will be, all that fun stuff. But I know better, because I have data. Hard numbers...I know I can run a Squid box harder and at higher loads for longer on ReiserFS than ext2 or ext3. I know that I can run a Squid machine for 2 years with ReiserFS cache partitions with uptimes over a year, with the reboot after all that time being for a kernel upgrade.

      Yes, there have been data corruption issues for some people for ReiserFS. But I'm on the ext3 and jfs mailing lists as well...I know they have data corruptions of their own. It's a fact of life when dealing with computers, things go wrong for everyone at some point. I simply don't believe the masses when they tell me ReiserFS is not suitable for production use, because I have more machines to administer than the vast majority of slashdotters, and I believe I can trust ReiserFS. I trust my opinion above most.

    2. Re:The Reiser guys have some ideas. by Ian+Bicking · · Score: 2
      I've been studying WebDAV, and have been excited about how it presents a network storage that seems much more general than a typical filesystem-based metaphor -- kind of making the dynamicism of web applications available at a lower level to the OS.

      What is intriguing here, is that the level of granularity that you talk about with ReiserFS would map well with WebDAV. Running mod_dav for Apache is fine, but unexciting -- the underlying filesystem storage that Apache is so closely tied to is awkward and lacks good granularity and flexibility. But with a more powerful filesystem, it could go much further.

      In a lot of monolithic client-server architectures, I see systems created where the OS is insignificant -- just a dumb layer of hardware compatibility. You dump all the data in one file, with internal structure you define yourself. You almost always do your own permission structure -- traditional Unix permissions are worthless in most new domains (IMHO). All you need is a socket interface and a disk interface, everything else you write yourself.

      This is a shame, really, because you are reimplementing things the OS should be doing... but OS design is stagnant. Maybe that's fine, but I don't even see much ambition among Linux kernel programmers (or BSD or other Unices)... they're working off an old model that is fine at what it does, but not helpful for new systems. They don't seem to mind that they are being made more and more insignificant... maybe that's good, they aren't holding onto power or being territorial, but it really is true that there isn't much innovation there. (There is innovation in non-kernel applications, mind you, just not much in the kernel or most basic libraries like libc)

      It's nice to hear ReiserFS people are thinking about real progress, not just little tweaks.

    3. Re:The Reiser guys have some ideas. by dinotrac · · Score: 2

      The problem seems to be that everybody's mileage may vary, and the thing you just had a problem with sucks worse than an overclocked Hoover until the thing you replaced it with has a problem.

      Personally, I've used Reiser for the last couple of years thanks to cats, small children and a less than 100% reliable power supply. I had suffered corruption with ext2 that made my life Hell. That doesn't happen any more.

      Whether it's good for anybody else in this world, Reiser is great for me.

    4. Re:The Reiser guys have some ideas. by Uruk · · Score: 2

      The eventual goal of Reiser is a filesystem that is indistinguishable from a powerful database (if a special purpose database)

      Why? Why do we need the all-singing, all-dancing filesystem when we've already got database pacakages that are mature and effective?

      A filesystem should be a filesystem. You don't see mail applications trying to add features to remotely configure the server they're sending mail to - that's because they stick to what they need to do, mail.

      UNIX - do one thing, and do it well. Leave database functionality to the packages that already do it well and have been for more than 10 years.

      --
      -- Truth goes out the door when rumor comes innuendo. -- Groucho Marx
    5. Re:The Reiser guys have some ideas. by phee · · Score: 2

      Ok. Real-life, hard data?

      I ran Reiser over a year on an 80-gig partition. It started out just fine; speedy, recovered instantly upon reboots, etc. But as time went on, it got slower... and slower... and slooooower... until I got so fed up with it I got another 80-gig drive just so I could get rid of Reiser. It's all on ext3 now and literally five to ten times faster. It took a good 10 seconds just to start xv on Reiser (my machine is a 1.2-GHz, ATA100, 320M of RAM, linux 2.4.18); on ext3, it takes half a second. Netscape went from a minute to load to a mere five seconds. Adding indices to a Postgres database to speed up searches made it SLOWER because of the extra disk access. I was lucky to get 2 meg/second data transfer rates on file copies with Reiser; with ext3, I get 15 meg/second. Sustained. And bear in mind; these comparisons are all done on the same hardware, same kernel version. And no, since I can sense you all about to ask this, I didn't have the Reiser Debugging enabled in the kernel.

      Stay away from Reiser unless you only make 50-meg partitions. Trust me. Sure, it's "stable" and doesn't corrupt data, but ext3 combines the best of ext2 and Reiser... and doesn't bring your machine to its knees.

      --

    6. Re:The Reiser guys have some ideas. by ahde · · Score: 2

      the reason files are flat is because that is how they are stored. There is this little magnetic disk that is just a long circular line of ones and zeros. In order to read or write anything from that disk, it has to be done in sequential order (it could be parallelized, but so far that isn't practical).

      So, your rich data has to be flattened out. Whether you like it or not. And at some point, whether it's the filesystem, the hardware drivers, or a chip with hardwired instructions and a cache (essentially re-implementing a large part of the os), eventually, someone is going to have to deal with that line of ones and zeros.

      And that's what a filesystem does. You can obscure it, you can use links and indexes and graphical icons and properties files and block level meta-data, and hide it all from the user, but you can't take away the functionality by covering it with more layers of abstraction.

      Abstraction is good. But not at the cost of functionality. I'd argue filesystems are too complex already. Look at all the tools needed already for dealing with inodes and blocks and timestanmps and so forth. That's why somes databases want to chuck it all out and start from scratch and work with the raw data.

      Of course, some things, like binary tree search and journalling suffer when abstracted too far, so they're built into the filesystem. But stuffing the filesystem full of metadata isn't a compromise at all. The only thing that saves is a few file descriptors. And reieser's answer is that if files are too expensive, make them cheaper. Why not store that meta data in a separate file (or several) so that you know exactly where it is, without having to parse the every file-header for every access.

  8. Portability by Pretzalzz · · Score: 2, Insightful

    The great advantage of the current system is that it is very easy to move your e-mail from one program or computer to another with little hassle and/or risk. With any type of database system, you introduce a level of complexity that virtually assures that only one e-mail program will be able to read your e-mail. I think the best solution as far as I am concerned is to just stick with current mbox format, but allowing attachments to be deleted independently though that is just personal preference. But I think we should be wary of adding any complexity that endangers the portability of mail. Also, the other thing to be said for the mbox format is that worst come to worse you can still access your e-mail with a text editor and/or grep.

  9. Something to keep in mind... by cwinters · · Score: 5, Insightful

    /. punchingbag jwz has some strong opinions about using databases (etc.) for mail storage. I tend to agree: everything can read from and write to files, there no versioning issues, they can be easily transported among different operating and file systems, they can be backed up easily. But it's another wheel to reinvent, so everyone hop to it at once and then lose interest in two or three weeks!

    --

    Chris
    M-x auto-bs-mode

  10. Quantum-like Storage by cybermage · · Score: 3, Funny

    I've been joking for years about getting two shell accounts on opposite sides of the planet and setting each up with procmail to bounce all my mail between the two (always rewriting the header so as to avoid a loop.) I figure at any given time, my mail would be in both places and neither simultaneously.

    If I want to read some, I'd just chmod .procmailrc for a few seconds and change it back. Plenty of mail storage without chewing-up precious file-system quota.

    1. Re:Quantum-like Storage by akh · · Score: 2, Interesting

      A similar idea has already been implemented. Some
      Canadian researchers used an existing 8000km fiber
      optic network as a storage device. Basicly, the network
      is configured as a loop and the
      data to be stored is simply sent onto the network.
      Packets of data are placed onto the network and can be
      pulled from it as they pass a node on the network.
      It's kind of like a cross between a token ring network
      and a mercury delay line. You can find a few more
      details from this link.

      --
      Accept Eris as your Fnord and personally sate her
    2. Re:Quantum-like Storage by red_dragon · · Score: 2

      The BOFH had already thought about it about four years earlier. It landed him an award, even.

      --
      In Soviet Russia, Jesus asks: "What Would You Do?"
  11. Re:why use a 'file' at all? by frank_adrian314159 · · Score: 2

    Hello! Why don't you talk to IBM about this? It's called Domino.

    --
    That is all.
  12. Eudora mbox by 1u3hr · · Score: 4, Informative
    Eudora (Win and Mac)handles encoded attachments by decoding them and storing them in an attachments folder, replacing the encoded text in the mesage with a line like

    Attachment Converted: "C:\EUDORA\ATTACH\NEW YORK.pps"

    Click on that in Eudora and the attachment opens.


    This keeps the actual text in the mbox file lean. I've got almost a decade of correspondence that totals about 20 MB, if it included all the attachments it'd be much more.

    Also it allows you to edit messages after receipt, (this might trouble some people, but it just simplifies what I used to do by opening the mbx file in a text editor). I can select all the text, then paste it back in. This has the effect of removing all the HTML coding that is especially crufty from Word generated mail -- a 20k message reduces to 1k.

    1. Re:Eudora mbox by Ziviyr · · Score: 2

      Eudora (Win and Mac)handles encoded attachments by decoding them and storing them in an attachments folder, replacing the encoded text in the mesage with a line like

      Attachment Converted: "C:\EUDORA\ATTACH\NEW YORK.pps"

      Click on that in Eudora and the attachment opens.


      For the sake of portability I turn features like that off, that and it makes it harder for me to loose my attachments and manual maintenance easier.

      --

      Someone set us up the bomb, so shine we are!
    2. Re:Eudora mbox by martin · · Score: 2

      Have you tried moving this from Mac to PC to Linux - won't work without messing with the files with Perl or something.

      I've done this with Netscape (4.7x and 6/7) and moved all the files etc easily from platform to platform with no problems...

    3. Re:Eudora mbox by wadetemp · · Score: 2

      It sounds cleaner, but it also sounds like it's easier for a virus or worm to start a raging party... say by looking in your c:\eudora\attach directory and running all files that end with .exe or .vbs.

    4. Re:Eudora mbox by Krieger · · Score: 2

      I am actually trying to figure out how to reverse this process and re-uuencode all of my emails so that I can port them to a different system and not lose the attachments. Have you run into anything that can do this?

  13. Compression. by TellarHK · · Score: 2

    The next generation of mail storage should definitely work on taking optimal advantage of compression technologies. Preferably in a way that compresses the data from end to end, not just in the recieving mailbox. As to managing the kind of data sent, I'd suggest using a twofold approach. Save binary attachments in the natural state in a subfolder linked to the message itself, which would be kept in a compressed database format.

    As to the database format itself, I'd like to see a form of redundancy in the structure of it. Give the design some self-healing ability in case flaws develop as the information gets shuffled around. Media isn't perfect, but mail stability should try and be as good as it can get.

    If you want to speed searches, index the data in a seperate file and use that. Just keep the actual data storage as simple and reliable as possible, anything like searching or sorting is just a bonus.

  14. Well.... by BJH · · Score: 2, Interesting

    One file per Mailbox-folder, allowing multiple folders per user. Should those files reside in one central location or in users Homedirs?

    Depends on how the user accesses their mail. If they read their mail only on the local machine, it should be in their home dir. If the server allows multiple forms of access (like local + IMAP), central storage makes sense. There's a lot of other issues here, like backup methods.

    Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?

    No. Separating a single mail into its component parts is just asking for trouble (not to mention that it massively increases your locking problems).

    File format: gdbm, Sleepycat db? Something new?

    Personally, I like Maildir, since it lets me use standard tools like grep to find particular mails. I admit that a more efficient method is probably required these days.

    Should the security model allow users to directly access their files, grep them, copy them around?

    Yes, of course. It's their mail - let them do what they want with it. The mail app must be able to deal with that.

    Shared folders, virtual domains?

    Shared folders would be nice - IMAP can do that now, although it's overengineered and not necessarily fully implemented in any particular IMAP server. Virtual domains I've never had any use for myself...

    Unicode support in folder names?

    Why not?

    Imap message-IDs, flags, useragent specific state-information?

    As you say, IMAP does that already...

    File-locking (NFS)?

    More the fault of NFS than the mail software (and I believe NFS4 handles locking better).

  15. Re:XML all gzipped up by ObviousGuy · · Score: 2, Funny

    No. This is Unix we're talking about, not Windows. Unix doesn't have any problems.

    --
    I have been pwned because my /. password was too easy to guess.
  16. one file per message by g4dget · · Score: 3, Insightful
    One file per mail message is the right thing to do. That lets you use standard UNIX tools for manipulating mail and it gives you convenient locking semantics. And the hierarchical UNIX file system structure, together with links, matches mail semantics nearly perfectly.

    Of course, with traditional UNIX file systems, this is a bit slow. The thing to do is to fix the file system, not to kludge ever more complex mail formats on top of it. ReiserFS goes much of the way; we now also need some system calls to open and read multiple files with a single call.

    Until file systems catch up, one kludge is as good as another. UNIX mbox format is at least simple, so I stick with that.

    1. Re:one file per message by Lennie · · Score: 2, Informative

      That's what maildir is.

      --
      New things are always on the horizon
    2. Re:one file per message by spitzak · · Score: 2

      Absolutely! A lot of the libraries and kludges being written for both Unix and Windows are to implement hierarchies of data in files, because individual files are too slow or have too much overhead. This needs to be fixed and we would be much better off of the effort going into designing the next mail went into designing a filesystem that allowed the "obvious" way to store mail to work.

  17. We need an XML standard to move mail around by astrashe · · Score: 2

    People have been arguing about the balance between standard formats that are easy to parse and move between systems and complex formats that make searching easier.

    What we need is a standard DTD or schema for mail data that all well written email systems can understand. If everything can import and export XML representations of email, the internals aren't so important.

    1. Re:We need an XML standard to move mail around by Rudd-O · · Score: 2, Insightful

      There is a standard to move E-mail around. It's called RFC 2822.

      --
      Rudd-O - http://rudd-o.com/
    2. Re:We need an XML standard to move mail around by NotZed · · Score: 3, Informative

      No we definetly do not need another standard to move mail around.

      MIME *is* a transport. MIME *IS* easy to decode. MIME *must* be supported by any email client already.

      MIME *is* the solution, it already exists, it supports everything you need (multiple binary attachments, multilingual headers), and it *works*.

      XML is *not* a good idea.

      --
      _ // `Thinking is an exercise to which all too few brains
      \\/ are accustomed' - First Lensman
    3. Re:We need an XML standard to move mail around by rpeppe · · Score: 2
      MIME *IS* easy to decode.

      ha ha ha ha!
      HA HA HA HA!
      ROFL.
      have you actually read any of the (many) MIME RFCs? there are so many traps and pitfalls lurking there that to say that MIME is easy to decode is just untrue.

  18. Maildirs by mrsam · · Score: 4, Informative

    Maildir : Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi) ...

    In case you haven't noticed, the default settings for the Linux ext[23] filesystems is to allocate one inode per 4096 or 8192 bytes of disk space. Which happens to be pretty much the size of an average E-mail message. So, in other words, you are unlikely to run out of inodes before you run out of disk space, since both are going to be used up pretty much at the same clip.

    It may come as a shocking surprise to some, but the average large filesystem is just littered with small files here, and small files there, all over the place. Here's my workstation -- a fairly large box with all sorts of crap loaded:

    Filesystem 1k-blocks ...
    /dev/sdb5 8159388 ...

    Filesystem Inodes ...
    /dev/sdb5 1036288 ...

    I'm using up almost exactly 8192 bytes per inode.

    and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.

    How about instantly? Most GUI E-mail clients cache mail headers, so they don't have to go and wait for the server to reply each time you click on the folder index window to re-sort, or scroll the folder index.

    ...

    Some ideas about the ideal mail-storage:
    * One file per Mailbox-folder, allowing multiple folders per user.


    Using one file per folder essentially forces you to use some form of locking each time folder access is necessary. Locking of any sort has been problematic for years whenever NFS (or pretty much any other network filesystem) is involved. A single circuit will now take out your entire network spool, as all clients are now spinning on lock requests out on the unreachable server.

    Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?

    I thought you wanted to save everything in a single file per folder, and using multiple files for messages is supposed to waste inodes, remember?

    File format: gdbm, Sleepycat db? Something new?

    Ask an Exchange admin about joys of a corrupted Exchange database. If mail are stored in simple, plain, files, a single instance of corruption will affect at most one mailbox, instead of taking out the entire monolithic database.

    Unicode support in folder names? Imap message-IDs, flags, useragent specific state-information?

    IMAP already uses Unicode to encode folder names. Not sure what "useragent specific state-information" means...

    1. Re:Maildirs by Ian+Bicking · · Score: 2, Troll
      In case you haven't noticed, the default settings for the Linux ext[23] filesystems is to allocate one inode per 4096 or 8192 bytes of disk space. Which happens to be pretty much the size of an average E-mail message. So, in other words, you are unlikely to run out of inodes before you run out of disk space, since both are going to be used up pretty much at the same clip.
      This doesn't make sense to me, at least not as presented. You are going to run out of inodes at exactly the same time you run out of disk space, because they are one and the same thing. In fact, I believe all the inodes are created when you create your filesystem, all space is mapped to an inode (though of course one file can use multiple inodes).

      The issue is not the waste of inodes, but the waste of diskspace because the smallest file chunk is one inode worth of space. It's usually said that if you have 4k inodes, you'll lost 2k (on average) per file. This is not really correct, because inodes themselves take up space -- I remember reading a paper somewhere many years ago where they estimated that most users would find 4k inodes better than smaller values, because in normal file distributions the space you save with the smaller inode is less than the space of the increased number of inodes themselves. However, this would lead one to believe you should have really big inodes and really big files, and then you'll be very efficient.

      But really, none of this should be given much weight until someone does a statistical analysis of just how inefficient a one-mail-per-file system is. It might not be significant, or it might be insignificant compared to storing base64 messages, or it may be insignificant compared to the benefits of compression. It's bad form to optimize before profiling, and the many-file inefficiency concerns feel like they are more based on intuition and less on fact. But then, someone must have studied it, so maybe not.

  19. mbx by zsmooth · · Score: 2

    I believe that UW-IMAP .mbx also includes indexing in the mail file, along with the concurrent access stuff. It's definitely WAY faster than mbox.

  20. Oh great by elefantstn · · Score: 3, Funny
    If you were to design your own MUA, how would you design its mail storage?


    Now I'll never get to sleep tonight.
    --
    If it ain't broke, you need more software.
  21. "Why yEnc is bad for Usenet" by Wyzard · · Score: 2, Informative

    yEnc isn't all that great. See http://www.exit109.com/~jeremy/news/yenc.html.

  22. Life is not that simple by coyote-san · · Score: 3, Insightful

    Life is not that simple. All databases are limited by the size of the basic block, and if you can't fit your data into that block performance takes a hit.

    With PostgreSQL this a compile-time option, default 8k and it can go up to 32k.

    It *is* possible to store larger items, esp. if they're 'TOASTable' or blobs, but this often just pushes the problem of dealing with thousands of files onto the database. Only now it's a lot harder to figure out why performance sucks.

    Does this mean that database solutions won't work? Of course not. But it does mean that simple solutions won't scale well when you're dealing with massive amounts of data.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
    1. Re:Life is not that simple by chill · · Score: 2

      Good points.

      In defense of databases, they are probably the single most scalable, performance-tuned app in existance. LOTS of people put LOTS of money and LOTS of effort into addressing database performance.

      Yes, mega-databases require high priests to manage properly but nothing beats Oracle, DB2, Sybase and the like for massive data storage, retrieval and searching.

      Add in proper asynch I/O, raw partition access, transaction support, dedicated monitoring and backup engines and you have a system that is damned hard to beat for large mail storage.

      --
      Learning HOW to think is more important than learning WHAT to think.
  23. Re:About 1.4 seconds? by Anonymous Coward · · Score: 3, Informative

    i concur. there's nothing wrong with maildir or the linux filesystem, at least for me. my mailbox has about 3000 messages, and it opens pretty much instantly, using Maildir, Courier-IMAP and EXT2, from a server running a 700mhz Athlon and 7200 RM IDE disks.

    the author's comments about Maildir make it sound like they've been using it and having problems. perhaps the problem is with their imap daemon? or their client? or their hardware? if running out of inodes or space for small files is such a problem, why not use ReiserFS? reformatting your filesystem is probably a lot quicker than inventing another new UNIX mailbox standard and getting people to support it.

    i use the OS X mail client, and it indexes my messages in the background as they arrive, so i can do instantaneous (i mean in-stan-tan-e-ous!) searches through my 3000 message mailbox by subject, to, from, or the entire message text. i can't imagine how this could work much better.

    your experience is clearly different; but i think there are other factors you should consider before blaming the mailbox format.

  24. Encryption by coyote-san · · Score: 2

    You can go a step further - don't bother with setting up a new compression layer, just encrypt it with existing tools. Most encryption routines compress it first, to make cryptanalysis more difficult (and for performance, since there's less data to encrypt), but this is partially offset by the continuing need for 7-bit safe transport layers.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  25. Citadel uses Berkeley DB by IGnatius+T+Foobar · · Score: 2

    Check out the Citadel system. (Disclaimer: I am one of the developers, so my opinion on this is kind of strong.) We use Berkeley DB from Sleepycat Software for the data store. Yes, this is the same Berkeley DB that Sendmail uses to store its alias tables, access tables, etc. But it's capable of so much more than that. It's a robust, non-relational database that is hugely scalable and even has transactions/logging support!

    We store all messages in the database.

    Works like a charm. No pounding through ugly directory hierarchies or insanely long flat files. No need to escape out the word "From" when it appears at the start of a line. None of the cruft.

    Ok, so it's a black box. But it's an open source server that uses an open source database backend, and since it supports SMTP/POP/IMAP plus webmail all by itself, you can still plug your favorite utilities into it (Pine, elm, fetchmail, etc.) and you don't have to graft together Sendmail+IMAP+whatever to make your mail system work.

    The traditional Unix mail utilities are getting a little long in the tooth. I'm going to get flamed for saying this but look at what's happened to the email world: Lotus and Microsoft have run away with most of the market because Unix traditionalists won't give up their flat files. It's time for us to evolve, folks.

    --
    Tired of FB/Google censorship? Visit UNCENSORED!
  26. Re:Spam Assassin!!! by swimfastom · · Score: 2, Funny

    So far I have blocked about 94% of the SPAM coming in through our mail server. It only misses a couple and is highly configurable! Download and install it!

    OFFTOPIC!? With that great deal of spam reduction, the space required to store the emails is greatly reduced!

    Cheers!
    Tom

    --
    http://tomgould.com/
  27. don't forget by Hollins · · Score: 3, Funny

    You're newfangled system better have DRM built in. I don't have any data, but it must be obvious that artists are losing billions in revenues every week due to mp3s being sent as attachments. This criminal behavior must be stopped or the practice of free expression will come to a screeching halt.

  28. Moving to a better mail box - HUMOR by buss_error · · Score: 3, Funny

    We have to have the remote hammer to pop out of the monitor to whack the end user. This is a must for any admin that works with more than 300 people. Hammer trigger from e-mail, pager, SMS, or telephone number.

    Power mains must be connected to the user's chair, see above for trigger.

    The MUA must forward all p0rn to the admin account. Likewise with credit card info.

    The MUA must know when the user is about to do something to tick off the admin, like sending a "me too" to everyone in the office, or replying to a confidential e-mail to the whole office and prevent the user from reproducing. X-rays are fine for impromptu sterlizations. The side effect of loosing all your body hair is no problem, as it alerts others to a stupid co-worker.

    The MUA must alert the admin when a coworker he has got the hots for changes her home phone number. Just to be fair, if the the admin is female, the reverse applies.

    The MUA must analyze the admins e-mail and throw a bucket of cold water if (s)he attempts to send a really stupid e-mail.

    Also, the MUA must be able to launch nuclear missles at spammers automatically. After that, it should refer the e-mail to the admin to see if a stronger response is warranted. Better yet, the MUA should employ a time machine to go back and choke the spamming creep when the spammer is still a baby, then use X-rays on the parents as above.

    The MUA should have a hypnotic effect on the object of the Admins desire and cause that person to preform disgusting oral acts on the Admins body each time a new e-mail arrives. (HOORAY FOR KELZ!)

    For the PHB, he should (by the same hypnotic effect) do a "Full Monty" when the big cheese walks in. Twice.

    The MUA should be able to cause back dated confirmation messages from HR approving a 51 week paid vacation upon pressing a special key combination, unless it's the PHB pressing the keys, then it should cause an e-mail to be sent to HR from the PHB's account turning in notice.

    Sorry, if you had a day like mine, you'd need a laugh about now...

    --
    Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves.
  29. Maildir? by Dionysus · · Score: 2, Insightful

    What is the problem with Maildir? I mean if you're going to store email, might as well use reisterfs. I don't have any problems with big mail boxes, and the extra integrity of the email messages, are worth the (non-noticable) dealy.

    Used to get the mbox corrupted once in a while. Never had problems with Maildir.

    --
    Je ne parle pas francais.
  30. Databases Ptewey. by Jason+Pollock · · Score: 3, Insightful

    The problem with using database formats is that you can't access them with vi. How many times has your mail client crashed attempting to read an email, but you still _need_ to get access to it? If it's in a database (proprietary or not), you're up the creek. If it's stored in a flat file, you at least have the option of using vi/emacs/grep to find and read the email, and then excise it.

    This has happened to me in Netscape, Kmail, Outlook, Evolution, Eudora, etc. Every single one has had problems at one point or another. The best programs are the ones that are _truly_ open, and let you get at the mail from other directions.

    Don't doubt the power of the text utilities in Unix. :)

    Jason Pollock

    1. Re:Databases Ptewey. by Dynedain · · Score: 2

      Or, you use IMAP and just access it from another machine. Simple.

      --
      I'm out of my mind right now, but feel free to leave a message.....
    2. Re:Databases Ptewey. by wadetemp · · Score: 2

      Select Body from Mail where User like 'Jason Pollock'?

    3. Re:Databases Ptewey. by ka9dgx · · Score: 2
      Uhm... never.

      Text is nice for simple, low volume applications with one infrequent user. When you move into multiuser, transaction oriented, high volume systems, an versioning database is the way to fly.

      I've never had an email client crash, I get tons of email, spam, and the occasional trojan/worm/hoax, but no crashes of email programs since 1982 or so...

      --Mike--

    4. Re:Databases Ptewey. by Luyseyal · · Score: 2

      Huh, I've never had Evolution corrupt my mbox unalterably or keep me from ssh'ing into the box and grepping some text out that I needed from elsewhere. I do this fairly regularly and I get a decent amount of mail. An old alpha version of Evolution _did_ crash on me while trying to import some crappy Netscape mail, but they fixed that bug.

      If libcamel really corrupted your mbox files, you need to file a bug.

      -l

      --
      Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
    5. Re:Databases Ptewey. by Jason+Pollock · · Score: 2

      Notice, the ssh in and grep text out of it. That's my point. :) If it was in a database, you wouldn't be able to grep it.

      I was using a beta copy of evolution at the time it happened to me, but that doesn't mean that it won't happen again. Next time, I want the same ability to get at my email using grep. :)

      Jason

  31. Re: Am I missing something? by Dionysus · · Score: 2

    Courier IMAP uses an enhanced Maildir format (original Maildir didn't support subfolders)

    --
    Je ne parle pas francais.
  32. MTA/IMAP server for MySQL message-store by chrisv · · Score: 3, Informative
    Personally, I'd love to see a Linux MTA/IMAP system which uses an SQL message-store. The ability to replicate a message-store across multiple physical sites without having to get into distributed filesystems like Coda would be a huge benefit for those who need to provide a redundant mail service.

    I actually found a nifty little package called dbmail which uses an SQL messagestore. I've been playing with such things at work since they wanted me to write them a web-based mail client, and I wanted something which would let me deal with a MySQL database on the web client, but also allow people to connect to it via IMAP or POP3.

    Of course, the whole replication part of it might be a bit more difficult, but it could probably be arranged as well. I'm pretty sure there are tools in existance for doing replication on a MySQL database (of course, don't ask me the names of any of them...)

    --

    Dogma: Dead (mostly because your Karma ran it over)

    1. Re:MTA/IMAP server for MySQL message-store by kris · · Score: 2

      You do not want BLOBs in a MySQL store, at least not unless MySQLs BLOB API changed a lot since I looked last (which has been some time, admittedly).

      MySQL limits BLOB size to max-packet (1 MB per default), which is very small and stupid anyway.

      MySQL has no proper BLOB API which allows you to download a BLOB only partially. You cannot read bytes 10.000 to 20.000 of a BLOB in MySQL.

      MySQL tables perform abysmal with BLOBs of varying size being part of the table.

    2. Re:MTA/IMAP server for MySQL message-store by joib · · Score: 2

      PostgreSQL blob api also sucks golfballs through a waterhose, but as of pgSQL 7.1 there is no row lenght limit any longer. So you could stuff an arbitrarily long attatchment as a TEXT or VARCHAR field.

  33. Don't do it! Compress in the file system. by billstewart · · Score: 3, Insightful

    Shredding and compressing mail messages is almost always a bad idea. Essentially *nobody* does it correctly, and you can't reconstruct messages in their original byte-for-byte formats, which trashes digital signatures. You won't save much disk space, because real text doesn't take up enough space for anybody except a big ISP mailsystem to worry about, and binary attachments usually only compress well if they've been encoded in some non-8-bit-transparency format like base64 or uucode. About the only time it wins is when one person on your keep-mail-on-server mailsystem is sending an attachment to a bunch of people who can then all use the original, which is to say they should probably have stored the file on the web and mailed a URL. If you're going to do things like this, get yourself a compression-equipped filesystem and just store your raw mail messages there.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  34. XML = unnecessary performance hit by acb · · Score: 2

    Why waste CPU cycles on parsing a human-readable text file format such as XML?

    There should be a standard byte-compiled representation of XML (CXML), which has been flattened into an easily readable data structure. It would be portable, with byte orders indicated in flags (or would just use network byte order, i.e., big-endian), and with fixed-length element start/end headers, and could be used in lieu of XML for machine-machine communications. If a human wants to inspect the data, XCML could be trivially converted to and from XML.

    Why go to the trouble of running a parser for files that 99% of the time no human will ever look at?

    1. Re:XML = unnecessary performance hit by spectecjr · · Score: 5, Informative

      There should be a standard byte-compiled representation of XML (CXML), which has been flattened into an easily readable data structure. It would be portable, with byte orders indicated in flags (or would just use network byte order, i.e., big-endian), and with fixed-length element start/end headers, and could be used in lieu of XML for machine-machine communications. If a human wants to inspect the data, XCML could be trivially converted to and from XML.

      There is one. It's called ASN.1

      Simon

      --
      Coming soon - pyrogyra
    2. Re:XML = unnecessary performance hit by jbert · · Score: 2

      You have just described ASN.1.

      People like XML because humans can read it.

      It is the same reason people like the inefficient text-based encoding in email.

      (That doesn't mean they are right, of course).

  35. Exchange brain-damage by Anonymous Coward · · Score: 5, Informative

    Exchange is actually a pretty decent mail server

    As part of my job, I've written software to send out HTML mails to people (no, it's not spam). When these messages pass through an Exchange server, Exchange does us the "service" of creating a text version of the mail from the HTML. I guess this is so that people without HTML-capable mailers can have a readable version...

    The problem is, we include our own text/plain version alongside the HTML (ain't multipart/alternative great?). Nicely formatted and everything. Instead of leaving our mail alone, Exchange rips out the text version and creates a new one from the HTML. The result is an ugly mess of URLs because we use some graphics in the HTML version. Our nicely formatted text version ends up in the bit bucket so that Exchange can dump it's url-barf on people.

    This is really stupid behaviour for an MTA. And for some reason, it's always CEOs of important clients who use text-based MUAs while sitting behind an MS Exchange server. They call us up asking which URL to click on.

    This, combined with other mail-rewriting bogons, has lead me to the conclusion that Exchange has no respect for the messages passing through it.

    1. Re:Exchange brain-damage by ortholattice · · Score: 4, Informative
      Mod parent up. More people need to know about this blatant RFC 2046 violation that corrupts carefully composed multipart/alternative emails. Sometimes it makes me want to scream that Microsoft gets away with this stuff and no one seems to care. Or maybe they don't realize their correspondents are receiving a corrupted version on the other side, and the correspondents just assume the sender is sloppy and lazy.

      BTW a good way to get nicely formatted text/plain content is:

      links -dump abc.html > abc.txt
      with neatly formatted tables and everything. Unfortunately only your non-Exchange recipients will see it.

      Now if Exchange automatically put in a text/plain for attached Word documents, I might buy that... :)

    2. Re:Exchange brain-damage by Anonymous Coward · · Score: 4, Insightful

      Sometimes it makes me want to scream that Microsoft gets away with this stuff and no one seems to care.

      One thing to keep in mind about Exchange is that it's really a X.400 mail system, with some proprietary routing features kludged on top, lots of back-compat MS Mail features kludege on top of that, and then (as the last afterthought) SMTP kludged on top of all that. Next time you are at a computer book store, gander at the architecture diagram for Exchange -- it's so complex that it _should_ make you queasy. The thing just reeks of early-90s incorrect design assumptions.

      So it shouldn't be a shock that it can't handle a large number of SMTP edge cases. Frankly, nobody would buy a product like Exchange if it didn't have the Microsoft logo on it and a nice client which gets installed with Word and Excel.

      Microsoft, in their heart of hearts knows that it's a piece of shit, but it's _their_ piece of shit. And it happens to sell well, and any product that profitable can't be all that bad.

      I wouldn't be shocked if numerous skunkwork project have come and gone at MS to replace their Big X.400 Jet DB Kludge with a real Internet-saavy mail server, but the poltics of the place probably dictate that that they lumber on with what's working (that also explains products like Windows ME).

  36. row size limits are gone now... by Lazy+Jones · · Score: 3, Informative
    With PostgreSQL this a compile-time option, default 8k and it can go up to 32k.

    Current versions of PostgreSQL no longer have such limits (they're much higher, a single field can use up to 1GB ...).

    --
    "I love my job, but I hate talking to people like you" (Freddie Mercury)
  37. How about usenet? by thogard · · Score: 2

    If you store your messages in an usenet server you get all kinds of neat features like auto expiration and tools that can put the binaries together and let the server deal with the file format.

    Back when C-news was new, there was a systems called "Notes" that keep usenet posts in a database. From what I can tell that became the ancestor of lotus notes at some point.

  38. Don't speculate. Profile. by Doktor+Memory · · Score: 5, Insightful
    Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi)
    Psssst. It's not 1978 any more. Inodes are cheap. So is disk space. Stop spreading FUD.
    and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
    Quite right. Just try it. You might be a bit surprised by the results.
    --

    News for Nerds. Stuff that Matters? Like hell.

  39. a few comment by an experienced mail hacker by fejjie · · Score: 5, Informative

    A couple things:

    1. Evolution is NOT "Basically mbox with database features". It can use Maildir or MH as the backend (and you can write your own plugin to extend this if you like).

    2. Evolution's body indexing and summary files are extremely fast and efficient, about the best you'll get. I hear MySQL has text indexing capabilities that are extremely fast, but I'm not sure if they are faster than Evolution's indexer or not. Might be interesting to check this out.

    3.

    > But the thing that bugs me most is disk space. Typical inboxes are
    > made of 5% to 10% of Text including Headers and HTML. The rest are
    > BASE64- (or UU-) encoded pictures, word documents, zip archives and so
    > on. The problem here is the encoding which wastes considerable amounts
    > of space (at least one third).

    It's theoretically possible, if you wrote your own Evolution storage plugin, to change the Content-Transfer-Encoding header value of binary attachments to "binary" (and text attachments to "8bit") before writing the message out to disk (or wherever) thus magically making it so that you no longer save the encoded text of the attachments but rather in-line binary data content. (Yes, it's as easy as setting an enum value in the CamelMimePart structure).

    However, you have to be aware of the consequences of this. Most importantly, you will not be able to validate any of your PGP/MIME or S/MIME signed messages as according to the RFCs for these types, the signed MIME parts MUST be treated as opaque (meaning that you may not modify them in any way).

    Now on to your ideas...

    > One file per Mailbox-folder, allowing multiple folders per user.
    > Should those files reside in one central location or in users
    > Homedirs?

    How is this different from mbox? (btw, CVS Evolution can handle mbox files and directory trees in external locations - ie, not within the
    ~/evolution directory).

    > Compression: Should messages be broken into pieces and the
    > MIME-attachments stored separately (thus searching of the text parts
    > would still be possible without decompressing the whole file)?

    If you break apart the MIME parts, you run into the same problem I described above about not being able to verify signatures.

    However... if you took a normal mbox and gzipped it, you would certainly save space (at the expense of speed). I've been thinking about writing a CamelMimeFilterGzip class for gzip compresing/decompressing streams which would allow Evolution to read and write to gzipped mbox files for example.

    Once the class is written (which should be fairly simple), allowing Evolution to read gzipped mboxes should be as simple as doing:

    camel_stream_filter_add (MboxStream, GzipFilter);

    ...before feeding 'MboxStream' to the MIME parser.

    > File format: gdbm, Sleepycat db? Something new?

    Please not Sleepycat. If you are so sure that a generic database backend will be better than what Evolution's got, at least have the sense to use MySQL or PostgreSQL.

    I'm personally against using a generic database as a storage and heres why:

    1. The average user does not have an SQL database installed on their desktop systems, and so this is a completely rediculous dependency for them. If you think library dependencies are bad, just wait till you have to go installing, configuring, and maintaining a multi-user database running on your system. This may be fine for a company solution, but not the average end-user.

    2. I'm not too familiar with MySQL or PostgreSQL, but I recall there being problems with mailers that use SQL database backends that tried to store the content of the messages as part of the table (due to them making the size of the table too small or whatever). If you can set the size to be "infinite", then I guess that's not a problem.

    If your plan is just to have the database index the folder and actually store the contents as separate files, then you've instantly gained nothing over Maildir except that now you have a hefty database that you have to maintain and very little to no speed improvements (especially if you have a well designed/implemented summary index like Evolution does).

    The only improvements you might gain here is body indexing? As I said earlier, MySQL supposedly has a REALLY good text indexer and so it might be a little faster than Evolution's. I'm really not sure on the comparison here.

    > Should the security model allow users to directly access their
    > files, grep them, copy them around?

    Is there a reason NOT to? I don't see one. It's their mail.

    > Shared folders, virtual domains?

    This doesn't really have anything to do with folder formats and everything to do with features of the client itself.

    (Evolution can do this).

    > Unicode support in folder names? Imap message-IDs, flags, useragent
    > specific state-information?

    Unicode support in folder names I'd say is a pretty important feature. I'm not sure what you mean by "Imap message-IDs". Do you mean UIDs? Evolution, for example, has a UID assigned for each message whether it be in an mbox folder, Maildir folder, MH folder, or IMAP folder. So this isn't necessarily dependant on folder format (though it could be if you used a database backend for example, you might want a UID in the table).

    I don't feel that UIDs are a must though, but I would suggest them. They are definetely useful especially for folders that can be accessed by multiple clients at once.

    Flags are good. I'd go so far as to say a MUST have.

    As far as user-agent specific state-information, it'd be nice to not need it. But if the client needs to keep it's own info, it'd be nice to be able to map the info to UIDs and keep it's own state file somewhere else (not necessarily alongside of the mail storage).

    For example, IMAP doesn't have any means for the client to store state information on it, but that's perfectly fine. If a client chooses to
    have it's own state, then it can save it locally.

    It would be nice if the storage could handle user-defined flags/tags though. This would allow the client to extend the native features of the format (Flag-for-Followup, message colouring, etc).

    > How would MTAs deliver mail? How would clients access? File-locking
    > (NFS)?

    This is one reason to just stick with what's available :-)

    File locking is a MUST have (or a scheme to make it not needed, such as Maildir).

    --
    You know, I have one simple request...and that is to have messages with freakin' laser beams attached to their headers. Now evidently my MIME specification informs me that that can't be done. Uh, can you remind me what I pay you people for? Honestly, throw me a bone here. What do we have?

  40. DB by jukal · · Score: 2

    I have in some times used a custom made MTA and indexed the incoming mails, headers and first message body part, into MySQL database. Attachments are compressed and stored within "regular filesystem". The whole kludge is then interfaced to IMAP. User authentication is also done via MySQL, thus making it unnecessary to create "real users". The solution has lasted without problems for years now already. MySQL in general, works like a dream, I have never had any problems with it.

    It is good (and fast) for some purposes, which I am not going to discuss here, everyone is probably very well equipped to figure out the plusses and minuses of this way of doing it.

  41. No Notes on Linux by BlueUnderwear · · Score: 2
    Plus, Domino runs on Linux, Aix, Solaris, NT, 2000, OS/2, AS/400... The list goes on and on. As far as a shared database, just setup shared mail.

    But unfortunately, the Notes client does not. We still need to dick around with wine to access the corprorate Notes server. If anybody from IBM (who likes to show their committment to Linux...) is listening: are there any plans for a native Linux Notes client? If so: when? If not: why not?

    --
    Say no to software patents.
    1. Re:No Notes on Linux by Anonymous Coward · · Score: 2, Interesting

      as i understand it, and i know i don't know much, it looks a lot like notes is going to be ditched in favor of web-ish access. i would guess that notes and the sametime client are probably going to get obsoleted at the same time...

      of course i have no real information, but that's how it looks. i don't think anyone believes notes is actually a good piece of work. i certainly hope not.

      [posting anonymously so as not to irritate my superiors]

    2. Re:No Notes on Linux by BlueUnderwear · · Score: 2
      You can access your mail (and other apps) via a standard web browser.

      How exactly would do you do this? N.B. I'm not speaking about specially set up Web applications (these work quite well), but about just the general Notes databases.

      You can access your mail with your favourite POP3 or IMAP client.

      Pop and Imap need to be specifically enabled on the server, which is often not done. Moreover, Notes' pop and imap interfaces are rather stripped down versions, which didn't allow to move messages into folders, nor to delete them. Only reading is possible. For any maintainance, you still need to log in using a Notes client. At least, that was the case when I last checked (about a year ago).

      --
      Say no to software patents.
    3. Re:No Notes on Linux by twinpot · · Score: 2, Informative
      And none of these emails mention that when you send an attachment to multiple users in Notes, the attachment gets recreated for each user and takes up space for each user on the Notes server


      Depends on how the mail side is set up. Single instance store solves this. BUT, few places run it, as the admin overhead is generally not worth it.


      Never thought I'd like an email client less than I liked Exchange, but Notes wins that prize.


      You're confusing the client (Notes) with a server (Exchange). You can run Outlook against a Domino mail server. The Domino mail server, which does have its quirks, is in my experience way more reliable than Exchange. Plus with Notes clients, mail born viruses are very unlikely.

  42. Usenet-style, with overview database. by strredwolf · · Score: 4, Interesting

    Plain and simple. Switch from mail to Usenet. Maildir-like structure, but with a .overview (XOVER) file to help out with indexing.

    Storage is another problem, though... but Usenet messages can be sidetracked a bit with the encoding.

    --

    --
    # Canmephians for a better Linux Kernel
    $Stalag99{"URL"}="http://stalag99.net";
  43. Easy solution by BlueUnderwear · · Score: 2
    And for some reason, it's always CEOs of important clients who use text-based MUAs while sitting behind an MS Exchange server. They call us up asking which URL to click on.

    Easy solution: Build a list of "VIP" users who will get a text-only version. Or who will get the text and the HTML version in 2 separate mails.

    --
    Say no to software patents.
    1. Re:Easy solution by e_n_d_o · · Score: 2

      That's not easy and it's not a solution.

      I have no idea on the specifics of the original problem, but in my experience every user does not complain about a problem. Depending on the system the mailing list is sourced from, adding a "prefers text/html mail option" could be non-trivial or just not possible, so as to require implementing a "parasite" database from scratch to keep track of such preferences, which would be quite difficult.

    2. Re:Easy solution by a_n_d_e_r_s · · Score: 2, Insightful

      Even more simple solution - just send everyone a text message.

      In it you can put a link to the html variant for those that want that - and put that variant on a webb server.

      Email should be text. Webb pages should be HTML.

      --
      Just saying it like it are.
    3. Re:Easy solution by ninewands · · Score: 2

      I agree completely. Nobody ever got a virus from merely opening a plain-text e-mail.

    4. Re:Easy solution by ahde · · Score: 2

      This is the correct solution.

      Mail clients can be built to "automatically" open a specified URL if you want to send it that way. While this might seem a potential security risk, its no more dangerous than the current featureset of some mail clients. This would reduce internet traffic (and server storage space) enormously. How many megabytes of spam are sent to every user? Alternatively, attachments could be "sent" the same way. Even using FTP (or SCP) to reduce the overhead of HTTP.

      The solution isn't to re-engineer the server, but the clients.

  44. qmail by jabbo · · Score: 4, Informative
    This is just too easy.

    Life With Qmail

    Building a Linux Qmail Toaster

    Same thing, but with FreeBSD (more scalable, in my experience)

    have fun

    --
    Remember that what's inside of you doesn't matter because nobody can see it.
  45. Stays up for *days* before losing mail and reboot by SgtChaireBourne · · Score: 2, Informative
    Single instance storage is only good for intranets except that there there one should use file sharing to collaborate on documents rather than sending virii^H^H^H^H^Hattachments.
    Alen, your experiences with MS-Exchanges are so many worlds of difference away from mine that I nearly suspect that you've written a troll. Rebooting a mission critical service like a mail server during working hours is unsatisfactory. If other mission critical services like file and print sharing are also disabled during that reboot, then it's time to look for a more robust product.

    I have worked closely with three shops in the previous three years that used Microsoft Exchange. Each had at least 3 full time equivalents of MSCEs to babysit their Exchange servers, probably more if you count overtime. This is not counting the occasional high priced consultant. None of these shops could keep Exchange running for a full week. Nor could they keep it from losing mail (When I measured it was 10-15%, ). Nor could they get it to communicate well with other mail servers. Nor could they keep it from getting wiped out once every three months by MSTDs (especially worms and virii).

    In contrast, Novell servers run years at a time unattended (nearly every consultant has at least two such anecdotes of their own) and many UNIX-based MTA's need only a few hours of non-hardware maintenance per year, when set up tight. I guess running MS-Exchange is a new status thing to flaunt resources, like having a tuburcular wife was during the Vicrotian era.

    Needless to say the managment's support was/is a real PITA for anyone doing work via e-mail with people outside of the house's MS-Intranet. In one case it even delayed a publishing a book by several weeks. In house use of Exchange was fine -- when it was down for you, it was down for everyone else so it was a nice time out and a chance to go have coffee with the others. When put to the test, file sharing couldn't, wouldn't, didn't function often enough to be useful either. For file sharing, those without access to a Novell or Unix file server, used sneaker net or mailed attachments. Yes, Exchange does look good in the 4-color glossy marketing brochure, but that's were it ends and reality sets in.

    Puh.

    Back to mail databases. RFC 2822, Internet Message Format specifies the general structure of a message. This can be over simplified as a header with its standard and non-standard fields and one or more message bodies. RFC2049 specifies multipart bodies. These structures do seem very well suited to a relational database.

    --
    Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
  46. Problems & Ideas by Twylite · · Score: 4, Informative

    Oh dear, another file format debate. I'm glad there was a library suggestion though ... that allows us to change our mind when we do it wrong the first time ;)

    First, you need to consider the possibilitiy of moving the mailbox. To a different computer, or a different platform. This means it must be easy to access in any environment, and the tools must be portable.

    This doesn't completely rule out a database solution (like mySQL), but it certainly makes it less-than-ideal.

    Second, having used many mailers which separate out attachments ... Please Don't Do It! You can't easily move your mailbox, because there are a host of associated attachment files. There is ALWAYS a synchronisation issue between attachments and messages, so you end up scanning and cleaning out the attachment folder every so often to prevent dead files from accumulating.

    Compression is nifty, but isn't really important. Disk space is seldom a concern these days, and the really big stuff (binaries) is often already compressed or don't compress well.

    The real issue with most mailbox formats is how do you deal with the problem of removing dead space from the mailbox? Some program just leave it there until you hit "compact", which is wasteful and confuses users. Others rewrite the entire mailbox every time, which causes the software to "hang" for a while on shutdown.

    The best suggestion I can come up with off the top of my head is this: One file per mailbox folder, and that file is its own filesystem. The "root node" contains a group of summaries (from, to, subject, date, etc) and node links. Other nodes are chained to contain the message and attachments.

    Handling attachments: attachments are separated out and stored as binary in the mailbox. This conserves space but keeps the attachment with the message.

    Compacting: is avoided. When a mail is deleted, it is merely flagged in the root node (index). So each mailbox has its own deleted items folder, so to speak. When the deleted items folder is empties, the index is rewritten and nodes freed - every node not at the end of the file is overwritten with a node from the end of the file (and appropriate reindexing done), so the file is automatically compacted.

    Ideally the file needs some sort of transation logging area to ensure its integrity at all times.

    Shared access to files is best handled through a library or a service. File locking is notoriously prone to bugs and security issues, and avoiding multiple implementations in different mail clients would be beneficial.

    --
    i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
  47. /var/spool/mail/martin is my friend by martinflack · · Score: 3, Informative

    For Pete's sake, leave mail alone. If I can't fix it in less than 20 minutes with grep and perl, I don't want to know about it.

    Divide mail into 20-30 logical "folders" (files), use procmail to help sort/scan/unspam, do IMAP to get to it from Win machines, archive mail out of your working files once it gets a year old, and you're all set. Strive to keep your inbox empty (you need a proper "action" orientation with your mail folders to accommodate this). No big deal.

  48. Re:Want to save space? by ObviousGuy · · Score: 2

    Son, I'm posting at 2.

    --
    I have been pwned because my /. password was too easy to guess.
  49. Mail is a protocol! by Kynde · · Score: 2

    There is no need to toy around with the mail as-is. It's a little like IP packet, doesnt matter what's in it, but the essential thing is that it has a destination and source addresses and it travels in the net. No techinal solution will _ever_ overcome the fallacies with current emails, becuase the current email is as unrestrictive as IP packets.

    Take spam for example. The problem will always be present theoretically when you want to receive mail also from people you've never received before and/or havent given your public crypto key for example. Another side of the aspect is when _we_ allow people to send source address spoofed spam.

    The problems with email are people, as with almost every damn problem in the IT sector, it's always us. People bend towards stricker rulesets, to avoid abuse, which in many cases is not the way to go, let alone the solution to the root of the problem. Somebody here in ./ said it really well once, he said that the best solution would be "Cheap plasma handguns and justice for us all"

    --
    1 Earth is warming, 2 It's us, 3 it's royally bad, 4 we need to take action NOW
  50. Looking for a problem that doesn't exist by NotZed · · Score: 5, Interesting

    I lost the plot half way through this, but here's some food for thought anyway. Now I should get back to work ...

    Z

    I think that this is looking for the solution to a problem that doesn't really exist in the first place. Although I guess it depends somewhat on what you define as 'Unix mail'.

    I'm a developer on Evolution, and primarily on Camel, evolution's email library. I'm not sure i'd rave about it (although I think Camel is a mostly beautiful piece of code ;), but it works reasonably well, and we've had a chance to try and deal with users with lots of email.

    What IS 'Unix mail'?

    I would define Unix mail as mail (rfc822 format) downloaded and stored locally on a per-user basis. IMAP, Exchange, and other remote protocols are very different beasts.

    Why are DBMS's not suitable for 'Unix mail'?

    Once you have a remote server you have to do things differently than if you have local access. Using a DBMS, and having a trained administrator to manage it are practical considerations, as are the benefits you might get from this configuration. These solutions dont really make sense for standalone users. They shouldn't need to install and manage databases, complex backup prodedures, and so forth, just to read their email.

    i.e. rdbms's are:
    hard to setup
    hard to maintain
    another major point of failure

    If however, I was to design a multi-user groupware server, then a DBMS would come into serious consideration - at the backend at least. It allows you do to things like easily consolidate authentication outside of the operating system (the idea of having a 'shell account' to access mail is somewhat outdated), it allows you to save space by storing common data, like attachments and email content in a single place, and redirecting it to multiple recipients (which is a common practice within organisations). It may be practical to use a mixture, a RDBMS to store textual parts or indices to data stored in a more conventional filesystem.

    But even with a RDBMS backend, I would personally probably still stick to IMAP to serve it to actual clients. The IMAP protocol is a bit heavy, but not really that bad, and it serves email, I dont think there's really any need to reinvent the wheel here.

    So ...

    If you define unix mail as I have, and separate it from a *mail server*, then you rule out full blown RDBMS's, and are left with:

    single file database
    multiple file database

    I'm not even going to mention XML because I think it is the single most stupid idea anyone's come up with. It is completely unsuitable for this purpose.

    And well, there's really no reason not to use MIME to store the messages. MIME already does everything you can possibly do with email (since, uh, it is how the email *will* be sent), any client will already have to deal with it, and mime decoding is for the most part really quite simple and fast anyway. Translating the mime format into some other storage format really doesn't make sense.

    single file databases

    mbox

    Mbox is a single file database. Its just that everyone that uses it generally writes their own access code. This is where problems with 'locking' come about, either because the underlying filesystem doesn't support it properly (e.g. some nfs implementations), or everyones clients don't use the same locking mechanism. This really just an implementation issue anyway. There would be nothing to stop someone writing a common 'mbox.db' library that stored everything in completely compatible mbox files, which took all the work out of it, and then you'd have an mbox DBMS ...

    mbox scales ok, without any caching of header information it handles in the order of 2K messages in an interactive timescale, and quite a lot more if you dont mind some short delays (i.e. in the order of the time it takes mozilla to start up).

    Appending and reading is quick, and reliable - assuming the filesystem works, which is a pretty safe assumption to make. This is assuming the mailbox is first summarised at first opening, otherwise looking up messages can be slow, because you have to scan the whole file first.

    The only operation that is slow is expunging messages, and at worst case isn't really any slower than copying a whole file across to another file.

    The only other issue is agreement on the 'standard' for what constitutes an mbox file. For example. Solaris uses and honours the 'Content-Length' header, and thus it does not translate any lines beggining with "From " into the conventional ">From ". Some mail clients translate "(>*)From " into ">\1From " (using sed syntax) and visa versa, others do not. There is no standard, just some conventions, some of which aren't easy to determine either.

    Because you need to keep the whole index in memory at once, this can become expensive, but you could use a secondary database as an index into the real file. But eventually you hit a point where the cost of expunging does get too expensive. You could just archive the mail regularly, or use a format like maildir instead.

    gdbm/db/etc

    db files wrap the single file in a common api that handles all of the locking issues and access issues for you. Some have different features, e.g. querying capability, logging and transactions, etc.

    We've never tried to use db for this purpose, more just because we didn't think it was worth it. All you really get with a minimal implementation is the ability to store and retrieve a blob of data using a single key. Writing is fairly slow because the database has to manage more details for you (locking, allocating blocks, unlocking, etc). You could use multiple db files as indices to perform multiple-key searches, but they are quite slow at creating them (we tried using db for the content indices and it was way too slow).

    i.e. even if you store the data in a db file, which gives you a slight benefit of inbuilt referential integrity, you still need to provide additional indices to actually be able to use it in any useful way. Evolution suffers this problem with the addressbook which stores vCards in db records.

    Most db libraries (all?) also dont provide any mechanism to stream data. You either get the whole lot into memory, or you get none of it. So for large messages you're limited by memory (well, evolution is anyway, but it doesn't have to be). Yes, memory is cheap, but it is still a consideration, and it would certainly rule out a simple database in a multi-user environment.

    db files are also slower than native files, especially for large objects. You're mapping an arbitrarily sized chunk of data to some 'database blocks', which are then stored in an arbitrarily sized 'database file' which the operating system is then mapping to its 'filesystem blocks'.

    multifile solutions

    Well I guess this comes down to mh and maildir. mh isn't really suitable for anything, because of its just plain bad design and lack of defined semantics. There's no way to guarantee anything about its operation.

    maildir - i like. It moves the scourge of trying to implement a reliably, scalable, multiple access database almost entirely into the operating system layer. Operating systems already do this very well - they manage hundreds of thousands of files randomly written across your disks, without skipping a beat.

    No operation requires more than a single message size of data, and the operating system already indexes the message, via its filename. Sure, ext2 doesn't do such a swell job with long directories, but that can be addressed (and the same problem can be addressed on just about any platform). For 'free' you get concurrent multiple-reader, multiple-writer database access, without any of the considerable problems you have to solve to implement it otherwise.

    The maildir 'protocol' is simple, reliable, and it works.

    Again, it can easily be augmented by a client with additional indices, but for things like delivery agents who dont care about existing email, they dont need to suffer that overhead at all.

    Some other comments specific to the question:

    Compression. Personally I dont see the point. But a maildir-like structure would fit well with compression. Flat files would be the worst (e.g. mbox), and block-file formats (like db files) would also work well with compression. The good thing about email is it is 'write once', you don't edit or change the messages in the mailbox.

    External attachments. I guess its possible, but again, it isn't really worth it in most cases. Parsing MIME is *fast*. It is much faster than parsing xml, and besides, people rarely look at an email more than once or twice. There isn't much use going off and storing the attachment in a high-performance reading format if it isn't going to be accessed often, and it just places a greater burden on your server.

    base64, etc. Well, its entirely possible simply to store the messages as 'binary' format. Assuming the boundary markers are checked properly, Camel can work with binary encoded mail messages, and probably at least some other mail clients can too. There are some problems with some of the extremely broken openpgp/pgp/mime specs which suddenly say that mail transports aren't allowed to alter the *transport* encodings of some parts, but well, these specs are just braindead, and can be worked around.

    Security model. Well, talking about Unix mail, not server mail, the filesystem is adequate.

    Shared folders - is not an issue for unix mail.

    Unicode. Well you can write unicode filenames to most unix filesystems, evne if 'ls' doesn't show it right.

    MTA. Nothing could be simpler or safer than maildir as a delivery format. The mta doesn't have to care about any client-side indices, the mua will simply update them when it incorporates the new messages, etc.

    Writing libmailstore? Mate, its called Camel, and its already written. Camel already does mbox, maildir, mh, it can read spool files directly (it doesn't create a summary file or build any indexes), it can talk imap, pop, and partial support for nntp. If someone gave me a decent RDBMS table schema and a carton of pale, I could probably write a MySQL backend in a couple of days, well, assuming the MySQL api is mt-safe.

    Finally, some comments on evolution.

    Evolution isn't reinventing any wheel. We use standard mbox format (if such a thing really exists anyway). We use standard maildir format, etc. Yes we may optionally create body indices, and we do usually create on-disk binary/compressed 'summaries' of the data, but these are really just on-disk caches of in-memory data structures, rather than anything to do with the mail storage format.

    We put mail in another location, but everyone else has done that too, elm:Mail, pine:mail (or is it the other way around?), netscape:ns_mail, etc. At least we now offer the option to read most of this 'in place'.

    The main problems evolution has with scalability is:

    indexing.

    Indexing is quite costly. The original index code was written somewhat like a database, it handled all internal data structures, used blocks of data, etc. It was slow, it scaled poorly. Definetly some of the algorithm choices and the implementation wasn't that hot, but it shows that such a solution isn't as simple as at first thought. Using libdb was impossibly slow (like several orders of magnitude slower).

    The new stuff is a lot better, but can still use a lot of resources while indexing, and copies the whole file (well 2 files) across when performing expunges, but they are only performed occasionally, and the indices are smaller than the original indices, so in practice it scales much much better.

    the summaries

    The summaries are indices of a sort anyway. They are an in-memory tree of a subset of the information on each message. Enough information to display a list of messages, and perform vfoldering operations. Even though we do some tricks, like sharing common strings, the summary can get very large.

    But, its a tradeoff I thought was worth it, rather than using on-disk summaries. The api's are much easier to use, and the problem gets pushed to the user - if they want to have folders with 100K messages, they should expect it to use a bit of memory. The on-disk size of the summaries is very small too, although I guess it could be made even smaller if we consolidated common strings.

    per-message memory use

    Currently, a lot of data gets copied around in memory. Every time you read a message, at least 1 whole copy of the (decoded) message is in memory at a given time (yes, including attachments). For IMAP this can get even worse (2-3 copies of a given attachment at a given time), because it doesn't stream enough. Most of this could use a disk-backing without changing any api's though, and well, i'm rewriting IMAP.

    Wrapping up ...

    And yeah, we're talking 100K messages here, not 1400. My 500Mhz celeron laptop has about 35K messages stored over about 10 mbox files, and it starts up in under 10 seconds, and that includes all of the bonobo/activation overhead (which is very significant). Yeah it uses a bit of memory, but memory is cheap on a personal workstation.

    In short. The current mailbox formats we have suffice for "Unix mail". Add some archiving abilities to your mail client (even RDBMS backed mail clients need archiving), and you'll never have to delete a message again, and still get work done and still use mbox.

    If you want to talk about writing a server - well who cares, you can do whatever you want, because everyone has to go through your interface anyway (you DO NOT want clients accessing data under you, thats what DBMS's are all about in the first place ... and you dont want 1-tier applications), so it doesn't matter what format you use under the belt - you can choose the format which best suits what you're trying to do.

    It seems some people think using 1-tier applications (client code talking directly to a database) are the way to go for multi-user environments. They're not, they dont scale and are impossible to maintain. Nobody writes any real software like that anymore, unless you're writing dodgey vb toy apps.

    --
    _ // `Thinking is an exercise to which all too few brains
    \\/ are accustomed' - First Lensman
  51. Use standards by dybdahl · · Score: 2, Interesting

    There is absolutely no reason to abandon the standard e-mail file format, including uuencode for file formats. Doing that, you would end up with a file format that depends on certain versions of the e-mail file format to work optimally. If you want to reduce harddisk space, zip it like OpenOffice.org does.

    E-mails are documents. Documents belong into the home directory, and so do e-mails. If you want to do something new, you should use the harddisk folders as e-mail storage, so that e-mails, spreadsheets and documents mix. This probably requires inventing a new ".e-mail" file format so that e-mails can be properly recognized and indexed.

    Storing one e-mail in one file is not a problem as long as you index the filenames properly, for which you can use gdbm.

    Dybdahl.

  52. Back up your critique with some numbers please! by Jack+Hughes · · Score: 2, Insightful
    It would be interesting to see some real measurements. For example, disk storage and access times for various functions of the different file formats (you could access different messages in a large "mailbox" randomly, or search subjects and bodies and see how long things took).

    I don't think things are that bad - for example, Cyrus with its indexes works pretty well and large (20,000+) folders. And things like searches are pretty fast with a client like evolution that does a lot of cacheing.

    I would take the simple structure of Cyrus over the easy to break "database" files of Exchange server any day.

  53. mbox.funkified by yem · · Score: 2, Interesting

    This is all very interesting because I'm slowly writing an IMAP server at the moment..

    But here's the setup I'm currently using:

    Inbox:
    /var/mail/$USER
    Subfolders
    /var/mail/$USER-folders/$FOLDER/.messages

    Eg:

    /var/mail/
    |-- root
    |-- fred
    `-- fred-folders
    |-- 1ZB
    | `-- .messages
    |-- Friends
    | `-- .messages
    |-- Games
    | |-- .messages
    | |-- Rune-Beta
    | | `-- .messages
    | `-- Tribes
    | `-- .messages
    `-- Mailing Lists
    |-- .messages
    |-- EFNZ chat
    | `-- .messages
    `-- Hard News
    `-- .messages

    I started with uw-imap but I want to store messages and subfolders together. Plain uw-imap doesn't do this and last time I checked, neither does Maildir. So I did a [kludgy, incomplete] mod and produced the above. Works for me :)

    Get the patch: http://home.y3m.net/uw-imap-2001a-nested-folders.p atch
    (diff against imap-2001a)

    In the server I'm working on you will be able to implement a relatively simple C++ API to do your own storage. So you can use Maildir, mbox, PostgreSQL, whatever. We'll see.

    flame away :P

    --
    No, I did not read the f***ing article!
  54. Some thoughts by ChrisJones · · Score: 3, Interesting

    There seem to be two discussions going on in the comments today, one about mail storage for an MUA and one for storing mail on servers.
    As far as the client end is concerned, from the point of view of writing an MUA, having an SQL backend is a complete godsend because you have to write virtually no IO code, you can put all the logic in the queries. However, there are some tricks you need to use to keep up the speed, most importantly to use two tables, one for metadata and one for the mails themselves. This keeps the speed up by keeping the metadata table small (maybe on a better RDBMS than MySQL this wouldn't make a difference, but I found that >10,000 mails all in a single table in MySQL got quite slow until I moved the metadata into a seperate table).
    The obvious downside of using a DB for client end storage is that you have to have a centreal DB server, or one on each client and you need to admin one more set of authentication/permission details, plus you can't move the mail very easily to other MUAs. IMO a much better solution would be to keep the use of SQL/RDBMS, but move the DB into the filesystem so you can just have a bunch of files with metadata stored in the fs. Need to make an mbox? "cat ~/mail/* >>/tmp/my_new_mbox".
    From the server point of view, many people have been mentioning Exchange/Domino etc. Personally I can't stand Exchange, I've had to admin it on several occasions and it's generally done everything it can to stop me from having an easy life (just thought I'd air my predjudice against Exchange in the spirit of fairness and honesty ;) I've never used OpenMail/Domino/Notes/whatever, but I guess they do roughly the same thing, which is a pretty good idea. However, these things all have the distinct disadvantage that they use propritary protocols and aren't particularly cheap. There's always IMAP, which many people really like, but I feel is too complex a protocol (compare with the infant levels of complexity in POP3).

    With a colleague of mine, I'm working on a set of POP3 extensions that give some IMAP like features, but is really designed to keep multiple mail clients in sync with each other by way of a transaction log. There are still some limitations, but I think I know what they are and how to fix them (e.g. not enough metadata can be associated with each mail yet). It adds about 6 or 7 commands to POP3 and currently lacks any decent client support, but I have written a fairly usable library and patch to gnu-pop3d for it. I've just submitted it as my University final year project, so I'll try and get the protocol description documentation online soon. In the mean time, if you're interested, it's on SourceForge

    --
    Chris "Ng" Jones
    cmsj@tenshu.net
    www.tenshu.net
  55. Didn't we solve this with NNTP? by speedenator · · Score: 3, Insightful

    So NNTP solved this IMHO a rather elegant way...

    You have directories corresponding to newsgroups or mail folders or whatnot. i.e. alt.swedish.chef.bork.bork.bork is really alt/swedish/chef/bork/bork/bork

    Articles are numeric, i.e. \d+ for Perl types. The raw message is stored in each file.

    In each directory, there's a file called .overview, which is just the summary information for all the files.

    Thus, you can have zillions of small files, and happily grep and copy them to your heart's content. But you never do a 'ls' on a huge directory, you always just look through the .overview file. Or grep through it, if you like.

    So, in that sense, it's very much the best of both worlds. And, on the same box, you can specify rules on who can access the folders, so one file can be read by multiple people. Ooh.

    GNUS, an Emacs based mail/news reader, uses a variant of this called nnml, which rocks.

    Of course, when you get down to it, JWZ arguments aside, databases start to really look like what you want, especially on a corporate level when you're tossing the same piece of mail around to tons of different folks.

    -e

  56. Re:Database storage by sxpert · · Score: 2

    better yet, make the files downloadable via a secure web server.

  57. oh come on isn't .NSF the greatest? by ellem · · Score: 3, Funny

    Are you trying to tell me that a 5MB empty mailbox is asking too much? A text message that says "Hi!" costing 1.2MB is somehow wasteful?

    Lotus Notes Uber Alles!

    --
    This .sig is fake but accurate.
  58. A alternate proposal by Anonymous Coward · · Score: 2, Interesting

    What bugs me the most with current mail technology is the problems with distributed mail handling.

    I access my mail on all kinds of devices, sometimes online sometimes not.

    My main problem is not so much witch mail-server / retrieval / presentation to use, since they all have the same inability to give me a working distributed solution.

    For online usage imap is sufficient, but if I go ofline with my laptop or ipaq, Im lost.

    POP isnt very efficient either, since only one of my clients can be the deleter, I must make sure that I synced all my other devices before the deleter removes the message.

    Since I use tons of folders for my mail, some of my stored mails data back to the late '80s, it basically forces me to use imap so my folders are insync on all the devices, but again that only works online

    Further it only works if my imap server is online. That can be a trouble if Im in some far of part of the world and for some reason or not I have no contact with my mailserver.

    What I would like is a concept I call SyncMail

    A distributed db-system. First I set up some 3-4 primaries, spread out on the net with completly different access routes. Each of them gets a MX record.

    The sending mta is happy to deliver to a secondary mailserver if the primary is ofline.

    But here comes the magic!

    The system regarded as a secondary MX by the rest of the world is in fact a primary!

    It sucks the message instead of queing it into its db, tags it with it's own internal server id, and tries to sync it to all other SyncMail primaries.

    Sooner or later the new mail is propagated to all the primaries.

    On the client side, the SyncMail app, contacts all the primaries, and cheks against a private index, and syncs all new mails, first trying with the closest server.

    Since all mails are tagged with what primaries it's been delivered to, no mail is retrieved to the client more than one time.

    Now I have a complete local mail-tree in my client, regardles of which primary I was able to contact, sure if a mail was delivered to a primary that goes ofline before the client syncs, and it hasnt been able to sync it to the other primaries, I wont get it until that primary comes online, but - what the heck, in pop/imap is my mailserver ofline im completly out of buisness, so the loss is defenetly smaller in this case.

    And for my ipaq i just configure the client to work with a few important folders, and to skip attachments, to save storage

    And for sending, all clients stores it in a outbox, wich is then synced to the primaries, once it gets to a primary it is sent in normal SMTP
    this way I solve the problem of being able to send mail with propper originating SMTP headers. Of course the outbox is synced as well, so I get a ref copy of my mail on all systems.

    I have started on a SyncMail application and someday I might be able to complete it, but there is so much work all the time :(

    Would anybody else be interested in this concept, maybe we could complete it together.

    Or if this is a realy stupid Idea, I'd be glad if someone would point it out, so that I can focus on finding a better solution.

    1. Re:A alternate proposal by ahde · · Score: 2

      as others have pointed out, a large part of what you want is accomplished with NNTP. For those who spend a lot of time on mailing lists, etc. this seems like an ideal solution, but really that's because what they are doing is actually using email like a newsgroup. I think you're right though about needing to be able to have the ease of retrieval that comes with IMAP and the disconnect of POP. I'm not sure of a better way to do this than NNTP (maybe that is the solution), but I think having the synched servers is a bit wasteful.

  59. Re:Don't speculate. Profile. by mgedmin · · Score: 3, Interesting
    An interesting comparison, but its a comparison of Courier-IMAP vs UW IMAP, and not just Maildir vs mbox.

    I once tried benchmarking Maildir vs mbox for my mail archives (mailboxes with ~3000 messages). On ext2 Maildir was a loss:

    • Mutt took twice as long to open a Maildir than mbox from cold cache.
    • Mutt still took a bit longer to open Maildir than mbox from hot cache.
    • On ext2 with 4K blocks mbox ate 13 MB of space, Maildir ate 21 MB.
    • Small UI degradation: Mutt wouldn't show the number of lines in a message from a Maildir, and it wouldn't show percent progress indicator while reading the Maildir.
    Basically for my situation (read-only mail archives with large numbers of messages, which are rarely in filesystem cache, ext2 and constant disk space shortage) mbox was better. But my situation (personal mosty static mail archives) is remarkably different from running IMAP server.

    I did this test in 2000. I should probably try again some day with Reiserfs, but I heard various people telling me it doesn't improve Maildir performance. Can't say anything until I try myself.

    I therefore recommend you to try it yourself and see if Maildirs really help in your situation.

  60. the plan 9 approach by rpeppe · · Score: 5, Interesting
    as a basis for an approach i like what plan 9 does. the mail is made available to clients as a filesystem (provided by a user level program). each mail message gets its own directory; each mime attachment gets its own subdirectory within that message (and recursively, as MIME is recursive).

    here's a little transcript:

    % cd /mail/fs/mbox
    % lc
    Directories:
    1 113 128 142 157 171 186 20 214 229 243 258 272 287 300 315 33 344 359 373 388 401 416 430 445 46 474 56 70 85
    [...]
    % cd 318
    % lc
    Files:
    bcc date filename info messageid rawbody sender type body digest from inreplyto mimeheader rawheader subject unixheader cc disposition header lines raw replyto to

    Directories:
    1 2 3
    % head raw
    Return-Path:
    Received: from punt-1.mail.demon.net by mailstore for rog@vitanuova.com
    id 1021665470:10:17045:138; Fri, 17 May 2002 19:57:50 GMT
    Received: from psuvax1.cse.psu.edu ([130.203.4.6]) by punt-1.mail.demon.net
    id aa1016828; 17 May 2002 19:57 GMT
    Received: from psuvax1.cse.psu.edu (psuvax1.cse.psu.edu [130.203.6.6])
    by mail.cse.psu.edu (CSE Mail Server) with ESMTP
    id 27DA4199BE; Fri, 17 May 2002 15:57:13 -0400 (EDT)
    Delivered-To: 9fans@cse.psu.edu
    Received: from acl.lanl.gov (plan9.acl.lanl.gov [128.165.147.177])
    % head body
    This is a multi-part message in MIME format.
    --upas-mbyuptynpdsmbjuyeermihdgur
    Content-Disposition: inline
    Content-Type: text/plain; charset="US-ASCII"
    Content-Transfer-Encoding: 7bit

    Hi,

    If you seek excitement and thrills you need to look no further than
    Plan9 -- it gives you everything and then some, but in a good way (or
    % cd 2
    % lc
    Files:
    bcc date filename info messageid rawbody sender type
    body digest from inreplyto mimeheader rawheader subject unixheader
    cc disposition header lines raw replyto to
    % cat mimeheader
    Content-Type: image/jpeg
    Content-Disposition: attachment; filename=iostats.jpg
    Content-Transfer-Encoding: base64
    % page body
    reading through graphics...
    %
    "raw" contains the raw data that makes up the message. "body" contains the data after the encoding formats have been applied (hence in that case /mail/fs/mbox/318/2/body is a jpeg file, viewable directly by any usual jpeg viewer).

    the beauty of this scheme is that it hides the underlying storage scheme from the mail clients. if i wish to change things so that the underlying storage format is many files [currently it uses a traditional mbox format], none of the mail client programs have to change.

    plus i can use grep, diff, shell scripts, etc directly on the messages in my mailbox. procmail eat your heart out.

    1. Re:the plan 9 approach by glv · · Score: 4, Insightful
      You alluded to this, but I know slashdot, and it's worth being explicit about it to avoid all the flames:

      This is not how mail is actually stored on disk in Plan 9. The "real" mail storage is just mbox files. What rpeppe has described is the view that the mail storage system provides to clients.

      I agree it's very sweet, but the question is primarily dealing with the actual storage format.

      --
      ---glv
  61. I also have hard data that ReiserFS is NOT Ready by FreeUser · · Score: 2

    ... in the form of 8 different machines, all of which were running reiserfs on various GNU/Linux distros ranging from Suse to Mandrake to Debian, all of which suffered data corruption, data loss, and even the mysterious vanishing of entire directory trees (while disk usage exploded). In short, all had unrecoverably corrupted filesystems, not as a result of unscheduled shutdowns (which journalling is supposed to help protect against anyway), but on machines that were operating normally, without interruption. None of these filesystems survived more than 9 months of normal, everyday activity (without improper shutdowns, I will stress once again).

    These machines were located at three disparate sites, had different base configs, and in two cases were installed and maintained by different people.

    The only things they had in common were that they used Reiser, they lost data (severely), and had to be reconstructed from backups (this time without using Reiserfs).

    You may believe that you can trust ReiserFS, but I know for an absolute fact that I cannot, and I think it is very possible you will discover that at some point as well. Of course, having relegated everyone else's experience to mere anecdote, it is clear you won't learn this until it hits you in the face, personally. That's OK, not everyone is willing to learn from the experience of others.

    However, to those who are interested in learning from the experience of others I will say this: tread very, very carefully with ReiserFS. It is not ready for prime time, and should not be used in any production system. If you really need journalling, use XFS. It is very stable and quite difficult to damage (so far it has survived every stress test I've been able to throw at it).

    Now, go ahead and relegate this to anecdote if it makes you feel better ... I have hard data to back up my claims, and, quite frankly, a filesystem is sufficiently important that "your milage may vary" should be an unacceptable answer. By all accounts, if those who haven't (yet) suffered data loss with ReiserFS are to be believed, with ReiserFS YYM indeed V.

    --
    The Future of Human Evolution: Autonomy
  62. Take a look at Mercury by Havokmon · · Score: 2
    Mercury Mail, from David Harris, the author of Pegasus Mail, I believe does what you're looking for.

    I think it's the best of both worlds. Your 'INBOX' is like MailDir, where each 'new' message is a seperate text file. Once you've 'Filed' that message, however, it's compressed into a single file along with the rest of the emails for that folder.

    Personally, I think you're looking at the WRONG aspects of mail servers. You're getting way too technical. Nobody gives a shit about wasted inodes. When's the last time you defragmented ANY disk?

    The reason I use Mercury, is because of it's exceptional Netware NDS integration. Combine that with Pegasus Mail's NDS integration, and you have 'Roaming' users without all the profile garbage (Pegasus will use NDS calls to see 'who' you are, and read your email from your home directory). Oh, and it's free.

    To bad it hasn't been ported to Linux.. along with the PAM stuff needed to keep up the kick-ass user integration :)

    --
    "I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
  63. I've tought about this, and.. by Fweeky · · Score: 2

    I came up with mboxdir. It was actually a preliminary specification for a Win32 client.

  64. Learn from non-Unix models, too by mwood · · Score: 2, Insightful

    VMSmail's storage format is instructive. Each message is represented by a single record in an indexed file. A short message body is simply tucked into the record along with the headers and other metadata. Long bodies (more than around 2kb IIRC) are stored as individual files and their header records point to the files by name.

    Of course you all realized at once that the main file can get out of sync. with the directory which holds the external bodies. It does, sometimes, and fixing it up can be a pain. Any storage method which partitions a single message among multiple files is going to have similar problems. But it works pretty well, and it shouldn't be too hard to write a tool to groom the message store in case of inconsistency. It's worth study.

    It was a natural choice on VMS, which has really good multi-indexed file support in the base package. It works well with text messages, which often do fall within the size limit for avoiding external storage of the message body. Today it suffers the same problem that mbox does -- people use email differently now.

  65. I have to differ by b0bby · · Score: 2, Informative

    I have never used Exchange, but a friend of mine admins a large (50,000+ users) Exchange system. Even a few years ago, running on NT4, their servers did NOT go down, ever. They scheduled a reboot for patches etc every 6 months, that's it. I have had lots of Netware boxes up for over a year, but not Netware 5 running mail. I inherited such a box & it needed to be rebooted every month or two. Now I've replaced it with a Linux based mail server & I'm much happier. Still have a 4.11 box cranking along happily, even happier since the 5 box is no longer giving annoying messages about it's licences. And my 2000 Server has been up for coming up on a year with no problems.

  66. Maildir for 1000+ messages? by Jobe_br · · Score: 2, Insightful

    I have in excess of 46K email messages in my account alone, not to mention everyone elses accounts on my company's mail server. We use cyrus IMAP and qmail, both of which use the Maildir format mailboxes ... every client I've used (Mozilla, Communicator, Outlook/OL Express, Mail.app on OS X, Eudora, and Papi-Mail on PalmOS) seem to have absolutely no problem with this setup. Most MUAs are intelligent enough not to download all your headers every time you connect, so unless you're getting 1000+ new emails everytime you open a particular folder, you're generally not going to need to read all those headers every time.

    The server that runs this is a measly 600MHz PIII w/ 128MB RAM running RedHat 6.2 w/ a 20GB hard drive. I haven't gotten even close to running out of inodes, to my knowledge, and my server never goes down (really, the only times its gone down is when power has been cut to it and this has only happened twice in the past 1.8 yrs ... long live Rackspace).

    Maildir is specifically designed to handle mailboxes with large numbers of emails in them, contrary to other formats such as mbox. The problem with any sort of DB approach is the waste of space, even if you compress. A basic course in file structures will teach you a wealth of knowledge in this regard.

    Imagine this: you have a table that stores everything you need to know about an email. You have a few distinct fields for commonly accessed headers (subject, from, to, cc, etc.) each of which would need to be 'text' blobs, since you cannot limit their size (you've seen the emails that have to/cc fields that are miles long, right?) - well, 'text' fields are notoriously poorly optimized in database engines and quite difficult to search (you can create an index on a part of a text field, but that might not be enough, right?). Next you have the message body which would also need to be a text field since you don't limit it's length, either.

    Now, since the space for these fields (which don't *ever* change) is not optimized in the slightest, you might think that compressing them is a good idea, right? Well, what if an email is deleted - then you start looking at fragmented space in your database table which would need to be compacted periodically (much as mbox/.mbx files do today, if I recall).

    All in all, storing each message to its own file is not really *that* bad ... optimize the file subsystem beneath it, maybe allow for compression/encryption or that sort of thing, but otherwise, the folks that put together Maildir have certainly done a decent job!

    1. Re:Maildir for 1000+ messages? by Matthew+Weigel · · Score: 2
      All in all, storing each message to its own file is not really *that* bad

      Correct. The only complaint of Cyrus's format seems to be that it uses too many inodes. Well, in any situation where you're at risk of running out of inodes for mail, you're going to a) keep Cyrus's playground on separate disks, and b) take advantage of Cyrus's partitioning ability to spread it out over several different filesystems on multiple disks...

      This can be a problem for Maildir, since in the general setup Maildirs are spread out all over the place, making it hard to consolidate to 'mail only' partitions.

      --
      --Matthew
  67. No. by Doktor+Memory · · Score: 2

    You are going to run out of inodes at exactly the same time you run out of disk space, because they are one and the same thing.

    No.

    Running out of inodes is not the same thing as running out of space. Some of the symptoms of the two are the same ("can't create new files"), but they are completely different failure modes.

    Consult your local man pages for further details.

    --

    News for Nerds. Stuff that Matters? Like hell.

  68. Re:hmmm by AdTropis · · Score: 2, Interesting

    when i read your post, i immediately thought of a Jamie Zawinski article that i read a few weeks ago:

    http://www.jwz.org/doc/mailsum.html

    he talks about this very thing. quite interesting if you ask me.

  69. Unix philosophy vs. Borg philosophy by bee · · Score: 2

    There are two excellent reasons that so many people use Exchange.

    1) In general, it works out of the box. A company with someone with meager knowledge can set up a fairly complex mail handling system without much help.


    And that same person with meager knowledge is going to get hacked six ways from Sunday when the next Exchange exploit comes around, because what's not included in that meager knowledge is that you have to keep up on security patches if you want your easy-to-install mail server to not be an easy-to-hack mail server.

    2) It does A LOT. In it's most basic configuration it does what you need 10 or more programs in Linux to do, not to mention that most of those 10 don't exist.

    And God help you if one (or many) of those pieces of Exchange are broken or don't do what you want to do. Can't change it, it's part of Exchange! At least if one of those 10 linux programs are broken or doesn't work right, you can replace it with something better without affecting all the other parts.

    These are simple philosophic differences between Unix and Borg. Borg stuff usually has a shallow learning curve at the beginning, but then it ramps up as you discover things that are difficult or impossible to do. Whereas, the initial Unix learning curve may be steep, but it flattens out further in.

    --
    At least mafia-owned pizzarias make excellent pizza. Compare to Bill Gates.
    1. Re:Unix philosophy vs. Borg philosophy by bee · · Score: 2

      If you think that viruses are the only way to exploit security holes, then you're the perfect example of that person with meager knowledge from 1).

      --
      At least mafia-owned pizzarias make excellent pizza. Compare to Bill Gates.
  70. Re:Stays up for *days* before losing mail and rebo by Sabalon · · Score: 2

    Well, how many MCSE's on paper equal one person who has actually done the stuff?

    We have an exchange server - one person managers it along with tons of other stuff. It pretty much runs itself. We just moved to Ex2k, but were on Ex5.5 for quite a while - I can think of only one time it crashed and that most likely had to do with a 3rd party virus scanner intergrated onto the server. Removed that and no more problems.

  71. I think a hybrid solution is called for. by mellon · · Score: 3, Insightful

    I don't think it makes sense to store email in dbm files. It's too sketchy - what happens when the dbm file gets corrupted? The nice thing about flat files is that if something goes wrong, you can fix it with vi.

    I think the right solution to the problem is to key off the message ID, which is supposed to be unique. Then define a mail folder as simply a list of message IDs. Messages can appear in more than one folder, but hopefully not in no folders.

    To make this efficient, I'd hash the message ID, and use a hierarchy of directories, because Unix doesn't do well with large flat directories. The hierarchy could auto-extend, so that as one subdirectory fills up, you do a sub-hash and split it into more directories.

    The problem of tiny files is a real one. The solution is probably to make the bottom of a hash a file rather than a directory, and store more than one message in each such file. You don't have to store a lot of messages in these files to win - even ten messages would produce a big win, and would be pretty efficient.

    The format of the individual files should probably be indexed sequential access - that is, a TOC at the front, and then the contents as plain text, nothing fancy. The TOC should be in ASCII, not binary, and you should be able to rebuild the TOC by looking at the file.

    Babyl used to use a control character as a delimiter, which worked pretty nicely - much better than using "^From ". Ever seen >From in an email message? That's because Unix mail uses "^From " as an inter-message delimiter, so it has to quote it, and it does so stupidly. So use ^_ as a delimiter, and if ^_ appears in the email message, just double it. Take a doubled ^_ out when reading a message.

    As for compression, I don't think it's worth doing at first. Disk space is cheap. Yes, my email folder is pretty huge, but it's really not a major problem. Making the storage system extra-complicated by uncompressing MIME is something to add on after you've got something more basic that works - you don't have to solve every problem all at once.

    As for folder scan performance, you can make a cache, and have the mail program scan the cache from time to time when it's idle to clean up errors. This is much better than trying to come up with a format that's optimized toward folders - if you try to optimize toward folders, you wind up creating all kinds of problems, IMHO.

  72. Maildir and 1000+ Mails by PCGod · · Score: 2, Interesting
    and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.

    Until about 3 days ago, I had 1700+ messages in my Maildir, and pine (patched to support Maildir) opened my inbox in about two seconds. Compare this with my sent-mail folder, which had about the same number of messages in it. This folder is stored in mbox format and it took 5+ seconds to open AND CLOSE this folder. I believe that Maildir is the fastest option, short of keeping a seperate database.

    1. Re:Maildir and 1000+ Mails by vidarh · · Score: 2
      Maildir is great as long as you're using a filesystem that can handle it well, such as reiserfs... mbox format can have some advantages on filesystems that handle lots of medium to small files badly (such as ext2fs), though, but I still think the risks of manipulating a single mailbox file from multiple applications is too big to be worth it.

      I designed the mail system Nameplanet.com ran on (about 1.5 million mail accounts), and we used qmail with Maildir, but wrote our own highly optimized POP3 server with some extensions (for our web frontend) and caching of size and header data etc., to reduce the amount of stat()'s, and with the few enhancements we did, Maildir was extremely fast (and robust).

  73. Re:I also have hard data that ReiserFS is NOT Read by FreeUser · · Score: 2

    Searches of google and google groups turns up no one else that shares your experences of "unrecoverably corrupted filesystems" with reiserfs.

    ahem. You really didn't look very hard, did you?

    filesystem corruption (2.4.18, reiserfs)

    Bug#122230: reiserfsprogs: filesystem corruption with reiserfs

    Re: ReiserFS / 2.4.6 / Data Corruption

    ReiserFS desaster - advice please !

    and about 829 other matches. Need I go on?

    Oh, BTW, as I noted, two of those systems didn't belong to me, they belonged to people I know who experienced similar difficulties (and documented them as well).

    Enough people, of enough diverse walks of life, are having issues like this with Reiserfs that it is clearly not something that is safe to be deploying in a production environment. Even if only 1% of the people using it are being so bitten, that number is way too high (and based on my own experiences and those of several people I know, I suspect that number is a lot higher than 1 per cent).

    --
    The Future of Human Evolution: Autonomy
  74. Don't agree with your definition by Ashurbanipal · · Score: 2
    Tons of interesting information and viewpoint. Thank you.
    I would define Unix mail as mail (rfc822 format) downloaded and stored locally on a per-user basis. IMAP, Exchange, and other remote protocols are very different beasts.
    I would define that as "home user" Email. Very specifically not corporate or academic strength.

    I don't think "unix mail" is all that useful a handle, but if I was going to use it I'd be referring to mail that stayed on unix hosts - usually in mbox format - as opposed to mail downloaded to user PCs with unknown operating systems.

    Corporations and other profit-making legal entities can't dedicate specific PCs to single users cost-effectively in most situations, and they certainly can't effectively manage storage and back-up email stores if the Email messages are scattered over many failure-prone end-user hard drives. IMAPv4 and whatever the proprietary boyz are shopping this week purposely keep the email on the server, so that evidence can be extracted (or destroyed, if you work for Enron) from server backups, and so that filtering and surveying of mail data is easily possible.

    For example, some corporations sweep their drives for return & delivery receipts over a month old and delete them.

    Another example, corporations doing highly sensitive government contracts will sweep their email stores for classified information leaks.

    Another example, I need to get my Email regardless of whether I'm on my laptop at a remote site, at my desk in town, or at home tunneled through SSH. Downloading it to one of these boxes makes it inaccessable to the others.

    The list goes on, but basically downloading email to a local drive is primarily for AOL users and basement hackers. That being the case, your points about maildir are excellent - let the filesystem handle most of the details. I'd add that if you must run a db for speed reasons (such as a subject line db used by an IMAP server) do it so that it can be deleted and/or recreated on the fly from the contents of the maildir. No need to create additional dependencies.

  75. Who modded this guy up? by RelliK · · Score: 2
    You are going to run out of inodes at exactly the same time you run out of disk space, because they are one and the same thing.

    No they are not. The parent post is correct.

    In fact, I believe all the inodes are created when you create your filesystem, all space is mapped to an inode (though of course one file can use multiple inodes).

    What you believe has nothing to do with reality. I suggest you take an OS course. Or read up on how Unix filesystems work.

    It's usually said that if you have 4k inodes, you'll lost 2k (on average) per file.

    There is no such thing as a "4k inode". You got your terminology wrong. You are thinking about blocks. On average, you waste 1/2 the block size for each file on your filesystem, since the last block is, on average, half-full. An inode is not the same as a block! They are two completely different things, which is why your entire post makes no sense. Think of an inode as a "file header". I don't have time or energy to post the full description but I already mentioned where you can get relevant information.

    --
    ___
    If you think big enough, you'll never have to do it.
  76. I agree by RelliK · · Score: 2

    The poster of the article just assumes that filesystem must be slow when working with 1000+ files per directory and we need a database to save us. That's nonsense, from my experience.

    Apart from that, there are some very important reasons why maildir is much better than a DB. With maildir you can use standard Unix tools to manipulate your email. With a DB you can't do that. Mailbox corruption is not a problem with maildir -- even if corruption were to happen it would be limited to one message, or a small number of messages (not even a mailbox). With a corrupted DB storage, you lose everything -- all the mail of all the users in all the mailboxes. Ask an Exchange admin about it some time.

    --
    ___
    If you think big enough, you'll never have to do it.
  77. Re: I was thinking of viewsets by os2fan · · Score: 2
    I have a shared mail account with a fixed name at work, so I know what you are talking about.

    If you are looking at a file system as a heirarchial structure, why can't you have more than one such table.

    The idea being that some mail clients would be only in the "person" tree, and that others would only be in a "function" tree. One could then be given access to both the person and function trees, and shunt mail between them for others to see.

    The other thing that we should do is do things that encourage the use of these things. Make the tools for doing this easier to use and understand, and make the concepts easier to grasp.

    --
    OS/2 - because choice is a terrible thing to waste.
  78. Re:Cyrus does *not* use the maildir format by Matthew+Weigel · · Score: 2
    I don't think its fair to say that cyrus doesn't use the Maildir format ... it certainly does

    Eh? Your 'counter' to the factual claim that it doesn't is... an unsupported claim that it does?

    It is not Maildir format. Maildir specifies the delivery method as well as the file format; by your logic, Maildir is nothing but mh. But it's not just "single file per message," and neither is Cyrus; and they're not mh in different ways. Cyrus does not use the new/tmp/cur subdirectory setup, Maildir does not use CRLF to represent newlines.

    Cyrus mailboxes are not designed to permit multiple processes, unaware of each other, to access mail without failure - that was the primary design consideration of Maildir. Cyrus side-stepped that problem, and was therefore able to improve performance more (do an 'ls' in a Maildir with a thousand messages - that's what a client has to do to read that folder).

    In short, there are superficial similarities of design, but they are different.

    --
    --Matthew
  79. Re:Database-Type Storage, Hybrid by ahde · · Score: 2

    common sense tells you to store binary files in the file system. Include a URI or path or or whatever in the DB. Believe it or not, direct file access is faster (on most OSes) than a database. You gain nothing by including a blob in the DB. It's not searchable, and it slows other searchs down. The only draw back is that you couldn't do this if you were trying to code your solution completely in SQL or for some other reason are not able access the filesystem directly.

  80. Re:Client issue by vidarh · · Score: 2

    Perhaps because at some point you as a user are likely to switch mail client, and may have mail you want to migrate. But anyway, this discussion doesn't exactly seem to be a discussion of whats nice for end users, does it? It's a useful discussion for anyone designing or deploying mail systems of various types, including MTAs and MUAs.

  81. Re:Don't speculate. Profile. by ahde · · Score: 2

    I wonder...

    would a tarred maildir decrease the number of disk reads (renames would be trickier, but possible) and inodes, or would the tar overhead be greater than that of the filesystem?

  82. confusion resolved by Doktor+Memory · · Score: 2

    I see the problem here. You are attempting to use Evolution when the mail client you were actually wanting to install is called "mutt".

    If you don't like GNOME and GTK+, for the love of pete don't use a mailer that says in big flaming letters "I am a GNOME program!".

    --

    News for Nerds. Stuff that Matters? Like hell.