Slashdot Mirror


Improving Unix Mail Storage?

At first, there was mbox, then there was Maildir, and Bill begat Outlook and .mbx. CaraCalla wonders if there is a better way to store mail than the way we currently store it today. I admit, with the changes that email has undergone over the past 5 years (changes in what is being sent, not necessarily in how it is sent), it may be time to reinvent the mail format. Read on for CaraCalla's analysis of the current mail options, and his thoughts on where we may go in the future. If you were to design your own MUA, how would you design its mail storage? CaraCalla asks: "Does anybody know a good, free solution for storing mail on unix hosts? The reason that I ask this question is my discontent with available techniques:
  • mbox: There are problems with locking, corruption, access-times, and bloat.
  • Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi) and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
  • Cyrus: Basically the same as Maildir with database features.
  • UW-Imap mbx: That's classical mbox with extensions allowing multiple access.
  • Evolution: Basically mbox with database features.
  • Windows clients: Typically some proprietary db-format. Pathetic.

But the thing that bugs me most is disk space. Typical inboxes are made of 5% to 10% of Text including Headers and HTML. The rest are BASE64- (or UU-) encoded pictures, word documents, zip archives and so on. The problem here is the encoding which wastes considerable amounts of space (at least one third).

Some ideas about the ideal mail-storage:

  • One file per Mailbox-folder, allowing multiple folders per user. Should those files reside in one central location or in users Homedirs?
  • Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?
  • File format: gdbm, Sleepycat db? Something new?
  • Should the security model allow users to directly access their files, grep them, copy them around?
  • Shared folders, virtual domains?
  • Unicode support in folder names? Imap message-IDs, flags, useragent specific state-information?
  • How would MTAs deliver mail? How would clients access? File-locking (NFS)?
  • What about backwards-compatibility? Writing libmailstore (anyone)? adopting UW c-client?

Does my ideal mailstorage exist somewhere? Is somebody working on a project addressing this? Does anybody have some other hints? And please no mbox/Maildir flamewar!"

554 comments

  1. One folder to rule them all... by Pig+Hogger · · Score: 5, Interesting
    Stuff the mail in one folder, one to rule them all.

    But put multiple indexes (by sender, subject, date, whatever key-classes you want to assign messages) and the possibility to restrict the range displayed. With careful programming, you can manage many users who won't be able to read each other's mail, except as required.

    This way, you can arrange your mail as you please.

    No more message duplication. Send a memo to 250 people? Just send it once, but tag it as readable by the 250 sendees.

    Of course, this calls for an SQL database... :) :) :)

    1. Re:One folder to rule them all... by TheBracket · · Score: 4, Insightful
      You do realise that you just described MS Exchange (albeit in a chronically simplified form), right? :-)

      Exchange is actually a pretty decent mail server, although only using it for mail is pretty dumb - its groupware features are the killer app. It exposes both benefits (in particular, single storage of messages with multiple recipients) and flaws (if your db goes boom, it affects all your users - or at least all your users in a given mail partition) of database-based mail storage.

      I remember seeing a project to combine mail storage with PostgreSQL a while ago. Anyone know what happened to it?

      --
      Lead developer, http://wisptools.net
    2. Re:One folder to rule them all... by plierhead · · Score: 1

      Of course almost a decade ago Oracle's (now defunct ?) oramail product did this, ie 250 people who got a single message were actually linked to a single, shared copy of that message (easy enough to do using the good old Oracle DB, natch :). It actually worked pretty well but never really caught on. Perhaps this approach would catch on now though, what with corporate morons emailing massive Word docs, power points and mp3 files to each other, not to mention virus-laden executables.

      --

      [x] auto-moderate all posts by this user as insightful

    3. Re:One folder to rule them all... by southpolesammy · · Score: 1

      Add to this idea the encryption of the maildb via pgp or gpg, and access to the folders within the maildb granted on an ACL basis. Saves space, preserves rights to access/modify/delete, ensures integrity of the message, and allows for easy and fast redundancy. All the pillars of security and reliability are covered in a single paradigm shift.

      --
      Rule #1 -- Politics always trumps technology.
    4. Re:One folder to rule them all... by alen · · Score: 1, Troll

      That's MS Exchange alright. It's been doing all this and more for almost a decade. And managed properly it's very stable. It's one of the good products that MS makes.

    5. Re:One folder to rule them all... by Ignominious+Cow+Herd · · Score: 0

      I think the problem there is that anyone outside your mail server's influence still sends the mail to 250 people, even if they are all in your mail server's domain. To really support that kind of system you need a smarter SMTP protocol such that sendmail or whatever can say "here is a message, send it to these 250 people, since I can see they all belong to you". AFAIK sendmail/SMTP can't do that today.

      --
      Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.
    6. Re:One folder to rule them all... by DrZaius · · Score: 1

      You're probably looking for pronto.. and it can use a few different database back ends. It works well, just don't use the CSV driver :)

      --
      -- DrZaius - Minister of Sciences and Protector of the Faith
    7. Re:One folder to rule them all... by intermodal · · Score: 0, Flamebait

      the biggest problem with Exchange is that it runs on M$....anything that isn't run on a native TCP/IP OS (read: *nix) isn't suitable if you ask me. It's not blind M$ hate...its a reliability and customizability factor.

      --
      In SOVIET RUSSIA... erm...NSA AMERICA, the Internet logs onto YOU!
    8. Re:One folder to rule them all... by mikefoley · · Score: 3, Interesting

      Do they still not have single mailbox restore? Do you still need to build a seperate exchange server just to restore a single mailbox or message?

      Exchange made my life miserable for many years in the 93-95 timeframe. It might be better now.

      The concepts weren't bad (db for mail, etc) but the execution was terrible. I was field testing Exchange (for Alpha NT) when I was at DEC and asked the Exchange manager point blank about single mailbox restore and he said "Why?" My answer "When my boss wants that email he really needs yesterday, you're telling me I have to build a totally new system and restore 8GB (at the time) of data just to restore a single mail message????"

      "Uh, yea?"

      No thanks...

      --
      What's my Karma Mr. Burns? "Excellent"
    9. Re:One folder to rule them all... by alen · · Score: 2

      In exchange 5.5 it depends on your back up program. We have it with Veritas net backup. Exchange 2000 it's out of the box. And with Veritas or Commvault software you can do single message restore. Or there is deleted item retention time out fo the box. Even if user empties the trash folder the message will still be there fot the number of days you specify on the server side admin program. You just recover the message in outlook client.

    10. Re:One folder to rule them all... by adamjaskie · · Score: 1

      Problem is, its overpriced and bloated. If you spent a little time you could do the same thing on a dual Pentium Pro 200 system for about $300 using Linux. We just are trying to figure out a way to take out the "spend a little time" part. Sorry, but I dont think that a couple hours of time is worth $5000*(users/10) [or whatever MS charges for exchange server now]

      --
      /usr/games/fortune
    11. Re:One folder to rule them all... by ffattizzi · · Score: 4, Informative

      It's called a brick level backup, and most Exchange admins don't use them. The better setup is to set a reasonable deleted item retention policy. I set mine for 60 days. If I need any email deleted in the last 60 days, I can get it with out any restore, mailbox or otherwise. Works great.

    12. Re:One folder to rule them all... by cuyler · · Score: 1

      Actually, if you use Veritas Netbackup along with an Exchange agent you can back up an exchange database at the brick level. This allows you to restore a single mailbox or even parts of the mailbox (appointments for example).

      In many companies however they prefer to have a separate Exchange server and just do regular backups (fulls and differentials) and then restore the full server. Then they'd get the system administrator to restore the mailbox individually. This can be a considered a third check. It's not hard to restore the wrong thing - doing that to a production machine isn't good.

    13. Re:One folder to rule them all... by MrCreosote · · Score: 1
      --
      MrCreosote Meow!Thump!Meow!Thump!Meow!Thump! "You're right! There isn't enough room to swing a cat in here!"
    14. Re:One folder to rule them all... by Malcontent · · Score: 3, Informative

      James from the apache group can use an SQL datastore.

      --

      War is necrophilia.

    15. Re:One folder to rule them all... by Ageless · · Score: 3, Insightful

      If you think that you can replicate what Exchange does in "a couple house of time" you have not been at it long enough.
      There are two excellent reasons that so many people use Exchange.

      1) In general, it works out of the box. A company with someone with meager knowledge can set up a fairly complex mail handling system without much help.

      2) It does A LOT. In it's most basic configuration it does what you need 10 or more programs in Linux to do, not to mention that most of those 10 don't exist.

      Rage against the machine all you want, but when your boss says you will have shared contacts and calendars and your clients will run Windows; find me a solution that comes within miles of the ease of Outlook and Exchange and I'll give you a cookie.
      Actually, I'll probally give you several thousand dollars.

    16. Re:One folder to rule them all... by Tayknight · · Score: 1

      Domino does all this. It's been doing it longer and much better. Domino is _the_ answer for collaboration. Where do I pick up my several thousand dollars?

      --
      Pair up in threes. - Yogi Berra
    17. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      Domino is a pizza, not a frigging mail server

      :: Rim Shot::

      Seriously thio, Domino is not the solution. It si overpriced, waaaaaaay bloated and a right pain in the arse to admin. I have admined both Domino an Exchange and all I can say is Gimme Exchange anyday.

    18. Re:One folder to rule them all... by whirred · · Score: 0, Troll

      Are you on crack? Calling Exchange's "groupware features" anything but an utter joke is absurd. They're still trying to catch up to what Lotus has been doing for years, and they aren't doing a very good job of it.

      If you just want to run email, Exchange/Outlook is fine. If you want a collaborative groupware sollution with work flow built in, Domino/Notes is the only answer, currently.

      Plus, Domino runs on Linux, Aix, Solaris, NT, 2000, OS/2, AS/400... The list goes on and on. As far as a shared database, just setup shared mail.

      Not to mention, unlike Exchange, when one mail database gets hosed your whole server doesn't get scrapped. And you aren't supporting Microsoft.

    19. Re:One folder to rule them all... by Anonymous Coward · · Score: 0
      It's not blind M$ hate

      Actually it is. Blind, because you're an ignorant moron. WTF is a "native TCP/IP OS"? You mean you have to run IP in the kernel to build a decent mail *reader* or *database* on a given platform?

      Here's a hint, clueless: TCP/IP has NOTHING to do with mail storage, and in fact is a very small part of a complete mail system.

    20. Re:One folder to rule them all... by tupps · · Score: 2, Informative

      Samsung Contact:

      http://www.samsungcontact.com

      Which is based on HP OpenMail. About 1/6 the cost.

      --
      Go out and get sailing!
    21. Re:One folder to rule them all... by Bert64 · · Score: 1

      But a server being setup by someone with meager knowlege will only end up becoming a codered drone.
      Administering of server by people with "meager knowlege" as you put it... should really be stamped out, if there werent so many people who believe the "so easy to use.. anyone can use it!" bullshit, and have someone with little or no knowlege run a server, then codered wouldn`t have been such an issue, i still get 30+ hits a day, and there will always be a new worm/virus in the works.
      Even the most lazy or overworked admin with half a clue, would have managed to install a patch within 6 months of it`s release.

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    22. Re:One folder to rule them all... by jamesbromberger · · Score: 1

      And like Novel Groupwise, IIRC.

    23. Re:One folder to rule them all... by jcoy42 · · Score: 2, Interesting
      Exchange is actually a pretty decent mail server

      The problem is, mail is a critical app, and in some ways exchange *really* misses the point.

      I think the most frustrating situation I've ever seen as a system administrator was when we were doing a scheduled reboot of the exchange server. After about 20 minutes waiting for it to resync itself & shutdown, my boss, one of those "smart enough to be dangerous types", decided it was a typical NT hang & to go ahead and hit the reset button (he did it in a single motion without a word, there was nothing anyone could do).

      It took almost 2 weeks to get things straightend out. We had backups, but it turned out there was a 2048 meg bug with NT restores that had been re-introduced by a recent server upgrade, and we had problems getting the patch rolled into the new code (legato tech support- need I say more?).

      350 people *screaming* for 2 weeks. I was very glad I was not the mail administrator.. but very sad to be sitting next to him.
      --
      Never trust an atom. They make up everything.
    24. Re:One folder to rule them all... by H310iSe · · Score: 4, Interesting

      ditto. Either exchange is impossible to administer well or just very, very hard. Until recently you couldn't restore single users mailboxes (there was brick backup but eventually even MS admitted it didn't work) you have to restore the entire server to get back one corrupted data store.

      At one firm I was at exchange went down and it took 3 days of 24 hour work to get it back up (I guess we were 'lucky') - the solution? a $50,000 backup server that does absolutely nothing but wait for the main exchange server to go down. First time we had to use it, we were down for a day, it didn't come up.

      I'm currently looking for a mail server, any server, that does mail well for 50 - 500 users (I'd settle for 50-100). I've played w/ xMail, it's tough to config. Heard good things about WorldMail (qualcomm?) but not used it. Heard free BSD's qmail (?) is good as well. I'm very interested in anyone who has info about free or cheap mail servers that can be configured in a day or two of work. If that exists.

      --
      closed minded is as closed minded does
    25. Re:One folder to rule them all... by TheCarp · · Score: 1

      And if you actually call the warmed over shit that Dominos makes "pizza" then you have some absolutly amazgingly low standards. Then again, You have actually like exchange so I guess we shouldn't be surprized.

      -Steve
      (who just can't pass up the easy shot sometimes)

      --
      "I opened my eyes, and everything went dark again"
    26. Re:One folder to rule them all... by BlueUnderwear · · Score: 3, Funny
      If I need any email deleted in the last 60 days, I can get it...

      ... and so can the FBI, the SEC, and the Attorney General. Using Exchange should not be an excuse to also repeat Bill's other mistakes ;-)

      --
      Say no to software patents.
    27. Re:One folder to rule them all... by ffattizzi · · Score: 1

      My company is not a monopoly, so I think were safe. ;)

    28. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      Have you ever thought of Lotus Notes?
      It is one of the best mailing-clients i've ever seen. Every user got his own mail-database.
      It contains not only mail, but an calender with meeting planner (automatic sending of invitations and so on). You have a very secure system for access rights, full-text-index is also included.
      And with its replication-concept you are even able to keep it on diffrent systems (transfer it to your local machine for speed or to your notebook) without any difference to normal usage.
      Of course, Notes has its problems, too, but as mailing system it is the best you can have today (imho, of course).
      regards

      Keep open minded, but do not let your brain fall out of your head!

    29. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      yes, but notes/domino is also a piece of shit. there *has* to be a better way. this stuff sucks *inside IBM*.

      [of course this is anonymous]

    30. Re:One folder to rule them all... by qeL3-i · · Score: 2, Informative

      Postfix is good, free, and open source. It's also easy to configure. You should be able to get it going in about half an hour.

      As for Microsoft and Exchange Server, aren't they convicted criminals? I don't want to use software made by criminals. If they are willing to break the anti-trust laws, what other laws might they be willing to break? I don't trust them with my email.

    31. Re:One folder to rule them all... by Harik · · Score: 1
      Mmm.
      MAIL FROM <me@me.com>
      RCPT TO <you@you.com>
      RCPT TO <boss@you.com>
      RCPT TO <joe@you.com>
      DATA
      .....
      Hint: That's the way SMTP works. If you send a message to 250 recipients, it's sent to your mailserver as 250 RCPT TO lines and 1 body. If they're all delivered locally, it gets delivered locally. If 220 of them go to another server, it gets 220 RCPT TO lines, and one copy of the message.

      Think first, then post.

      --Dan

      (Yes, my protocol may be a bit off. I don't send mail manually as often as I used to)

    32. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      I have build a systom on top of PostgreSQL and Python ( http://sourceforge.net/projects/pyimap ). It _is_ working, it is fully debugged and it works perfectly with Outlook, Netscape, Evolution. I'm not sure if SQL is the right thing though. We have about 50 000 emails in the system. People _often_ use folders with 3000+ messages. There are about 6 users on the Pentioum Celeron Machine with 128MB RAM.I would say the thing has following features:
      + fast reading of large mailboxes
      + I have added some indexing on subjects, thus fast searching
      + the actual mail text is stored outside of the database - simple backups
      - it is written in Python - message parsing is slow. Someone should write message parsing in C as a module
      - IMO although you have indexed database, some operations are not as efficient as they can be if you selected proprietary database format
      - Large memory footprint - every client process must cache data as the SQL operations are quite expensive...

      I do not think SQL is ideal for mail though. Espcecially for PostgreSQL I missed subtransactions and table locking is BAD in SQL operations. IMO something like maildir with special features, that would allow pointers (not duplicating mail msgs) would be better. Or just some kind of own database format, because there are specific needs e.g. for IMAP access.

    33. Re:One folder to rule them all... by moderators_are_w*nke · · Score: 0

      Ever used a Windows box? TCP/IP has very little to do with mail performance. Exchange is slow because its big an bloaty, not because its running on Windows. An interesting experiment would be to run a unix mail system on a windows/linux duel boot box and compare performace in each O/S. Probably the UNIX system would win, due to thread / process optimisations (Windows has fast threads / slow processes, Linix has slow threads (pthreads, anyway) and quick processes).

      --
      "XML is like violence. If it doesn't solve your problem, use more." - Anonymous Coward
    34. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      My company is not a monopoly

      That's exactly what Bill said!

    35. Re:One folder to rule them all... by adamjaskie · · Score: 1

      Hmm...
      How does this look?
      SuSE Linux Groupware Server with Lotus Domino
      I dunno, but it looks like this is a pretty good system. The MAJOR problem with exchange is its price. Every copy I have seen sold comes with limited client licences. You STILL need to buy software for the clients, even though you already paid for a limited number up front. Every time you want to connect more clients, you have to purchace both client software and a client licence. They (MS) charge you out the nose TWICE for just letting people share a calender and email. All this still has to run on a Microsoft server OS, which in itself costs another $1000 (for only 5 client licences) which wont even run on older hardware.
      Another thing: can you tell me an easy way to get DNS, file/printer sharing, NAT, firewall and DHCP client/server on a Pentium 83MHz (overdrive for 486 mobo) system using MS software? That works reliably? I set it up in a couple of hours with Linux. It was a great learning experiance. I learned how to configure BIND, Samba, DHCPd, IPTables, etc. If Linux had a good groupware server, that would be great, because it would mean that people dont need to have an MS server at all. From what I have seen around the web, Linux can basically do just about anything an MS server can do, except Exchange. If there was a good Linux alternative, Linux could finish taking over the server market. It looks like Domino is pretty promising. It just needs a bit more publicity. (oh, I found it with a quick google search in case you were wondering)

      --
      /usr/games/fortune
    36. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      Want to hear scary. The place I am at has 15000 users on 2 (two!) dual P2-400 servers using MS Exchange

    37. Re:One folder to rule them all... by lynnroth · · Score: 3, Informative

      I would definitely recommend XMail Server. Cross platform (Linux, FreeBSD, WinNT/2K/XP, Solaris), runs multiple domains with no problem. Not really that hard to set up if you read all the docs. There are several web config apps for it now and it's not that hard to program against the TCP config interface. It's being actively developed (new release every month or more often if a rare bug comes up.) It's licensed under the GPL. I use it with about 30 domains with 4-20 users per domain. I have had 0 problems with it. Easy to use, easy to upgrade (just copy the new binaries) no complaints.

    38. Re:One folder to rule them all... by Zathrus · · Score: 2

      Domino doesn't need more publicity. It needs to be put into a grave. And then shot.

      Domino well predates Exchange (Lotus Notes was the precursor to Domino), and it's generally considered expensive, hard to admin, and difficult to use. It is, however, much more flexible than Exchange.

    39. Re:One folder to rule them all... by phil+reed · · Score: 1
      Rage against the machine all you want, but when your boss says you will have shared contacts and calendars and your clients will run Windows; find me a solution that comes within miles of the ease of Outlook and Exchange and I'll give you a cookie.


      Groupwise. (And no, it doesn't need to run on a Novell server - the daemons run very nicely on NT. And the clients use TCP/IP these days. And, no more Outlook viruses.)

      --

      ...phil
      "For a list of the ways which technology has failed to improve our quality of life, press 3."
    40. Re:One folder to rule them all... by overturf · · Score: 1

      > Exchange made my life miserable for many years in the 93-95 timeframe. It might be better now.

      Wow, that's amazing since MS Exchange was first released in 1996.

    41. Re:One folder to rule them all... by Ubertech · · Score: 1

      I take care of a Eudora Worldmail system as part of my duties with my employer. For what it does, it works fine. It has never crashed on me, and handles our small office (60 people) just fine.

      My only problem is the whole shared address book thing. I think that is the reason why so many people use MS Exchange with the Outlook client. We have a kludgy solution to that problem, but it is only useable because there are sp few people here.

      I wish I had the coding skills to do this - I would like nothing more than to be able to install a Linux system that had a single server application to handle all of the following:

      Email, with accounts based on system users and/or custom users.
      Multiple shared address books managed on the server. (Accessed with Evolution/Outlook/...)
      Group and List management abilities
      Shared user/group scheduling

      I know there are separate applications to handle many of these things, but one comprehensive package would be nice.

      Does anyone know what is out there in the Open Source world to handle the scheduling and the shared address parts of all this? (Unfortunately, I have been a slave to MS for far too long and am only now starting to learn the ways of UNIX. So far, it's quite refreshing.)

      --
      Be quick to listen, slow to speak, and slow to anger.
    42. Re:One folder to rule them all... by Anonymous Coward · · Score: 0


      "I don't want to use software made by criminals." Jesus Tap Dancing Christ.
      GET OVER IT!!!!!

    43. Re:One folder to rule them all... by mikefoley · · Score: 2

      Yea, ok, whatever.. I messed up the dates. I was involved with Exchange from the beta period on. Made 2 trips to Microsoft during the beta. That was around 95. It was all painful.

      First thing that goes after 40 is the memory.. Second is the...er...ah....

      --
      What's my Karma Mr. Burns? "Excellent"
    44. Re:One folder to rule them all... by Ageless · · Score: 2

      I am actually a Certified Lotus Notes Programmer Thing, it being required for a previous job. I know Notes / Domino inside and out and it doesn't come anywhere close to the ease of use of Outlook.

      Outlook is so popular because it's GUI is quite good. (Like many of the MS products). I freely admit that it's storage sucks, it's servers and protocols suck but to the end user that doesn't matter. They want a nice GUI they can use without a tech guy standing over their shoulder and Outlook does provide that.

    45. Re:One folder to rule them all... by Ageless · · Score: 2

      It won't be stamped out until the need for email is stamped out. Every single company, no matter how small needs email these days with few exceptions. The problem arises that getting a decent admin, and certainly one that can make a Linux or UNIX server sing does not come cheap. It's much cheaper to pay $5000 (or much less, if you buy it with your boxes) for Exchange and have the one guy that knows a little about computers set it up.

    46. Re:One folder to rule them all... by derch · · Score: 1

      Go with qmail.

      I work for a company that sell's POP3 accounts. Currently on one server, we have something like +10,000 addresses under almost as many different domain names.

      I wasn't the one who set the server up, but I know it required a few readily available patches to handle the volume. Other than that it's a pretty standard install.

      It also uses vpopmail to manage the different addresses and to make scripting easier.

    47. Re:One folder to rule them all... by swb · · Score: 2

      Exchange is actually a pretty decent mail server, although only using it for mail is pretty dumb - its groupware features are the killer app.

      Icky. Exchange does well enough for email and scheduling, but anything else requires you to dump your life into the black hole of the exchange database.

      We've been fortunate with our Exchange installation -- lots of AV software, excessive hardware and limited use of the groupware functionality have kept it stable and functional.

    48. Re:One folder to rule them all... by guinsu · · Score: 2

      Try iMail for NT/2000, my company has been using it for about 3 years. Its not perfect, but its pretty good, and it hasn't lost any data yet.

    49. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      We have had great success with CommuniGate Pro, runs on pretty much every platform, is very cheap compared to Exchange or Groupwise. We have our running on two Cobalt Raq Servers that are mirrored, no downtime, very little overhead, easy to administer, and lots of happy users (supports thousands of users easily).

    50. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      Domino is a great system, but it hardly "works out of the box" -- you need to know quite a bit to get it up and running effectively.

      (I used to do Domino consulting, and everytime I'd set up a server for some small business with a Stockboy-turned-MCSE for a support guy I'd cringe. I knew they'd never be able to put in the development time to get the value out of the system and the administrative overhead would just kill them. It would usually take 2 days just to get the server setup with all the features working, and I knew what I was doing.)

    51. Re:One folder to rule them all... by Pointer80 · · Score: 1

      That actually depends on your MTA...

      (IIRC) qmail, for example, would send 250 separate messages in this instance.

      --
      [%- PROCESS life -%]
    52. Re:One folder to rule them all... by smnolde · · Score: 2
      exim or postfix.

      'nuff said.

    53. Re:One folder to rule them all... by Ben+Hutchings · · Score: 2

      Those just deliver mail, either to other programs or into simple mailboxes; they don't provide any facilities for reading or searching the mail afterwards. It's easy enough to integrate either of these with Cyrus, though, which will do that.

    54. Re:One folder to rule them all... by morzel · · Score: 2
      Exchange is actually a pretty decent mail server, although only using it for mail is pretty dumb - its groupware features are the killer app.
      <IRONIC>
      What groupware features?
      </IRONIC>
      Exchange is a reasonably stable email-platform, but calling it 'groupware' is paramount to calling a mini cooper a luxury sedan.
      Single Copy Object Stores (that's what they're called in LotusSpeak) can be advantageous in some cases, but there ain't no free lunches: it's more difficult to manage, and if something's corrupted or needs te be restored from back-up, you're SOL.

      --
      Okay... I'll do the stupid things first, then you shy people follow.
      [Zappa]
    55. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      I work for Qualcomm (and yes, an AC for this post) and have first-hand knowledge of their corporate mail infrastructure. We take mail VERY seriously - at least inside =]

      For 90% of the mail system, Sendmail does the dirty work. There's some WorldMail (yes Qualcomm), and some Exchange, neither of which is used for anything other than for direct user access. But we do eat our own dogfood with QPopper. Yup, 5000+ employees use POP3 to get their mail.

      The servers are Sun Ultra 60s, dual proc, 2gig of ram, fibre disk, and each one handles 500 - 1000 users.

      I'm actually amazed how well it works. Especially since a good number of users see the need to pop their mail every 5 minutes. Most leave mail on the server. A rare few whose spools spill over the gigabyte mark. And to top it off, there is ONE guy who can claim the name "postmaster" fulltime.

      Luckily since it's unix based, I can ssh to my mail server and use procmail and pine. =] (Diversity, another reason to shy away from the MS solutions)

    56. Re:One folder to rule them all... by Bert64 · · Score: 1

      The cost of hiring a decent admin should be nothing compared to the potential costss a cracker could cause you. Besides, there are external firms who will come and install a server for you, and regularly maintain it. Perhaps people who get hit with worms or viruses for which patches already exist should be fined, even more heavily if it`s a company, afterall the actions of a worm running on one companies machine can cause financial losses (bandwidth costs) to other companies.

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    57. Re:One folder to rule them all... by mcg1969 · · Score: 1

      ... and so can the FBI, the SEC, and the Attorney General. Using Exchange should not be an excuse to also repeat Bill's other mistakes ;-)

      I know you went off-topic just to be funny, but at the same time you're missing the point. The poster was suggesting a way to provide users with the ability to restore emails that they have inadvertently deleted. No matter how you slice it, that means you have to back up pretty much every email for at least a fixed amount of time...

      ...and any such system is going to be "vulnerable" to this kind of government search. Exchange is not unique in this regard.

    58. Re:One folder to rule them all... by justinjtp · · Score: 1

      FirstClass

      It's a pretty good email system not perfect but it has worked fine for us. As a groupware system it has calendaring, Global Address book (Everyone in the system can be found just by typing the first few letters of their names) and conferencing.

      The Administration for the system is not hard. Easy enough to do that we have delegated most of the administration the less technichal people of our organization. THe user rights controls are also pretty extensive.

      One other thing of interest is if I send a message to every user on the system only one copy of the message will exist unless the user makes a copy of the message.

      Backup with system allows us to restore on or more mailbox if we choose to.

      As for clients it has support on Mac OS, Mac OS X, Windows, and includes a telnet interface and Web interface.

    59. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      "Besides, there are external firms who will come and install a server for you, and regularly maintain it."

      If you pick up the phone and call the closest small business vendor, you will end up getting Exchange installed.

      Personally, I think that it would be best to outsource the mail handling to a ISP or someone, but if you have your own server, the MS support infrastructure out in the field is much better.

    60. Re:One folder to rule them all... by broody · · Score: 1

      SuSE has a prepackaged box just for this purpose.

      --
      ~~ What's stopping you?
    61. Re:One folder to rule them all... by Dazza · · Score: 1

      Either exchange is impossible to administer well or just very, very hard. Until recently you couldn't restore single users mailboxes

      This is simply untrue. Using ( for example ) BackupExec you can restore the whole database, a single mailbox, a single folder in that mailbox, or even a single message in a particular folder in that mailbox.

      First time we had to use it, we were down for a day, it didn't come up.

      Perhaps you should try configuring it, and then testing next time ?


      In my experience, for upto around 500 users Exchange works perfectly well, with almost no maintenance overhead, and when used with Outlook ( as it nearly always is ) the integrated scheduling, calendering, address books, etc, etc, make it extremely simple and powerful to use.

      --
      -- "I know that this is vitriol, no solution, spleen-venting, but I feel better having screamed, don't you ?"
    62. Re:One folder to rule them all... by Anonymous Coward · · Score: 2, Informative

      when your boss says you will have shared contacts

      LDAP

      and calendars

      CorporateTime (http://www.steltor.com)

      and your clients will run Windows

      Both of these work fine on Windows


      find me a solution that comes within miles of the ease of Outlook


      You can even keep using Outlook; LDAP is supported by Outlook, and Steltor provides an Outlook plugin that talks to their server instead of Exchange.

    63. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      I did not have sexual relations with that woman!

    64. Re:One folder to rule them all... by Anonymous Coward · · Score: 0

      Which is technically correct. A Cuban cigar is not a dick, and a mouth is not a cunt.

    65. Re:One folder to rule them all... by darkonc · · Score: 2
      I had sendmail running a server for 100,000 users. It was slightly modified, however.
      The user database for most of the users was a very simple fixed-record database. this made for fast access. (later migrated to IMAP)
      Email was stored in a separate directory for each user, and the user directories were hashed into a tree, with about 100 (or was it 10?, this was a while ago) users per leaf directory.
      heavily RAIDed -- more than a dozen disks to store the email. This distributed the I/O cost.
      The server had 4 processors (200Mz each -- it was a while ago) and 2GB of ram.

      I was the only person directly responsible for the mail server, and I considered it a sign that something was wrong if it took more than 5-10 seconds for email to get delivered (users with 400MB mailboxes that insisted on checking them every 5 minutes excepted).

      --
      Sometimes boldness is in fashion. Sometimes only the brave will be bold.
    66. Re:One folder to rule them all... by bafu · · Score: 1

      Hint: That's the way SMTP works. If you send a message to 250 recipients, it's sent to your mailserver as 250 RCPT TO lines and 1 body. If they're all delivered locally, it gets delivered locally. If 220 of them go to another server, it gets 220 RCPT TO lines, and one copy of the message.

      He/she was saying it would store one copy, no matter how many local recipients you had. That way you'd only be storing one copy, along with 250 pointers to that copy. That's got nothing to do with SMTP... it's all a question of what your local delivery agent and mail reader access can handle.

      Personally, that kind of thing makes me a bit nervous... I'd rather get more drives than complicate my mail storage scheme. I'm an old fuddy-duddy, though.

    67. Re:One folder to rule them all... by ahde · · Score: 2

      The problem is that people don't want an alternative. They want exchange+outlook and if there is anything different (down to the icons) they won't use it. There are alternatives to Exchange. Anyone can whip up a web-based alternative to the "killer apps" of exchange, the addressbook and scheduling. But people won't use anything that doesn't work exactly the same with outlook, or that requires installation, because Dell with sell you a server with exchange already installed, and clients with outlook.

    68. Re:One folder to rule them all... by crucini · · Score: 2
      Want a standards-based SMTP server with server-side calendaring that works nicely with Outlook and the plethora of email clients? You want this affordable Intel based application!

      From http://www.bynari.net/bynari/products.html.
      The server runs on Linux, of course.
      Unfortunately, the linked page does not render for me in Netscape/Linux.

      Steltor, whose site seems to be broken, makes good scheduling apps that can connect to Outlook. Their server runs on lots of OS's, including Linux. I know one customer, and he's happy.
    69. Re:One folder to rule them all... by Electrum · · Score: 2

      Are you sure that you want to be using Postfix? I don't...

      http://cr.yp.to/maildisasters/postfix.html
    70. Re:One folder to rule them all... by Electrum · · Score: 2

      Yes, and there is a VERY good reason why it does that:

      http://cr.yp.to/proto/verp.txt
    71. Re:One folder to rule them all... by Beliskner · · Score: 2
      Here's the secret: many IIS have no admin. Some manager dude double-cliked on setup.exe and that was it.
      Administering of server by people with "meager knowlege" as you put it... should really be stamped out
      What are you suggesting? Making it illegal to set up a webserver without a licence?? Who'll issue this license? The Government? Great, that's game over in the censorship area.

      Personally I'd much rather have Code Red flooding my servers all day than having to hand control of webservers over to the Government. How would you police this? Block port 80? Wouldn't this set a precedent for the (RI|MP)AA to block Kazaa ports by requiring the reconfiguring of routers that give connectivity to all Kazaa supernodes? Your statement is absolutely unacceptable.

      --
      A caveman dreams of being us, the incalculable power and riches. We dream of being Q, then what?
    72. Re:One folder to rule them all... by Inthewire · · Score: 1

      The cost of hiring a decent admin should be nothing compared to the potential costss a cracker could cause you

      Right. Mom and Pop might have a fire and lose a half of a million in inventory, a quarter mil in fixtures, and another quarter mil in lost business. Should they spend a million on fire prevention?
      No. Perhaps they need a million in insurance, but the protection isn't always as expensive as the worst case scenario.
      Not every piece of info needs to be on a machine that is connected to the web, and most lost emails are not worth the salary of an admin.
      Remember, IT is an expense.

      --


      Writers imply. Readers infer.
  2. hmmm by Cyno · · Score: 1

    I'd store it in either an XML format or possibly as separate files in a directory structure with a filesystem that could handle the extra load like XFS. But its nice to have a single file to backup, or at least a single directory. I think evolution, netscape, mbox handle mail just fine as it is. Or how about a filesystem in a file that you mount loopback and compress old mail as needed. Store all mail in separate directories and files for attachments and have XML metadata describing everything in an easily parsable system.

    1. Re:hmmm by ewoods · · Score: 1

      So you suggest adding the bloat of XML to the bloat of BASE-64 encoding?

    2. Re:hmmm by Cyno · · Score: 1

      Yep. Disk space is extremely efficient in both cost and performance and the extra metadata would spead up access. But there wouldn't be any base-64 encoding. Store the mail in separate files in a directory, text for your html/text email content and binaries for each of the attachments. Compress old directories. But the problem with email currently is not the bloat but the problem of parsing a 1,000,000 line file or syncing up to a mail server. If it was stored in a loopback filesystem it might help with the organization and performance of your mail servers. But I wouldn't know.

    3. Re:hmmm by Ignominious+Cow+Herd · · Score: 0

      Right! Why are we still Base64 encoding mail? This is to support old (non-existent?) gateways that couldn't support 8-bit encoding. I say screw em if any still exist. Make em update, save bandwidth, who cares about space?

      --
      Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.
    4. Re:hmmm by Anonymous Coward · · Score: 0

      disk space isnt a problem for who?

      damnit man, i've got 24 megs at the high end of my product line...

    5. Re:hmmm by fejjie · · Score: 1

      XML will not speed up disk access. You'd still have to parse the file... and MIME is a XML-ish type format to begin with, so adding XML on top of that just adds bloat and nothing more.

      if you want to speed up disck access to a folder in, say, mbox format - what you do is keep a summary file which stores the offsets of each message (and probably other some other pre-parsed information such as from/to/cc/subject/date/in-reply-to/references headers so that you can easily display a message-list and optionally thread it wihout having to parse the messages again).

      You could perhaps encode this file in XML, but it'd be a lot faster if it were a binary file.

      read (fd,

      is a lot faster than decoding XML which requires massive amounts of mallocing and freeing on top of the file I/O.

      it is a well-known fact that XML is slow.

      the only plus that XML has is that it gives you a lexer for free and it's in a human readable format. These are not necesary for a file that can be auto-generated at will on an mbox file (for example).

    6. Re:hmmm by fejjie · · Score: 1

      that should read:

      read (fd, &offset, 4);

    7. Re:hmmm by AdTropis · · Score: 2, Interesting

      when i read your post, i immediately thought of a Jamie Zawinski article that i read a few weeks ago:

      http://www.jwz.org/doc/mailsum.html

      he talks about this very thing. quite interesting if you ask me.

  3. why use a 'file' at all? by nullchar · · Score: 1, Redundant

    Perhaps storing the message, attachments, etc. in an RDBMS would be a better way. Give each user a table-space with a table per folder/directory or just each user with a single table. With a decent RDBMS, storage on disk is no longer your concern. This way web, local text/gui, and remote text/gui clients could easily access the same information. There's probably several solutions out there already (with wrappers for your favorite mail clients).

    1. Re:why use a 'file' at all? by alen · · Score: 3, Funny

      MS Exchange has been doing this for almost a decade now. Version 5 was the first decent one to get and 5.5 is great.

    2. Re:why use a 'file' at all? by Icy · · Score: 1

      Using a database server would be ideal. You would get fast lookups for things such as subjects, but I am not sure how quota would be handled. I did see a patch that used qmail for delivery, but injected the entire mail message into mysql or postgresql (probably it was mysql), but it was young and I can't find it anymore. There are many database password lookup patches, but thats about it. pop3 and imap servers could be patched easily so you would have to probably have your local mail clients use those services rather then the traditional mbox/MailDir lookup to get around having to make messy wrappers. It could be done.

    3. Re:why use a 'file' at all? by frank_adrian314159 · · Score: 2

      Hello! Why don't you talk to IBM about this? It's called Domino.

      --
      That is all.
    4. Re:why use a 'file' at all? by Anonymous Coward · · Score: 0

      Looks like databases are the way to go.

      Oracle:
      http://www.oracle.com/ip/deploy/ias/ema il/index.ht ml

      Novell:
      http://www.novell.com/products/groupwis e/

    5. Re:why use a 'file' at all? by Anonymous Coward · · Score: 0

      holy crap! somebody else knows whats best for them!

  4. My fast, easy solution. by crazney · · Score: 2, Interesting

    Well. My solution for storing ALOT of BIG email but still browsing fast is to use MySQL. My mail client is Pronto! (written by Muhri, in perl, gtk, etc).. I have several 10's of thousands emails in about 10 different folders. Reaction time is immediate and searching is pretty damn quick aswell.
    The mysql server is at work, and I can view my mail from anywhere simply by pointing my client at my IP. Presto.

    I'm also slowly writing a MySQL-based IMAP server which will hopefully be compatible with Pronto!... But as with so many projects, itl probably take some time to complete...

    David

    --
    stuff
    1. Re:My fast, easy solution. by Anonymous Coward · · Score: 0

      when you say big emails....I assume that you mean less that 16 MB to handle MySQL row limitation. We have users who want to send 30 MB messages. Damn artists.

    2. Re:My fast, easy solution. by crazney · · Score: 2, Informative

      when you say big emails....I assume that you mean less that 16 MB to handle MySQL row limitation. We have users who want to send 30 MB messages. Damn artists.

      Nope, This limitation disapeared ages ago..

      Information can be found here here and here

      I suggest opening up the config file (generally /etc/mysql/my.cnf) and ensuring everything like "max_allowed_packet" etc are > 50-ish MB.

      David

      --
      stuff
    3. Re:My fast, easy solution. by cybermage · · Score: 2

      We have users who want to send 30 MB messages. Damn artists.

      In all seriousness, get these lunatics to use some kind of P2P solution. Heck, even AIM is better than hosing your mail server.

      A friend sent me 42MB worth of zipped MP3's over AIM with no trouble. Took about 5 minutes, or so. (me=cable, him=~T1)

    4. Re:My fast, easy solution. by Anonymous Coward · · Score: 0

      Good thing he zipped 'em. I'm sure it saved TONS of bandwidth...

    5. Re:My fast, easy solution. by MaxVlast · · Score: 1

      Gee whiz, I'm glad I have a Mac so I can use a sensible format like tar and not have to wait for non-existant compression to take place. I send my PC friends tarfiles and they freak out. It's usually all right after I assure them that WinZip can deal with crazy alien formats like tar.gz.

      --
      There should be a moratorium on the use of the apostrophe.
      Max V.
      NeXTMail/MIME Mail welcome
    6. Re:My fast, easy solution. by vsync64 · · Score: 2

      But tar.gz is compressed. Maybe you just meant .tar.

      --
      TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
    7. Re:My fast, easy solution. by cybermage · · Score: 2

      Actually, now that I look, the seperate files total 47.8MB and zipped were 46.3MB; So, there was some compression. However, it would still be more handy as a .zip than seperate files even if the "compression" made the sum greater than its parts.

      The real point of my post though was that large files have no business being sent via email. P2P solutions are faster and friendlier; And, although not typically used for this, AIM is probably the most common.

      Peace.

    8. Re:My fast, easy solution. by Gossy · · Score: 1

      Don't forget WinZip (like all other compression programs I've seen that use .zip) has a 'no compression' option...

    9. Re:My fast, easy solution. by MaxVlast · · Score: 1

      That's what I said. Just because I finished the post with a different thought from the original thought doesn't mean that the first thought wasn't there.

      --
      There should be a moratorium on the use of the apostrophe.
      Max V.
      NeXTMail/MIME Mail welcome
    10. Re:My fast, easy solution. by tricorn · · Score: 1

      Have you looked at dbmail? I believe I found it on freshmeat.

      Available via CVS at

      :pserver:cvs@lightning.fastxs.net:/cvsroot-dbmail
      module dbmail, no password necessary.

      Originally used just mysql, but I believe they've added interfaces to other DBs. Some of the coding was horrible, but it basically worked. Has an MDA that can be made to work with sendmail, has an IMAP and POP server. Haven't looked at it for about 4 months, we were going to use it for an e-mail system (along with Columba as a client) but the project was cancelled. There was lots of room for improvement, but there's been quite a bit of work done on it since then, so it might be worth using as a starting point at least.

  5. XML by Anonymous Coward · · Score: 0

    ...and why not?

    20.....15.....10......5......doo!

    1. Re:XML by Anonymous Coward · · Score: 0


      But xml is probably much slower to parse than mbox style folders.

      But you were probably a troll.

    2. Re:XML by Anonymous Coward · · Score: 0

      I don't see how it will be 'much slower'.
      And look at the other benefits:
      1. validation
      2. more parsing tools
      3. transformation tools

  6. Bah! by Anonymous Coward · · Score: 0

    I like how you included the Microsoft proprietary client format blah blah blah. Gotta have that eh?

    Are you aiming for this discussion to be server side, or client side? Or a general slugfest as long as it's anti-Microsoft?

    1. Re:Bah! by Anonymous Coward · · Score: 0

      How do Microsoft Clients have anything to do if your compairing unix-serverside mail formats?

      Troll.

  7. yEnc by Glytch · · Score: 2

    If you're so worried about encoded binaries, why not try yEnc instead of base64 or uuencoding? It works well in newsgroups. It might work well for email storage as well.

    1. Re:yEnc by fejjie · · Score: 2, Informative

      *sigh*

      yEnc is a complete waste of time. Had the author of yEnc actually gone out and read some pre-existing MIME specifications before going out and re-inventing (a square) wheel, he would have found that MIME already defines an encoding that gets even better compresion than yEnc. It's called "binary". Yes, MIME can handle binary content.

      Content-Transfer-Encoding: binary

      it's as simple as that.

      Btw, I've implemented the yEnc specification in my library GMime

      My favourite part of the yEnc authors defense for why he implemented yEnc is "but most news clients don't implement MIME". Hah, join the real world where NO news reader implemented yEnc. (yes, I know there are clients that implement it now, in fact my code is used in a few of them).

      Believe me as someone who spends time hacking on news and mail readers, yEnc is nothing but a headache.

    2. Re:yEnc by erc · · Score: 1

      MIME isn't all that hot, either. Don't forget that before we had MIME, we had uuencode. The only reason to use MIME is if you want to actually tell what's in that attachment before you open it. Content type and file name. Other than that, MIME is just reinventing the wheel.

      --
      -- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
    3. Re:yEnc by fejjie · · Score: 1

      I gather you've never READ the MIME rfcs, have you? ;-)

      MIME solves i18n issues as well. Not to mention that uuencoded blocks in the middle of a text stream do not even come close to solving all the problems that MIME solves.

      MIME was designed to solve problems that could not be addressed using uuencode. Think of uuencode as a quick hack that worked so long as you you only intended to send US-ASCII email messages.

      Then along came MIME, the properly designed transport medium, that solved i18n issues, solved the problem of some transport agents not understanding 8bit characters, and allowed logic structuring of data (not to mention "better compression" which seems to be the rave these days with yenc which simply does a horrible job of trying to solve the same problem that MIME has already solved).

      so no, uuencode did not solve what MIME solves and has solved since ~1995.

    4. Re:yEnc by Anonymous Coward · · Score: 0
      "And please no mbox/Maildir flamewar!"

      Can we have a yEnc Flamewar instead ?

      Please ?

    5. Re:yEnc by Anonymous Coward · · Score: 0
      Just one problem. NNTP doesn't support binary transfer.

      In addition, yEnc encodes the cr and lf characters, which is important in keeping binary files from becoming corrupted when they are transfered via FTP. (Users often forget to transfer in binary mode.)

  8. Database storage by AngusSF · · Score: 1

    Receive new-mail as ASCII files 9just as now), store them in a database. Attachments should be decoded and stored as binary objects in the database, with the ability to extract them and save them. The extraction process would leave behind info about when they were extracted and where they were saved. Following them after that would be up to the user. Database could be MySQL or some other common OS SQL database.

    --
    "A gun is a tool, Marian. No better, no worse than any other tool. An axe, a shovel, or anything." Shane (1953)
    1. Re:Database storage by Anonymous Coward · · Score: 0

      Don't store binary data in a database, unless you have a good reason for doing it. One reason:

      1. You have many clients accessing the database that have different requirements for file IO. If you have a mixed unix/windows environment, samba/nfs may not be an option.

      Generally speaking, keep a pointer to a file on a FILESYSTEM, where files are ment to live. Really, why do you need to them them in a (soon to be) bloated database?

    2. Re:Database storage by Anonymous Coward · · Score: 0

      Generally speaking, keep a pointer to a file on a FILESYSTEM, where files are ment to live. Really, why do you need to them them in a (soon to be) bloated database?

      My gut tells me there's a security hole here someplace. It might be ok, if you left the files encoded. But decoding a file and dropping it on the hard drive would require some fairly paranoid precautions, me thinks.

    3. Re:Database storage by sxpert · · Score: 2

      better yet, make the files downloadable via a secure web server.

    4. Re:Database storage by Anonymous Coward · · Score: 0

      It is generally not recommended to store binary data in a database. Unless you are going to actually query on that binary data it would be more efficient to store the binary object in the file system and place a pointer in the database table.

  9. I like MS Exchange by alen · · Score: 5, Informative

    A single database to hold of the user's email. Single instance storage ensures that only one copy of any attachment is in the database at once, no matter in how many email messages it was sent in. API's for back up let you back up the whole database or individual mailboxes. And depending on your backup solution you can restore mailboxes and individual emails. Anti-virus software that integrates into the server side of the software. In Exchange 2000 if you accidently delete a mailbox you can easily bring it back with all emails without restoring from tape. Only files to worry about on the user end is a personal address book and archived email. Unless you use POP3 or it's archived in personal folders the email always stays on the server preventing problems like accidentaly downloading important emails you need at the office being on a home PC. And it's stable. Not as stable as UNIX I admit, but it stays up for months without a reboot. And in my experience most problems are solved by a simple reboot. In 4 yeas of exposure to exchange, the only non-admin related problems I've seen were 1 database corruption where I needed to run a utility and wait 45 minutes for it to work again. And a corrupted MTA that needed a reboot to get it working right again.

    1. Re:I like MS Exchange by BrookHarty · · Score: 2

      We do the same thing, 1 Large multi-TB oracle database, ldap front ends. Of course this is for Voice Mail(encoded), SMS and Email. Not cheap, but its pretty standard, all the vendors seems to offer the same configuration.

      I think the sweetest thing is how 1 object(voicemail/etc) can go be tagged for a select group of people. Theres Garbage Collection, extra storage, all kinds of handy features. Just a well thought out, easy to manage, solution. Thou it costs :)

      Oh by the way, its Unix baby, ya ya. (-;

    2. Re:I like MS Exchange by Anonymous Coward · · Score: 0

      I've seen some rather nasty problems where a single item in db caused store.exe to crash. The db was in a consistent state according to all available tools. The client unfortunately had bad backups, stupid backup exec and false warnings creating laziness, and had to go back to ancient backups to get back up and running. Tons of lost data.

      I'm happy to spending my time with postfix and other Free Software apps, my life is infinitely more pleasant.

    3. Re:I like MS Exchange by Anonymous Coward · · Score: 0

      Then you have clearly never worked on a large exchange system. Exchange's DB format makes you suffer at the whim of the limited repair tools MS gives you. An as for 45 min, riiiiight. A full db check on a 2gig exchange db that got corrupted(crashed) can take upto 6 hours. Watch this sometime: Do a search in a large public folder, including the body of the messages. Now try to post a new message from system B. Please wait while windows.... has locked everything up tighter than a baby's bottom.

    4. Re:I like MS Exchange by alen · · Score: 2

      Backups are important. I work for an internal IT in a company and we do them every night. Fulls on weekends.

    5. Re:I like MS Exchange by alen · · Score: 2

      And I forgot to add. Deleted item retention. You set the number of days. The user deletes an email and empties the trash. The email is still in the database for that number of days and can be restored without back up software.

    6. Re:I like MS Exchange by alen · · Score: 2

      I ran isinteg with all options and to fix everything on a 12GB database and it took 30 minutes to fix it. Another time we lost power a few times in a few hours. Decided to run isinteg since we had sudden shutdowns on the server. Same database took 45 minutes that time. We had 20,000 warnings after multiple power losses and a UPS failure at the same time. And this was a huge APC UPS that wasn't wired right by the previous generation of admins and electricians. 6 hours? You must have some old hardware.

    7. Re:I like MS Exchange by hspatel · · Score: 1


      Have you guys place with Oracle's Solution. Seem definetly more scalable then MS's.

      http://www.oracle.com/ip/deploy/ias/email/esfo.h tm l

    8. Re:I like MS Exchange by afidel · · Score: 2, Insightful

      I liked this until the server (well cluster actually) that served our EMEA operation fell over. EMC, Compaq and Microsoft fought over who was at fault and in the end 22 hours later the thing had been rebuilt and restored from tape. This was a solution put together by a Microsoft Premier Support partner that was supposed to have 5 9's availability and fell over in its first couple months! Instead of 0 lost email we had all emails that hadn't been in the last tape cycle lost along with any emails that timed out waiting for the server to come back up, not only that but noone could read their email for an entire day (2 business days actually).

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    9. Re:I like MS Exchange by Mika_Lindman · · Score: 1

      What, your saying good things about MS product? Your IP has just been banned by /. admins.

    10. Re:I like MS Exchange by Slashamatic · · Score: 2
      All these people posting anonymously. Nobody should punish you becase you run Microsoft, you must suffer enough already. However the original poster and the replies really put their finger on it.

      MS Exchange isn't a bad thing. It is quite useful and a lot easier than having zillions of mail files. Unfortunately, being proprietary, it is difficult to repair because you don't have the sources to hack around yourself. Even if a standard MS database was used like SQL Server, there would be more possibility of sucessfully repairing the thing. With a fully open-source message repository, it would be even better.

    11. Re:I like MS Exchange by Slashamatic · · Score: 3, Insightful
      Backups are important

      Sorry, restores are even more important. I hope you check your backup strategy by trying a recovery every so often. Many a time I have heard people who "thought they had a backup" and then it turns out that the thing that was being backed up was in an inconsistent state.

    12. Re:I like MS Exchange by Telastyn · · Score: 2

      One thing to consider with Exchange though is the gigantic bloat the thing is, and how much hardware is required to run the thing in comparison to *nix mail servers.

      Example: 100 person little tiny company:

      Exchange:

      Win2k server + MS Exchange ($5k? for proper licensing)
      Dell Poweredge 2550 2x1.8ghz 4gb ram ($10k)

      This should support you to (*maybe*) 150 people.

      Unix (assuming i386 hardware)

      *BSD + sendmail/qmail (free)
      Dell Poweredge 2550 2x1.8ghz 4gb ram ($10k)

      This should support you to probably around 250 people.

      Anti-virus software can run on the server side here as well (Norton has a version for unicies), and if you accidentally delete a mailbox (why did you do that again?) you can restore from tape much easier than with Exchange (yes, I've had to do both).

      And god knows you don't have to take the bsd machine down for an hour each week to patch it.

      And you can actually go through and run proper scripting on mail with the unix solution (spam catchers, conditional distribution lists, proper server side OoO replies).

      And the unix solution will give you proper logging of messages and mailboxes.

      And the unix solution will require far less attention by the IT admin(s) that will likely be woefully understaffed to begin with.

      And the unix solution will be faster, cleaner, more reliable, more scalable, more compatable to clients, while still being about a third cheaper.

      Not to mention of course that Exchange machines are a security liability, and should never ever ever be deployed into a hostile environment (ie: the internet, where mail comes from; or any company, where there's (statistically) someone with less-than-good intentions trying to get payroll info and the such.)

    13. Re:I like MS Exchange by cfb · · Score: 1

      If you want all of this on the Unix side in a single product from a vendor I would check out iPlanet Messagening Server, as well as others.

      Single instance storage - Got it
      API's for backups (or doing anything) - Got it
      Restoring Mailboxes - Got it
      Anti-virus - Got it
      No need to restore mailbox from tape if it was delete in recently - Got it
      and it's unix.

    14. Re:I like MS Exchange by Sabalon · · Score: 2

      Uh...

      Dell PE2550 2x1.13 w/2GB RAM - $7k
      Supporting about 800 people just fine.

      I will agree on the "hidden" costs though - Ex2k requires active directory, so you need an NT server w/ however many licenses and that'll hurt you, in addition to the exchange license.

      You can run anti-virus right on the exchange server intergrated into exchange, and with exchange you can set it to retain deleted mailboxes for a period of time before they are purged - removing the tape restore altogether. If you backup individual mailboxes, then it's just as easy for tape restore.

      The logging and scripting is true, though I don't agree with the IT admin attention - our exchange and Unix mail machines are both pretty problem free. As for security - uh...how many times has sendmail been the root of problems? ;)

      Now that I'm done with that :
      To quote our exchange person when upgrading to Ex2000 "Why don't we just go ahead and put all the e-mail on Linux?"

    15. Re:I like MS Exchange by Anonymous Coward · · Score: 0
      Dell Poweredge 2550 2x1.8ghz 4gb ram ($10k) [with BSD/qmail] This should support you to probably around 250 people.

      What are you smoking? Our aging mail server is a Sun, Ultra-1 based system, with 256MB of RAM. It runs sendmail, pop, and IMAP, servicing about 120 people. The CPU in that thing can't be over 200MHz (guessing 133).

      Using *BSD and qmail (both faster than Solaris and Sendmail) on that monster of a box should serve 500-1000 users quite easily.

    16. Re:I like MS Exchange by Telastyn · · Score: 1

      How many times has sendmail been the root of problems in the past 3-4 years?

      Perhaps it's just my epxerience, but a PE2550 with 2gb of ram (current setup) is pegging it's memory with a mere 80 users. And it's only being used for calendaring...

      Like I said, perhaps it's just my experience...

    17. Re:I like MS Exchange by Telastyn · · Score: 1

      I was trying to be generous, as most shops that use Exchange will also be sending massive office attachments which chew up ram, and storage.

      perhaps the message would be better s/around/atleast/

    18. Re:I like MS Exchange by Sabalon · · Score: 2

      How many times has exchange been the root?

      Now outlook is a different story!

      We have 2GB ram, right now it as 600M avail and 726K in system cache and has peaked at 1.3G of RAM used. That is with about 70 active users out of our total.

    19. Re:I like MS Exchange by Anonymous Coward · · Score: 0

      That sounds spookily familiar. So, er who's fault was it then?

    20. Re:I like MS Exchange by Zule_Boy · · Score: 1

      Uhh, a company I know of named ZoomTown.com had 30,000+ mailboxes on 2 PII 350s with 512MB ram each, plus some NetworkAppliance NAS devices to hold the mail in MailDir format. You do not need a 1,000,000,000Hz computer to save EMAIL! It is a game of I/O, not cpu power. Save your money and buy a RAID controller or something.

    21. Re:I like MS Exchange by ahde · · Score: 2

      of course, that Dell Poweredge 2550 2x1.8ghz 4gb ram isn't used for the email. That's the admin's toy. He put the mail server on an old P100.

    22. Re:I like MS Exchange by Anonymous Coward · · Score: 0

      I think 2 X 1.8Ghz is the wrong approach for an MTA. You would be better off stripping 6 disks on a SAN for the /var file system on a single CPU 800mhz I think. PGP will get the CPU going a bit more but its I/O at the office.

    23. Re:I like MS Exchange by Sabalon · · Score: 2

      We buy almost all of our servers the same specs - that way they are pretty much interchangable and can handle extra load if they ever need it.

    24. Re:I like MS Exchange by Anonymous Coward · · Score: 0

      Would have been EMC's fault.

      Of course no EMC client can talk about that without getting their asses sued off. There's a small "if you have a problem you can't talk about it" clause in the contracts that most people are only made aware of when they get pissed off at EMC

  10. MSFT will save us! by Boulder+Geek · · Score: 1

    By replacing the file system with SQL Server!

    --
    A well-crafted lie appears unquestionable - Dama Mahaleo
  11. Stench by Anonymous Coward · · Score: 0

    ...and a pox. Definitely... both a stench and a pox on you and your benign ancestral parade. Now, bring me another Dr. Pepper.

    And I don't want to hear any further talk of this 'male' thing.

  12. XML all gzipped up by global_diffusion · · Score: 1

    Personally, I would store mail in one big XML file, all gzipped up. XML is large and bulky, but it's repetative and texty so it should compress well. The alternative is a SQL style database, but that seems like overkill; there aren't really that many relationships there. Just use XML and search it for what you want.

    1. Re:XML all gzipped up by Anonymous Coward · · Score: 0

      Wouldn't uncompressing a huge file all the time, and multiple times at the same time kill the mail server?

    2. Re:XML all gzipped up by ObviousGuy · · Score: 2, Funny

      No. This is Unix we're talking about, not Windows. Unix doesn't have any problems.

      --
      I have been pwned because my /. password was too easy to guess.
    3. Re:XML all gzipped up by fejjie · · Score: 1

      Why would you want to waste even more space by putting an easily parsable MIME message into XML? What could you POSSIBLY hope to gain here?

      The whole point of MIME is to make messages easily parsable. XML will do absolutely nothing but bloat the file. Not to mention you can't actually pre-parse the message into XML and get rid of the MIME formatting because then the format would be completely useless in that you wouldn't be able to verify signatures. Not to mention that XML can't handle binary data, so you'd still have to encode.

    4. Re:XML all gzipped up by Anonymous Coward · · Score: 0

      Good thing you haven't been modded up yet, at least the moderators still have their wits about them.

      You're not solving the problem. The problem is, maildir/mbox are not sufficient to handle thousands of users with hundreds of messages each, checking their mail every 2 minutes. Compressing the mailbox, and wrapping it in XML is really the absolute dumbest idea I've heard on this topic.

      Hopefully, if you think about what you wrote for same amount of time it took you to write it you'll see it too.

  13. I vote for a filesystem-based database by Dr.+Awktagon · · Score: 5, Insightful

    Something like Maildir .. if the FS is slow and can't handle that kind of application, then we need to improve our filesystems!

    Lots of applications need lightweight databases with indexes, locking, and atomic operations. Why not bake this into the filesystem, and it won't have to be just for email, it will have many uses.

    I was thinking about this the other day as I was working on a logging system for a large in-house email filtering system.. similar problem, except instead of storing emails, I'm storing small XML fragments describing the structure of each email and what was done to each. So far the easiest solution was large monolithic XML files, and an external index pointing in the large file (i.e., like mbox + a DB index). As it grows we'll probably have to move it to a "real" database.

    There is a need for something like sleepycat DB + ReiserFS on steriods..

    1. Re:I vote for a filesystem-based database by Antos700 · · Score: 1

      Be careful what you wish for. According to The Register Microsoft is thinking the same thing. In theory it's a great idea, but in reality... I get the feeling that SQL is suddenly getting some new non-standard extentions coming on.

    2. Re:I vote for a filesystem-based database by Anonymous Coward · · Score: 0

      You are basically describing the mainframe file systems like RDB built right into VMS and DB2 structures on S390. Its works very well but it conflicts with the basic structure of UNIX. In UNIX everything is a file and a file is a stream of ASCII or binary. Thus you have all kinds of tools built right into the OS and most programs consist of simple scipts which link these tools together with command line arguments.

      Once you move over to a database structure the actual internals of the file are known only to the OS. Applications interface files indirectly through functions provided by the DBMS.

    3. Re:I vote for a filesystem-based database by ObviousGuy · · Score: 1

      Standards! We need to stick to Standards!

      But we offer so much more functionality that the Standard doesn't provide for or does but only with serious work.

      Functionality must come second to Standards!

      --
      I have been pwned because my /. password was too easy to guess.
    4. Re:I vote for a filesystem-based database by Anonymous Coward · · Score: 0

      Recent versions of ReiserFS already do this.

    5. Re:I vote for a filesystem-based database by GoRK · · Score: 3, Interesting

      This, as several other threads note, is the approach that Hans Reiser is taking with his filesystem. That is, if the filesystem is not good enough for storing our (large-grained) data, that we are resorting to what basically amounts to indexed archive files or databases full of BLOB objects to store our data, then our filesystems are broken. A directory with 1,000,000 files in it shouldn't take any longer to return a sorted directory listing than one with say, 10 files - because it all should be indexed behind the scenes. Same for the problems of inode starvation, fsck, etc. A program such as mail clients and servers (for most people anyway) -- or any other apps that need simple storage should use the filesystem as the storage mechanism.

    6. Re:I vote for a filesystem-based database by Slashamatic · · Score: 2
      RDB is not built into any version of VMS. It is layered on top of VMS (these days, it isn't even from the same people). RDB layers its storage containers on top of a standard VMS file system. However the standard VMS filesystem is rather a lot more powerful than many with integral ISAM support and the possibility of recovery-unit journalling (you pay extra to turn it on, but it comes as part of RMS).

      There were systems with DB filesystems, but that was stuff like MUMPS or Pick.

    7. Re:I vote for a filesystem-based database by erc · · Score: 1

      "The thing I like most about standards is that there are so many to choose from."

      --
      -- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
    8. Re:I vote for a filesystem-based database by ahde · · Score: 2

      The trick to this is to make it compatible with other file systems. And that means deciding what metadata gets tossed when you convert to raw text. A file system should hold data. The more data you include in the filesystem itself, the more data you will lose in conversion. And the harder it will be to maintain and restore data. Because no file system is permanent. Think of MICRO~1.

  14. Want to save space? by ObviousGuy · · Score: 0, Insightful

    Automatically deliver mail to recipients who can then save the mail on their own machines. It's like distributed processing except it's distributed storage.

    If someone isn't logged on to receive their mail (like those saps who turn their machines off every night), then forward the mail to /dev/null

    --
    I have been pwned because my /. password was too easy to guess.
    1. Re:Want to save space? by Jarvo · · Score: 1

      I hope that your comment was intended to be moderated as "funny".

      Firstly, I am not one of "those saps" that logs off each night, but what if my PC goes down (power/dialup/network failure) ?

      In your scenario, many people will shy away from email because delivery isn't guaranteed. Admittedly, email delivery isn't guaranteed if a domain's mail server goes down, but at least the system has provisions for retrying when the server is back up.

      If you want a system to be accepted in the real world, you have to cater for many tastes - even ones that are abhorrent to our palate.

    2. Re:Want to save space? by ObviousGuy · · Score: 1

      I hope that your comment was intended to be moderated as "funny".

      Not at all. While the 'saps' comment was a deferent stab at humor, the rest of the post is intended as insightful, if not informative and interesting commentary.

      The fact of the matter is that mail delivery is not a guaranteed product in any way. That engineers attempt to make it so with regards to email is laughable.

      The overhead involved in keeping mail on servers, even temporarily, is so great that an entire industry has popped up to solve the problem. Here in this discussion we are trying to nail down a plausible way to serve mail without loss. I'm here to tell you that with the onslaught of spam and large file attachments mail-storing servers will become a thing of the past, a relic of a time long forgotten and happily buried.

      We have the means of transmitting files, documents, and text without the need for any email server. It is called Instant Messaging. The only drawback is that IM hasn't really been focused on group-based message sending. Once such a system is in place, email will become a thing of the past.

      In fact, it is debatable whether email is necessary at all except in the case of multiple recipients. Normal chatter: IM. Collaborative discussion: Virtual meeting software. Files: IM. There simply isn't any space that email picks up any slack.

      If the original question poser wants to know how to reduce email storage overhead, why not go whole hog and simply delete any email that can't be delivered right away.

      --
      I have been pwned because my /. password was too easy to guess.
    3. Re:Want to save space? by Anonymous Coward · · Score: 0

      You're still posting at 1? And getting modded to insightful?

      You're not even that good of a troll, although you're amusing enough. Whatever it is, I wish I had some of that mod-crack about now, 'cos I'm almost bored enough to start trolling myself.

      Waitaminute, no I'm not. You had me going for a minute there, though.

    4. Re:Want to save space? by ObviousGuy · · Score: 2

      Son, I'm posting at 2.

      --
      I have been pwned because my /. password was too easy to guess.
    5. Re:Want to save space? by GravySkin · · Score: 1

      What about users who log into different machines throughtout the day and would like to get their email? Or what if the reliable, reliable client computer goes tits up?

      KEEP THE SHIT ON THE SERVER!!!!!!!

      --
      "never met a Microsoft zealot"
    6. Re:Want to save space? by Anonymous Coward · · Score: 0

      That's funny! Why didn't I think of that?

  15. The Reiser guys have some ideas. by SwellJoe · · Score: 5, Informative

    I've followed ReiserFS development for years now, shipping our first servers with it some two years ago (and every box we've shipped since then), and I believe they have the best long-term plan for this kind of thing. Hans has written some excellent white-papers on making small files extremely cheap.

    The eventual goal of Reiser is a filesystem that is indistinguishable from a powerful database (if a special purpose database). The plan is to make small files so cheap that every extension of a file, directory, etc. is just another file. Another interesting turn is that files would no longer be, necessarily, of the form '/big/long/path/to/some/file'...because the filesystem is a database, one could also access it by a category, so that one file read pulls in all of the data of that category (from any number of files). Directories become just one view of the data available, with any number of other views possible depending on the application.

    As was mentioned in the parent, this would lead to things like 250 email recipients and only one actual file. But of course, this leaves out the copy-on-write functionality needed to make this seamless.

    So I think the solution is probably to fix the filesystem--not to fix the email storage mechanism. A number of very smart people have 'fixed' email storage in the past, leading to all of the options we have today, none of which works extremely well on really large mailboxes. Yes, many are good enough, and many work fabulously for small to mid-sized applications. But the day will come when they do not work so well, due to the higher volume and growing average size of emails.

    A good place to start for information about these ideas (which are primarily a consolidation of the most interesting research in the field of filesystems and databases):

    http://www.namesys.com/whitepaper.html

    ReiserFS is good stuff. Give Hans' papers a read sometime.

    BTW-Don't gripe at me about ReiserFS instability, etc. I know better. As I mentioned I've been shipping servers with it for 2 years, and we've never had a single ReiserFS-caused corruption. Not one.

    1. Re:The Reiser guys have some ideas. by Ignominious+Cow+Herd · · Score: 0

      Sure, but don't you then want to get all that lovely Metadata out of the email headers, and break-out the attachments, and 'hook' all that up again in the "database"? It still means you need a new format for storing the mail. Whether it is a traditional database, or a database-cum-filesystem is the next step (or a parallel one).

      --
      Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.
    2. Re:The Reiser guys have some ideas. by SectoidRandom · · Score: 1

      Reiser are not the only ones working on this, the guys over at M$ are hard at it, basically it will be a part of .NET in the future, once MSSQL is embeded into every copy of Windows, NTFS.NET (whatever) will be just like you say on every windows boxen out there...

      I think it's a great idea, especially that the Reiser people are doing it too, it really sucks watching Linux play catch-up..

    3. Re:The Reiser guys have some ideas. by krogoth · · Score: 1

      Let me guess - the servers never shut down uncleanly? The only problems I've had were on unclean shutdown (one messed up file might be understandable. KDE nearly unusable because of corrupted settings (files|directories) is another thing), and if that wasn't a problem I wouldn't be likely to use a journaling filesystem.

      I have also heard from someone who does Linux consulting who won't use ReiserFS. Overall, I don't call it stable.

      --

      They that quote Benjamin Franklin on liberty and safety deserve neither.
    4. Re:The Reiser guys have some ideas. by Anonymous Coward · · Score: 0

      Don't go bandying about that 'database' term when it comes to fileystems or the Postgres guys will pop a cap in yo ass.

    5. Re:The Reiser guys have some ideas. by SwellJoe · · Score: 5, Insightful
      I have also heard from someone who does Linux consulting who won't use ReiserFS. Overall, I don't call it stable.


      Heheh...I read a funny quote here on slashdot earlier today that I think applies:

      The plural of anecdote is not data.


      I've heard from a lot of people who consider themselves experts that ReiserFS is not stable, never has been, never will be, all that fun stuff. But I know better, because I have data. Hard numbers...I know I can run a Squid box harder and at higher loads for longer on ReiserFS than ext2 or ext3. I know that I can run a Squid machine for 2 years with ReiserFS cache partitions with uptimes over a year, with the reboot after all that time being for a kernel upgrade.

      Yes, there have been data corruption issues for some people for ReiserFS. But I'm on the ext3 and jfs mailing lists as well...I know they have data corruptions of their own. It's a fact of life when dealing with computers, things go wrong for everyone at some point. I simply don't believe the masses when they tell me ReiserFS is not suitable for production use, because I have more machines to administer than the vast majority of slashdotters, and I believe I can trust ReiserFS. I trust my opinion above most.

    6. Re:The Reiser guys have some ideas. by leei · · Score: 1

      This is a great idea, but I think it needs to go further. I'm involved with a research project that is building a distributed filesystem and object database called NODAL. It is described in a white paper at http://nodal.sf.net.

      It is focused on the general problem of knowledge and structured information sharing between collaborators, but will definitely be useful for personal information organization as well.

    7. Re:The Reiser guys have some ideas. by Anonymous Coward · · Score: 0

      Thanks for your anecdote, ahem, opinion.

    8. Re:The Reiser guys have some ideas. by SwellJoe · · Score: 1

      You're right, it is an anecdote. And I don't expect anyone to take me at my word and take up ReiserFS today. I was merely saying that I have my own data, I don't need anecdotes to tell me what filesystem to use.

      Gather your data, choose your filesystem. Don't listen to me, except perhaps to get the advice: Gather your data for yourself. Opinions are often meaningless.

    9. Re:The Reiser guys have some ideas. by Ian+Bicking · · Score: 2
      I've been studying WebDAV, and have been excited about how it presents a network storage that seems much more general than a typical filesystem-based metaphor -- kind of making the dynamicism of web applications available at a lower level to the OS.

      What is intriguing here, is that the level of granularity that you talk about with ReiserFS would map well with WebDAV. Running mod_dav for Apache is fine, but unexciting -- the underlying filesystem storage that Apache is so closely tied to is awkward and lacks good granularity and flexibility. But with a more powerful filesystem, it could go much further.

      In a lot of monolithic client-server architectures, I see systems created where the OS is insignificant -- just a dumb layer of hardware compatibility. You dump all the data in one file, with internal structure you define yourself. You almost always do your own permission structure -- traditional Unix permissions are worthless in most new domains (IMHO). All you need is a socket interface and a disk interface, everything else you write yourself.

      This is a shame, really, because you are reimplementing things the OS should be doing... but OS design is stagnant. Maybe that's fine, but I don't even see much ambition among Linux kernel programmers (or BSD or other Unices)... they're working off an old model that is fine at what it does, but not helpful for new systems. They don't seem to mind that they are being made more and more insignificant... maybe that's good, they aren't holding onto power or being territorial, but it really is true that there isn't much innovation there. (There is innovation in non-kernel applications, mind you, just not much in the kernel or most basic libraries like libc)

      It's nice to hear ReiserFS people are thinking about real progress, not just little tweaks.

    10. Re:The Reiser guys have some ideas. by Gekke+Eekhoorn · · Score: 1
      Just to make you happy, the Hurd people have been doing what you describe for quite some time now.

      In the Hurd kernel, you basically have a microkernel and lots and lots of server programs, which implement (in this case) a Unix system. Everything is very tweakable, and you can run servers as an ordinary user, so that to programs you run against them, they are part of the kernel. And this happens independent from the other users.

      Also, you have these servers that you can attach to inodes, so that you can have virtual filesystems in the kernel, at user level. So all the things that KDE and so on do, like ftp:// and gzip://, can be just a server attached to an inode.

      Well, that is the theory at least :). I never tried it, although I'm sure I will in the near future. Check it out at the Hurd homepage

    11. Re:The Reiser guys have some ideas. by Anonymous Coward · · Score: 0

      I know of several (half a dozen) ReiserFS users who have had there filesystems corrupted after unclean shutdowns.

      If your servers are well kept and don't experience unclean shutdowns then that explains why you haven't had any problems. The guy who complained about a problem was using KDE and this may well have been a desktop system that is prone to unclean shutdowns.

      The irony is that in respect to this issue desktop systems place greater demands on the filesystem than servers.

    12. Re:The Reiser guys have some ideas. by dinotrac · · Score: 2

      The problem seems to be that everybody's mileage may vary, and the thing you just had a problem with sucks worse than an overclocked Hoover until the thing you replaced it with has a problem.

      Personally, I've used Reiser for the last couple of years thanks to cats, small children and a less than 100% reliable power supply. I had suffered corruption with ext2 that made my life Hell. That doesn't happen any more.

      Whether it's good for anybody else in this world, Reiser is great for me.

    13. Re:The Reiser guys have some ideas. by Uruk · · Score: 2

      The eventual goal of Reiser is a filesystem that is indistinguishable from a powerful database (if a special purpose database)

      Why? Why do we need the all-singing, all-dancing filesystem when we've already got database pacakages that are mature and effective?

      A filesystem should be a filesystem. You don't see mail applications trying to add features to remotely configure the server they're sending mail to - that's because they stick to what they need to do, mail.

      UNIX - do one thing, and do it well. Leave database functionality to the packages that already do it well and have been for more than 10 years.

      --
      -- Truth goes out the door when rumor comes innuendo. -- Groucho Marx
    14. Re:The Reiser guys have some ideas. by pacman+on+prozac · · Score: 1

      I run several reiser systems both desktops and the nameservers at work. All of them have been shutdown uncleanly without any problems at all. I never bother shutting my dekstop down properly any more as it's never had any problems at all. Why wait for it when you can just power straight off? (yes I do backup my work:) )

      With ext2 I've had all sortsa problems, the worst being when fsck "fixed" my disk by putting the old ppp chatscript with the per-minute charged ISP phone numbers back on. That's my anecdote and it cost me a fortune. Long live reiser!

    15. Re:The Reiser guys have some ideas. by Anonymous Coward · · Score: 0

      I got burnt by ReiserFS + NFS issues some time ago, but the ReiserFS team are doing an awesome job getting these issues resolved, and stability really no longer is an issue.

      Now that data journalling is on it's way, ReiserFS 3 is going to be faster and more reliable (data journalling is why I am with ext3fs now).

      I'm confident that Namesys are preparing to send us some good stuff named ReiserFS 4 pretty soon. Looking forward to it.

    16. Re:The Reiser guys have some ideas. by erc · · Score: 1

      Yup, it sounds good in theory and all that, but the fact is, we've been waiting on Hurd for 10+ YEARS. Hurd was being worked on before Linux was even an idea that Linus Torvalds had while quaffing beer and eating pizza while in college. In all that time that RMS and friends have been working on, to them, was the "perfect OS", Linux and the BSD variants have come and taken over.

      So much for Hurd.

      --
      -- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
    17. Re:The Reiser guys have some ideas. by kentborg · · Score: 1

      In other words, if it ain't broke don't fix it?

      Certainly if you have an imediate problem that Unix and Postgresql address nicely, don't go getting burned with something too innovative. And don't necessarily computerize your card file either. (I miss library card files.)

      Conversely, just because "more than 10 years" solves one person's problems doesn't mean there is no room for innovation.

      Once upon a time (probably well over 10 years ago) I was involved in (or lt least watching) a Usenet discussion over file name length, you know the one: "My OS is better than yours, nyah, nyah!" One person, in exasperation, asked: "What do you want to do, put the whole file in the name?!"

      At the point the light bulb went on for me and I said "Yes, maybe sometimes."

      Files systems are for persistent storage of data. When that data is in active use it is seldomly in an unadorned linear order, yet when we store anything we are expected to flatten it to a linear stream and give it a single short name. Why? Tradition.

      I suggest it would be good for OSs to offer higher level data storage services.

      XML has come up in this thread. In some sense XML exists because we seem to be forever stuck with flat files but our data still insists on having richer structure. XML is a way to force multidimensional rich data into a flat space.

      Why does our data misbehave so? Why can't it just be nice and flat like our files are? Damn data.

      -kb, the Kent who liked the Macintosh's original resource manager.

    18. Re:The Reiser guys have some ideas. by phee · · Score: 2

      Ok. Real-life, hard data?

      I ran Reiser over a year on an 80-gig partition. It started out just fine; speedy, recovered instantly upon reboots, etc. But as time went on, it got slower... and slower... and slooooower... until I got so fed up with it I got another 80-gig drive just so I could get rid of Reiser. It's all on ext3 now and literally five to ten times faster. It took a good 10 seconds just to start xv on Reiser (my machine is a 1.2-GHz, ATA100, 320M of RAM, linux 2.4.18); on ext3, it takes half a second. Netscape went from a minute to load to a mere five seconds. Adding indices to a Postgres database to speed up searches made it SLOWER because of the extra disk access. I was lucky to get 2 meg/second data transfer rates on file copies with Reiser; with ext3, I get 15 meg/second. Sustained. And bear in mind; these comparisons are all done on the same hardware, same kernel version. And no, since I can sense you all about to ask this, I didn't have the Reiser Debugging enabled in the kernel.

      Stay away from Reiser unless you only make 50-meg partitions. Trust me. Sure, it's "stable" and doesn't corrupt data, but ext3 combines the best of ext2 and Reiser... and doesn't bring your machine to its knees.

      --

    19. Re:The Reiser guys have some ideas. by Anonymous Coward · · Score: 0

      Kinda like how I can do any of the following to get to stuff on my Exchange 2K box?

      \\server\mailshare\username\inbox\message
      http: \\server\exchange\username\inbox\message

      Kinda cool to see the contents of my exchange database available as either a file share or a web page natively

    20. Re:The Reiser guys have some ideas. by ahde · · Score: 2

      the reason files are flat is because that is how they are stored. There is this little magnetic disk that is just a long circular line of ones and zeros. In order to read or write anything from that disk, it has to be done in sequential order (it could be parallelized, but so far that isn't practical).

      So, your rich data has to be flattened out. Whether you like it or not. And at some point, whether it's the filesystem, the hardware drivers, or a chip with hardwired instructions and a cache (essentially re-implementing a large part of the os), eventually, someone is going to have to deal with that line of ones and zeros.

      And that's what a filesystem does. You can obscure it, you can use links and indexes and graphical icons and properties files and block level meta-data, and hide it all from the user, but you can't take away the functionality by covering it with more layers of abstraction.

      Abstraction is good. But not at the cost of functionality. I'd argue filesystems are too complex already. Look at all the tools needed already for dealing with inodes and blocks and timestanmps and so forth. That's why somes databases want to chuck it all out and start from scratch and work with the raw data.

      Of course, some things, like binary tree search and journalling suffer when abstracted too far, so they're built into the filesystem. But stuffing the filesystem full of metadata isn't a compromise at all. The only thing that saves is a few file descriptors. And reieser's answer is that if files are too expensive, make them cheaper. Why not store that meta data in a separate file (or several) so that you know exactly where it is, without having to parse the every file-header for every access.

    21. Re:The Reiser guys have some ideas. by Zygo · · Score: 1

      I've run ReiserFS against ext3 head-to-head (redundant servers, mirrored data, load-balanced services, hardware as identical as it gets) since 2.4.10 or so. For 8 years prior to that, I've run ext2 on hundreds of Linux boxes. Crashes, corruption, software bugs, memory defects, bad sectors, overheating RAM, viruses...I or my clients have seen it all.

      When it works(*), reiserfs and ext3 are nearly identical in terms of performance and reliability, except for some corner cases with data appended to files close to an unclean shutdown (ext3 discards uncommitted data, while reiserfs (and ext2 for that matter) puts garbage at the end of the file). Of course, if you have a directory structure that obviously favours ReiserFS (flat namespace or small files), then reiserfs may be faster; however, for typical Windoze/Linux file service, the advantage or disadvantage of ReiserFS is negligible.

      Now for the (*): Murphy works at your company. Things will always go wrong, be they due to human failure, hardware failure, or software failure. When ext3 goes wrong, you can apply e2fsck (a very fine although still somewhat imperfect automated recovery tool), debugfs (a very fine hand-reconstruct-your-filesystem-tool), some third-party data recovery tools (open-source and proprietary), and the filesystem structure is sufficiently straightforward that you can easily recover most lost data "by hand" if you can't get it back from any of those.

      Of course backups are part of any good data recovery strategy; however, it often comes down to a choice between spending 20 days restoring data from tape, 36 hours waiting for reiserfsck, 54 hours waiting for rebuild-from-scratch from a mirror server, or six hours recovering 99% of it with ext2 recovery tools. The ability to crawl through the filesystem looking for data is as important as any other data recovery strategy.

      ReiserFS has data recovery tools which--in their own documentation--say "these tools are only of beta quality." reiserfsck simply cannot recover from several common kinds of filesystem corruption. The data structure of a reiserfs tree is a balanced tree with very little predefined structure--unlike ext3, in reiserfs you can't just search all inodes to look for files that have disappeared from any directory. If a reiserfs interior tree node is lost, a few percent of your filesystem at random will become inaccessible--contrast with a damaged ext3 block, which at worst loses a few dozen files, or loses filenames but without affecting the contents of the files. A full-blown tree-reconstructing reiserfsck involves reading the entire filesystem volume (or at least the nominally occupied parts of it) more than once, and there is no guarantee it will be successful.

      Another important distinction between reiserfsck and e2fsck is that e2fsck will leave you with a usable filesystem, even if the data within is corrupted--that is, you'll be able to store new data on the filesystem safely. reiserfsck will not leave you with a usable filesystem in common cases--you'll either have to live with inaccessible storage, or rebuild the filesystem from scratch.

      My current recommendation to clients is to use reiserfs only for workloads where recoverability of data is not a requirement--e.g. filesystems for squid caches, or mirrored servers, where it is acceptable to respond to filesystem corruption by starting over from mkreiserfs. In other cases, where e2fsck after a reboot is unacceptable use ext3, otherwise use ext2. Other filesystems are too new or too slow to care about.

      Frankly, the only currently available, mature, robust filesystem with proper, well-supported tools on Linux is ext2--fsck's and all. Everything else is just a tool for discovering bugs the hard way.

      --
      -- I avoid spam by accepting only OpenPGP encrypted or signed email at this address. Clear-signed, RFC2015, heck, even
    22. Re:The Reiser guys have some ideas. by Anonymous Coward · · Score: 0

      > I run several reiser systems both desktops and the
      > nameservers at work. All of them have been shutdown
      > uncleanly without any problems at all.

      Then you've been lucky, read the reiser fq if you don't believe me, it states it's not safe to shutdown uncleanly.

  16. Parroting the masses... by Webratta · · Score: 1

    Slashdot never dissapoints. You wait a minute; and 10 people have already said what you want to say.

    I, like others, suggest a RDBMS to implement a secure and quick mail system. This way you get the benefits of administrative security, file locking already in place, performance, redundancy, and potential for easy management. This, of course, all hinges on how well your RDBMS handles those specific details. That might also lead to some cool server-side email apps, as well. A blinding-fast email search utility on a MySQL mail system, or a nice way to relate your user information to email statistics (shudder).

    --
    Beef! Beef! Beef!
  17. full text indexing by Anonymous Coward · · Score: 0

    I just want full text indexing. Don't care about nuthin' else.

  18. Portability by Pretzalzz · · Score: 2, Insightful

    The great advantage of the current system is that it is very easy to move your e-mail from one program or computer to another with little hassle and/or risk. With any type of database system, you introduce a level of complexity that virtually assures that only one e-mail program will be able to read your e-mail. I think the best solution as far as I am concerned is to just stick with current mbox format, but allowing attachments to be deleted independently though that is just personal preference. But I think we should be wary of adding any complexity that endangers the portability of mail. Also, the other thing to be said for the mbox format is that worst come to worse you can still access your e-mail with a text editor and/or grep.

    1. Re:Portability by Anonymous Coward · · Score: 1, Insightful

      If you use IMAP or some other protocol that allows you to push mail into folders, then portability isn't an issue. You just have to ensure the old server is up long enough for you to copy the messages from the old format to the new format via your mail client. Then, having a portable format isn't an issue. This is how I converted my mailboxes from Cyrus to MBX using UW IMAP as a server and Outlook Express as a client. Moral: you don't need a portable format, just a common mail protocol. So, exploit the format to give you as many features and as much robustness as possible.

    2. Re:Portability by Antos700 · · Score: 1

      Maybe a hybrid system then? Have a program that uses the normal interface for e-mail, then have that program archive old e-mails to the RDBMS system for easy historical queries. That way, you have the emails you actually want to use now in a quick and accessable format, and the older ones 'filed' in an orderly manner.

    3. Re:Portability by Jubal+Kessler · · Score: 1

      The storage method has nothing to do with the mail-user agent reading the mail. If the MUA speaks IMAP, and the server speaks IMAP, then the server just relates the IMAP command to the filesystem/database and presents the mail to the MUA.

      Conceptually, only the email protocol matters between the client and server.

    4. Re:Portability by stickb0y · · Score: 1

      The ability to delete attachments independently is something I've wanted for quite awhile.

      A lot of times I get (or send) email with attachments for which I want to keep the message but don't need the attachment. (This is especially true for sending: I already have the file in my file system; why do I want to make another copy in my mailbox, which usually isn't designed to handle files efficiently?)

    5. Re:Portability by pipacs · · Score: 1
      Conceptually, only the email protocol matters between the client and server.
      Please don't forget the original question was about storing mail within the MUA. In this context, using the filesystem and a standard format like mbox, together with some kind of indexing is a very reasonable solution.
  19. Linux Mail by Anonymous Coward · · Score: 0

    All I have to say is that Linux rules! The author's bias against Windows is not without merit.

    - g>>(o)atse

  20. Something to keep in mind... by cwinters · · Score: 5, Insightful

    /. punchingbag jwz has some strong opinions about using databases (etc.) for mail storage. I tend to agree: everything can read from and write to files, there no versioning issues, they can be easily transported among different operating and file systems, they can be backed up easily. But it's another wheel to reinvent, so everyone hop to it at once and then lose interest in two or three weeks!

    --

    Chris
    M-x auto-bs-mode

  21. Quantum-like Storage by cybermage · · Score: 3, Funny

    I've been joking for years about getting two shell accounts on opposite sides of the planet and setting each up with procmail to bounce all my mail between the two (always rewriting the header so as to avoid a loop.) I figure at any given time, my mail would be in both places and neither simultaneously.

    If I want to read some, I'd just chmod .procmailrc for a few seconds and change it back. Plenty of mail storage without chewing-up precious file-system quota.

    1. Re:Quantum-like Storage by Anonymous Coward · · Score: 0

      Sounds like a funny joke. You don't date much do you?

    2. Re:Quantum-like Storage by Anonymous Coward · · Score: 0

      what, have you never heard of bang paths?

    3. Re:Quantum-like Storage by tyoud1 · · Score: 1

      "The network _is_ the storage" (tm)

    4. Re:Quantum-like Storage by akh · · Score: 2, Interesting

      A similar idea has already been implemented. Some
      Canadian researchers used an existing 8000km fiber
      optic network as a storage device. Basicly, the network
      is configured as a loop and the
      data to be stored is simply sent onto the network.
      Packets of data are placed onto the network and can be
      pulled from it as they pass a node on the network.
      It's kind of like a cross between a token ring network
      and a mercury delay line. You can find a few more
      details from this link.

      --
      Accept Eris as your Fnord and personally sate her
    5. Re:Quantum-like Storage by jrothlis · · Score: 1, Informative

      Byte magazine had an article on this, at least 15 years ago (possibly more). It involved bouncing a laser off one of (apparently) many golf ball satellites orbiting the earth (tiny spheres covered in mirrors, designed for measuring continental drift) and encoding data in the laser. The roundtrip of the light beam allowed the "storage" of quite a few megabytes IIRC. The fact that I still remember this vividly so many years later is a testament of how cool the idea was back then. *Sigh* Those were the days.

    6. Re:Quantum-like Storage by red_dragon · · Score: 2

      The BOFH had already thought about it about four years earlier. It landed him an award, even.

      --
      In Soviet Russia, Jesus asks: "What Would You Do?"
  22. NS3 mail summary files by Anonymous Coward · · Score: 0

    Probably noone will ever see this since I'm posting AC but anyway...

    jwz has quite a case for mail summary files that were used in pre-NS4 mailreaders. See http://www.jwz.org/doc/mailsum.html

    The basic idea is to use the old (relatively space efficient, compatible with everything) mbox format but also keep a "summary file" to allow quick threading/seeking/etc within the file. Actually quite workable. Worth a read if you're going off and designing (what you think will be) a grand new mail storage scheme. Don't repeat the same mistakes netscape made with NS4!

  23. Eudora by cpaluc · · Score: 1, Interesting
    I'm quite happy with Eudora's mail storage technique. The messages are stored in a format much like mbox except that the attachments are stripped out and dumped in a user-specified directory. This leaves text-only mailboxes that are reasonably small in size. They can be searched easily/quickly and they can be compressed even smaller for storage/backup. I really don't see the point of retaining attachments within the mbox file - apart from the inefficiency, they're not accessible from the shell/OS (eg. you can't grep your attachments unless you manually export them).

    This is one feature i miss in Linux mail clients. At one stage i wrote a perl filter to achieve this functionality with Kmail.

    1. Re:Eudora by kentborg · · Score: 1

      Don't remind me how angry I am with Eudora. I have lost mail in Eudora because it couldn't keep its bits straight.

      Maybe Eudora is trustworthy now, maybe the search can even be made to work. (What a confusing user interface it had.) But that doesn't mean I am not pissed.

      -kb

    2. Re:Eudora by Anonymous Coward · · Score: 0
      I use Eudora more than anything else.

      Being a hacker, I like to get access to my mail in my own scripts/programs. My biggest gripe about Eudora, is that it will strip out the text/plain part of a message if it also contains text/html, leaving only the html. Other gripes: the mime type is never changed, so after it strips out the text/plain part or an attachment, the mime type remains multipart/alternative or multipart/mixed, which then violates the MIME standard. Finally, message attributes are stored in the .toc file. If the .toc file every gets corrupted, the program will rebuild it as best it can, but attributes such as 'read', 'answered', 'forwarded', etc are lost.

  24. Eudora mbox by 1u3hr · · Score: 4, Informative
    Eudora (Win and Mac)handles encoded attachments by decoding them and storing them in an attachments folder, replacing the encoded text in the mesage with a line like

    Attachment Converted: "C:\EUDORA\ATTACH\NEW YORK.pps"

    Click on that in Eudora and the attachment opens.


    This keeps the actual text in the mbox file lean. I've got almost a decade of correspondence that totals about 20 MB, if it included all the attachments it'd be much more.

    Also it allows you to edit messages after receipt, (this might trouble some people, but it just simplifies what I used to do by opening the mbx file in a text editor). I can select all the text, then paste it back in. This has the effect of removing all the HTML coding that is especially crufty from Word generated mail -- a 20k message reduces to 1k.

    1. Re:Eudora mbox by Ziviyr · · Score: 2

      Eudora (Win and Mac)handles encoded attachments by decoding them and storing them in an attachments folder, replacing the encoded text in the mesage with a line like

      Attachment Converted: "C:\EUDORA\ATTACH\NEW YORK.pps"

      Click on that in Eudora and the attachment opens.


      For the sake of portability I turn features like that off, that and it makes it harder for me to loose my attachments and manual maintenance easier.

      --

      Someone set us up the bomb, so shine we are!
    2. Re:Eudora mbox by 1u3hr · · Score: 1
      For the sake of portability I turn features like that off, that and it makes it harder for me to loose my attachments and manual maintenance easier.

      I don't think you can turn it off, at least in the version I use. Most of the attachments are viruses, Word docs, or "funny" photos. I want to lose most of these if not the message they came with.

      Sometimes I open an attached Word doc with Quickview (to avoid any macro virus), select and copy the text, paste it into the original text message, then I can delete the doc and have the text neatly filed as plain text (since that's sufficient for the kind of correspondence I do). Wish that could be automated. Otherwise, I'm exchanging large pdf or doc files while working on publishing projects. I move these to a project-specific folder.

      And really, the huge bloat of attachments in mbx files would be a problem for me if they were kept encoded.In just using up 20 times as much space, and in making searching slow or impossible.

    3. Re:Eudora mbox by martin · · Score: 2

      Have you tried moving this from Mac to PC to Linux - won't work without messing with the files with Perl or something.

      I've done this with Netscape (4.7x and 6/7) and moved all the files etc easily from platform to platform with no problems...

    4. Re:Eudora mbox by wadetemp · · Score: 2

      It sounds cleaner, but it also sounds like it's easier for a virus or worm to start a raging party... say by looking in your c:\eudora\attach directory and running all files that end with .exe or .vbs.

    5. Re:Eudora mbox by Krieger · · Score: 2

      I am actually trying to figure out how to reverse this process and re-uuencode all of my emails so that I can port them to a different system and not lose the attachments. Have you run into anything that can do this?

    6. Re:Eudora mbox by Ziviyr · · Score: 1

      Nope, this is why I turned that fancy feature off in the first place. Spare myself THAT problem.

      --

      Someone set us up the bomb, so shine we are!
  25. Compression. by TellarHK · · Score: 2

    The next generation of mail storage should definitely work on taking optimal advantage of compression technologies. Preferably in a way that compresses the data from end to end, not just in the recieving mailbox. As to managing the kind of data sent, I'd suggest using a twofold approach. Save binary attachments in the natural state in a subfolder linked to the message itself, which would be kept in a compressed database format.

    As to the database format itself, I'd like to see a form of redundancy in the structure of it. Give the design some self-healing ability in case flaws develop as the information gets shuffled around. Media isn't perfect, but mail stability should try and be as good as it can get.

    If you want to speed searches, index the data in a seperate file and use that. Just keep the actual data storage as simple and reliable as possible, anything like searching or sorting is just a bonus.

  26. Well.... by BJH · · Score: 2, Interesting

    One file per Mailbox-folder, allowing multiple folders per user. Should those files reside in one central location or in users Homedirs?

    Depends on how the user accesses their mail. If they read their mail only on the local machine, it should be in their home dir. If the server allows multiple forms of access (like local + IMAP), central storage makes sense. There's a lot of other issues here, like backup methods.

    Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?

    No. Separating a single mail into its component parts is just asking for trouble (not to mention that it massively increases your locking problems).

    File format: gdbm, Sleepycat db? Something new?

    Personally, I like Maildir, since it lets me use standard tools like grep to find particular mails. I admit that a more efficient method is probably required these days.

    Should the security model allow users to directly access their files, grep them, copy them around?

    Yes, of course. It's their mail - let them do what they want with it. The mail app must be able to deal with that.

    Shared folders, virtual domains?

    Shared folders would be nice - IMAP can do that now, although it's overengineered and not necessarily fully implemented in any particular IMAP server. Virtual domains I've never had any use for myself...

    Unicode support in folder names?

    Why not?

    Imap message-IDs, flags, useragent specific state-information?

    As you say, IMAP does that already...

    File-locking (NFS)?

    More the fault of NFS than the mail software (and I believe NFS4 handles locking better).

  27. About 1.4 seconds? by ryochiji · · Score: 1

    >just try to open a Maildir with 1000+ mails and see how long it
    >takes your favorite Mail program to only display the subjects.


    A mailbox with over 1400 messages, using Courier-IMAP, viewing through my webmail interface (see shameless plug below), it takes about 1.4 seconds to sort all messages by size and display the subject, sender, date and size of the first 20 messages.

    Am I missing something?

    1. Re:About 1.4 seconds? by Anonymous Coward · · Score: 3, Informative

      i concur. there's nothing wrong with maildir or the linux filesystem, at least for me. my mailbox has about 3000 messages, and it opens pretty much instantly, using Maildir, Courier-IMAP and EXT2, from a server running a 700mhz Athlon and 7200 RM IDE disks.

      the author's comments about Maildir make it sound like they've been using it and having problems. perhaps the problem is with their imap daemon? or their client? or their hardware? if running out of inodes or space for small files is such a problem, why not use ReiserFS? reformatting your filesystem is probably a lot quicker than inventing another new UNIX mailbox standard and getting people to support it.

      i use the OS X mail client, and it indexes my messages in the background as they arrive, so i can do instantaneous (i mean in-stan-tan-e-ous!) searches through my 3000 message mailbox by subject, to, from, or the entire message text. i can't imagine how this could work much better.

      your experience is clearly different; but i think there are other factors you should consider before blaming the mailbox format.

    2. Re:About 1.4 seconds? by CaraCalla · · Score: 1
      Either Courier-IMAP, or your super-dup webmail-client cache the subjects. You don't open and parse 1400 files in 1.4 seconds unless you use either:
      • ReiserFS
      • some massive, parallel storage-array

      Well, the performance-problem can be overcome, but that still leaves the diskspace-bloat-problem.

      Caraclla

  28. SQL by Mowog · · Score: 1

    One of the things that MS Exchange does well is its storage of messages. It uses a database for the private store (i.e. mailboxes).. the only problem is that it's in a format not unlike MS Access.
    A while ago I went looking for a Linux MTA/IMAP server which supported MySQL message-storage. The closest match was Courier; it allows authentication and mailbox-location by MySQL, but not message-storage.. and there was a pretty hostile response to the suggestion that it be added.
    Personally, I'd love to see a Linux MTA/IMAP system which uses an SQL message-store. The ability to replicate a message-store across multiple physical sites without having to get into distributed filesystems like Coda would be a huge benefit for those who need to provide a redundant mail service.

    1. Re:SQL by mlk · · Score: 1

      There is a Java email server witch uses a JDBC to store the emails.
      It was pritty funky, google Java EMail Server...

      --
      Wow, I should not post when knackered.
    2. Re:SQL by Mule666 · · Score: 1

      Just for the record, Exchange 2000 has shifted to using the file system for storing individual messages/contacts/tasks etc, where each user has a folder, and then an Inbox/Contacts/Tasks etc subfolder under that.

      I have yet to work with Exchange 2000 in a large site, so I can't comment on how well it works though.

    3. Re:SQL by JayJayEm · · Score: 1

      I've written a PHP based web client that uses a MySQL backend DB to store all the messages. Useful for all those who have a machine at home running 24/7 on DSL and can therefore pick up _all_ your mail from wherever you can find an internet connection. It currently imports from POP, I would like to write a sendmail plugin that allows sendmail to deliver direct to the database. I have written a PHP/Xinetd based POP3 daemon for it if you need to get your mail to another client.

      It also does contacts, tasks, filters, multiple pop accounts, multiple smtp identities, stripping of html attachments, mime, etc etc

      The authentication relies totally on MySQL, you are prompted (HTTP Auth) for a specially constructed username which consists of a database host, database name and mysql username. This also allows multiple users on the same system; just create them their own mysql account and db.

      Download at http://jayjayem.d2g.com:3353/timd/ (latest version ALPHA6.1a).

    4. Re:SQL by thona · · Score: 0

      Yes, right - seems stupidity is required as a Mule, or?

      Exchange 2000 still has it's own storage mechanism. What is does is exposing your mailbox folder hierarchy (yes, it can have folders in mailboxes) through a simulated drive (M:). It does not store anything in M - M is not a physical disc. It is a file system build on top of the MS internal propietary storage.

      If you guis would just start reading at least technical documentation before posting.

    5. Re:SQL by Anonymous Coward · · Score: 0

      " One of the things that MS Exchange does well is its storage of messages"

      I'm curious how much experience with Exchange you've got. Sure it's a nice idea, but in practice the Giant Mailstore has had the historical tendancy to get corrupted, require that you down the server for DB maintenance, and made recovery very difficult. Furthermore, there's that embarassingly small 16GB size limit that MS charges you an arm-and-leg to remove.

      Maybe with the latest and greatest versions, these problemes have been mitigated, but they were a pain in the ass for years and years.

      Love it or hate it, Lotus had the right idea. Mail is stored in a proprietary database, but each user has their own database file. Therefore repair/recovery can take place by only taking one user offline and not the whole system.

    6. Re:SQL by erc · · Score: 1

      And if you try to display more than a few messages, PHP is horribly slow. No, thanks.

      --
      -- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
  29. reiserfs+Maildir by {X-Frog} · · Score: 1

    what's about reiserfs + Maildir + imap4?

    work very well, quite fast, easy to install and happy users..
    what to ask more? :)

    (and NO Outlook imap client PLEASE!)

    1. Re:reiserfs+Maildir by Lennie · · Score: 1

      see this then:
      http://www.jedi.claranet.fr/reisersmtp.html

      Also they are/have working, well, it atleast mentions sendfile too.

      --
      New things are always on the horizon
  30. one file per message by g4dget · · Score: 3, Insightful
    One file per mail message is the right thing to do. That lets you use standard UNIX tools for manipulating mail and it gives you convenient locking semantics. And the hierarchical UNIX file system structure, together with links, matches mail semantics nearly perfectly.

    Of course, with traditional UNIX file systems, this is a bit slow. The thing to do is to fix the file system, not to kludge ever more complex mail formats on top of it. ReiserFS goes much of the way; we now also need some system calls to open and read multiple files with a single call.

    Until file systems catch up, one kludge is as good as another. UNIX mbox format is at least simple, so I stick with that.

    1. Re:one file per message by Bert64 · · Score: 1

      Not surprising, considering email was designed for unix :)

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    2. Re:one file per message by Lennie · · Score: 2, Informative

      That's what maildir is.

      --
      New things are always on the horizon
    3. Re:one file per message by spitzak · · Score: 2

      Absolutely! A lot of the libraries and kludges being written for both Unix and Windows are to implement hierarchies of data in files, because individual files are too slow or have too much overhead. This needs to be fixed and we would be much better off of the effort going into designing the next mail went into designing a filesystem that allowed the "obvious" way to store mail to work.

  31. Has anyone ever tried the XML approach? by jaaron · · Score: 1

    Okay, so XML is still quite the buzzword, but that aside, it might be a nice format for storing mail. XML is very cross-platform and it would be easy to write a number of front ends that would access the XML files, which can be organized like a database. Plus, with proper parsing, you could just link to all your attachments, storing them elsewhere and allowing you to compress the XML which is all text. Has anyone every done something like this?

    --
    Who said Freedom was Fair?
    1. Re:Has anyone ever tried the XML approach? by Anonymous Coward · · Score: 0

      The problem with XML is that applications will spend a lot of time converting the < symbols to <. With so much HTML mail being sent, there will be a lot of I just don't think XML is a good format to wrap a lot of really bad HTML.

    2. Re:Has anyone ever tried the XML approach? by Anonymous Coward · · Score: 0

      That's what CDATA marked sections are for; all you have to escape then is "]]>".

  32. We need an XML standard to move mail around by astrashe · · Score: 2

    People have been arguing about the balance between standard formats that are easy to parse and move between systems and complex formats that make searching easier.

    What we need is a standard DTD or schema for mail data that all well written email systems can understand. If everything can import and export XML representations of email, the internals aren't so important.

    1. Re:We need an XML standard to move mail around by Rudd-O · · Score: 2, Insightful

      There is a standard to move E-mail around. It's called RFC 2822.

      --
      Rudd-O - http://rudd-o.com/
    2. Re:We need an XML standard to move mail around by NotZed · · Score: 3, Informative

      No we definetly do not need another standard to move mail around.

      MIME *is* a transport. MIME *IS* easy to decode. MIME *must* be supported by any email client already.

      MIME *is* the solution, it already exists, it supports everything you need (multiple binary attachments, multilingual headers), and it *works*.

      XML is *not* a good idea.

      --
      _ // `Thinking is an exercise to which all too few brains
      \\/ are accustomed' - First Lensman
    3. Re:We need an XML standard to move mail around by a_n_d_e_r_s · · Score: 1

      Sorry, but MIME is an ugly monster that should be shot.

      Pure text messages is the only portable message.
      Its the way to transfer email.

      You can even semd binary files by uuencoding them and inluding them in the message.

      That is the only thing that always works fine.

      --
      Just saying it like it are.
    4. Re:We need an XML standard to move mail around by Anonymous Coward · · Score: 0

      I fail to see how you can POSSIBLY think that uuencoded text is better than MIME.

      You obviously don't have a clue as to what you are talking about.

    5. Re:We need an XML standard to move mail around by a_n_d_e_r_s · · Score: 1

      Well, thats your problem.

      Since I prefer to use a non-MIME complieant email reader I personally see how most of these MIME encoded messages looks like. So I can say that I know a lot about MIME since I basically read it fluently - well except for the HEX encoded parts.

      --
      Just saying it like it are.
    6. Re:We need an XML standard to move mail around by rpeppe · · Score: 2
      MIME *IS* easy to decode.

      ha ha ha ha!
      HA HA HA HA!
      ROFL.
      have you actually read any of the (many) MIME RFCs? there are so many traps and pitfalls lurking there that to say that MIME is easy to decode is just untrue.

    7. Re:We need an XML standard to move mail around by fejjie · · Score: 1

      considering the fact that he pretty much knows the RFCs by heart (he wrote most of the MIME parser in Evolution), i'd say he knows it a lot better than you :-)

  33. Maildirs by mrsam · · Score: 4, Informative

    Maildir : Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi) ...

    In case you haven't noticed, the default settings for the Linux ext[23] filesystems is to allocate one inode per 4096 or 8192 bytes of disk space. Which happens to be pretty much the size of an average E-mail message. So, in other words, you are unlikely to run out of inodes before you run out of disk space, since both are going to be used up pretty much at the same clip.

    It may come as a shocking surprise to some, but the average large filesystem is just littered with small files here, and small files there, all over the place. Here's my workstation -- a fairly large box with all sorts of crap loaded:

    Filesystem 1k-blocks ...
    /dev/sdb5 8159388 ...

    Filesystem Inodes ...
    /dev/sdb5 1036288 ...

    I'm using up almost exactly 8192 bytes per inode.

    and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.

    How about instantly? Most GUI E-mail clients cache mail headers, so they don't have to go and wait for the server to reply each time you click on the folder index window to re-sort, or scroll the folder index.

    ...

    Some ideas about the ideal mail-storage:
    * One file per Mailbox-folder, allowing multiple folders per user.


    Using one file per folder essentially forces you to use some form of locking each time folder access is necessary. Locking of any sort has been problematic for years whenever NFS (or pretty much any other network filesystem) is involved. A single circuit will now take out your entire network spool, as all clients are now spinning on lock requests out on the unreachable server.

    Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?

    I thought you wanted to save everything in a single file per folder, and using multiple files for messages is supposed to waste inodes, remember?

    File format: gdbm, Sleepycat db? Something new?

    Ask an Exchange admin about joys of a corrupted Exchange database. If mail are stored in simple, plain, files, a single instance of corruption will affect at most one mailbox, instead of taking out the entire monolithic database.

    Unicode support in folder names? Imap message-IDs, flags, useragent specific state-information?

    IMAP already uses Unicode to encode folder names. Not sure what "useragent specific state-information" means...

    1. Re:Maildirs by Etcetera · · Score: 1

      It kind of depends on what your file system is optimized for.

      The classic Mac OS HFS+ (and HFS) was optomized for directory depth (large numbers of subdirectories, with fewer files each) versus directory breadth (a more shallow directory structure, but with 1000's of files each).

      That's one reason why Mac OS X is somewhat slow, and why classic Mac apps don't store information in tons of tiny little files everywhere like Windows or even *nix apps do. I'm sure my MacOS 9 machine next to me would be slow as a dog opening up my 4000-file c:\winnt\system32\ directory, but Windows takes forever drilling down into a directory like the Mac's System Folder.

      Of course, with the built in database aspects of the Resource Manager, storing structured data in one file was easier on MacOS than it was on other OS's too.

      My point is, if a FS or OS is optimized to deal with billions and billions of tiny files all in the same directory, then that's great. If it's optomized to deal with fewer files, but with a more structured layout, you should take that into account when designing the storage for your app.

      Who said every platform needs to use the same thing?

    2. Re:Maildirs by Ian+Bicking · · Score: 2, Troll
      In case you haven't noticed, the default settings for the Linux ext[23] filesystems is to allocate one inode per 4096 or 8192 bytes of disk space. Which happens to be pretty much the size of an average E-mail message. So, in other words, you are unlikely to run out of inodes before you run out of disk space, since both are going to be used up pretty much at the same clip.
      This doesn't make sense to me, at least not as presented. You are going to run out of inodes at exactly the same time you run out of disk space, because they are one and the same thing. In fact, I believe all the inodes are created when you create your filesystem, all space is mapped to an inode (though of course one file can use multiple inodes).

      The issue is not the waste of inodes, but the waste of diskspace because the smallest file chunk is one inode worth of space. It's usually said that if you have 4k inodes, you'll lost 2k (on average) per file. This is not really correct, because inodes themselves take up space -- I remember reading a paper somewhere many years ago where they estimated that most users would find 4k inodes better than smaller values, because in normal file distributions the space you save with the smaller inode is less than the space of the increased number of inodes themselves. However, this would lead one to believe you should have really big inodes and really big files, and then you'll be very efficient.

      But really, none of this should be given much weight until someone does a statistical analysis of just how inefficient a one-mail-per-file system is. It might not be significant, or it might be insignificant compared to storing base64 messages, or it may be insignificant compared to the benefits of compression. It's bad form to optimize before profiling, and the many-file inefficiency concerns feel like they are more based on intuition and less on fact. But then, someone must have studied it, so maybe not.

    3. Re:Maildirs by mce · · Score: 1

      With multiGB disks costing nearly nothing, who cares about a bit of space wastage for small mails? I use 1 file per message, and I can assure you that the ease of being able to manipulate them in arbitrary ways (think grep & Co!) is worth much more than the few byts of disk spaces that I might save by changing format.

      Besides, as the origional story correctly notes: more and more mails are wasteful multimegabyte monsters with more attachmants than any sane human wants to see. This pretty much dwarfs any concerns about the small files!

  34. mbx by zsmooth · · Score: 2

    I believe that UW-IMAP .mbx also includes indexing in the mail file, along with the concurrent access stuff. It's definitely WAY faster than mbox.

  35. er, that's just incorrect... by Anonymous Coward · · Score: 0

    >Windows clients: Typically some proprietary db
    >-format. Pathetic.

    Both Netscape and Eudora use the regular old mbox format.

    Outlook may use something else, I've never touched it.

    Pegasus uses something different, you'll have to track down either of the guys using it and ask them.

  36. Oh great by elefantstn · · Score: 3, Funny
    If you were to design your own MUA, how would you design its mail storage?


    Now I'll never get to sleep tonight.
    --
    If it ain't broke, you need more software.
  37. Database-Type Storage, Hybrid by Aloekak · · Score: 1
    I think a database would be a great place to store mail. Atleast the text portion anyways. AFIK, DB's such as mysql/postgres, are great for storing text, binary formats probably wouldn't be a good idea because you'd have to use something like "blob" which I believe may not be as quickly read as if the binary(picture, .doc, .ppt, .etc) was on the actual disk(on the filesystem). I also love the idea of integrated mail systems, groupware if you will, such as Exchange, and even PHPGroupWareSo what I propose would be a sort of hybrid.

    You could have the Database for:
    • All the mail stuff, subject, addressing, message
    • Groupware Apps Info(look at phpgroupware, they have a good thing going for them)
    • For files, such as attachments, a file path would be all that's needed, and it can be abstracted so that if an attachment is sent to 15 people on the server, it's only stored once. When a person removes it, their link in the db is removed, not the file.


    Now a simple mail system would only need a few of the DBs/Tables, but you could easily add the other options later without breaking something you already have going. Which wouldn't be the case if you were to move from just about anything To MS Exchange.

    This would almost inevitably break any form of backwards compatability, except for some possibility of a wrapper that sat around the database, and pretended like it was another format. But I think the pros out weigh the cons....
    1. Re:Database-Type Storage, Hybrid by Anonymous Coward · · Score: 0

      funk blobs, just leave your attachments encoded.

    2. Re:Database-Type Storage, Hybrid by spauldo · · Score: 1

      Perhaps have something similar to the /proc filesystem - /var/spool/mail and friends would be virtual. That way /bin/mail and every other email client would work fine.

      Of course, it'd probably require kernel support for that, like the loopback, proc, and dev filesystems do. Linux would probably have kernel support in a few months, as well as the various *BSDs, but the commercial UNIX vendors would likely take years. There's a lot of solaris-based mail servers out there...

      --
      Those who can't do, teach. Those who can't teach either, do tech support.
    3. Re:Database-Type Storage, Hybrid by Aloekak · · Score: 1

      Everything I've been reading says to store binary files in Blobs(Tinyblob, blob, medium and large). Of course this in mysql and I'm not sure about the other db's.

    4. Re:Database-Type Storage, Hybrid by ahde · · Score: 2

      common sense tells you to store binary files in the file system. Include a URI or path or or whatever in the DB. Believe it or not, direct file access is faster (on most OSes) than a database. You gain nothing by including a blob in the DB. It's not searchable, and it slows other searchs down. The only draw back is that you couldn't do this if you were trying to code your solution completely in SQL or for some other reason are not able access the filesystem directly.

  38. BeOS by mlk · · Score: 1

    I know the author did not like 1-file-per-email, but then when used with a VERY good fs (like BFS) it's a very effective method of storeing email.
    You don't store the subject, date, other metadata in the text file, but in the attributes.

    Mlk

    --
    Wow, I should not post when knackered.
  39. Compression is nice, but... by pHDNgell · · Score: 1

    Cyrus rejected my zlib patches for their IMAP server because, ``disk is cheap.'' I've been using my zlib patch everywhere I use cyrus and it's saved me tons of disk space (it's been so long since I've done a conversion, I don't remember details, but I know it's more than 50% on average).

    Cyrus is one of the better systems out there, IMO. Individual files take up a lot of inodes, sure, but the ``database'' files counter the performance lost to having to open all those files when you don't need them.

    Before that, I concatenated gzip files in mbox format. Modifying anything that can read mbox to read gzip files can't be terribly hard, but I assure you, the benefit is huge.

    --
    -- The world is watching America, and America is watching TV.
    1. Re:Compression is nice, but... by Anonymous Coward · · Score: 0

      Maildir type formats make sense for IMAP servers. And the way that cyrus handles it is great. The author of this posting (not the one I'm replying directly to, but the actual slashdot weblog entry itself) does not mention that those 'database features' in cyrus fix the problem of it taking a long time to load even headers. The inode thing is still an issue, but frankly it just makes sense to have IMAP messages stored in this fashion. You don't want to have to parse through a large mbox file in order to find 1 message (which is typically how IMAP clients load mail). What mbox makes more sense for is POP3. Anyway, from what I've seen, cyrus is great at handling traffic and at handling mail. Not to mention it has a good security track record.

      Cheers,
      -JD-

    2. Re:Compression is nice, but... by Anonymous Coward · · Score: 0

      When last I used Cyrus, there was also a single-store patch that could drastically reduce disk usage when a single message was delivered to multiple users. it did this by only creating 1 file with N links to it, as opposed to N files. The UNIX filesystem semantics took care of the rest.

      Very nice, and I think it may have worked with the zlib patch too.

  40. Spam Assassin!!! by swimfastom · · Score: 1, Offtopic

    SPAM is a burden to everyone. As a system admin, I was told to do something about it. After some research, the best solution was to impliment SpamAssassin on our linux mail server. I tried sendmail SPAM filters, procmail rules, etc. SpamAssassin is undoubtedly the best solution and I recommend it to everyone. It needs to be implimented at the server level, so email your ISP if you don't have root access. It is a simple perl script that can be run with sendmail (using a C++ version) or in procmail (perl). It is very easy to setup using perl CMOS.

    How does it work so well? Spamassassin checks the headers and body of every email passing in to the mail server. It searches the email for certain keywords and phrases and other SPAM characteristics and assigns points to the email based on these. It works very well and has many options --including the ability to have "black lists" and "white lists" in file glob format.

    So far I have blocked about 94% of the SPAM coming in through our mail server. It only misses a couple and is highly configurable! Download and install it!

    Cheers,
    Tom

    --
    http://tomgould.com/
    1. Re:Spam Assassin!!! by swimfastom · · Score: 2, Funny

      So far I have blocked about 94% of the SPAM coming in through our mail server. It only misses a couple and is highly configurable! Download and install it!

      OFFTOPIC!? With that great deal of spam reduction, the space required to store the emails is greatly reduced!

      Cheers!
      Tom

      --
      http://tomgould.com/
    2. Re:Spam Assassin!!! by Anonymous Coward · · Score: 0

      What is this posting doing in this thread??? now THIS sounds like spam to me you advertising bastard.

    3. Re:Spam Assassin!!! by forevermore · · Score: 1

      This really is on topic, in the sense that because of the bulk of spam coming in, there should be some consideration for it in how mail is stored. Maybe even by creating a special bit of meta-data for handling things just like this (so spamassassin wouldn't have to modify the actual message body) - or anything else for that matter. Personally, I'd try to stay away from any mail storage format that doesn't let me get at the ORIGINAL email source so I can report the spam to spamcop and/or the WA State Attorney General (being that some suggestions mentioned splitting off the attachments/etc - this is something that the CLIENT should do, not the MDA/server). Personally, I really like Maildir. It works great for IMAP, and with the file system improvements mentioned before (and maybe some db-type improvements like indexing, caching, etc) it does its job well.. but NEVER take away my email source. Heck, with drive space so cheap lately, you could easily store both the source AND (a cache of) the extracted pieces..

      --
      Do you really need reason for beer? Wingman Brewers
  41. Mail2DB by Aloekak · · Score: 1

    Storing mail in a Postgres DB is actually at Mail2DB

    You can find it by searching the Qmail Site

  42. Maildir with thousands of emails = fast by Sosarian · · Score: 1

    I don't know about anyone else, but I have maildirs with thousands and thousands of email and the subjects display nice and fast.

    1. Re:Maildir with thousands of emails = fast by Tom+Finch · · Score: 0

      What's with the decimal? Don't you mean 100h's? (That's 256's). Only stupid people use decimal.

  43. The only real answer by quantaman · · Score: 1

    CowboyNeal! Have him read all your mail type it out on a typewriter, delete the files, eat the carbon paper, and stuff the messages in a backpack and follow you around all day. Assuming his memory is dood you get fast and relevant responses to searches, excellent security, and easily access at all times! The only problem is space, while hard drive space isn't used the physical size of the system is far from negligable, it can also start to smell after a few days...

    --
    I stole this Sig
    1. Re:The only real answer by Anonymous Coward · · Score: 0
      Assuming his memory is dood

      Its always dood, we can count on that for sure.
  44. begat by *xpenguin* · · Score: 0, Funny

    At first, there was mbox, then there was Maildir, and Bill begat Outlook and .mbx.

    How do you misspell "began"? The "t" key is up above on the first row and the "n" is on the third row.

    1. Re:begat by Anonymous Coward · · Score: 0
      How do you misspell "began"?

      He didn't misspell it, fucktard. Begat is the past tense of beget.

    2. Re:begat by Anonymous Coward · · Score: 0

      Usually a misspelling like this occurs when a person with fine language skills recognizes that a word such as "begat" is the correct word to use in the sentence.

      Merriam Webster shows you

    3. Re:begat by pixel.jonah · · Score: 1

      no, because begat is the right word - look it up.

  45. get rid of the attachments... by Anonymous Coward · · Score: 0

    Email attachments are really just for people who are too stupid to use ftp or http.

    1. Re:get rid of the attachments... by Anonymous Coward · · Score: 0

      No shit man. And screw those people who actually use "ftp" for a client. netcat all the way, by hand mothafacko!

  46. "Why yEnc is bad for Usenet" by Wyzard · · Score: 2, Informative

    yEnc isn't all that great. See http://www.exit109.com/~jeremy/news/yenc.html.

    1. Re:"Why yEnc is bad for Usenet" by Anonymous Coward · · Score: 0

      yEnc isn't all that great. See http://www.exit109.com/~jeremy/news/yenc.html

      Funny.. after reading that, all I could think was "Sounds like almost every other proprietary format I've ever heard of."

      Seriously.. this Jurgen guy should be working at Microsoft.

  47. Oracle's Solution by hspatel · · Score: 1

    I did some research a few years back and found only one company that seemed to have a solid mail solution; Oracle.

    They seemed to have developed a Mail server (smtp, IMAP, pop3, and LDAP) package that runs on top of they 9i database.

    Check it out at:
    http://www.oracle.com/ip/deploy/ias/email/ind ex.ht ml

    I have tried it out and it seem pretty solid. But definetly not easy to setup.

    -hope this helps

  48. There's open source projects in the works already! by Anonymous Coward · · Score: 0

    The DBMail project is already well underway, with a fabulous beta 3 release and an active development team pushing towards a 1.0 release. The project is being supported by a Dutch ISP called IC&S. It provices an MDA interface for Postfix/Sendmail/Exim/Procmail/etc and POP3 and IMAP servers. MySQL and PostgreSQL are supported backends. The CVS tree has a non-relaying SMTP server, too.

    http://www.dbmail.org/

    There's also a solo developer working on a mail server called mmmail. It provides a non-relaying SMTP and POP3 with a MySQL backend.

    http://mmondor.gobot.ca/software.html

  49. Life is not that simple by coyote-san · · Score: 3, Insightful

    Life is not that simple. All databases are limited by the size of the basic block, and if you can't fit your data into that block performance takes a hit.

    With PostgreSQL this a compile-time option, default 8k and it can go up to 32k.

    It *is* possible to store larger items, esp. if they're 'TOASTable' or blobs, but this often just pushes the problem of dealing with thousands of files onto the database. Only now it's a lot harder to figure out why performance sucks.

    Does this mean that database solutions won't work? Of course not. But it does mean that simple solutions won't scale well when you're dealing with massive amounts of data.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
    1. Re:Life is not that simple by chill · · Score: 2

      Good points.

      In defense of databases, they are probably the single most scalable, performance-tuned app in existance. LOTS of people put LOTS of money and LOTS of effort into addressing database performance.

      Yes, mega-databases require high priests to manage properly but nothing beats Oracle, DB2, Sybase and the like for massive data storage, retrieval and searching.

      Add in proper asynch I/O, raw partition access, transaction support, dedicated monitoring and backup engines and you have a system that is damned hard to beat for large mail storage.

      --
      Learning HOW to think is more important than learning WHAT to think.
    2. Re:Life is not that simple by Anonymous Coward · · Score: 0

      With PostgreSQL this a compile-time option, default 8k and it can go up to 32k.

      sorry this isn't true anymore... check postgresql documentation

    3. Re:Life is not that simple by Anonymous Coward · · Score: 0

      "Life is not that simple. All databases are limited by the size of the basic block, and if you can't fit your data into that block performance takes a hit."

      Then how am I able to quickly access data from a 100GB table (when my db block size is 8k)? If you data won't fit into one db block, then any self respecting RDBMS will be able to handle this just fine! The above statement is just simply false.

  50. Re:missed the point. by Anonymous Coward · · Score: 0

    You don't need to store them encoded. You unencode it and it uses less space then yEnc even. If you need to forward it etc, then it gets re-encoded.

  51. Maildir access time by SealBeater · · Score: 1, Redundant

    It takes me 5 seconds exactly to open a maildir folder with 1315 emails in it.

    SealBeater

    --
    -- Its survival of the fittest...and we got the fucking guns!!!
    1. Re:Maildir access time by thogard · · Score: 1

      On my server it takes elm about .78 seconds to read in and index 1651 messages from a 21 meg mbox file. Of course most of that is living in cache somewhere.

    2. Re:Maildir access time by Anonymous Coward · · Score: 0

      My Cyrus lkml mailbox with 55,500 messages opens in under a second over imaps and can sort in about a second :)

    3. Re:Maildir access time by Anonymous Coward · · Score: 0

      Exactly? The probability that the time is an algebraic number, much less a rational number or an integer, is 0.

  52. Encryption by coyote-san · · Score: 2

    You can go a step further - don't bother with setting up a new compression layer, just encrypt it with existing tools. Most encryption routines compress it first, to make cryptanalysis more difficult (and for performance, since there's less data to encrypt), but this is partially offset by the continuing need for 7-bit safe transport layers.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  53. No reason... by NFW · · Score: 1
    ...other than tradition, I suppose.

    File-based mail storage makes sense on a resource-constrained device, but on a machine with enough CPU and disk to run an RDBMS, the database would be a better plan for many reasons, not least of which is that database developers have already spent countless hours producing efficient storage and retrieval systems so that you won't have to.

    Given a schema, it should be pretty straightforward to write an SMTP server to put messages in, and POP3/IMAP/HTTP+CGI servers to pull messages out.

    If anyone knows of any existing open-source RDBMS-centric mail systems, I'd love to know where to learn more about them.

    --
    Build stuff. Stuff that walks, stuff that rolls, whatever.
    1. Re:No reason... by erc · · Score: 1

      Actually, you've got it backward. Write code to pull data out of your flat file email via IMAP, then you don't have to rewrite the world ... and you can still support IMAP/POP users, if that's what they want.

      Why not just write your own? I wrote a C program to parse email and stick it in a database, and it took me all of a weekend to design, write, and test. It takes email out of a flat-file mailbox, parses it, sticks it in a database, and then runs whatever rules the user specifies against the data to put it into appropriate folders.

      By the way, just to piss off the Perl zealots, I first wrote it in Perl - the performance sucked. When I rewrote it in C, I got about a 5x increase in speed and about a 4x decrease in memory.

      --
      -- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
  54. Wrong paradigm by Anonymous Coward · · Score: 0

    ...the age of Desktop/Application/Document are done.

    Data/Application/View

    ..is what you should be aiming for as long as you're doing an overhaul. Anything else it just paint on rust.

  55. Citadel uses Berkeley DB by IGnatius+T+Foobar · · Score: 2

    Check out the Citadel system. (Disclaimer: I am one of the developers, so my opinion on this is kind of strong.) We use Berkeley DB from Sleepycat Software for the data store. Yes, this is the same Berkeley DB that Sendmail uses to store its alias tables, access tables, etc. But it's capable of so much more than that. It's a robust, non-relational database that is hugely scalable and even has transactions/logging support!

    We store all messages in the database.

    Works like a charm. No pounding through ugly directory hierarchies or insanely long flat files. No need to escape out the word "From" when it appears at the start of a line. None of the cruft.

    Ok, so it's a black box. But it's an open source server that uses an open source database backend, and since it supports SMTP/POP/IMAP plus webmail all by itself, you can still plug your favorite utilities into it (Pine, elm, fetchmail, etc.) and you don't have to graft together Sendmail+IMAP+whatever to make your mail system work.

    The traditional Unix mail utilities are getting a little long in the tooth. I'm going to get flamed for saying this but look at what's happened to the email world: Lotus and Microsoft have run away with most of the market because Unix traditionalists won't give up their flat files. It's time for us to evolve, folks.

    --
    Tired of FB/Google censorship? Visit UNCENSORED!
  56. Go back to the beginning, everything is a file... by Anonymous Coward · · Score: 0

    Wrong approach, instead use directories as folders
    and individual files for each message. So it takes
    a little more disk, but you can use all the unix tools on your mail without being trapped by the a poor MUA design for features. If you have a speed of directory traversal problem, just use a cache database for the messages headers as a hidden file in the directory. Reiserfs is really nice for an underlying FS.

    This should look familiar: mh & Andrew Messages (aka AMS) both use this. With AMS you could have 30,000+ messages in a folder without slowdowns.

  57. Re: Am I missing something? by Trevin · · Score: 1
    A mailbox with over 1400 messages, using Courier-IMAP, viewing through my webmail interface
    Somebody correct me if I'm wrong, but I was under the impression that IMAP is simply a protocol for remote access to a mailbox. The mailbox on the imap server side could still be the standard unix mbox format (single file for all messages). This is not what the original post was asking:
    Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi) and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
  58. don't forget by Hollins · · Score: 3, Funny

    You're newfangled system better have DRM built in. I don't have any data, but it must be obvious that artists are losing billions in revenues every week due to mp3s being sent as attachments. This criminal behavior must be stopped or the practice of free expression will come to a screeching halt.

  59. Moving to a better mail box - HUMOR by buss_error · · Score: 3, Funny

    We have to have the remote hammer to pop out of the monitor to whack the end user. This is a must for any admin that works with more than 300 people. Hammer trigger from e-mail, pager, SMS, or telephone number.

    Power mains must be connected to the user's chair, see above for trigger.

    The MUA must forward all p0rn to the admin account. Likewise with credit card info.

    The MUA must know when the user is about to do something to tick off the admin, like sending a "me too" to everyone in the office, or replying to a confidential e-mail to the whole office and prevent the user from reproducing. X-rays are fine for impromptu sterlizations. The side effect of loosing all your body hair is no problem, as it alerts others to a stupid co-worker.

    The MUA must alert the admin when a coworker he has got the hots for changes her home phone number. Just to be fair, if the the admin is female, the reverse applies.

    The MUA must analyze the admins e-mail and throw a bucket of cold water if (s)he attempts to send a really stupid e-mail.

    Also, the MUA must be able to launch nuclear missles at spammers automatically. After that, it should refer the e-mail to the admin to see if a stronger response is warranted. Better yet, the MUA should employ a time machine to go back and choke the spamming creep when the spammer is still a baby, then use X-rays on the parents as above.

    The MUA should have a hypnotic effect on the object of the Admins desire and cause that person to preform disgusting oral acts on the Admins body each time a new e-mail arrives. (HOORAY FOR KELZ!)

    For the PHB, he should (by the same hypnotic effect) do a "Full Monty" when the big cheese walks in. Twice.

    The MUA should be able to cause back dated confirmation messages from HR approving a 51 week paid vacation upon pressing a special key combination, unless it's the PHB pressing the keys, then it should cause an e-mail to be sent to HR from the PHB's account turning in notice.

    Sorry, if you had a day like mine, you'd need a laugh about now...

    --
    Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves.
    1. Re:Moving to a better mail box - HUMOR by Anonymous Coward · · Score: 0

      U have written far too much "BOFH" ...

  60. Maildir? by Dionysus · · Score: 2, Insightful

    What is the problem with Maildir? I mean if you're going to store email, might as well use reisterfs. I don't have any problems with big mail boxes, and the extra integrity of the email messages, are worth the (non-noticable) dealy.

    Used to get the mbox corrupted once in a while. Never had problems with Maildir.

    --
    Je ne parle pas francais.
    1. Re:Maildir? by Metatron · · Score: 1

      Quite ... I run evolution with many different folders all using Maildir ... I currently have 30+ folders (with many filters) ranging from 5000+ in one, going through 2000+ down through the hundreds to one with just 9 in ... nor problems with speed / searching / loading / anything ... very quick very usable. (so overall 10-20,000+ mails)

      Thats Linux / ext2 / Athlon 800 / 512M Ram.

      works great for me.

  61. Databases Ptewey. by Jason+Pollock · · Score: 3, Insightful

    The problem with using database formats is that you can't access them with vi. How many times has your mail client crashed attempting to read an email, but you still _need_ to get access to it? If it's in a database (proprietary or not), you're up the creek. If it's stored in a flat file, you at least have the option of using vi/emacs/grep to find and read the email, and then excise it.

    This has happened to me in Netscape, Kmail, Outlook, Evolution, Eudora, etc. Every single one has had problems at one point or another. The best programs are the ones that are _truly_ open, and let you get at the mail from other directions.

    Don't doubt the power of the text utilities in Unix. :)

    Jason Pollock

    1. Re:Databases Ptewey. by Anonymous Coward · · Score: 1, Insightful

      Amen. Also this gives you far more options for
      backing up email. Sure you can back up databases
      as well but if it takes more than 2seconds to do
      often it doesn't get done. Sure your company's IT
      dept may back up your email for you but do you
      really want to rely on that? If it's in a flat file
      you can just cp it to a floppy.. mail it offsite
      whatever.. and it's easy and fast and you know it
      will work. If I had to use a database based email
      I'd probably set it up to copy every email offsite
      to a regular flat file storage acct.

    2. Re:Databases Ptewey. by Dynedain · · Score: 2

      Or, you use IMAP and just access it from another machine. Simple.

      --
      I'm out of my mind right now, but feel free to leave a message.....
    3. Re:Databases Ptewey. by wadetemp · · Score: 2

      Select Body from Mail where User like 'Jason Pollock'?

    4. Re:Databases Ptewey. by YetAnotherDave · · Score: 1

      / ~f 'Jackson Pollock'

      Mutt can do this stuff with mbox, I hardly see that as a reason to add DB complexity...

    5. Re:Databases Ptewey. by ka9dgx · · Score: 2
      Uhm... never.

      Text is nice for simple, low volume applications with one infrequent user. When you move into multiuser, transaction oriented, high volume systems, an versioning database is the way to fly.

      I've never had an email client crash, I get tons of email, spam, and the occasional trojan/worm/hoax, but no crashes of email programs since 1982 or so...

      --Mike--

    6. Re:Databases Ptewey. by drauh · · Score: 1

      Yup. That's why I use nmh/exmh.

      --
      This is a tautology.
    7. Re:Databases Ptewey. by Luyseyal · · Score: 2

      Huh, I've never had Evolution corrupt my mbox unalterably or keep me from ssh'ing into the box and grepping some text out that I needed from elsewhere. I do this fairly regularly and I get a decent amount of mail. An old alpha version of Evolution _did_ crash on me while trying to import some crappy Netscape mail, but they fixed that bug.

      If libcamel really corrupted your mbox files, you need to file a bug.

      -l

      --
      Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
    8. Re:Databases Ptewey. by maks · · Score: 1

      are you sure you can do this with mutt on an mbox file > 1 Gig?
      sure... more than 1 gig...
      have you ever had a mailbox in wich you receive faxes... or high res pictures from your graphic or dtp staff?

      i think mutt will kill your machine...

      bye

    9. Re:Databases Ptewey. by Jason+Pollock · · Score: 1

      Actually, it's more usually searching for something in the body of the email. Databases aren't known for text searches in unindexed columns, which the body column would be. However, this is _exactly_ what fgrep/et al are for.

      ie select body from mail where body like %database%
      (yeah, it's wrong, but you get the idea)
      vs fgrep database mdir/* mbox

      Jason Pollock

    10. Re:Databases Ptewey. by Jason+Pollock · · Score: 2

      Notice, the ssh in and grep text out of it. That's my point. :) If it was in a database, you wouldn't be able to grep it.

      I was using a beta copy of evolution at the time it happened to me, but that doesn't mean that it won't happen again. Next time, I want the same ability to get at my email using grep. :)

      Jason

    11. Re:Databases Ptewey. by Jason+Pollock · · Score: 1

      That just shifts the point of failure to the IMAP server. The problems with using databases as mail stores still exist.

    12. Re:Databases Ptewey. by Jason+Pollock · · Score: 1

      Strange that you haven't had a mail crash. What are you using?

      Here's an example of what I've seen:
      Outlook had an interesting bug, that if you forwarded an email with an attachment, the program would crash the sending queue. You couldn't delete the email because it was locked by the sending queue process (which was dead). And, outlook wouldn't let you start up without attempting to send the email.

      Kmail 2.2.2 has had some problems recently with mime attachments. I have an email that when I attempt to save the attachment, the file dialog never appears. grep/mimencode to the rescue.

      Eudora has had various problems (been a while since I've used it).

      Netscape, I've never used it, but the complaining from the other end of the office is pretty indicative. :)

      Kmail 1.x had a problem with compressing folders.

      All that being said, this was a discussion on MUA, which typically is considered single user.

      I don't have a problem with using a database as an index, with the actual messages stored elsewhere, but I would like to be able to get at the email with fgrep, vi, etc. if needed.

      Jason Pollock

  62. Perfect file format by kinthalas · · Score: 1, Offtopic

    Just edit your .procmailrc:

    :0
    /dev/null

    And all your problems are solved.

  63. Re: Am I missing something? by Dionysus · · Score: 2

    Courier IMAP uses an enhanced Maildir format (original Maildir didn't support subfolders)

    --
    Je ne parle pas francais.
  64. MTA/IMAP server for MySQL message-store by chrisv · · Score: 3, Informative
    Personally, I'd love to see a Linux MTA/IMAP system which uses an SQL message-store. The ability to replicate a message-store across multiple physical sites without having to get into distributed filesystems like Coda would be a huge benefit for those who need to provide a redundant mail service.

    I actually found a nifty little package called dbmail which uses an SQL messagestore. I've been playing with such things at work since they wanted me to write them a web-based mail client, and I wanted something which would let me deal with a MySQL database on the web client, but also allow people to connect to it via IMAP or POP3.

    Of course, the whole replication part of it might be a bit more difficult, but it could probably be arranged as well. I'm pretty sure there are tools in existance for doing replication on a MySQL database (of course, don't ask me the names of any of them...)

    --

    Dogma: Dead (mostly because your Karma ran it over)

    1. Re:MTA/IMAP server for MySQL message-store by kris · · Score: 2

      You do not want BLOBs in a MySQL store, at least not unless MySQLs BLOB API changed a lot since I looked last (which has been some time, admittedly).

      MySQL limits BLOB size to max-packet (1 MB per default), which is very small and stupid anyway.

      MySQL has no proper BLOB API which allows you to download a BLOB only partially. You cannot read bytes 10.000 to 20.000 of a BLOB in MySQL.

      MySQL tables perform abysmal with BLOBs of varying size being part of the table.

    2. Re:MTA/IMAP server for MySQL message-store by joib · · Score: 2

      PostgreSQL blob api also sucks golfballs through a waterhose, but as of pgSQL 7.1 there is no row lenght limit any longer. So you could stuff an arbitrarily long attatchment as a TEXT or VARCHAR field.

    3. Re:MTA/IMAP server for MySQL message-store by erc · · Score: 1

      Yet another Slashdotter on crack.

      Putting aside for the moment the idiotic idea of storing a BLOB in a database, MySQL has absolutely no problem storing BLOBs of many megabytes.

      --
      -- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
  65. Back in 1993 by Anonymous Coward · · Score: 0

    I was working at a Governmental institution which was one of the first to run Linux servers already in 1993 (or was it 1994?).

    I recall I once tried to send a 12 MB attachment with zipped GIS data to a colleague at the other end of the corridor via e-mail.

    I subsequently had a long conversation with my superiors on dos and don'ts.

    As I understand it most servers still don't like files larger than 1 or 2 MB. Why is that? Can't one use an ftp-undercover or something?

  66. Unencoded Attachments. by sPaKr · · Score: 1

    While I think Unecoded binary stores of Attachements would be great.. I dont think it will work. I have yet to find a mail client that will *never* fail on decodeing an attachement. Now granted.. many of thes are messages from broken mail clients..but still I have been able to extract the attachments after a little work. Email virus scanners Often have this same problem. Until we can garentee that the software can extract ALL encoded documents we cant assume that the orignal message wont be required at a later date. In the end we are talking about disk space. Last I checked Drives were cheap.. and prices were falling.. ( I just picked up 100GB WD drive for a few dollars north of $100). The data base store and search may be fast.. and should be fast at the expense of space. So do we really need(want) unencoded attachments?

  67. The problem is with pack-rats... by doomdog · · Score: 1

    Not with the mbox/maildir formats... After all, who in their right mind REALLY needs to keep 10,000 emails? That's absurd. Even if you received 20 emails a day (that were worth keeping -- and most email is definitely NOT worth keeping), to amass 10,000 emails in your little kingdom would take nearly two years! And realistically, how many emails really maintain their validity after 2 entire years???? I would venture very few... The problem here is not technology, it's people and their lazy habits....

    1. Re:The problem is with pack-rats... by Anonymous Coward · · Score: 0

      No, for a lot of people thousands of emails go unread, but they're mailing lists about a subject, that may never be webarchived and yet are of great interest to the person. Therefore, they can act as a good archive when troubleshooting or looking for an answer to a question. Hence the need for reliable storage.

    2. Re:The problem is with pack-rats... by doomdog · · Score: 1

      Get real! You simply cannot have "thousands" of emails that are of "great interest"! I regularly go through my email archives and delete all of the cruft that has accumulated -- and without fail, the stuff I'm deleting is stuff that I *thought* would be important, but really wasn't (or at least, is no longer important after keeping it for a year)...

    3. Re:The problem is with pack-rats... by dossen · · Score: 1

      I'd just like to know, why not keep it. As a lot of people has said, disk is cheap. And though it might not happen very often, one of those old mails may contain an important bit of information that I lost (phone number, small util, advice on some rare problem etc). Keeping the mails also helps when searching for the stuff you get, since they provide you with context.

      And as far as the realism goes, there is nothing like high-traffic mailing-lists to fill your mailboxes (and what you don't have time to read now, might be what saves your hide tommorow).

    4. Re:The problem is with pack-rats... by Anonymous Coward · · Score: 0

      I'm the (informal) archivist for a mailing list. My solution: stuff the messages into a series of mbox files (mail.N, mail.N+1 ...) and back them up. Sometime I'll use MHonArc and other tools to make them searchable, but for archival purposes breaking them up into ~200kb files works well.

      Organizing massive amounts of mail needs to be done by the recipient, the way they want, not by a MTA.

  68. OS400 has been doing this for years by Starbuck · · Score: 1, Interesting

    on a REAL computer (albeit big iron), OS400 does exactly what they are proposing. Sure, the as400 has a bunch of smaller processors that operate the individual subsystems, but isn't this somewhat like what the video card industry is stepping towards in terms of GPUs. If your hard drive handled all of the hard drive tasks (meaning it only requests/sends data to the CPU) things would be a lot faster. Also a lot of proprietary hardware, but that's what standards are for. something like this is years away, but there is a limit on how bloated and stupid an OS can get. (sorry XP, but your 1000MB butt is too big for my taste.)

  69. Don't do it! Compress in the file system. by billstewart · · Score: 3, Insightful

    Shredding and compressing mail messages is almost always a bad idea. Essentially *nobody* does it correctly, and you can't reconstruct messages in their original byte-for-byte formats, which trashes digital signatures. You won't save much disk space, because real text doesn't take up enough space for anybody except a big ISP mailsystem to worry about, and binary attachments usually only compress well if they've been encoded in some non-8-bit-transparency format like base64 or uucode. About the only time it wins is when one person on your keep-mail-on-server mailsystem is sending an attachment to a bunch of people who can then all use the original, which is to say they should probably have stored the file on the web and mailed a URL. If you're going to do things like this, get yourself a compression-equipped filesystem and just store your raw mail messages there.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  70. The trouble with e-mail by os2fan · · Score: 1, Offtopic
    The main hassle I have with email is that there is little thought given to official positions. Under snail mail, I could write to "recruitment" or "finances", and expect my mail to get to the right area.

    Very few systems give alternate functional views to different views. In order to send a letter to a section, I'd have to find out a name of a person in the section, send it to that person's email, and then hope that person is in.

    What is needed is a parallel view where people can add functions to their role (at user level, for example).

    So an email to recruitment@sample.com will get to the recruitment folder, and any of the recruitment officers can deal with it.

    The only way around this is then to look at the issue of spam. If everyone has a "recruitment" address, then one could send out mail to "recruitment@[each domain]" a lot easier than getting the right name for each domain.

    The idea is that a section inbox should be available to a section, and not an individual, and that people in the section should be given access when appropriate. A section would then retain the same name, regardless of the personnel making it up.

    None of the mail systems that I see grasp this point.

    One could have view sets, which are alternate tree structures, with the accounts at the leaf objects. One could be in the flat "name" tree, or access the personnel\recruitment intray, or whatever.

    --
    OS/2 - because choice is a terrible thing to waste.
    1. Re:The trouble with e-mail by Jubal+Kessler · · Score: 1

      IMAP, shared folders and ACLs.

      RFCs 2086 and 2342.

    2. Re:The trouble with e-mail by kephunk · · Score: 1
      None of the mail systems that I see grasp this point.

      The last time I used MS Exchange (1998) it could do this. It's called Public Folders. You can assign email addresses to the public folders, and create custom views for those folders for the users. I had public folders setup with mailing lists going to them, with email retention set. It was quite nice. I appeared in your Outlook/Exchange client in the Public Folders section, and you could create a nice hieracy, and even set permissions on it, etc. The main reason I wouldn't use this would be spam.

    3. Re:The trouble with e-mail by Anonymous Coward · · Score: 0
      The idea is that a section inbox should be available to a section, and not an individual, and that people in the section should be given access when appropriate. A section would then retain the same name, regardless of the personnel making it up.

      uh, yeah. we do this with notes. it's called a shared ID. basically you just throw more people into the ACL, assign roles (if you feel like it), and you're done. i think notes is a PoS but this works well most of the time. not enough of the time by my lights, but most of the time.

    4. Re:The trouble with e-mail by erc · · Score: 1

      Most UNIX boxes have this feature built in, if admins would use it - it's called /etc/aliases. Most companies I've been at, I've used it.

      mismanager: fred

      Also works great when someone moves on:

      fred: fred@newdomain.com

      Works just fine, no reinventing the wheel needed.

      As for giving a group of people access to email, that's what groups are for:

      -rw-r----- owner group 12345 owner

      --
      -- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
  71. XML = unnecessary performance hit by acb · · Score: 2

    Why waste CPU cycles on parsing a human-readable text file format such as XML?

    There should be a standard byte-compiled representation of XML (CXML), which has been flattened into an easily readable data structure. It would be portable, with byte orders indicated in flags (or would just use network byte order, i.e., big-endian), and with fixed-length element start/end headers, and could be used in lieu of XML for machine-machine communications. If a human wants to inspect the data, XCML could be trivially converted to and from XML.

    Why go to the trouble of running a parser for files that 99% of the time no human will ever look at?

    1. Re:XML = unnecessary performance hit by spectecjr · · Score: 5, Informative

      There should be a standard byte-compiled representation of XML (CXML), which has been flattened into an easily readable data structure. It would be portable, with byte orders indicated in flags (or would just use network byte order, i.e., big-endian), and with fixed-length element start/end headers, and could be used in lieu of XML for machine-machine communications. If a human wants to inspect the data, XCML could be trivially converted to and from XML.

      There is one. It's called ASN.1

      Simon

      --
      Coming soon - pyrogyra
    2. Re:XML = unnecessary performance hit by jbert · · Score: 2

      You have just described ASN.1.

      People like XML because humans can read it.

      It is the same reason people like the inefficient text-based encoding in email.

      (That doesn't mean they are right, of course).

  72. Exchange brain-damage by Anonymous Coward · · Score: 5, Informative

    Exchange is actually a pretty decent mail server

    As part of my job, I've written software to send out HTML mails to people (no, it's not spam). When these messages pass through an Exchange server, Exchange does us the "service" of creating a text version of the mail from the HTML. I guess this is so that people without HTML-capable mailers can have a readable version...

    The problem is, we include our own text/plain version alongside the HTML (ain't multipart/alternative great?). Nicely formatted and everything. Instead of leaving our mail alone, Exchange rips out the text version and creates a new one from the HTML. The result is an ugly mess of URLs because we use some graphics in the HTML version. Our nicely formatted text version ends up in the bit bucket so that Exchange can dump it's url-barf on people.

    This is really stupid behaviour for an MTA. And for some reason, it's always CEOs of important clients who use text-based MUAs while sitting behind an MS Exchange server. They call us up asking which URL to click on.

    This, combined with other mail-rewriting bogons, has lead me to the conclusion that Exchange has no respect for the messages passing through it.

    1. Re:Exchange brain-damage by ortholattice · · Score: 4, Informative
      Mod parent up. More people need to know about this blatant RFC 2046 violation that corrupts carefully composed multipart/alternative emails. Sometimes it makes me want to scream that Microsoft gets away with this stuff and no one seems to care. Or maybe they don't realize their correspondents are receiving a corrupted version on the other side, and the correspondents just assume the sender is sloppy and lazy.

      BTW a good way to get nicely formatted text/plain content is:

      links -dump abc.html > abc.txt
      with neatly formatted tables and everything. Unfortunately only your non-Exchange recipients will see it.

      Now if Exchange automatically put in a text/plain for attached Word documents, I might buy that... :)

    2. Re:Exchange brain-damage by Pii · · Score: 1, Redundant
      Easy with the "dipshit," dipshit.

      In addition to the 'lynx' web browser, there is another, similar, console based text only browser called 'links.' It's actually a better browser, in that it correctly renders tables.

      Maybe if you weren't such as asshole, just looking to pounce on other people, you'd know that.

      Anonymous Coward, indeed.

      --
      For those that would die defending it, Freedom
      has a sweet taste that the protected will never know.
    3. Re:Exchange brain-damage by Anonymous Coward · · Score: 4, Insightful

      Sometimes it makes me want to scream that Microsoft gets away with this stuff and no one seems to care.

      One thing to keep in mind about Exchange is that it's really a X.400 mail system, with some proprietary routing features kludged on top, lots of back-compat MS Mail features kludege on top of that, and then (as the last afterthought) SMTP kludged on top of all that. Next time you are at a computer book store, gander at the architecture diagram for Exchange -- it's so complex that it _should_ make you queasy. The thing just reeks of early-90s incorrect design assumptions.

      So it shouldn't be a shock that it can't handle a large number of SMTP edge cases. Frankly, nobody would buy a product like Exchange if it didn't have the Microsoft logo on it and a nice client which gets installed with Word and Excel.

      Microsoft, in their heart of hearts knows that it's a piece of shit, but it's _their_ piece of shit. And it happens to sell well, and any product that profitable can't be all that bad.

      I wouldn't be shocked if numerous skunkwork project have come and gone at MS to replace their Big X.400 Jet DB Kludge with a real Internet-saavy mail server, but the poltics of the place probably dictate that that they lumber on with what's working (that also explains products like Windows ME).

    4. Re:Exchange brain-damage by bmetz · · Score: 1, Troll

      "Gets away with it"? Come on. Someone made a mistake in their team. Yell at MS and the next version will fix it. It happens.

      I doubt there's a sinister plot in MS to mess up
      people's emails. They've got to use these products too, you know.

      --
      What did you eat today? http://www.atetoday.com/
    5. Re:Exchange brain-damage by honold · · Score: 0

      they dropped the jet database starting with exchange 2000

  73. row size limits are gone now... by Lazy+Jones · · Score: 3, Informative
    With PostgreSQL this a compile-time option, default 8k and it can go up to 32k.

    Current versions of PostgreSQL no longer have such limits (they're much higher, a single field can use up to 1GB ...).

    --
    "I love my job, but I hate talking to people like you" (Freddie Mercury)
    1. Re:row size limits are gone now... by Anonymous Coward · · Score: 0

      that doesn't mean you'll be able to read 1 gig efficiently. That was the original authors point from what I gather.

      You have to understand how databases work. In order to get the best performance possible, they use "blocks" which are all of the same size. This allows them to do random-access.

      how much data can be saved has nothing to do with block sizes (well, ok - it *can* but I gather that PostgreSQL has made it so that data can span accross multiple blocks).

      Hopefully I cleared up the original authors post for you.

  74. Experiences with Mail Storage by Da_man · · Score: 1

    Hi,

    Having worked for a IT sales organisation as a Systems Engineer, you quickly become used to corrupt email stores, sluggish mail systems, due mostly ignorance of the Sales people with it comes to their mission critical application. Outlook .pst file sizes in the order of 1GB is not unusual. Try opening something like that in pine! And this is not going to change for the better with the advent of unifed messaging (1 information store for voice, email, Video??).

    Exchange is still a reasonable mail system even if the database is Access. Where I am working now (Large network kit manufacturer, sssh!) a team is in the process of rolling out what will be a mission critical Exchange 2000 implementation. Their major gripe / concern: having to place the exchange db's on several machines / EMC storage array. Despite the mission critical nature of this, no clustering is involved.

    Oracle have really latched on to the storage / manageability / reliability problems inherent in large mail systems (and these are only going to get bigger and bigger)and have a great system based on Oracle 9i. It's benefits - huge scalability and easy clustering, management etc. What it doesn't have: the online calandering and collaboration tools that exchange has.

    Notes is db backended as well, and being based on views, it can be very quick. It's cross platform, has a very rich client, and all the collaboration tools you could need. I can't understand why it seems to be losing ground to Exchange (at least that's my opinion of the Irish marketplace)

    If anybody out there is considering developing a new mail system, the things I would look for are:
    * RDBMS Back end (ala Oracle or notes, not exchange)
    * LDAP integration for user management (ala Exchange)
    * Bolt on interfaces, such as http / imap / pop3 / wap / voice, others can be added as necessary
    * Support for clustering, replication
    * Perhaps built in HSM, allowing users to migrate old email at the server rather than at the client. Never give the user the opportunity to store email locally, it will come back to bite you !!!

    Just my few cent.

    Jonathan Bourke

    1. Re:Experiences with Mail Storage by Anonymous Coward · · Score: 0

      Notes is losing out to Exchange in many markets because of the "Companion" effects. A lot of businesses already have a version of MS Office licensed. Everybody is using it, often using Outlook in POP mode with their old simple mail solution and for their own individual calenders. Once everyone in the company is bought into the UI like this, Exchange becomes the easier choice.

    2. Re:Experiences with Mail Storage by Da_man · · Score: 1

      Very true. I haven't downloaded StarOffice/OpenOffice in a while, but maybe that is something they should focus on, which might generate a true competitor to MS Office.

      That said, I am not a technology evanglist(sp!) by any means, and am a MCSE and Sun certified engineer. MS Office is wining the desktop war by being quite reasonable to good. Of all the alternative mail clients / office suites, StarOffice is the first one I have seen in use in the wild in a long long time.

      Some may think that this is off-topic, but choice of client seems to determine the backend in the vast majority of cases.

      JB

  75. How about usenet? by thogard · · Score: 2

    If you store your messages in an usenet server you get all kinds of neat features like auto expiration and tools that can put the binaries together and let the server deal with the file format.

    Back when C-news was new, there was a systems called "Notes" that keep usenet posts in a database. From what I can tell that became the ancestor of lotus notes at some point.

    1. Re:How about usenet? by Meowing · · Score: 1

      Gnus is capable of storing mail in the same format used by C News (numbered files in each folder with an .overview, and an active to rule them all). It works well enough, but Gnus is too damned slow overall to take real advantage of it.

      One problem with using a real news server to store mail is in the message IDs. This needs to be a unique key. No problem for news where message IDs are required to be unique, but a big problem for mail where this is not so. (Also consider a mail message sent to two recipients on the same server, or one that you receive twice, say through two lists. Sorting that out will be a pain using conventional news servers.)

    2. Re:How about usenet? by tricorn · · Score: 1

      Unix Notes and Lotus Notes both have origins in the PLATO Notes system written in the early '70s at the University of Illinois. Notesfile "pad" is still going strong on the NovaNET system (which is what the PLATO system was renamed to, after CDC sold the rights to the name PLATO to TRO), with origins dating to around 1972 or 1973 (before Notes was written).

  76. I vote for plain mbox by Trevin · · Score: 1

    I still like the existing mbox format, primarily because it is in plain text. This makes it easy to manage with common text tools and editors, as well as making it portable across many different MUA's. I just cringe whenever somebody mentions storing mail in some kind of database; if users want to use an MUA that does that, that's fine, but if sendmail ever changes the mailbox format to something that I can't read with 'less', I'll stop upgrading.

    The only thing I don't really like in the mbox format is the separation between messages. It's very similar to the mail header lines, and I'm not sure what would happen if a message happened to contain a line with the same format. OTOH, I can't think of another separator that would work better while keeping the file in 7-bit ASCII.

    I also use Eudora as my MUA, one of the reasons being that all mailboxes it creates are also in mbox format. One thing I like about Eudora is that it stores almost all metadata about each message in a separate file, including indexes to each message for faster access. But the down side to that is there have been several times when Eudora crashed (it is a M$-Windows program, after all) and the metadata got out of sync with the mailbox, so the tags were often lost. I think a better solution would be to store metadata as extra header lines in the message (I think pine does this) -- although I wouldn't want *too* many extra headers cluttering the mail -- and have the MUA use a separate file just for indexing and re-sorting.

    I also like the fact that Eudora extracts attachments from messages so that the mbox file doesn't get too big. However, placing all attachments in a single directory creates its own cluttered mess, plus there are potential problems when two or more messages happen to have attachments with the same file name, and it's difficult to keep track of which mailbox each attachment came from. Perhaps a partial solution to this would be to have a separate attachment directory for each mailbox, and each attachment's filename would be modified to indicate which message it came from (such as a date prefix or message ID suffix). The downside is that it may not work on old systems with a minimum filename length (POSIX programs must not depend on more that 14 chars). Attachment separation should also be limited to MUA's; MTA's ought to keep attachments inline so that user agents can do whatever they want with them.

    1. Re:I vote for plain mbox by Anonymous Coward · · Score: 0

      Uhm.. Maildir and MBX (A UW format) ARE still plain text.

      Also, in mbox format, with your From_ seperator, any message line that has that is supposed to be escaped like: ">From_"

      What Eudora/Netscrape/Outcrap store their local folders in has nothing to do with mail SERVER storage..

  77. Qmail by babyruth · · Score: 1

    Use the Qmail native format, 1 file per message. Let the filesystem do its job. Qmail is a great alternative MTA to Sendmail, fast, secure (no exploits so far IIRC) and easy to configure. Qmail

  78. Re: Am I missing something? by Trevin · · Score: 1

    I stand corrected. :^)

    (Guess I could've/should've looked up Courier-IMAP before responding!)
  79. Don't speculate. Profile. by Doktor+Memory · · Score: 5, Insightful
    Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi)
    Psssst. It's not 1978 any more. Inodes are cheap. So is disk space. Stop spreading FUD.
    and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
    Quite right. Just try it. You might be a bit surprised by the results.
    --

    News for Nerds. Stuff that Matters? Like hell.

  80. XML by Anonymous Coward · · Score: 0

    Mail is crying out to be stored in XML.

    The greatest benefit would be the ability to specify accurately the structure of the hierarchical document (which MIME mail is) using DTD, schema, etc. This would lead to greater standardisation across the internet.

    There are a great number of high quality tools available for the manipulation, transformation and parsing of XML.

    There are a number of protocols already available for the transmission of XML data.

    And mail is text based.

  81. a few comment by an experienced mail hacker by fejjie · · Score: 5, Informative

    A couple things:

    1. Evolution is NOT "Basically mbox with database features". It can use Maildir or MH as the backend (and you can write your own plugin to extend this if you like).

    2. Evolution's body indexing and summary files are extremely fast and efficient, about the best you'll get. I hear MySQL has text indexing capabilities that are extremely fast, but I'm not sure if they are faster than Evolution's indexer or not. Might be interesting to check this out.

    3.

    > But the thing that bugs me most is disk space. Typical inboxes are
    > made of 5% to 10% of Text including Headers and HTML. The rest are
    > BASE64- (or UU-) encoded pictures, word documents, zip archives and so
    > on. The problem here is the encoding which wastes considerable amounts
    > of space (at least one third).

    It's theoretically possible, if you wrote your own Evolution storage plugin, to change the Content-Transfer-Encoding header value of binary attachments to "binary" (and text attachments to "8bit") before writing the message out to disk (or wherever) thus magically making it so that you no longer save the encoded text of the attachments but rather in-line binary data content. (Yes, it's as easy as setting an enum value in the CamelMimePart structure).

    However, you have to be aware of the consequences of this. Most importantly, you will not be able to validate any of your PGP/MIME or S/MIME signed messages as according to the RFCs for these types, the signed MIME parts MUST be treated as opaque (meaning that you may not modify them in any way).

    Now on to your ideas...

    > One file per Mailbox-folder, allowing multiple folders per user.
    > Should those files reside in one central location or in users
    > Homedirs?

    How is this different from mbox? (btw, CVS Evolution can handle mbox files and directory trees in external locations - ie, not within the
    ~/evolution directory).

    > Compression: Should messages be broken into pieces and the
    > MIME-attachments stored separately (thus searching of the text parts
    > would still be possible without decompressing the whole file)?

    If you break apart the MIME parts, you run into the same problem I described above about not being able to verify signatures.

    However... if you took a normal mbox and gzipped it, you would certainly save space (at the expense of speed). I've been thinking about writing a CamelMimeFilterGzip class for gzip compresing/decompressing streams which would allow Evolution to read and write to gzipped mbox files for example.

    Once the class is written (which should be fairly simple), allowing Evolution to read gzipped mboxes should be as simple as doing:

    camel_stream_filter_add (MboxStream, GzipFilter);

    ...before feeding 'MboxStream' to the MIME parser.

    > File format: gdbm, Sleepycat db? Something new?

    Please not Sleepycat. If you are so sure that a generic database backend will be better than what Evolution's got, at least have the sense to use MySQL or PostgreSQL.

    I'm personally against using a generic database as a storage and heres why:

    1. The average user does not have an SQL database installed on their desktop systems, and so this is a completely rediculous dependency for them. If you think library dependencies are bad, just wait till you have to go installing, configuring, and maintaining a multi-user database running on your system. This may be fine for a company solution, but not the average end-user.

    2. I'm not too familiar with MySQL or PostgreSQL, but I recall there being problems with mailers that use SQL database backends that tried to store the content of the messages as part of the table (due to them making the size of the table too small or whatever). If you can set the size to be "infinite", then I guess that's not a problem.

    If your plan is just to have the database index the folder and actually store the contents as separate files, then you've instantly gained nothing over Maildir except that now you have a hefty database that you have to maintain and very little to no speed improvements (especially if you have a well designed/implemented summary index like Evolution does).

    The only improvements you might gain here is body indexing? As I said earlier, MySQL supposedly has a REALLY good text indexer and so it might be a little faster than Evolution's. I'm really not sure on the comparison here.

    > Should the security model allow users to directly access their
    > files, grep them, copy them around?

    Is there a reason NOT to? I don't see one. It's their mail.

    > Shared folders, virtual domains?

    This doesn't really have anything to do with folder formats and everything to do with features of the client itself.

    (Evolution can do this).

    > Unicode support in folder names? Imap message-IDs, flags, useragent
    > specific state-information?

    Unicode support in folder names I'd say is a pretty important feature. I'm not sure what you mean by "Imap message-IDs". Do you mean UIDs? Evolution, for example, has a UID assigned for each message whether it be in an mbox folder, Maildir folder, MH folder, or IMAP folder. So this isn't necessarily dependant on folder format (though it could be if you used a database backend for example, you might want a UID in the table).

    I don't feel that UIDs are a must though, but I would suggest them. They are definetely useful especially for folders that can be accessed by multiple clients at once.

    Flags are good. I'd go so far as to say a MUST have.

    As far as user-agent specific state-information, it'd be nice to not need it. But if the client needs to keep it's own info, it'd be nice to be able to map the info to UIDs and keep it's own state file somewhere else (not necessarily alongside of the mail storage).

    For example, IMAP doesn't have any means for the client to store state information on it, but that's perfectly fine. If a client chooses to
    have it's own state, then it can save it locally.

    It would be nice if the storage could handle user-defined flags/tags though. This would allow the client to extend the native features of the format (Flag-for-Followup, message colouring, etc).

    > How would MTAs deliver mail? How would clients access? File-locking
    > (NFS)?

    This is one reason to just stick with what's available :-)

    File locking is a MUST have (or a scheme to make it not needed, such as Maildir).

    --
    You know, I have one simple request...and that is to have messages with freakin' laser beams attached to their headers. Now evidently my MIME specification informs me that that can't be done. Uh, can you remind me what I pay you people for? Honestly, throw me a bone here. What do we have?

    1. Re:a few comment by an experienced mail hacker by Da_man · · Score: 1

      Just picking up one of your points:

      "The average user does not have an SQL database installed on their desktop systems"

      Should they even be allowed to store mail locally, SQL DB or no. I have seen so many "dummy spits" when locally stored email goes bang!!

      This is one area where coherent client-server is a must.

      JB

    2. Re:a few comment by an experienced mail hacker by blueroo · · Score: 1

      Right. So, when will Evolution stop requiring me to install an entire GTK and GNOME (You wanna whine about SQL dependancies? Try GNOME dependancies. Megs upon Megs of bloated libraries and other bullfuckingshit that I don't want to maintain on my fucking workstation. ... WELL I GET PISSED GODDAMNIT!) support environment just so I can ready my boffing mail? If evolution's mail handling features are so great, why weren't they wrapped in a single lovely library (or set of libraries) that I can use to write other mail clients with?

    3. Re:a few comment by an experienced mail hacker by Anonymous Coward · · Score: 0

      Hmmm, aren't they wrapped in the library "Camel"?

    4. Re:a few comment by an experienced mail hacker by CaraCalla · · Score: 1
      Thanks for responding...

      2. Evolution's body indexing and summary files are extremely fast and efficient, about the best you'll get. I hear MySQL has text indexing capabilities that are extremely fast, but I'm not sure if they are faster than Evolution's indexer or not. Might be interesting to check this out.

      so by default it's still mbox with some database features :-)

      However, you have to be aware of the consequences of this. Most importantly, you will not be able to validate any of your PGP/MIME or S/MIME signed messages as according to the RFCs for these types, the signed MIME parts MUST be treated as opaque (meaning that you may not modify them in any way).

      That's why I love Slashdot. While most of the issues rised in comments above are things I already thought about, that's a new point. So whatever is implemented, it must be able to reproduce the original message exatly. Even the --cut here 01204737473829 -- parts. That's a very important design criteria.

      Please not Sleepycat. If you are so sure that a generic database backend will be better than what Evolution's got, at least have the sense to use MySQL or PostgreSQL.

      Sleepycat, gdbm, ndbm are just libs containing database routines. Do you really think Evolution implements it's own indexing functions, just because it's fun to write them? I bet Evulution also uses eighter standard Berkly db, gdbm or Sleepycat.

      I concur in that it would be a nightmare having to set up a RDBMS just for using a Mail-client. Although it would be justified for a big-scale POP-Toaster or similar.

      I know that my post didn't make it clear enough. I also like the simplicity of mbox and Maildir, because you can use it for huge mailservers down to minimalistic MUAs. Both have shortcomings though, so what I want is something simular, simplistic, but with optional compression and a standard way to add features like indexes, IMAP-flags, etc.

    5. Re:a few comment by an experienced mail hacker by CaraCalla · · Score: 1
      Sleepycat, gdbm, ndbm are just libs containing database routines. Do you really think Evolution implements it's own indexing functions, just because it's fun to write them? I bet Evulution also uses eighter standard Berkly db, gdbm or Sleepycat.


      Oups, aparently theiy don't. (db for vCards, but something new for indexes?)

    6. Re:a few comment by an experienced mail hacker by astroboscope · · Score: 1
      > But the thing that bugs me most is disk space. Typical inboxes are
      > made of 5% to 10% of Text including Headers and HTML. The rest are
      > BASE64- (or UU-) encoded pictures, word documents, zip archives and so
      > on. The problem here is the encoding which wastes considerable amounts
      > of space (at least one third).

      It's theoretically possible, if you wrote your own Evolution storage plugin, to change the Content-Transfer-Encoding header value of binary attachments to "binary" (and text attachments to "8bit") before writing the message out to disk (or wherever) thus magically making it so that you no longer save the encoded text of the attachments but rather in-line binary data content. (Yes, it's as easy as setting an enum value in the CamelMimePart structure).

      However, you have to be aware of the consequences of this. Most importantly, you will not be able to validate any of your PGP/MIME or S/MIME signed messages as according to the RFCs for these types, the signed MIME parts MUST be treated as opaque (meaning that you may not modify them in any way).

      It could still be made useful by adding a "Good signature received from bla bla..." blurb, modifying the message as you said above, then signing the modified message with the user's own key. That way the user would have a trustable receipt that the message was originally signed by the original sender. The only loss would be not being able to pass the message along to someone else with a verifiable sig from the orig sender.

      --
      If we were ants living on a Rubik's cube, differential geometry would be a little more confusing.
    7. Re:a few comment by an experienced mail hacker by fejjie · · Score: 1

      I'm actually the other Evolution mail hacker (NotZed being the other) :-)

      My biggest complaint about sleepycat (other than it being slow) is that they change formats between minor releases.

      You'll note that Evolution REQUIRES libdb-3.1.17 for the addressbook for this very reason. While we could most likely easily make Evolution able to use various versions of libdb, the simple fact that upgrading your libdb instantly breaks your contacts database is a major downside.

      It also means that copying the addressbook.db file to another machine running Evolution linked against a different version of libdb would simply not work.

      This is why we require exactly version 3.1.17 of the library.

      The new Evolution mail indexer is based on an on-disk hash table design (http://primates.ximian.com/~fejj/idealhashtrees.p df.

      The original design of the indexer used libdb as a storage but that turned out to be unbearably slow and the indeces were massive.

      The second design used a custom file format consisting of blocks but required you to load it into memory in order to use the information, which means that it's not very scalable (it scaled ok up to ~10,000 messages per folder).

      the third design (only in the CVS development branch) again uses custom file formats (there are now multiple files) based on that on-disk hash table design above. This gives us the advantage of not having to load the entire index into memory before being able to use it (making it much more scalable - easily handles >100,000 messages per folder? I don't really know the statistics on it). On the downside, it means we have to make I/O calls. However, the design of the file formats makes it extremely fast lookups and so the user still hardly notices it (if they notice it at all?) even at 10,000 messages in a single folder.

      Anyways, if you are really interested - you can checkout cvs evolution and read evolution/camel/developer-docs/camel-index.txt

      Jeff

  82. DB by jukal · · Score: 2

    I have in some times used a custom made MTA and indexed the incoming mails, headers and first message body part, into MySQL database. Attachments are compressed and stored within "regular filesystem". The whole kludge is then interfaced to IMAP. User authentication is also done via MySQL, thus making it unnecessary to create "real users". The solution has lasted without problems for years now already. MySQL in general, works like a dream, I have never had any problems with it.

    It is good (and fast) for some purposes, which I am not going to discuss here, everyone is probably very well equipped to figure out the plusses and minuses of this way of doing it.

  83. No Notes on Linux by BlueUnderwear · · Score: 2
    Plus, Domino runs on Linux, Aix, Solaris, NT, 2000, OS/2, AS/400... The list goes on and on. As far as a shared database, just setup shared mail.

    But unfortunately, the Notes client does not. We still need to dick around with wine to access the corprorate Notes server. If anybody from IBM (who likes to show their committment to Linux...) is listening: are there any plans for a native Linux Notes client? If so: when? If not: why not?

    --
    Say no to software patents.
    1. Re:No Notes on Linux by Anonymous Coward · · Score: 2, Interesting

      as i understand it, and i know i don't know much, it looks a lot like notes is going to be ditched in favor of web-ish access. i would guess that notes and the sametime client are probably going to get obsoleted at the same time...

      of course i have no real information, but that's how it looks. i don't think anyone believes notes is actually a good piece of work. i certainly hope not.

      [posting anonymously so as not to irritate my superiors]

    2. Re:No Notes on Linux by Anonymous Coward · · Score: 1, Insightful

      From what I've been told, there are currently no plans for a Linux Notes client. The reason is that there is just not enough money in the Linux desktop market right now to justify the expense and hassle of trying to port the Notes client (it would be hellish because of the GUI and for various other reasons). Most people at IBM who use Linux on the desktop use Notes under WINE. It works reasonably well for basic email and calendaring. Besides, as someone else said, the Web is a more practical cross-platform solution than a native GUI port anyway.

    3. Re:No Notes on Linux by twinpot · · Score: 1

      You don't need to run the notes client (although it does run reasonably well under wine - there are even some RPMs floating around with Notes/Wine pre-packaged). That's not to say a native client wouldn't be nice....

      You can access your mail (and other apps) via a standard web browser.

      You can access your mail with your favourite POP3 or IMAP client.

    4. Re:No Notes on Linux by BlueUnderwear · · Score: 2
      You can access your mail (and other apps) via a standard web browser.

      How exactly would do you do this? N.B. I'm not speaking about specially set up Web applications (these work quite well), but about just the general Notes databases.

      You can access your mail with your favourite POP3 or IMAP client.

      Pop and Imap need to be specifically enabled on the server, which is often not done. Moreover, Notes' pop and imap interfaces are rather stripped down versions, which didn't allow to move messages into folders, nor to delete them. Only reading is possible. For any maintainance, you still need to log in using a Notes client. At least, that was the case when I last checked (about a year ago).

      --
      Say no to software patents.
    5. Re:No Notes on Linux by Carpathius · · Score: 1

      > You can access your mail (and other apps) via a standard web browser.

      Yes and no. It has to be allowed by the Notes admins, and you don't get anywhere near the same functionality as you do with a true client. We have vacation databases set up which can't be accessed except with a notes client.

      > You can access your mail with your favourite POP3 or IMAP client.

      Again, only if set up to do so by the notes admins. They won't allow it here.

      And none of these emails mention that when you send an attachment to multiple users in Notes, the attachment gets recreated for each user and takes up space for each user on the Notes server.

      Never thought I'd like an email client less than I liked Exchange, but Notes wins that prize.

      Sean.

    6. Re:No Notes on Linux by twinpot · · Score: 1

      Obviously it is dependant upon whether or not you can get the system owners to load the relevant tasks (HTTP/POP3/IMAP). Running the HTTP task to allow lots of users access to apps or web based mail does load the server up somewhat, so you need to be careful.

      The POP and IMAP interfaces work quite well now.

    7. Re:No Notes on Linux by twinpot · · Score: 2, Informative
      And none of these emails mention that when you send an attachment to multiple users in Notes, the attachment gets recreated for each user and takes up space for each user on the Notes server


      Depends on how the mail side is set up. Single instance store solves this. BUT, few places run it, as the admin overhead is generally not worth it.


      Never thought I'd like an email client less than I liked Exchange, but Notes wins that prize.


      You're confusing the client (Notes) with a server (Exchange). You can run Outlook against a Domino mail server. The Domino mail server, which does have its quirks, is in my experience way more reliable than Exchange. Plus with Notes clients, mail born viruses are very unlikely.

    8. Re:No Notes on Linux by Anonymous Coward · · Score: 0

      Load the http task on the Domino server.

      Point your browser at http://your.domino.server/yourmail.nsf?OpenDatabas e

      The site formerly known as notes.net has some good documentation for this.

  84. Advanced Mail requires database capabilities by Anonymous Coward · · Score: 0

    The future is in storing emails into a database.

    What the GNU (or other open source) people need to do is establish an API, and then hooks for the mail app and/or database. With the right API it should fairly easy to use MySQL, PostGRES, Whatever as the database. The client could be anything from mutt to pegasus or even Outlook

    POP is nice.
    IMAP is nicer
    DbSQLMail_API will rule!

    Users will be able to easily catagorize mail, filter, archive, etc.

    The current problem is that each mail program (ie. Outlook and even Evolution) is reinventing the wheel again, and again. The programmers cannot see the forest for the trees.

    Look beyond the trees and you will see that the current ways are not viable for storing email for the rest of our lives.

    1. Re:Advanced Mail requires database capabilities by Anonymous Coward · · Score: 0

      you're pretty much wrong.

      please see the post by fejjie:

      http://slashdot.org/comments.pl?sid=33331&threshol d=0&commentsort=0&tid=130&mode=thread&cid=3600389

      I think that post has a lot of good points.

      btw, Evolution's backend, Camel, is exactly what you propose (except that it's not called DbSqlMail_API).

      the Evolution developers have not missed the forest for the trees, they've extended the well defined and proven mail formats so that they were extremely fast and efficient. Probably faster than any generic database could ever be for mail.

  85. Exchange 2000 and WebDAV by Anonymous Coward · · Score: 0

    This is kind of a little-known fact, but Exchange 2000 implements the IETF's WebDAV protocol, meaning you can access the whole database (Email, Personal Contacts, Calendar, shared folders, personal folders, etc.) from any standards-compliant WebDAV client library. And as to file formats, the files the server spits out are in standard formats: MIME, vCARD, iCAL, etc.

    Authentication and group contacts is LDAP, WebDAV uses HTTP auth. All standards-compliant.

    So you can write a fairly complete integration to Exchange using commonly available open source libraries and tools.

    It's not a bad mail server/groupware platform, really.

  86. just... by ComaVN · · Score: 1

    ask the FBI

    --
    Be wary of any facts that confirm your opinion.
  87. Usenet-style, with overview database. by strredwolf · · Score: 4, Interesting

    Plain and simple. Switch from mail to Usenet. Maildir-like structure, but with a .overview (XOVER) file to help out with indexing.

    Storage is another problem, though... but Usenet messages can be sidetracked a bit with the encoding.

    --

    --
    # Canmephians for a better Linux Kernel
    $Stalag99{"URL"}="http://stalag99.net";
  88. Easy solution by BlueUnderwear · · Score: 2
    And for some reason, it's always CEOs of important clients who use text-based MUAs while sitting behind an MS Exchange server. They call us up asking which URL to click on.

    Easy solution: Build a list of "VIP" users who will get a text-only version. Or who will get the text and the HTML version in 2 separate mails.

    --
    Say no to software patents.
    1. Re:Easy solution by e_n_d_o · · Score: 2

      That's not easy and it's not a solution.

      I have no idea on the specifics of the original problem, but in my experience every user does not complain about a problem. Depending on the system the mailing list is sourced from, adding a "prefers text/html mail option" could be non-trivial or just not possible, so as to require implementing a "parasite" database from scratch to keep track of such preferences, which would be quite difficult.

    2. Re:Easy solution by a_n_d_e_r_s · · Score: 2, Insightful

      Even more simple solution - just send everyone a text message.

      In it you can put a link to the html variant for those that want that - and put that variant on a webb server.

      Email should be text. Webb pages should be HTML.

      --
      Just saying it like it are.
    3. Re:Easy solution by Anonymous Coward · · Score: 0

      Send out only good old plain text then. You know it won't get munged up.

    4. Re:Easy solution by Anonymous Coward · · Score: 0
      Email should be text. Webb pages should be HTML.

      Why? Just because that's the way it was in the old days? Because HTML is for those flaky non-nerd types? Because the mail infrastructure has problems with HTML email?

      The first two reasons are just geek bigotry. The last one is a reason to update mail infrastructure.

    5. Re:Easy solution by ninewands · · Score: 2

      I agree completely. Nobody ever got a virus from merely opening a plain-text e-mail.

    6. Re:Easy solution by Anonymous Coward · · Score: 0
      Why? Just because that's the way it was in the old days? Because HTML is for those flaky non-nerd types? Because the mail infrastructure has problems with HTML email?

      Because when I read my email, I want two things:
      1. I want the text.
      2. I want it now.

      I do not want to wait, and wait, and wait for some images to come up. It's even more annoying when the images that do come up are just for decoration, and don't add to the content of the message. I don't want my email to entertain me. As long as it's readable, I don't care how it looks. I just want to read it, with as little delay and hassle as possible.
    7. Re:Easy solution by ahde · · Score: 2

      This is the correct solution.

      Mail clients can be built to "automatically" open a specified URL if you want to send it that way. While this might seem a potential security risk, its no more dangerous than the current featureset of some mail clients. This would reduce internet traffic (and server storage space) enormously. How many megabytes of spam are sent to every user? Alternatively, attachments could be "sent" the same way. Even using FTP (or SCP) to reduce the overhead of HTTP.

      The solution isn't to re-engineer the server, but the clients.

    8. Re:Easy solution by RustyTaco · · Score: 1

      Uh, yes they did. MyParty had no HTML, just text. - RustyTaco

    9. Re:Easy solution by Inthewire · · Score: 1

      You have just received the Amish virus. Because we don't have any computers, or programming experience, this virus works on the honor system. Please delete all the files from your hard drive and manually forward this virus to everyone on your mailing list. Thank you for your cooperation.

      --


      Writers imply. Readers infer.
  89. BeOS Style by Anonymous Coward · · Score: 0

    Messages as individual files, parse headers into XFS attributes, implement a few indexes on those attributes... Cool.

  90. Re: Am I missing something? by ryochiji · · Score: 1

    Actually, the mailbox does use maildirs. I specifically installed Courier-IMAP because of it's maildir support.

    However, as the poster above (or in between, whatever) points out, Courier-IMAP may have a nonstandard Maildir format...

  91. qmail by jabbo · · Score: 4, Informative
    This is just too easy.

    Life With Qmail

    Building a Linux Qmail Toaster

    Same thing, but with FreeBSD (more scalable, in my experience)

    have fun

    --
    Remember that what's inside of you doesn't matter because nobody can see it.
  92. What's Right with Maildirs by Ekman · · Score: 1
    People keep dismissing Maildirs because of an erroneous notion that performance (speed and/or inode usage) is somehow the most important consideration. It isn't. Let's look at what the Maildir format gives you.

    Maildirs are:

    • Crash proof: an interrupted delivery cannot cause folder-wide corruption or the delivery of an incomplete message.
    • Lockless: all Maildir operations (deliver, delete, read, etc) can be performed simultaneously by multiple processes on multiple machines without the need for any sort of file locking.
    What does this mean? Reliability. That's why you use Maildirs. It may be slightly slower than some other formats (although I've never noticed a difference) and it certainly consumes more inodes. But it's way more reliable. You never have to worry about someone's mail program crashing and leaving the mail folder in an inconsistent state. Maildirs don't have an inconsistent state. And when you're delivering over NFS you don't have to worry about whether or not file locking is going to work right. Maildir's don't need locking.

    In email, reliability is everything. People may grumble a bit if they think their email isn't arriving fast enough. No big deal. But there is nothing more terrifying than a user with corrupted or (gasp) missing email. While using Maildirs won't solve all of your email problems, they are definitely a step in the right direction.

  93. Exchange, Notes, GroupWise, etc... by deviator · · Score: 1

    Exchange is actually a decent way to store e-mail on a server. But if you're gonna look at PC-based groupware solutions, DON'T use Exchange because it's loaded with holes. Its monolithic, proprietary JET-based data format is prone to corruption (I've seen this happen several times. :) They're trying to get it to work on SQL server, but I don't like MS SQL server that much, either.

    I don't have a lot of experience with Lotus Notes (though I hear it's good... :) but I can tell you GroupWise solved this probelm (on UNIX, about ten years ago, when it was WordPerfect Office) with a proprietary database broken into different types of individual files. GroupWise these days consists of a few important files for each user:

    1) a smallish userxyzy.db (where xyzy is the unique user identifier, so you can change their e-mail address the items aren't duplicated; the pointers in the userxyzy.db files are updated to point to the shared items.

    3) an unlimited number of special-purpose directories (FD01...FDXX) that hold items that are bigger than a certain size (I think it's 4k or 8k?)

    All of the database files are encrypted & compressed (algorithms licensed from Stac). The connection between the clients & server is encrypted & compressed. You can also use POP/IMAP (+ POP+SSL/IMAP+SSL) to access a GroupWise post office (and a web-based interface written in java servlets)). But I'm drifting off the topic...

    anyhow, I always thought this setup was a really nice, well thought-out way to maximize performance for a large mail system without wasting lots of space (or inodes :) (Did I mention the whole database gets constantly reindexed, so you can find anything in seconds? Exchange does not do this on the back end without third-party software. Of course, it has no document management, either... but I digress. :)

    Perhaps some of this info could be adapted for a UNIX-based open-source e-mail solution? (of course it seems silly since GroupWise is already available for UNix :) I'm still waiting for an open-source package that does everything GroupWise does... I think it'll be a while though. :(

    1. Re:Exchange, Notes, GroupWise, etc... by deviator · · Score: 1

      umm... I don't know why it chopped off some vital info, but here it is again:

      1) userxyz.db - headers & stuff.
      2) msgxx.db (30 per post office) - shared "big" databases that hold message-note-task-document bodies. this makes it easy to share folders... no duplicated data (except for the pointers in the userxyz.db files)
      3) FD00... directories that hold "big" messages and attachments beyond the msgxx.db limit (so those databases don't get too big)

    2. Re:Exchange, Notes, GroupWise, etc... by deviator · · Score: 1

      d'oh - I should also note that MOST people only use one msgXX.db file - they're designated by the db engine when the mailbox is created. But sometimes multiple msgXX.db files get traversed if the user uses shared folders.

      It's a cool system anyhow; there are lots of notes about the database structure & message flow at www.novell.com/documentation - I think it's fascinating 'cause it is well-designed.

  94. Thanks, I'll have that cookie by Anonymous Coward · · Score: 0

    ...as soon as you've visited MeetingMaker's web site.

    Real-time scheduling, planning, organising. Scalable, cross-platform, web-enabled.

    chomp, chomp, ...

    1. Re:Thanks, I'll have that cookie by mcg1969 · · Score: 1

      ...as soon as you've visited MeetingMaker [meetingmaker.com]'s web site. Real-time scheduling, planning, organising. Scalable, cross-platform, web-enabled.

      As its name implies, limited to scheduling only (both people and resources, though). No less a massive pain in the butt to work with, either. But hey, at least it works for what it was designed for.

  95. Stays up for *days* before losing mail and reboot by SgtChaireBourne · · Score: 2, Informative
    Single instance storage is only good for intranets except that there there one should use file sharing to collaborate on documents rather than sending virii^H^H^H^H^Hattachments.
    Alen, your experiences with MS-Exchanges are so many worlds of difference away from mine that I nearly suspect that you've written a troll. Rebooting a mission critical service like a mail server during working hours is unsatisfactory. If other mission critical services like file and print sharing are also disabled during that reboot, then it's time to look for a more robust product.

    I have worked closely with three shops in the previous three years that used Microsoft Exchange. Each had at least 3 full time equivalents of MSCEs to babysit their Exchange servers, probably more if you count overtime. This is not counting the occasional high priced consultant. None of these shops could keep Exchange running for a full week. Nor could they keep it from losing mail (When I measured it was 10-15%, ). Nor could they get it to communicate well with other mail servers. Nor could they keep it from getting wiped out once every three months by MSTDs (especially worms and virii).

    In contrast, Novell servers run years at a time unattended (nearly every consultant has at least two such anecdotes of their own) and many UNIX-based MTA's need only a few hours of non-hardware maintenance per year, when set up tight. I guess running MS-Exchange is a new status thing to flaunt resources, like having a tuburcular wife was during the Vicrotian era.

    Needless to say the managment's support was/is a real PITA for anyone doing work via e-mail with people outside of the house's MS-Intranet. In one case it even delayed a publishing a book by several weeks. In house use of Exchange was fine -- when it was down for you, it was down for everyone else so it was a nice time out and a chance to go have coffee with the others. When put to the test, file sharing couldn't, wouldn't, didn't function often enough to be useful either. For file sharing, those without access to a Novell or Unix file server, used sneaker net or mailed attachments. Yes, Exchange does look good in the 4-color glossy marketing brochure, but that's were it ends and reality sets in.

    Puh.

    Back to mail databases. RFC 2822, Internet Message Format specifies the general structure of a message. This can be over simplified as a header with its standard and non-standard fields and one or more message bodies. RFC2049 specifies multipart bodies. These structures do seem very well suited to a relational database.

    --
    Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
  96. Content-type: multipart/mailbox by Anonymous Coward · · Score: 0

    Content-type: multipart/folder

    -- Pretty easy.

  97. Problems & Ideas by Twylite · · Score: 4, Informative

    Oh dear, another file format debate. I'm glad there was a library suggestion though ... that allows us to change our mind when we do it wrong the first time ;)

    First, you need to consider the possibilitiy of moving the mailbox. To a different computer, or a different platform. This means it must be easy to access in any environment, and the tools must be portable.

    This doesn't completely rule out a database solution (like mySQL), but it certainly makes it less-than-ideal.

    Second, having used many mailers which separate out attachments ... Please Don't Do It! You can't easily move your mailbox, because there are a host of associated attachment files. There is ALWAYS a synchronisation issue between attachments and messages, so you end up scanning and cleaning out the attachment folder every so often to prevent dead files from accumulating.

    Compression is nifty, but isn't really important. Disk space is seldom a concern these days, and the really big stuff (binaries) is often already compressed or don't compress well.

    The real issue with most mailbox formats is how do you deal with the problem of removing dead space from the mailbox? Some program just leave it there until you hit "compact", which is wasteful and confuses users. Others rewrite the entire mailbox every time, which causes the software to "hang" for a while on shutdown.

    The best suggestion I can come up with off the top of my head is this: One file per mailbox folder, and that file is its own filesystem. The "root node" contains a group of summaries (from, to, subject, date, etc) and node links. Other nodes are chained to contain the message and attachments.

    Handling attachments: attachments are separated out and stored as binary in the mailbox. This conserves space but keeps the attachment with the message.

    Compacting: is avoided. When a mail is deleted, it is merely flagged in the root node (index). So each mailbox has its own deleted items folder, so to speak. When the deleted items folder is empties, the index is rewritten and nodes freed - every node not at the end of the file is overwritten with a node from the end of the file (and appropriate reindexing done), so the file is automatically compacted.

    Ideally the file needs some sort of transation logging area to ensure its integrity at all times.

    Shared access to files is best handled through a library or a service. File locking is notoriously prone to bugs and security issues, and avoiding multiple implementations in different mail clients would be beneficial.

    --
    i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
    1. Re:Problems & Ideas by mla_anderson · · Score: 1
      First, you need to consider the possibilitiy of moving the mailbox. To a different computer, or a different platform. This means it must be easy to access in any environment, and the tools must be portable.
      This doesn't completely rule out a database solution (like mySQL), but it certainly makes it less-than-ideal.

      Actually, moving a database from one system to another can be very easy.

      I use IMAP whenever I can, and when I can't I use fetchmail to pull my mail to an IMAP server I own. IMAP is really nice but the Linux implementations I've seen have a great deal of overhead per user. From what I understand a lot of that overhead is message accessing, most server programs don't do it very well.

      We have methods for accessing large amounts of data and pulling out the relevant portions: a data base. A well written server (ending up as a db client) could be made to work with most database servers. If the data base structure is properly designed the data would be easily transportable from one system to another as well.

      Just my $0.02 ($0.03 Canadian)

      --
      Sig is on vacation
    2. Re:Problems & Ideas by groomed · · Score: 1

      Well, it's always possible to make things more complicated. The advantage of keeping things simple is that it turns sed, awk, grep, cat and all the others into potential mail processing tools.

  98. /var/spool/mail/martin is my friend by martinflack · · Score: 3, Informative

    For Pete's sake, leave mail alone. If I can't fix it in less than 20 minutes with grep and perl, I don't want to know about it.

    Divide mail into 20-30 logical "folders" (files), use procmail to help sort/scan/unspam, do IMAP to get to it from Win machines, archive mail out of your working files once it gets a year old, and you're all set. Strive to keep your inbox empty (you need a proper "action" orientation with your mail folders to accommodate this). No big deal.

  99. No Lotus Notes, Yes Database by Anonymous Coward · · Score: 0

    Lotus Notes used the Lotus database scheme to store email, among other things. I never knew anybody who really made full use of the groupware functionality of Notes, though (I'm sure somebody somwhere did). I would hope, based on my personal experience, that whatever the nextgen email tools are, they are NOT like Notes.

    Personally, I would love to see a database message store that would be compatible with IMAP access, especially with emails coming to my phone, my handheld, my laptop, my main machine, and various other places. With replication and virtual domains support.

    The mbox+index scheme seems to be a fairly decent second choice, but when someone says "but I like to be able to use normal text tools like grep" my answer would be "why don't you use other normal tools like SQL queries to do the same thing, and then some!"

  100. MH: been there, done that by ziegast · · Score: 1
    Am I the only old fart left that uses MH from the command line?

    If I recieved Slashdot Poll postings as e-mail messages, I might use the following to find recent common whining about lame slashdot poll choices (and faster too!):


    cd `mhpath +slashdot`
    pick -subject CowboyNeal | xargs egrep -i 'this poll sucks' | sed -e 's/:.*//' | uniq | xargs show


    Go to the link above, or look for MH or NMH in rpmfind.net or your local ports tree.

    -ez
    1. Re:MH: been there, done that by a_n_d_e_r_s · · Score: 1

      I prefer elm from the command line.

      I've never seen a email program that uses as few keystrokes to read maul as elm.

      You cant get to fewer keystrokes.

      All of this makes elm fast and easy to use since
      there are so few commands to learn.

      --
      Just saying it like it are.
    2. Re:MH: been there, done that by Lozzer · · Score: 1

      Use a mouse, et voilà no key strokes at all

      ** ducks **

      --
      Special Relativity: The person in the other queue thinks yours is moving faster.
    3. Re:MH: been there, done that by mce · · Score: 1

      From the command line, maybe. But I do use a graphical MH wraper and MH style folders are my number 1 requirement whenever I go out shopping for an e-mail client. As mentioned before, the ability to apply standard UNNIX commands to folders and messages is priceless.

  101. Databases and attachments by RPG+Advocate · · Score: 1
    In reply to Twylite's comment:
    Second, having used many mailers which separate out attachments ... Please Don't Do It! You can't easily move your mailbox, because there are a host of associated attachment files. There is ALWAYS a synchronisation issue between attachments and messages, so you end up scanning and cleaning out the attachment folder every so often to prevent dead files from accumulating.
    Certainly, but in a database solution, you could have the base64 or UU for the attachment simply be a field in the database. I would hope that any database solution would have the option to dump the database to an mbox-compatible format, and conversely to import any mbox file into the database. That way, you get the faster accesses, scalability, and organization a database provides, while also maintaining the flexibility of being able to port stuff. The only time you would experience "hanging" is during the conversion process, which would (hopefully) occur less often than a read of a large mbox file!
  102. Re:The Reiser guys have some ideas. - I AGREE! by Tracy+Reed · · Score: 1

    MP3.com has many terrabytes of disk on reiserfs for over 18 months with great success. Again, hard data that reiserfs works. And it works well!

  103. On Trolls by Anonymous Coward · · Score: 0

    While trolling takes on many forms, many of them merely being nuisances (crapflooding, goat links, page widening, etc) you'll find the vast majority of trolling occurring in posts similar to posts such as your original. On Slashdot, well-thought out and reasoned posts have become indistinguishable from trolls. This is made all the more obvious by the dimness of the moderators who would mod you down -1 in a heartbeat if not for the length of your post (as if that were the measure of an argument).

    I too am a troll, much along the lines as you (though perhaps you don't realize yourself as such yet). I used to post, IMO, well argued posts and was consistently modded down by the Slashdot groupthink moderators. This is not to say that I didn't eventually hit the karma cap, but that along the way it was painfully obvious that my pro-Windows, anti-GPL opinion was not tolerated here.

    Upon the realization of that I had my epiphany that pearls are not to be given to swine (this seems to be the same satori experience you are having now). Pigs deserve slop, and now that is all they get from me.

    In any case, I'm not one of the nuisance trolls as I listed above, but one of the provocative trolls such as yourself (please do not take offense, this is not an insult as it may first appear). The Slashdot feeding frenzy that follows any post that attempts to support Microsoft or attack Linux or posit Creationism is a wondrous thing to watch, much like a thunderstorm or a supernova. The one difference is that you, the troll, have total control over the experience, much like a god who views his masterpiece from another dimension.

    This is not to say that Slashdot is void of intellectual content. On the contrary, you'll find quite a bit of interesting information in the Science and Developer sections. You will find *no* intellectual content in the YRO section.

    It's a travesty that a good idea like Slashdot, allowing users to create their own content, has succumbed to the mindless pursuit of mental masturbation of FSF zealots.

    So while this may be the end of your Slashdot infancy, I think you will find your maturation into a Slashdot provocateur quite fulfilling and fun. Isn't that why you joined the technology revolution in the first place?

  104. Plain Files! by Anonymous Coward · · Score: 0

    As I would like to have access to my mails in 10+ years, I vote for the usage of plain files, no other propriete format that has to be converted.

    If access speed is problem, cache some of the files' contents in indices.

    Disk space is surely no problem...

  105. Use the filesystem as an interface. by imeller · · Score: 1

    Personally, I like the idea of Maildir, every e-mail is a file. It's very easy for apps to work with Maildir. Also, everyone above is talking about databases, which probably is a fast and nice solution. But communication with the database probably will be kind of messy. Why not just 'map' the mailbox on the filesystem somewhere and use whatever underlying system (database, plain files) you want for the actual storage. Exactly what plan9 is doing. With that, every mail could also easily be split up in headers, body, attachments etc. (I seem to remember they already do that.)

    This way, 'transparent compression' can be done, the 'file format' is very easy to use, people can still grep their mail (even copy attachments using normal commands), flags could also be mapped on the filesystem.
    For delivering mail a probably similar method can be used, perhaps like Maildir, which does fine over NFS.

    Ah, it's the entire plan9 'map it on the filesystem'-idea that just seems great to me, why don't we have it in the average BSD or linux?

  106. Mail is a protocol! by Kynde · · Score: 2

    There is no need to toy around with the mail as-is. It's a little like IP packet, doesnt matter what's in it, but the essential thing is that it has a destination and source addresses and it travels in the net. No techinal solution will _ever_ overcome the fallacies with current emails, becuase the current email is as unrestrictive as IP packets.

    Take spam for example. The problem will always be present theoretically when you want to receive mail also from people you've never received before and/or havent given your public crypto key for example. Another side of the aspect is when _we_ allow people to send source address spoofed spam.

    The problems with email are people, as with almost every damn problem in the IT sector, it's always us. People bend towards stricker rulesets, to avoid abuse, which in many cases is not the way to go, let alone the solution to the root of the problem. Somebody here in ./ said it really well once, he said that the best solution would be "Cheap plasma handguns and justice for us all"

    --
    1 Earth is warming, 2 It's us, 3 it's royally bad, 4 we need to take action NOW
  107. CYRUS! by Anonymous Coward · · Score: 0

    If you're looking for a great mailserver for 1000 - 500 000 people, try Cyrus Imapd from CMU. It's fast, secure and stable.

  108. Use a real database by znerd · · Score: 1

    Why not use a real database for this, like MySQL? The advantages are obvious. You can search your email using plain SQL statements. Storage is handled by the database implementation, so you don't have to care about that. Performance can be improved in the standard ways, by having indexes and perhaps lookup tables and/or columns.
    Also, you can have multiple MUA's use the same mailbox since databases normally handle concurrency already.

  109. Re:Maildirs [wanders OT] by _Knots · · Score: 1

    Eeeeh?

    My System Folder is no more than 10 directories deep (just a cursory inspection, but I don't think I missed anything), with Extensions and Preferences having HUGE breadth. So why, again, would HFS be designed for depth based on the System Folder?

    Not that it isn't, just why did they do it that way, since the System Folder explanation doesn't make sense to me.

    -knots

    --
    Anarchy$ dd if=/dev/random of=~/.signature bs=120 count=1
  110. Disk is cheap by oddityfds · · Score: 1
    Seriously, disk is cheap. So are inodes. Use maildir and get on with your life. I've never used anything else, and I have about two gigs of mail in my home directory. I might switch to IMAP soon, and will then use our Cyrus IMAPd which also (as noted in the article) uses maildir.

    Don't store your mail on an NFS server. NFS is bad (use AFS instead, but not for mail either), and it's especially bad for storing mail. Use IMAP instead. IMAP is a secure file system designed for storing mail, and that's what your really need. Also, all relevant mail clients supports IMAP, and for those that don't, Cyrus IMAPd contains a POPd as well.

  111. Looking for a problem that doesn't exist by NotZed · · Score: 5, Interesting

    I lost the plot half way through this, but here's some food for thought anyway. Now I should get back to work ...

    Z

    I think that this is looking for the solution to a problem that doesn't really exist in the first place. Although I guess it depends somewhat on what you define as 'Unix mail'.

    I'm a developer on Evolution, and primarily on Camel, evolution's email library. I'm not sure i'd rave about it (although I think Camel is a mostly beautiful piece of code ;), but it works reasonably well, and we've had a chance to try and deal with users with lots of email.

    What IS 'Unix mail'?

    I would define Unix mail as mail (rfc822 format) downloaded and stored locally on a per-user basis. IMAP, Exchange, and other remote protocols are very different beasts.

    Why are DBMS's not suitable for 'Unix mail'?

    Once you have a remote server you have to do things differently than if you have local access. Using a DBMS, and having a trained administrator to manage it are practical considerations, as are the benefits you might get from this configuration. These solutions dont really make sense for standalone users. They shouldn't need to install and manage databases, complex backup prodedures, and so forth, just to read their email.

    i.e. rdbms's are:
    hard to setup
    hard to maintain
    another major point of failure

    If however, I was to design a multi-user groupware server, then a DBMS would come into serious consideration - at the backend at least. It allows you do to things like easily consolidate authentication outside of the operating system (the idea of having a 'shell account' to access mail is somewhat outdated), it allows you to save space by storing common data, like attachments and email content in a single place, and redirecting it to multiple recipients (which is a common practice within organisations). It may be practical to use a mixture, a RDBMS to store textual parts or indices to data stored in a more conventional filesystem.

    But even with a RDBMS backend, I would personally probably still stick to IMAP to serve it to actual clients. The IMAP protocol is a bit heavy, but not really that bad, and it serves email, I dont think there's really any need to reinvent the wheel here.

    So ...

    If you define unix mail as I have, and separate it from a *mail server*, then you rule out full blown RDBMS's, and are left with:

    single file database
    multiple file database

    I'm not even going to mention XML because I think it is the single most stupid idea anyone's come up with. It is completely unsuitable for this purpose.

    And well, there's really no reason not to use MIME to store the messages. MIME already does everything you can possibly do with email (since, uh, it is how the email *will* be sent), any client will already have to deal with it, and mime decoding is for the most part really quite simple and fast anyway. Translating the mime format into some other storage format really doesn't make sense.

    single file databases

    mbox

    Mbox is a single file database. Its just that everyone that uses it generally writes their own access code. This is where problems with 'locking' come about, either because the underlying filesystem doesn't support it properly (e.g. some nfs implementations), or everyones clients don't use the same locking mechanism. This really just an implementation issue anyway. There would be nothing to stop someone writing a common 'mbox.db' library that stored everything in completely compatible mbox files, which took all the work out of it, and then you'd have an mbox DBMS ...

    mbox scales ok, without any caching of header information it handles in the order of 2K messages in an interactive timescale, and quite a lot more if you dont mind some short delays (i.e. in the order of the time it takes mozilla to start up).

    Appending and reading is quick, and reliable - assuming the filesystem works, which is a pretty safe assumption to make. This is assuming the mailbox is first summarised at first opening, otherwise looking up messages can be slow, because you have to scan the whole file first.

    The only operation that is slow is expunging messages, and at worst case isn't really any slower than copying a whole file across to another file.

    The only other issue is agreement on the 'standard' for what constitutes an mbox file. For example. Solaris uses and honours the 'Content-Length' header, and thus it does not translate any lines beggining with "From " into the conventional ">From ". Some mail clients translate "(>*)From " into ">\1From " (using sed syntax) and visa versa, others do not. There is no standard, just some conventions, some of which aren't easy to determine either.

    Because you need to keep the whole index in memory at once, this can become expensive, but you could use a secondary database as an index into the real file. But eventually you hit a point where the cost of expunging does get too expensive. You could just archive the mail regularly, or use a format like maildir instead.

    gdbm/db/etc

    db files wrap the single file in a common api that handles all of the locking issues and access issues for you. Some have different features, e.g. querying capability, logging and transactions, etc.

    We've never tried to use db for this purpose, more just because we didn't think it was worth it. All you really get with a minimal implementation is the ability to store and retrieve a blob of data using a single key. Writing is fairly slow because the database has to manage more details for you (locking, allocating blocks, unlocking, etc). You could use multiple db files as indices to perform multiple-key searches, but they are quite slow at creating them (we tried using db for the content indices and it was way too slow).

    i.e. even if you store the data in a db file, which gives you a slight benefit of inbuilt referential integrity, you still need to provide additional indices to actually be able to use it in any useful way. Evolution suffers this problem with the addressbook which stores vCards in db records.

    Most db libraries (all?) also dont provide any mechanism to stream data. You either get the whole lot into memory, or you get none of it. So for large messages you're limited by memory (well, evolution is anyway, but it doesn't have to be). Yes, memory is cheap, but it is still a consideration, and it would certainly rule out a simple database in a multi-user environment.

    db files are also slower than native files, especially for large objects. You're mapping an arbitrarily sized chunk of data to some 'database blocks', which are then stored in an arbitrarily sized 'database file' which the operating system is then mapping to its 'filesystem blocks'.

    multifile solutions

    Well I guess this comes down to mh and maildir. mh isn't really suitable for anything, because of its just plain bad design and lack of defined semantics. There's no way to guarantee anything about its operation.

    maildir - i like. It moves the scourge of trying to implement a reliably, scalable, multiple access database almost entirely into the operating system layer. Operating systems already do this very well - they manage hundreds of thousands of files randomly written across your disks, without skipping a beat.

    No operation requires more than a single message size of data, and the operating system already indexes the message, via its filename. Sure, ext2 doesn't do such a swell job with long directories, but that can be addressed (and the same problem can be addressed on just about any platform). For 'free' you get concurrent multiple-reader, multiple-writer database access, without any of the considerable problems you have to solve to implement it otherwise.

    The maildir 'protocol' is simple, reliable, and it works.

    Again, it can easily be augmented by a client with additional indices, but for things like delivery agents who dont care about existing email, they dont need to suffer that overhead at all.

    Some other comments specific to the question:

    Compression. Personally I dont see the point. But a maildir-like structure would fit well with compression. Flat files would be the worst (e.g. mbox), and block-file formats (like db files) would also work well with compression. The good thing about email is it is 'write once', you don't edit or change the messages in the mailbox.

    External attachments. I guess its possible, but again, it isn't really worth it in most cases. Parsing MIME is *fast*. It is much faster than parsing xml, and besides, people rarely look at an email more than once or twice. There isn't much use going off and storing the attachment in a high-performance reading format if it isn't going to be accessed often, and it just places a greater burden on your server.

    base64, etc. Well, its entirely possible simply to store the messages as 'binary' format. Assuming the boundary markers are checked properly, Camel can work with binary encoded mail messages, and probably at least some other mail clients can too. There are some problems with some of the extremely broken openpgp/pgp/mime specs which suddenly say that mail transports aren't allowed to alter the *transport* encodings of some parts, but well, these specs are just braindead, and can be worked around.

    Security model. Well, talking about Unix mail, not server mail, the filesystem is adequate.

    Shared folders - is not an issue for unix mail.

    Unicode. Well you can write unicode filenames to most unix filesystems, evne if 'ls' doesn't show it right.

    MTA. Nothing could be simpler or safer than maildir as a delivery format. The mta doesn't have to care about any client-side indices, the mua will simply update them when it incorporates the new messages, etc.

    Writing libmailstore? Mate, its called Camel, and its already written. Camel already does mbox, maildir, mh, it can read spool files directly (it doesn't create a summary file or build any indexes), it can talk imap, pop, and partial support for nntp. If someone gave me a decent RDBMS table schema and a carton of pale, I could probably write a MySQL backend in a couple of days, well, assuming the MySQL api is mt-safe.

    Finally, some comments on evolution.

    Evolution isn't reinventing any wheel. We use standard mbox format (if such a thing really exists anyway). We use standard maildir format, etc. Yes we may optionally create body indices, and we do usually create on-disk binary/compressed 'summaries' of the data, but these are really just on-disk caches of in-memory data structures, rather than anything to do with the mail storage format.

    We put mail in another location, but everyone else has done that too, elm:Mail, pine:mail (or is it the other way around?), netscape:ns_mail, etc. At least we now offer the option to read most of this 'in place'.

    The main problems evolution has with scalability is:

    indexing.

    Indexing is quite costly. The original index code was written somewhat like a database, it handled all internal data structures, used blocks of data, etc. It was slow, it scaled poorly. Definetly some of the algorithm choices and the implementation wasn't that hot, but it shows that such a solution isn't as simple as at first thought. Using libdb was impossibly slow (like several orders of magnitude slower).

    The new stuff is a lot better, but can still use a lot of resources while indexing, and copies the whole file (well 2 files) across when performing expunges, but they are only performed occasionally, and the indices are smaller than the original indices, so in practice it scales much much better.

    the summaries

    The summaries are indices of a sort anyway. They are an in-memory tree of a subset of the information on each message. Enough information to display a list of messages, and perform vfoldering operations. Even though we do some tricks, like sharing common strings, the summary can get very large.

    But, its a tradeoff I thought was worth it, rather than using on-disk summaries. The api's are much easier to use, and the problem gets pushed to the user - if they want to have folders with 100K messages, they should expect it to use a bit of memory. The on-disk size of the summaries is very small too, although I guess it could be made even smaller if we consolidated common strings.

    per-message memory use

    Currently, a lot of data gets copied around in memory. Every time you read a message, at least 1 whole copy of the (decoded) message is in memory at a given time (yes, including attachments). For IMAP this can get even worse (2-3 copies of a given attachment at a given time), because it doesn't stream enough. Most of this could use a disk-backing without changing any api's though, and well, i'm rewriting IMAP.

    Wrapping up ...

    And yeah, we're talking 100K messages here, not 1400. My 500Mhz celeron laptop has about 35K messages stored over about 10 mbox files, and it starts up in under 10 seconds, and that includes all of the bonobo/activation overhead (which is very significant). Yeah it uses a bit of memory, but memory is cheap on a personal workstation.

    In short. The current mailbox formats we have suffice for "Unix mail". Add some archiving abilities to your mail client (even RDBMS backed mail clients need archiving), and you'll never have to delete a message again, and still get work done and still use mbox.

    If you want to talk about writing a server - well who cares, you can do whatever you want, because everyone has to go through your interface anyway (you DO NOT want clients accessing data under you, thats what DBMS's are all about in the first place ... and you dont want 1-tier applications), so it doesn't matter what format you use under the belt - you can choose the format which best suits what you're trying to do.

    It seems some people think using 1-tier applications (client code talking directly to a database) are the way to go for multi-user environments. They're not, they dont scale and are impossible to maintain. Nobody writes any real software like that anymore, unless you're writing dodgey vb toy apps.

    --
    _ // `Thinking is an exercise to which all too few brains
    \\/ are accustomed' - First Lensman
    1. Re:Looking for a problem that doesn't exist by CaraCalla · · Score: 1
      Thanks alot for your reply, I'm the guy responsible for this post.

      Some thoughts:

      The beauty of 'Unix Mail' is that it works from large-scale Pop-Toasters down to minimalist clients. However it has some shortcomings. The same is true for Maildirs.

      Think about a format which has similar properties of scalability and portability but adding 2 features:

      • compression
      • a standard way to add features

      I thought about db* (perhaps sleepycat) for a particular reason: It is simple. It can be made portable. And it's already their on a fair amount of systems.

      Sendmail and other MTAs could deliver mail to it, MUAs could access it simultainesly, MUAs could store its indexes in it. Imap-Servers could store their UIDs and flags in it. Evolution could share flags and indexes with Imap-Servers.

      Think about writing a lib which overrides some libc-calls (open etc.). Insert it via LD_PRELOAD or at build-time and expose a Maildir or mbox-view of your db-file to leagacy software (think qmail). I don't know if it was practical, it would be geekish though :-)

      We put mail in another location, but everyone else has done that too, elm:Mail, pine:mail (or is it the other way around?), netscape:ns_mail, etc. At least we now offer the option to read most of this 'in place'.

      This last sentence sounds like you also think it stinks that clients do that. There must be a better way. Currently I use: Courier Imap, accessing a Maildir++ (Courier-speak) with subfolders, for remote reading. Pine using a custom built c-client with Maildir-support and overridden subfolder-paths (à la Maildir/.Sent) for console-reading. A .qmail file for filtering/sorting incoming mails. So in a way I already have kind of an unified mailstore. This is however a nightmare to set up. I don't even know wether it is possible to set that up using central configuration files. There must still be a better way.

      You are of course right, the problem of the missing grand-unified-mailstore is of course not essential, it does exist however. Is it worth fixing?

      Is Camel suited for use in MTAs? Can you link Sendmail, Postfix, Exim, qmail, etc. against it? What about POP/IMAP servers?

      Unfortionatly I haven't yet tried out Evolution, I will do that soon though.

      Edgar

  112. Thoughts I had regarding distributed mailboxes by riflemann · · Score: 1

    Has anyone actually implemented a distributed email system based
    on NNTP? Not like the simple email to nntp gateways, but something
    far more featureful. This would work as follows:

    Every system that you would like to have full email access from has
    a local NNTP server. All these systems are hooked up using
    mostly standard NNTP configurations and protocols. Only relatively
    minor modifications would be need to support authentication and
    the other features.
    Your domain(s) are configured to use all of these (net-reachable)
    systems as MX hosts. And each mailbox/mailspool is setup as a
    separate 'newsgroup', allowing for hierarchial mailboxes. Presumably
    your top level hierarchies are local usernames, and the server
    only allows authenticated users access to their 'mailbox'(hierarchy).
    Group mailboxes would be easy to implement though.
    Something like this:

    bb.inbox
    bb.inbox.lists.slug
    bb.sent-mail
    bb. sent-mail.lists.slug
    [..]
    public.somegroup.inbox

    etc

    Whenever a mail comes into one of the MX hosts, it is filtered
    out, using procmail or something, and dropped into the appropriate
    newsgroup. Alternatively have only the primary MX handle this,
    but then you cannot get any new mail if this box is unreachable.

    The magic of NNTP then comes into play, distributing that
    email across all of the hosts in the NNTP group.

    You then read your email using any nntp capable client. To delete
    messages, your client sends a usenet 'cancel' type message to the
    local server, and this gets distributed around the network.

    But to start with, it'd be simple to create a wrapper that
    gave an IMAP interface, so (almost) any mail client
    will work. But that would limit you to read and delete.
    Having sent items and saving items probably isnt supported in
    IMAP.
    Not a bad start though, a "full" client would be able to
    do the works, such as automatically moving messages across
    "folders", saving sent messages, etc.

    Sending an email sends via normal SMTP protocols, and optionally
    puts a message out via NNTP to update the sent-messages groups.

    This is incredibly useful especially with intermittently connected
    hosts like laptops. You can read/send/delete messages there, and
    when it gets put on line again, it will send the cancel messages,
    sent-messages and other things via the NNTP net to all other
    hosts, ensuring a consistent system across all hosts.

    What would be the limitations/weaknesses/etc that would make
    this a bad idea?

  113. Use standards by dybdahl · · Score: 2, Interesting

    There is absolutely no reason to abandon the standard e-mail file format, including uuencode for file formats. Doing that, you would end up with a file format that depends on certain versions of the e-mail file format to work optimally. If you want to reduce harddisk space, zip it like OpenOffice.org does.

    E-mails are documents. Documents belong into the home directory, and so do e-mails. If you want to do something new, you should use the harddisk folders as e-mail storage, so that e-mails, spreadsheets and documents mix. This probably requires inventing a new ".e-mail" file format so that e-mails can be properly recognized and indexed.

    Storing one e-mail in one file is not a problem as long as you index the filenames properly, for which you can use gdbm.

    Dybdahl.

    1. Re:Use standards by Anonymous Coward · · Score: 0

      Who still UUENCODEs files? That standard was pretty much abandoned years ago in favor of MIME encoding.

  114. Back up your critique with some numbers please! by Jack+Hughes · · Score: 2, Insightful
    It would be interesting to see some real measurements. For example, disk storage and access times for various functions of the different file formats (you could access different messages in a large "mailbox" randomly, or search subjects and bodies and see how long things took).

    I don't think things are that bad - for example, Cyrus with its indexes works pretty well and large (20,000+) folders. And things like searches are pretty fast with a client like evolution that does a lot of cacheing.

    I would take the simple structure of Cyrus over the easy to break "database" files of Exchange server any day.

  115. mbox.funkified by yem · · Score: 2, Interesting

    This is all very interesting because I'm slowly writing an IMAP server at the moment..

    But here's the setup I'm currently using:

    Inbox:
    /var/mail/$USER
    Subfolders
    /var/mail/$USER-folders/$FOLDER/.messages

    Eg:

    /var/mail/
    |-- root
    |-- fred
    `-- fred-folders
    |-- 1ZB
    | `-- .messages
    |-- Friends
    | `-- .messages
    |-- Games
    | |-- .messages
    | |-- Rune-Beta
    | | `-- .messages
    | `-- Tribes
    | `-- .messages
    `-- Mailing Lists
    |-- .messages
    |-- EFNZ chat
    | `-- .messages
    `-- Hard News
    `-- .messages

    I started with uw-imap but I want to store messages and subfolders together. Plain uw-imap doesn't do this and last time I checked, neither does Maildir. So I did a [kludgy, incomplete] mod and produced the above. Works for me :)

    Get the patch: http://home.y3m.net/uw-imap-2001a-nested-folders.p atch
    (diff against imap-2001a)

    In the server I'm working on you will be able to implement a relatively simple C++ API to do your own storage. So you can use Maildir, mbox, PostgreSQL, whatever. We'll see.

    flame away :P

    --
    No, I did not read the f***ing article!
  116. Some thoughts by ChrisJones · · Score: 3, Interesting

    There seem to be two discussions going on in the comments today, one about mail storage for an MUA and one for storing mail on servers.
    As far as the client end is concerned, from the point of view of writing an MUA, having an SQL backend is a complete godsend because you have to write virtually no IO code, you can put all the logic in the queries. However, there are some tricks you need to use to keep up the speed, most importantly to use two tables, one for metadata and one for the mails themselves. This keeps the speed up by keeping the metadata table small (maybe on a better RDBMS than MySQL this wouldn't make a difference, but I found that >10,000 mails all in a single table in MySQL got quite slow until I moved the metadata into a seperate table).
    The obvious downside of using a DB for client end storage is that you have to have a centreal DB server, or one on each client and you need to admin one more set of authentication/permission details, plus you can't move the mail very easily to other MUAs. IMO a much better solution would be to keep the use of SQL/RDBMS, but move the DB into the filesystem so you can just have a bunch of files with metadata stored in the fs. Need to make an mbox? "cat ~/mail/* >>/tmp/my_new_mbox".
    From the server point of view, many people have been mentioning Exchange/Domino etc. Personally I can't stand Exchange, I've had to admin it on several occasions and it's generally done everything it can to stop me from having an easy life (just thought I'd air my predjudice against Exchange in the spirit of fairness and honesty ;) I've never used OpenMail/Domino/Notes/whatever, but I guess they do roughly the same thing, which is a pretty good idea. However, these things all have the distinct disadvantage that they use propritary protocols and aren't particularly cheap. There's always IMAP, which many people really like, but I feel is too complex a protocol (compare with the infant levels of complexity in POP3).

    With a colleague of mine, I'm working on a set of POP3 extensions that give some IMAP like features, but is really designed to keep multiple mail clients in sync with each other by way of a transaction log. There are still some limitations, but I think I know what they are and how to fix them (e.g. not enough metadata can be associated with each mail yet). It adds about 6 or 7 commands to POP3 and currently lacks any decent client support, but I have written a fairly usable library and patch to gnu-pop3d for it. I've just submitted it as my University final year project, so I'll try and get the protocol description documentation online soon. In the mean time, if you're interested, it's on SourceForge

    --
    Chris "Ng" Jones
    cmsj@tenshu.net
    www.tenshu.net
  117. Didn't we solve this with NNTP? by speedenator · · Score: 3, Insightful

    So NNTP solved this IMHO a rather elegant way...

    You have directories corresponding to newsgroups or mail folders or whatnot. i.e. alt.swedish.chef.bork.bork.bork is really alt/swedish/chef/bork/bork/bork

    Articles are numeric, i.e. \d+ for Perl types. The raw message is stored in each file.

    In each directory, there's a file called .overview, which is just the summary information for all the files.

    Thus, you can have zillions of small files, and happily grep and copy them to your heart's content. But you never do a 'ls' on a huge directory, you always just look through the .overview file. Or grep through it, if you like.

    So, in that sense, it's very much the best of both worlds. And, on the same box, you can specify rules on who can access the folders, so one file can be read by multiple people. Ooh.

    GNUS, an Emacs based mail/news reader, uses a variant of this called nnml, which rocks.

    Of course, when you get down to it, JWZ arguments aside, databases start to really look like what you want, especially on a corporate level when you're tossing the same piece of mail around to tons of different folks.

    -e

  118. why do you need to know? by Anonymous Coward · · Score: 0

    From the MUA's point of view the storage is abstracted in that you use IMAP. (Don't you? :)
    If you need to run elm/mailx etc use fetchmail.

    From the MTA's point of it's abstracted via LMTP.

    Job done :)

    I'm a Cyrus admin, and the only reason I'd care about how mail is stored is for doing tape restores.

  119. Re:Use a real database (dbmail.org) by Anonymous Coward · · Score: 0

    no one ever try DBMail? http://www.dbmail.org

    works with mysql and postgres for now...

  120. One non goverment enforced standard by Anonymous Coward · · Score: 0

    Well I know who should *not* be in charge of the new mailstorage format standard, I guess These boys are not yet capable of getting the right mail in the right place.... and keeping it there, although it would be cool if they released the carnivore source just so wo could add carnivore format file import to evolution ;-)

  121. Oracle Internet Filesystem by Anonymous Coward · · Score: 0

    Oracle has a product: Internet File System (iFS) that aim to provide a global solution.
    They store files, mails... in an Oracle 9i database.
    http://www.oracle.com/ip/deploy/databas e/features/ index.html?ifs.html

  122. TheBat! by the_danielsan · · Score: 1

    In fact, one of these pathetic windows clients has found a quite good solution, IMHO: Files are extracted from the mail body and stored in a seperate folder. This is has many advantages:

    1. You can easily browse this folder, deleting files you don't want. As you pointed out, Attachements use the most space and like this, you must only keep what you want.

    2. By directly writing them to binary files, no space is wasted (other than keeping them as MIME).

    TheBat's Mail format is far from being perfect. Mails are still written seqentially into a mail file (We all know this effect of "deleted mails", which are physically on disk).

  123. oh come on isn't .NSF the greatest? by ellem · · Score: 3, Funny

    Are you trying to tell me that a 5MB empty mailbox is asking too much? A text message that says "Hi!" costing 1.2MB is somehow wasteful?

    Lotus Notes Uber Alles!

    --
    This .sig is fake but accurate.
  124. Slashdot Labs by Anonymous Coward · · Score: 0

    I need to build a new mailbox file format.

    Let me ask the elite engineers and database gurus on slashdot.org.

  125. A alternate proposal by Anonymous Coward · · Score: 2, Interesting

    What bugs me the most with current mail technology is the problems with distributed mail handling.

    I access my mail on all kinds of devices, sometimes online sometimes not.

    My main problem is not so much witch mail-server / retrieval / presentation to use, since they all have the same inability to give me a working distributed solution.

    For online usage imap is sufficient, but if I go ofline with my laptop or ipaq, Im lost.

    POP isnt very efficient either, since only one of my clients can be the deleter, I must make sure that I synced all my other devices before the deleter removes the message.

    Since I use tons of folders for my mail, some of my stored mails data back to the late '80s, it basically forces me to use imap so my folders are insync on all the devices, but again that only works online

    Further it only works if my imap server is online. That can be a trouble if Im in some far of part of the world and for some reason or not I have no contact with my mailserver.

    What I would like is a concept I call SyncMail

    A distributed db-system. First I set up some 3-4 primaries, spread out on the net with completly different access routes. Each of them gets a MX record.

    The sending mta is happy to deliver to a secondary mailserver if the primary is ofline.

    But here comes the magic!

    The system regarded as a secondary MX by the rest of the world is in fact a primary!

    It sucks the message instead of queing it into its db, tags it with it's own internal server id, and tries to sync it to all other SyncMail primaries.

    Sooner or later the new mail is propagated to all the primaries.

    On the client side, the SyncMail app, contacts all the primaries, and cheks against a private index, and syncs all new mails, first trying with the closest server.

    Since all mails are tagged with what primaries it's been delivered to, no mail is retrieved to the client more than one time.

    Now I have a complete local mail-tree in my client, regardles of which primary I was able to contact, sure if a mail was delivered to a primary that goes ofline before the client syncs, and it hasnt been able to sync it to the other primaries, I wont get it until that primary comes online, but - what the heck, in pop/imap is my mailserver ofline im completly out of buisness, so the loss is defenetly smaller in this case.

    And for my ipaq i just configure the client to work with a few important folders, and to skip attachments, to save storage

    And for sending, all clients stores it in a outbox, wich is then synced to the primaries, once it gets to a primary it is sent in normal SMTP
    this way I solve the problem of being able to send mail with propper originating SMTP headers. Of course the outbox is synced as well, so I get a ref copy of my mail on all systems.

    I have started on a SyncMail application and someday I might be able to complete it, but there is so much work all the time :(

    Would anybody else be interested in this concept, maybe we could complete it together.

    Or if this is a realy stupid Idea, I'd be glad if someone would point it out, so that I can focus on finding a better solution.

    1. Re:A alternate proposal by ahde · · Score: 2

      as others have pointed out, a large part of what you want is accomplished with NNTP. For those who spend a lot of time on mailing lists, etc. this seems like an ideal solution, but really that's because what they are doing is actually using email like a newsgroup. I think you're right though about needing to be able to have the ease of retrieval that comes with IMAP and the disconnect of POP. I'm not sure of a better way to do this than NNTP (maybe that is the solution), but I think having the synched servers is a bit wasteful.

  126. What's wrong with your MUA? by jpn-sdot · · Score: 1
    just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
    About 1 sec / 1000 mail (no fancy stuff). Try switching MUA.
  127. Re:Don't speculate. Profile. by mgedmin · · Score: 3, Interesting
    An interesting comparison, but its a comparison of Courier-IMAP vs UW IMAP, and not just Maildir vs mbox.

    I once tried benchmarking Maildir vs mbox for my mail archives (mailboxes with ~3000 messages). On ext2 Maildir was a loss:

    • Mutt took twice as long to open a Maildir than mbox from cold cache.
    • Mutt still took a bit longer to open Maildir than mbox from hot cache.
    • On ext2 with 4K blocks mbox ate 13 MB of space, Maildir ate 21 MB.
    • Small UI degradation: Mutt wouldn't show the number of lines in a message from a Maildir, and it wouldn't show percent progress indicator while reading the Maildir.
    Basically for my situation (read-only mail archives with large numbers of messages, which are rarely in filesystem cache, ext2 and constant disk space shortage) mbox was better. But my situation (personal mosty static mail archives) is remarkably different from running IMAP server.

    I did this test in 2000. I should probably try again some day with Reiserfs, but I heard various people telling me it doesn't improve Maildir performance. Can't say anything until I try myself.

    I therefore recommend you to try it yourself and see if Maildirs really help in your situation.

  128. How many users would LIKE to use email by MarkedMan · · Score: 1

    There are many, many emails sent to one person that really need to be stored in a project folder, an administrative folder, etc. When someone is searching for info, they want to go to a central location and search all the documents, folders and emails that have to do with those documents. Storing email as one file or many is a discussion orthagonal to this need. I don't have an answer on this one, only a need. How can I quickly drop an email into a directory?

    I realize it is possible, but in Eudora or Mozilla's mail server, I have to do a Save As, rename, browse to a specific folder and finally save. It would be great to be able to put a folder someone and just drag and drop.

  129. Client issue by gidds · · Score: 1

    Perhaps I'm missing the point here, but isn't it down to your mail client to store your mail however it sees fit? Why should you as a user have to know or care?

    --

    Ceterum censeo subscriptionem esse delendam.

    1. Re:Client issue by vidarh · · Score: 2

      Perhaps because at some point you as a user are likely to switch mail client, and may have mail you want to migrate. But anyway, this discussion doesn't exactly seem to be a discussion of whats nice for end users, does it? It's a useful discussion for anyone designing or deploying mail systems of various types, including MTAs and MUAs.

  130. I agree, but probably not for the same reason by smcv · · Score: 1

    If your e-mail is in a binary DB, you're pretty much reliant on the developer of the DB format to let you export it. Outlook Express, in particular, is very reluctant to let you bulk export e-mail - it'll export .eml files, which are the e-mail in plain text just like OE received it, but only one at a time via right-click, Save As, which is a pain for large folders (at least in the version I used to use, 5.5, it might have got better since).

    Yes, it's possible to scan through binary DBs with 'less' if they contain the plain text somewhere, and I have been known to do this with my old OE .dbx files, but it's a bit ugly (half a paragraph of mail, 20 bytes or so of random binary, the other half of the paragraph).

    With a maildir or mbox format (I now use MH, which has a modified maildir as its native format) you can just grep through the files if you want to extract information from them and your e-mail client isn't working/installed/whatever (or you've switched to a different one).

  131. the plan 9 approach by rpeppe · · Score: 5, Interesting
    as a basis for an approach i like what plan 9 does. the mail is made available to clients as a filesystem (provided by a user level program). each mail message gets its own directory; each mime attachment gets its own subdirectory within that message (and recursively, as MIME is recursive).

    here's a little transcript:

    % cd /mail/fs/mbox
    % lc
    Directories:
    1 113 128 142 157 171 186 20 214 229 243 258 272 287 300 315 33 344 359 373 388 401 416 430 445 46 474 56 70 85
    [...]
    % cd 318
    % lc
    Files:
    bcc date filename info messageid rawbody sender type body digest from inreplyto mimeheader rawheader subject unixheader cc disposition header lines raw replyto to

    Directories:
    1 2 3
    % head raw
    Return-Path:
    Received: from punt-1.mail.demon.net by mailstore for rog@vitanuova.com
    id 1021665470:10:17045:138; Fri, 17 May 2002 19:57:50 GMT
    Received: from psuvax1.cse.psu.edu ([130.203.4.6]) by punt-1.mail.demon.net
    id aa1016828; 17 May 2002 19:57 GMT
    Received: from psuvax1.cse.psu.edu (psuvax1.cse.psu.edu [130.203.6.6])
    by mail.cse.psu.edu (CSE Mail Server) with ESMTP
    id 27DA4199BE; Fri, 17 May 2002 15:57:13 -0400 (EDT)
    Delivered-To: 9fans@cse.psu.edu
    Received: from acl.lanl.gov (plan9.acl.lanl.gov [128.165.147.177])
    % head body
    This is a multi-part message in MIME format.
    --upas-mbyuptynpdsmbjuyeermihdgur
    Content-Disposition: inline
    Content-Type: text/plain; charset="US-ASCII"
    Content-Transfer-Encoding: 7bit

    Hi,

    If you seek excitement and thrills you need to look no further than
    Plan9 -- it gives you everything and then some, but in a good way (or
    % cd 2
    % lc
    Files:
    bcc date filename info messageid rawbody sender type
    body digest from inreplyto mimeheader rawheader subject unixheader
    cc disposition header lines raw replyto to
    % cat mimeheader
    Content-Type: image/jpeg
    Content-Disposition: attachment; filename=iostats.jpg
    Content-Transfer-Encoding: base64
    % page body
    reading through graphics...
    %
    "raw" contains the raw data that makes up the message. "body" contains the data after the encoding formats have been applied (hence in that case /mail/fs/mbox/318/2/body is a jpeg file, viewable directly by any usual jpeg viewer).

    the beauty of this scheme is that it hides the underlying storage scheme from the mail clients. if i wish to change things so that the underlying storage format is many files [currently it uses a traditional mbox format], none of the mail client programs have to change.

    plus i can use grep, diff, shell scripts, etc directly on the messages in my mailbox. procmail eat your heart out.

    1. Re:the plan 9 approach by glv · · Score: 4, Insightful
      You alluded to this, but I know slashdot, and it's worth being explicit about it to avoid all the flames:

      This is not how mail is actually stored on disk in Plan 9. The "real" mail storage is just mbox files. What rpeppe has described is the view that the mail storage system provides to clients.

      I agree it's very sweet, but the question is primarily dealing with the actual storage format.

      --
      ---glv
    2. Re:the plan 9 approach by Pegasus · · Score: 1

      Ugh, yeah, NeXT had this too back then ... and when i had to fsck a 180gb server full of next homes, it took close to a day to reassemble all those directories together ... sigh ... Altough this is a nice solution, it requires a file system that is capable of dealing with it.

    3. Re:the plan 9 approach by Prometheu · · Score: 1

      I've been working on a "SQL" database storage and searching backend for mail for some time now, and I can give some insight for those who would like to start. 1. Figure on making a RFC 822 compliant SMTP agent for incoming mail. Sounds silly, but if you are going to also house usernames, domains, etc in this singular database, you can try to integrate with qmail's extension, which limits the structure of your database extensively, OR you will have to actually sit on port 25 and do the standard rcpt of the conversation. If you expect to use the QF files from sendmail, don't. Sendmail checks the virtual users table and translates the incoming RCPT TO:'s to the correct user before it writes the files. According to RFC 822, mailers sending a message to a group of recipient addresses handled by a single MX should send a single message with multiple RCPT TO:'s. Unfortunately, sendmail drops the multiple part. This seems innocuous enough,figuring you will use the RCPT To:'s in the message header (inside the DATA section), but dont forget two things. You will have to read each mail address contained there to see if you handle it, and again to determine which domain and user to match it to. And then there are BCC's. They never show up here (obviously), and are ONLY mentioned in the MX conversation with the actual RCPT TO: command. Just a heads up. 2. One way around this would be to "sniff" that particular connection. Figure on losing much hair on that one. How do you match the sniffed information to the queue files? If you are using sendmail, you can match the timestamp inside the QF file to the timestamp of the conversation. But the overhead to do this is tough, and I haven't found a reliable way to do this. 3. Once you HAVE all the data, what to do with it? My suggestion, similar to others here, is to segment out the mime-encoded data to actual binary information to try to keep the size down. Also, since much of my projected traffic will be address-list types (multiple internal recipients), I'm also using a single message in format. This means that there are four distinct tables, in a SQL format. One for users, one for message maps (userid, messageid), one for messages (no binary), and one for the binary attachments. I'm not a fan of keeping binary information in a database, so I'll be using a link format, autonaming the attachments based on the auto-increment field of the attachment table. Using the primary index for filenaming is an easier way out, and every 100/1000/10000 files can be a delimeter for a new folder, etc. Not quite the one folder to rule them all, but still governed nicely. 4. Try to find as much common ground as you can. Good database design should be the primary step. Keep redundancy to a minimum. For workgroup management / exchange emulation, figure on using the user/domain table for addresslists, etc. I've setup a virtual user on a domain with a text field that contains a serialized array of all qualifying addresses. There are more specifics for adding external addresses for internal distribution lists, but that is really up to the designer as to how to handle those. 5. Another piece to watch is the blackhole list. Any process can request verification, but making this piece can take more time than you expect. Sendmail, and qmail will do this for you, but again, you may have issues getting the information from either process in a nice and neat way. 6. As far as using this for document sharing, etc. figure on the auto-increment file naming process to help you. Checking in a particular file should be a process that fits your application. Keeping the original file as a singular copy does help, and also, since the auto-increment naming structure prevents unwiting overwriting, the actual code to do that must be present, and, again, has to fit your design. 7. This also allows for quick "virus scanning" as any files of a particular type can be renamed, modified or "quarrantined" until okayed by an administrator.

    4. Re:the plan 9 approach by CaraCalla · · Score: 1

      Ironically I thought about something like that: Writing a lib which wraps the libc-open etc. calls and provide leagacy programs with an mbox or Maildir view of the mail-store.

      MTAs and MUAs could than access the mail-store via native api or in the traditional way, only needing some LD_PRELOAD flag.

      (i submitted the story)

  132. Use news! by Inode+Jones · · Score: 1
    Use the Usenet news format, complete with overview files.

    • It's a well-defined format, so other tools grok it.
    • One file per message means reliability. Let the filesystem semantics work for you.
    • One folder per directory. Remember the Cnews days where you read news directly from the spool?
    • The overview file makes a good index.
    • If you implement mail and news in the same client, then you can share the code base.
  133. I had some plans, but i need to learn Java/Perl by eatmeat · · Score: 1

    After i learned Java and Perl some, i had planned to create an email app that was resonably cross platform.
    The main detail is that i'd cause the email to be stored by return address, much like OE (and the like, i'm sure) stores EVERYTHING in one file, with the exception of storing the MIME-attachments in a separete directory.
    every email i'd get from friend A would be stored in file A and indexed by some sort of Perl db. Friend B would have their own file, and the db would keep up to date so it all displays in the app, but on the computer they're sepereate files. (spam might be easily removed by just removing them from your files system; repeat spammers needn't be deleted more than once).
    attachments could easily be moved to a seperate directory and the db keeps notes to what mail entry it belongs to and could display it in the app.

    since i'm very new to Perl (and havent even started learning Java), its going to take me a while. So, i'm interested in some collaboration.
    once this is working, perhaps it wouldnt take too much work to be implemented into a fully fleged Unix mail thingy (with a cli instead of a Java-gui?) and be usefull for this topic.

    --
    All Scottish food is based on a dare.
  134. Text client for Evolution mboxes by motyl · · Score: 1

    All is nice, but I have additional requirement - I need to access my email from a text terminal sometimes. So I need I client which woul be able to access mail processed by Evolution (possibly with reduced functionality).

    Another question: after many back-ups, moving between many hosts I ended up with many folders with partially duplicated emails. How to make "one big merge" to make mail IDs unique, but still link to the virtual folders there were originally in (so merge the data, but keep the original forlder names as meta-data)?

  135. SQL by Julian+Morrison · · Score: 1

    Backend the silly stuff onto PostgreSQL (which is the best for clean transactionality and locking). To heck with this low level mucking about in the filesystem, that should be left to the database designers.

  136. I also have hard data that ReiserFS is NOT Ready by FreeUser · · Score: 2

    ... in the form of 8 different machines, all of which were running reiserfs on various GNU/Linux distros ranging from Suse to Mandrake to Debian, all of which suffered data corruption, data loss, and even the mysterious vanishing of entire directory trees (while disk usage exploded). In short, all had unrecoverably corrupted filesystems, not as a result of unscheduled shutdowns (which journalling is supposed to help protect against anyway), but on machines that were operating normally, without interruption. None of these filesystems survived more than 9 months of normal, everyday activity (without improper shutdowns, I will stress once again).

    These machines were located at three disparate sites, had different base configs, and in two cases were installed and maintained by different people.

    The only things they had in common were that they used Reiser, they lost data (severely), and had to be reconstructed from backups (this time without using Reiserfs).

    You may believe that you can trust ReiserFS, but I know for an absolute fact that I cannot, and I think it is very possible you will discover that at some point as well. Of course, having relegated everyone else's experience to mere anecdote, it is clear you won't learn this until it hits you in the face, personally. That's OK, not everyone is willing to learn from the experience of others.

    However, to those who are interested in learning from the experience of others I will say this: tread very, very carefully with ReiserFS. It is not ready for prime time, and should not be used in any production system. If you really need journalling, use XFS. It is very stable and quite difficult to damage (so far it has survived every stress test I've been able to throw at it).

    Now, go ahead and relegate this to anecdote if it makes you feel better ... I have hard data to back up my claims, and, quite frankly, a filesystem is sufficiently important that "your milage may vary" should be an unacceptable answer. By all accounts, if those who haven't (yet) suffered data loss with ReiserFS are to be believed, with ReiserFS YYM indeed V.

    --
    The Future of Human Evolution: Autonomy
  137. One word. by 42forty-two42 · · Score: 1

    Mysql.

  138. Take a look at Mercury by Havokmon · · Score: 2
    Mercury Mail, from David Harris, the author of Pegasus Mail, I believe does what you're looking for.

    I think it's the best of both worlds. Your 'INBOX' is like MailDir, where each 'new' message is a seperate text file. Once you've 'Filed' that message, however, it's compressed into a single file along with the rest of the emails for that folder.

    Personally, I think you're looking at the WRONG aspects of mail servers. You're getting way too technical. Nobody gives a shit about wasted inodes. When's the last time you defragmented ANY disk?

    The reason I use Mercury, is because of it's exceptional Netware NDS integration. Combine that with Pegasus Mail's NDS integration, and you have 'Roaming' users without all the profile garbage (Pegasus will use NDS calls to see 'who' you are, and read your email from your home directory). Oh, and it's free.

    To bad it hasn't been ported to Linux.. along with the PAM stuff needed to keep up the kick-ass user integration :)

    --
    "I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
  139. I'm using Courier Mailserver w/ assorted clients. by Anonymous Coward · · Score: 0


    Courier MTA/Courier IMAP w/ XFS on software raid5. Maildirs and XFS is a good thing. So far i'm pretty happy with it.

    Sam Varshavchik, the coder/maintainer/author is... a character. He can be rather acerbic and opinionated, but responds to EVERY issue raised on the mailing list within a day. Admitedly the answer is somtimes "their software is broken. don't use it with courier" but serious issues are addressed quickly and well. Nice change from having 2 wait for 6 months for a institutional patch...

    So far I have mutt, pine, webmail, netscape, and OE, clients working well with it. (pine needs to run through IMAP and OE can be a little iffy)

    For more info check out www.courier-mta.org

  140. I've tought about this, and.. by Fweeky · · Score: 2

    I came up with mboxdir. It was actually a preliminary specification for a Win32 client.

  141. cyrus? by jochen · · Score: 1

    Did you really try cyrus or did you just dump it because it looks similar to maildb?

    cyrus really has some interesting features and is way faster than mbox:

    - full IMAP-4.1 complicance with multi access
    - ACL
    - Quota
    - sieve support
    - hard link support for multiple recipients (yes, this means sending a 10 MB file to all local users will take 10 MB disk space on the mail server).

    And it proved to be very reliable.

    --jochen

  142. Re:Stays up for *days* before losing mail and rebo by twinpot · · Score: 1

    My experiences mirror yours. I worked for a company that supported Exchange and Domino. We had a number of Exchange guys (who were pretty clued up_ to support the client's Exchange system. We had just one person who supported a greater number of Domino clients.

    As for the NetWare stories: Netware 2.15c server that was up for 2 years. Shut down and moved to new site, and wouldn't restart. Investigation showed this to be incorrect termination of the SCSI drives (which had been like that for two years!). Corrected the termination, and off it starts ;-)

    There is no reason though why a combination approach cannot be used. Store binaries (and text) on the file system, and have the "meta info", pointers etc stored ina DB. That way the DB doesn't need to be too flash or large.

  143. Learn from non-Unix models, too by mwood · · Score: 2, Insightful

    VMSmail's storage format is instructive. Each message is represented by a single record in an indexed file. A short message body is simply tucked into the record along with the headers and other metadata. Long bodies (more than around 2kb IIRC) are stored as individual files and their header records point to the files by name.

    Of course you all realized at once that the main file can get out of sync. with the directory which holds the external bodies. It does, sometimes, and fixing it up can be a pain. Any storage method which partitions a single message among multiple files is going to have similar problems. But it works pretty well, and it shouldn't be too hard to write a tool to groom the message store in case of inconsistency. It's worth study.

    It was a natural choice on VMS, which has really good multi-indexed file support in the base package. It works well with text messages, which often do fall within the size limit for avoiding external storage of the message body. Today it suffers the same problem that mbox does -- people use email differently now.

  144. I have to differ by b0bby · · Score: 2, Informative

    I have never used Exchange, but a friend of mine admins a large (50,000+ users) Exchange system. Even a few years ago, running on NT4, their servers did NOT go down, ever. They scheduled a reboot for patches etc every 6 months, that's it. I have had lots of Netware boxes up for over a year, but not Netware 5 running mail. I inherited such a box & it needed to be rebooted every month or two. Now I've replaced it with a Linux based mail server & I'm much happier. Still have a 4.11 box cranking along happily, even happier since the 5 box is no longer giving annoying messages about it's licences. And my 2000 Server has been up for coming up on a year with no problems.

  145. Oracle Mail by nixkuroi · · Score: 1

    I remember a few months ago that Oracle was going to release some db oriented mail server that was supposed to revolutionize enterprise level email. Anyone know anything about this?

  146. Know your system administrator by gypsyx · · Score: 1
    The official responce from the old "Know Your System Administrator" field guide (look on google):

    SITUATION: Balky mail.

    TECHNICAL THUG: Rewrites sendmail.cf from scratch. Rewrites sendmail in SNOBOL. Hacks kernel to implement file locking. Hacks kernel to implement "better" semaphores. Rewrites sendmail in assembly. Hacks kernel to . . .

    ADMINISTRATIVE FASCIST: Puts mail use policy in motd. Locks accounts that go over mail use quota. Keeps quota low enough that people go back to interoffice mail, thus solving problem.

    MANIAC:
    # kill -9 `ps -augxww | grep sendmail | awk '{print $2}'`
    # rm -f /usr/spool/mail/*
    # wall
    Mail is down. Please use interoffice mail until we have it back up.
    ^D
    # write max
    I've got my boots and backpack. Ready to leave for Mount Tam?
    ^D

    IDIOT: # echo "HELP!" | mail tech_support.AT.vendor.com%kremvax%bitnet!BIFF!!!

    Both "administrative fascist" and "maniac" seem to have the right idea about how we should handle users receiving too much mail.

  147. Why would you do that? by Anonymous Coward · · Score: 0

    I mean come on shit on the server? Think of the smell. What if you're running a hot server? You'll have some brownie lookin turds.
    Just my 2 cents but you may want to avoid shitting on the server.

  148. MySQL! by deadkarma · · Score: 0

    I often searched for a MUA that uses MySQL as storage and often though of creating one myself.

  149. yEnc is an excess format written by ignorants by Anonymous Coward · · Score: 0

    yEnc reintroduces the problems the world had before MIME, suffers from the same "begin youcantreadthis.txt" "attachment" games as Outlook, but does not solve the transport reliability issues. Don't waste your time.

  150. Maildirs are not slow by DrProton · · Score: 1
    ... just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.

    About 1 second with mutt (in an rxvt) on my dell dimension 4100 (I GHz pIII; 512 MB ram; 7200 rpm IDE disk) running debian. The maildir contained 1429 messages and is on an xfs, the kernel is a recent 2.4.18+xfs. Idiot.

    --
    "Mit der Dummheit kaempfen Goetter selbst vergebens." - Schiller
  151. Maildir for 1000+ messages? by Jobe_br · · Score: 2, Insightful

    I have in excess of 46K email messages in my account alone, not to mention everyone elses accounts on my company's mail server. We use cyrus IMAP and qmail, both of which use the Maildir format mailboxes ... every client I've used (Mozilla, Communicator, Outlook/OL Express, Mail.app on OS X, Eudora, and Papi-Mail on PalmOS) seem to have absolutely no problem with this setup. Most MUAs are intelligent enough not to download all your headers every time you connect, so unless you're getting 1000+ new emails everytime you open a particular folder, you're generally not going to need to read all those headers every time.

    The server that runs this is a measly 600MHz PIII w/ 128MB RAM running RedHat 6.2 w/ a 20GB hard drive. I haven't gotten even close to running out of inodes, to my knowledge, and my server never goes down (really, the only times its gone down is when power has been cut to it and this has only happened twice in the past 1.8 yrs ... long live Rackspace).

    Maildir is specifically designed to handle mailboxes with large numbers of emails in them, contrary to other formats such as mbox. The problem with any sort of DB approach is the waste of space, even if you compress. A basic course in file structures will teach you a wealth of knowledge in this regard.

    Imagine this: you have a table that stores everything you need to know about an email. You have a few distinct fields for commonly accessed headers (subject, from, to, cc, etc.) each of which would need to be 'text' blobs, since you cannot limit their size (you've seen the emails that have to/cc fields that are miles long, right?) - well, 'text' fields are notoriously poorly optimized in database engines and quite difficult to search (you can create an index on a part of a text field, but that might not be enough, right?). Next you have the message body which would also need to be a text field since you don't limit it's length, either.

    Now, since the space for these fields (which don't *ever* change) is not optimized in the slightest, you might think that compressing them is a good idea, right? Well, what if an email is deleted - then you start looking at fragmented space in your database table which would need to be compacted periodically (much as mbox/.mbx files do today, if I recall).

    All in all, storing each message to its own file is not really *that* bad ... optimize the file subsystem beneath it, maybe allow for compression/encryption or that sort of thing, but otherwise, the folks that put together Maildir have certainly done a decent job!

    1. Re:Maildir for 1000+ messages? by Matthew+Weigel · · Score: 2
      All in all, storing each message to its own file is not really *that* bad

      Correct. The only complaint of Cyrus's format seems to be that it uses too many inodes. Well, in any situation where you're at risk of running out of inodes for mail, you're going to a) keep Cyrus's playground on separate disks, and b) take advantage of Cyrus's partitioning ability to spread it out over several different filesystems on multiple disks...

      This can be a problem for Maildir, since in the general setup Maildirs are spread out all over the place, making it hard to consolidate to 'mail only' partitions.

      --
      --Matthew
  152. No. by Doktor+Memory · · Score: 2

    You are going to run out of inodes at exactly the same time you run out of disk space, because they are one and the same thing.

    No.

    Running out of inodes is not the same thing as running out of space. Some of the symptoms of the two are the same ("can't create new files"), but they are completely different failure modes.

    Consult your local man pages for further details.

    --

    News for Nerds. Stuff that Matters? Like hell.

  153. Unix philosophy vs. Borg philosophy by bee · · Score: 2

    There are two excellent reasons that so many people use Exchange.

    1) In general, it works out of the box. A company with someone with meager knowledge can set up a fairly complex mail handling system without much help.


    And that same person with meager knowledge is going to get hacked six ways from Sunday when the next Exchange exploit comes around, because what's not included in that meager knowledge is that you have to keep up on security patches if you want your easy-to-install mail server to not be an easy-to-hack mail server.

    2) It does A LOT. In it's most basic configuration it does what you need 10 or more programs in Linux to do, not to mention that most of those 10 don't exist.

    And God help you if one (or many) of those pieces of Exchange are broken or don't do what you want to do. Can't change it, it's part of Exchange! At least if one of those 10 linux programs are broken or doesn't work right, you can replace it with something better without affecting all the other parts.

    These are simple philosophic differences between Unix and Borg. Borg stuff usually has a shallow learning curve at the beginning, but then it ramps up as you discover things that are difficult or impossible to do. Whereas, the initial Unix learning curve may be steep, but it flattens out further in.

    --
    At least mafia-owned pizzarias make excellent pizza. Compare to Bill Gates.
    1. Re:Unix philosophy vs. Borg philosophy by Eristone · · Score: 1

      1) I still haven't seen an "Exchange" virus. Could you kindly point one out that exploits Exchange (and not Outlook)?

      2) Learning curve to 'change' the behavior of stuff is high no matter which product you're using... philosophy or not.

    2. Re:Unix philosophy vs. Borg philosophy by bee · · Score: 2

      If you think that viruses are the only way to exploit security holes, then you're the perfect example of that person with meager knowledge from 1).

      --
      At least mafia-owned pizzarias make excellent pizza. Compare to Bill Gates.
  154. Re:Stays up for *days* before losing mail and rebo by EatenByAGrue · · Score: 1

    Sounds like you worked closely with a bunch of clowns who had no idea how to run Exchange.

  155. Re:Stays up for *days* before losing mail and rebo by Sabalon · · Score: 2

    Well, how many MCSE's on paper equal one person who has actually done the stuff?

    We have an exchange server - one person managers it along with tons of other stuff. It pretty much runs itself. We just moved to Ex2k, but were on Ex5.5 for quite a while - I can think of only one time it crashed and that most likely had to do with a 3rd party virus scanner intergrated onto the server. Removed that and no more problems.

  156. Cyrus does *not* use the maildir format by pHDNgell · · Score: 1

    While there are similarities, note that cyrus also keeps a couple of files per folder to enhance the performance.

    --
    -- The world is watching America, and America is watching TV.
    1. Re:Cyrus does *not* use the maildir format by Jobe_br · · Score: 1

      I don't think its fair to say that cyrus doesn't use the Maildir format ... it certainly does, it just uses a few more files to optimize access, nothing wrong with that - nothing in the basic Maildir format is changed, as I use SqWebMail to access my Maildir folders remotely, when I can't connect to my server via SIMAP.

      Anyway, the fact that the format allows for this type of extension is a Good Thing(tm), right?

    2. Re:Cyrus does *not* use the maildir format by Matthew+Weigel · · Score: 2
      I don't think its fair to say that cyrus doesn't use the Maildir format ... it certainly does

      Eh? Your 'counter' to the factual claim that it doesn't is... an unsupported claim that it does?

      It is not Maildir format. Maildir specifies the delivery method as well as the file format; by your logic, Maildir is nothing but mh. But it's not just "single file per message," and neither is Cyrus; and they're not mh in different ways. Cyrus does not use the new/tmp/cur subdirectory setup, Maildir does not use CRLF to represent newlines.

      Cyrus mailboxes are not designed to permit multiple processes, unaware of each other, to access mail without failure - that was the primary design consideration of Maildir. Cyrus side-stepped that problem, and was therefore able to improve performance more (do an 'ls' in a Maildir with a thousand messages - that's what a client has to do to read that folder).

      In short, there are superficial similarities of design, but they are different.

      --
      --Matthew
  157. Re:I also have hard data that ReiserFS is NOT Read by Anonymous Coward · · Score: 0

    "..in the form of 8 different machines, all of which were running reiserfs"

    I have never heard anyone having so many problems with reiserfs as you! I am using reiserfs on several squid boxs, 21 production qmail boxes and a handful of other production and testbed systems and I have never had so much as hiccup that related to reiserfs. Searches of google and google groups turns up no one else that shares your experences of "unrecoverably corrupted filesystems" with reiserfs.

  158. Cyrus by Lumber+Cartel+Czar · · Score: 1

    You forgot to mention the fastest and most scalable solution there exists, which is Cyrus imapd, see http://asg.web.cmu.edu/cyrus/

    Basically, it is maildir with a header database.
    It scales well for tens of thousands of very active users on a single small box, and has also support for clustering. I know of installations which serve many hundreds of thousands of users on a single host, so imagine what a cluster of them could do.

    It doesn't do much to economize on space, but that's a non-issue. Anyone who is willing to keep dozens of megabytes in his mailbox is willing to pay for the privilege, and hard disk space is cheap. Anyway, I think that any mail system which does not preserve the rfc822 format all the way from sender to recipient is evil.

  159. I think a hybrid solution is called for. by mellon · · Score: 3, Insightful

    I don't think it makes sense to store email in dbm files. It's too sketchy - what happens when the dbm file gets corrupted? The nice thing about flat files is that if something goes wrong, you can fix it with vi.

    I think the right solution to the problem is to key off the message ID, which is supposed to be unique. Then define a mail folder as simply a list of message IDs. Messages can appear in more than one folder, but hopefully not in no folders.

    To make this efficient, I'd hash the message ID, and use a hierarchy of directories, because Unix doesn't do well with large flat directories. The hierarchy could auto-extend, so that as one subdirectory fills up, you do a sub-hash and split it into more directories.

    The problem of tiny files is a real one. The solution is probably to make the bottom of a hash a file rather than a directory, and store more than one message in each such file. You don't have to store a lot of messages in these files to win - even ten messages would produce a big win, and would be pretty efficient.

    The format of the individual files should probably be indexed sequential access - that is, a TOC at the front, and then the contents as plain text, nothing fancy. The TOC should be in ASCII, not binary, and you should be able to rebuild the TOC by looking at the file.

    Babyl used to use a control character as a delimiter, which worked pretty nicely - much better than using "^From ". Ever seen >From in an email message? That's because Unix mail uses "^From " as an inter-message delimiter, so it has to quote it, and it does so stupidly. So use ^_ as a delimiter, and if ^_ appears in the email message, just double it. Take a doubled ^_ out when reading a message.

    As for compression, I don't think it's worth doing at first. Disk space is cheap. Yes, my email folder is pretty huge, but it's really not a major problem. Making the storage system extra-complicated by uncompressing MIME is something to add on after you've got something more basic that works - you don't have to solve every problem all at once.

    As for folder scan performance, you can make a cache, and have the mail program scan the cache from time to time when it's idle to clean up errors. This is much better than trying to come up with a format that's optimized toward folders - if you try to optimize toward folders, you wind up creating all kinds of problems, IMHO.

  160. Lotus Notes by Anonymous Coward · · Score: 0

    Lotus Notes has a special database format called Notes Storage Format (NSF). It supports clustering, replication, and encryption inside the file. We run it on Unix and NT servers, and it performs great, even on hundreds of users each having hundreds/thousands of messages...oh, and Notes not only does email.

  161. Don't loose functionality! by Anonymous Coward · · Score: 0

    I'm a small time mail admin, since i'm somewhat small time running only a small hosting servers delivering no more than 300 emails a day, i don't require these super respondant and super efficient MTAs....

    What i do required is the functionality i have found in qmail - and i know plenty of people hate Dr Dan Berstien (sp?) for it.

    I've written two authentication modules for my hosting server since we use name vhosts and ip based vhosts, therefore there's a requirement to default to $ENV{TCP_LOCAL_HOST} on ip based connections and to user user%host suffixes for http named based vhosts. I could not have done this if my MTA didn't authenticate with an environment variable and a string of exec loving apps.

    I've taken my authorization somewhat further, including courier imap auth modules and custom logging. Again, smth that i could not do without basic functionality, offered by my favourite MTAs.

    I'm now in the loving hands of a custom chrooted setup with loging and authentication i dreamed and developed and _know how to maintain_ - don't let any database-based MTx take this away from me!

    Matt

  162. Throwing out the baby with the bathwater. by Anonymous Coward · · Score: 1, Insightful

    The questioner makes the correct observation that Maildir is very slow with large directories when performing aggregate operations such as viewing the inbox.

    Unfortunately the questioner doesn't notice the correlary that the single-file-per-folder solution will tend to be slower for *unit* operations -- adding newly arrived mail becomes a problem because of locking issues, removing deleted mail neccesitates compacting the file and so forth.

    I worked at the 8th largest web based e-mail provider -- they provide cobranded web based e-mail for over half a million domains, with over 12,000,000 mailboxes when I left.

    A gentleman we interviewed who had left a competitor told us about a major problem they had: They were using stock maildir to store messages, and with a *slighty* larger userbase than us they were crushing a $1,000,000 EMC SAN capable of handling some 8,000 NFS operations per second (Or was it 16,000? Can't recall...) -- 300-400 NFS operations to view an inbox just isn't good. My employer was using a low-end NetApp capable of handling something like 4,000 NFS operations per second (Again, don't remember for certain -- it was half or less of the EMC box's capacity though) and the box was only at 20% of it's throughput capacity, with nearly as much mail coming through the system.

    The *one* key architectural difference we made was storing certain headers in a MySQL database -- from, subject, sent date, etc. The stuff you need to view an inbox or what have you.

    Following such an approach -- particularly with a DB capable of fine-grained locking gives you the best of both worlds: Fast aggregate operations (use the DB to aggregate and index data for inbox-viewing, searches, and so forth), and fast unit operations (using individual files to store messages). And writing software to interact with such a mailbox remains very simple.

    You can use compression on the individual files to save space, or you could be courageous and come up with a binary-safe heirarchical file format that can represent a MIME document efficiently in order to "undo" the 35+% penalty encoding poses. If you're really gutsy you could then compress that file. Or, in order to really maximize performance you could simply opt to compress *segments* of the file (think binary attachments -- leave headers and text/HTML sections uncompressed), so that viewing a mail doesn't involve decompressing it -- only accessing large attachments would incur that penalty. In fact, this gives you room to make user-definable performance vs. space tradeoffs: Let the user decide what sorts of things get compressed. Want to save the maximum amount of disk space? Compress everything. Maximum speed? Compress nothing. (And in that event you don't even have to pay the CPU penalty of MIME-decoding the attachment!)

  163. AMS : The Andrew Messages System by mpb · · Score: 1

    The Andrew Messages System is pretty neat.
    http://www-2.cs.cmu.edu/afs/cs.cmu.edu/Web/ People/ AUIS/ams.html

  164. OraMail (Re:One folder to rule them all...) by hpavc · · Score: 1

    i hope not ... using it right.

    now nothing better than having virtual mboxes that allow me to look through all my gig's of mail via imap in under a second.

    i can also organize the mail messages into folders but not move them ... so 'all mesages with from or to these domains newer than 60 days in this folder' and 'all unread messages in this folder' and not have duplicated messages.

    this thread was obsolete a long time ago.

    --
    members are seeing something, your seeing an ad
  165. Re:/., come for the intellectual discussions by hazen_vs · · Score: 1

    fucktards fuck see - Here

    Tards, isin't a word and therefore doesen't exist. I pity those whom are quick to flame and never understand. Ours is an enlightend relality and unfourtuanetly you will never be a part of it. Pity, ignorance is bliss and I guess you'll never understand enough to move on and evolve like the rest of us have.

    Desire is the first evil and it begets desire -Mohatma Buddah

    --
    Peace can only come as a natural consequence of universal enlightenment ~Tesla
  166. What about LDAP as the message store? by Anonymous Coward · · Score: 0

    What about LDAP as the message store?

    I keep all of these messages so that they can be re-read. But they are obviously only written once. If the MTA would write the headers (To, From, CC, Subject) and then the body to diferent attributes, it would be very serachable and fast. In addition, Access control lists could be set up, and seperate container nodes could replace the "folder" concept. I think it makes more sense and would be easier to do than a database.

  167. Maildir and 1000+ Mails by PCGod · · Score: 2, Interesting
    and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.

    Until about 3 days ago, I had 1700+ messages in my Maildir, and pine (patched to support Maildir) opened my inbox in about two seconds. Compare this with my sent-mail folder, which had about the same number of messages in it. This folder is stored in mbox format and it took 5+ seconds to open AND CLOSE this folder. I believe that Maildir is the fastest option, short of keeping a seperate database.

    1. Re:Maildir and 1000+ Mails by vidarh · · Score: 2
      Maildir is great as long as you're using a filesystem that can handle it well, such as reiserfs... mbox format can have some advantages on filesystems that handle lots of medium to small files badly (such as ext2fs), though, but I still think the risks of manipulating a single mailbox file from multiple applications is too big to be worth it.

      I designed the mail system Nameplanet.com ran on (about 1.5 million mail accounts), and we used qmail with Maildir, but wrote our own highly optimized POP3 server with some extensions (for our web frontend) and caching of size and header data etc., to reduce the amount of stat()'s, and with the few enhancements we did, Maildir was extremely fast (and robust).

  168. Re:I also have hard data that ReiserFS is NOT Read by FreeUser · · Score: 2

    Searches of google and google groups turns up no one else that shares your experences of "unrecoverably corrupted filesystems" with reiserfs.

    ahem. You really didn't look very hard, did you?

    filesystem corruption (2.4.18, reiserfs)

    Bug#122230: reiserfsprogs: filesystem corruption with reiserfs

    Re: ReiserFS / 2.4.6 / Data Corruption

    ReiserFS desaster - advice please !

    and about 829 other matches. Need I go on?

    Oh, BTW, as I noted, two of those systems didn't belong to me, they belonged to people I know who experienced similar difficulties (and documented them as well).

    Enough people, of enough diverse walks of life, are having issues like this with Reiserfs that it is clearly not something that is safe to be deploying in a production environment. Even if only 1% of the people using it are being so bitten, that number is way too high (and based on my own experiences and those of several people I know, I suspect that number is a lot higher than 1 per cent).

    --
    The Future of Human Evolution: Autonomy
  169. SQL by Anonymous Coward · · Score: 0

    MySQL (and SQL in general) is a great way to store large amount of data that later needs to be searched in some onscure way. And with the addition of the full text search to MySQL you can do queries that return possible matches, not just exact or wildcard matches.

  170. One more by oli_freyr · · Score: 1

    Check out http://qvcs-guide.sourceforge.net

  171. Don't agree with your definition by Ashurbanipal · · Score: 2
    Tons of interesting information and viewpoint. Thank you.
    I would define Unix mail as mail (rfc822 format) downloaded and stored locally on a per-user basis. IMAP, Exchange, and other remote protocols are very different beasts.
    I would define that as "home user" Email. Very specifically not corporate or academic strength.

    I don't think "unix mail" is all that useful a handle, but if I was going to use it I'd be referring to mail that stayed on unix hosts - usually in mbox format - as opposed to mail downloaded to user PCs with unknown operating systems.

    Corporations and other profit-making legal entities can't dedicate specific PCs to single users cost-effectively in most situations, and they certainly can't effectively manage storage and back-up email stores if the Email messages are scattered over many failure-prone end-user hard drives. IMAPv4 and whatever the proprietary boyz are shopping this week purposely keep the email on the server, so that evidence can be extracted (or destroyed, if you work for Enron) from server backups, and so that filtering and surveying of mail data is easily possible.

    For example, some corporations sweep their drives for return & delivery receipts over a month old and delete them.

    Another example, corporations doing highly sensitive government contracts will sweep their email stores for classified information leaks.

    Another example, I need to get my Email regardless of whether I'm on my laptop at a remote site, at my desk in town, or at home tunneled through SSH. Downloading it to one of these boxes makes it inaccessable to the others.

    The list goes on, but basically downloading email to a local drive is primarily for AOL users and basement hackers. That being the case, your points about maildir are excellent - let the filesystem handle most of the details. I'd add that if you must run a db for speed reasons (such as a subject line db used by an IMAP server) do it so that it can be deleted and/or recreated on the fly from the contents of the maildir. No need to create additional dependencies.

  172. I'd bet he *has* read the MIME RFCs by Ashurbanipal · · Score: 1

    ...seeing as how he's a camel developer for Evolution. And the reason the RFCs are unreadable is because they use words like "pedagogical" (and byzantine grammatical structures) not because MIME is complicated.

  173. Re:I also have hard data that ReiserFS is NOT Read by RustyTaco · · Score: 1

    I've also has similar experiances, though all with one server. Of course, it's the server in co-lo which is dificult to get to. Over the last year or two that server's been mostly happiliy camped out in co-lo (it was put in with 2.4.0-testsomething) using reiserfs for /home. Now on three separate occations I've come across files that cannot be access, cannot be deleted, and some times, cannot be seen even by root.
    Actually, I lied. It's been two separate servers as the hardware was complely swaped out once because of random crashing and other instabilities. Now in a last couple months it started randomly crashing again. This time I was noticing occations "access beyond end of device" in the syslog. After the last crash and hard reboot I ran "find /home -exec cat \{\} \; > /dev/null" (read every file in /home and discard the data to the uninitiated) to find it spitting IO errors on one file. I inspect it as root, and sure enough I can't read it, or even delete it. AND, when I try to access the file the "access beyond end of device" messages show up in the syslog.
    There are no IDE drive errors. lm_sensors shows everything within reasonable ranges, and I'm told the system passes a trivial visual inspection.
    So I pulled out a nice fresh chunk of space from the LVM pool and made a spiffy new /home ext3 and copied everything over. 5 days so far with no problems, which is sadly an improvement. I'll give it another couple weeks before I invest in a Hans Reiser voodoo doll.
    To be fair to the "it works for me, so it's perfect" crowd. I have been running it at home too, /home and /media, and havn't run into this problem. Guess I must not be rubbing it the right way.

    - RustyTaco

  174. One file per e-mail part is the only way to go by rici · · Score: 1

    If you administer a corporate e-mail system, one thing you will find is that your mail system rapidly fills up with multiple copies of the same e-mails, most of them with uncompressed Excel spreadsheets weighing in at hundreds of kilobytes of wasted space.

    Furthermore, if you store these things in databases or mbox-type flat files, you also find that your "incremental" backup tapes fill up with the same stuff.

    One file per e-mail solves part of this problem. One file per MIME part would probably do it even better.

    Sure, you can do the same thing with databases and fancy backup strategies, but why bother? If file systems aren't adequate to the struggle, use better ones. (Anyway, I'm not convinced of that -- if you're really concerned about inodes, change the setting on the partition which holds the mail. If you're concerned about the time it takes to read linearly through a directory, use directory trees.)

    Databases always make me cringe. The number of times I've failed to restore Outlook mail files after they have been incompletely transferred over a network has convinced me to never again even think about database mail storage. Use a database for indexing if you really want to (although IMHO an SQL server is a ridiculous extravagance and waste of cycles for a database which will comfortably fit in less RAM than a typical screensaver), but make sure you can rederive it from the original data.

    Breaking into down into single files makes everything simpler (starting with locking and going up from there.)

    Probably too late to bother contributing and I'll bet all these points have been made already anyway, but I feel passionate about this. So there.

    Rici

  175. Evolution == mbox, mh, or Maildir by RonVNX · · Score: 1

    For some reason, no one seems to ever know what they're talking about on this subject. (*sigh*)

    Evolution uses things other than mbox. In fact, you'd be wise to choose Maildir with Evo, aside from not dealing with the flaws of mbox, it can be much faster. (see the Evo archvies)

  176. Who modded this guy up? by RelliK · · Score: 2
    You are going to run out of inodes at exactly the same time you run out of disk space, because they are one and the same thing.

    No they are not. The parent post is correct.

    In fact, I believe all the inodes are created when you create your filesystem, all space is mapped to an inode (though of course one file can use multiple inodes).

    What you believe has nothing to do with reality. I suggest you take an OS course. Or read up on how Unix filesystems work.

    It's usually said that if you have 4k inodes, you'll lost 2k (on average) per file.

    There is no such thing as a "4k inode". You got your terminology wrong. You are thinking about blocks. On average, you waste 1/2 the block size for each file on your filesystem, since the last block is, on average, half-full. An inode is not the same as a block! They are two completely different things, which is why your entire post makes no sense. Think of an inode as a "file header". I don't have time or energy to post the full description but I already mentioned where you can get relevant information.

    --
    ___
    If you think big enough, you'll never have to do it.
  177. I agree by RelliK · · Score: 2

    The poster of the article just assumes that filesystem must be slow when working with 1000+ files per directory and we need a database to save us. That's nonsense, from my experience.

    Apart from that, there are some very important reasons why maildir is much better than a DB. With maildir you can use standard Unix tools to manipulate your email. With a DB you can't do that. Mailbox corruption is not a problem with maildir -- even if corruption were to happen it would be limited to one message, or a small number of messages (not even a mailbox). With a corrupted DB storage, you lose everything -- all the mail of all the users in all the mailboxes. Ask an Exchange admin about it some time.

    --
    ___
    If you think big enough, you'll never have to do it.
  178. abstraction.... by perlchild · · Score: 1

    I doubt I'll be alone in the opinion that the above discussion kinda emphasised the need for a particular method of access being tailorized to your needs, and not everyone's needs being the same...

    Do you know WHY?

    Because the format is rather extensible, and adding/removing/rewriting headers, when you don't know how many of each are supposed to be, isn't such a good idea(think about the "Received:" headers, for which there must be several formats, perhaps as much as a dozen) the X-headers are another kind of hard-to-mess with "content" That leads the MDA and MUA to use the same format, to minimize the number of operations on each email, etc.. (They don't NEED to have the same(sylpheed converted an mbox I had into mx(or mbx) which was nice, if unexpected, yet not wholly what I wanted)

    Now what does that have to do with anything? Well it's no coincidence that most MUA(mail user agent aka mail client) REUSE the work of the MDA as much as possible... It's easier to have high performance when you don't have to do anything... Now most people only use ONE mua to read all their mails... But most systems administrators manage servers where not everyone uses the same MUA (yes if you're a sysadmin I'm preaching to the choir...) Locking and compatibility become important... otherwise you have to remember joe wants his mail in mbx format and dave wants it in maildir, so you can deliver to them... hence the lowest common denominator has its advantages... and why do work when the client will make you redo it(think people with enough procmail rules to consume 1 minute of cpu per incoming message)

    Now think... all of this has to do with the link between mua and mda formats... What's the future for storage of emails? well if you are writing a client and have access to something like camel, which lets you choose the format as you see fit... You sure aren't hurting your chances, are you? You KNOW everyone likes their email du jour different...

    Now what does that mean for servers? Well I can see the mailfront project(where the "front end" or "customer facing") is seperate from the "back end" or processing unit, allowing one to basically mix and match, or at least to integrate seperate approaches more easily as an approach with lots of future...

    What does that change? Well for having tried alternate file systems and alternate mail"drop" formats a lot in recent months, I can tell a smart sysadmin will want to choose the filesystem and mail"drop" format together... as an optimisation measure... Lots of people don't seem to like maildir... on e2fs... Where it's not at its best... But put it on reiserfs... and it flies.. why? Simple... The filesystem is a data retrieval method... and your mail"drop" is a database of sorts... Would you just pick the database that comes with your operating system, because it comes with the operating system... with no thought to size or performance or contention or locking? I know I wouldn't... Now databases et al... are all good ideas... for the right needs... Does everyone need the same mail server? No... I use courier, on reiserfs... it does what I need...

    For a larger setup... copying the headers for indexing purposes is a good idea... IF you search your email a lot... Which is why it makes sense for evolution to do it... most people don't search email ON THE SERVER... they search a local copy... (or hopefully cache some of the metadata instead of brute-force download all messages of a mailstore...) Does it make sense on a pop-toaster? Probably not, most people don't "Keep" mail long enough for it to make sense... But some do... And it probably was how the original idea of exchange/domino/etc... developed... a database you subscribe(as in publish/subscribe, not as in cash) to... that gives you access to your email/meetings/etc...

    From the namesys project's web page, it appears some people are working to integrate reiserfs into maildir to a greater degree, allowing more efficient searches, headers stored as attributes, etc... All lovely ideas... for the right client...

    The same with embedded databases(bdb, gdbm, cdb, etc...) or generic relational or object databases... For some people they make a lot of sense... For the pop toaster kind of setup, it seldom makes sense: the "end users" don't appreciate the kind of work involved into making searches fast(most of them don't search too often "live" over hundreds of folders in a webmail type of situation for example).

    Of course in a smaller office, with say a pair of email power users with a gig or two of emails for "data mining" purposes such databases might mean the difference between life and death...

    On that note mbox is probably fine for up to a hundred messages... if a bit slow... maildir might get scary after 100000 messages(especially on e2fs, inode vs directory table considerations...)

    Does it make sense to spend lots of work on performance vs compliance to standards vs interoperability? Depends on patterns of access, installed base, usage metrics and other such considerations... But email is a tool... Like all tools, what's important is: What are you going to use it for today?

  179. Re: I was thinking of viewsets by os2fan · · Score: 2
    I have a shared mail account with a fixed name at work, so I know what you are talking about.

    If you are looking at a file system as a heirarchial structure, why can't you have more than one such table.

    The idea being that some mail clients would be only in the "person" tree, and that others would only be in a "function" tree. One could then be given access to both the person and function trees, and shunt mail between them for others to see.

    The other thing that we should do is do things that encourage the use of these things. Make the tools for doing this easier to use and understand, and make the concepts easier to grasp.

    --
    OS/2 - because choice is a terrible thing to waste.
  180. Not XML! by Anonymous Coward · · Score: 0
    Mail tends to have a lot of HTML tags. That would require a lot of < character entities.

    Why use XML anyway? Why not a format that is optimized for mailboxes? Put another way, what's the advantage of XML over mbox format?

  181. My Long Two Cents Worth by Anonymous Coward · · Score: 0

    Just thought I would throw in my two cents. I manage the mail sever for a company that has 10,000 mailboxes and handles about 100,000 email messages a day (that's minus SPAM because our SPAM filter stops 20,000 email a day). Before I was able to convince TPTB that Linux is the best solution all around for a server, we were using NTMail. NTMail uses a format similar to mbox but also has an idx file that contains an index. NTMail finally got where it couldn't handle the load so I moved us up to SendMail on a RedHat system.
    We were using EXT2 for the filesystem and IDE drives with a software RAID. Although we never had any corruption problem, some of the larger mailboxes did take a while to open (10 seconds max). The processor load average also went up and the whole machine slowed down (not too bad tho) when a large glob of emails came in at once.
    We have finally upgraded to a Dell PowerEdge 1650 (with one processor) and hardware RAID SCSI drives. For the filesystem, I used XFS because it is a jounaling filesystem and has at least the performance of ReiserFS. We are also using the RAV antivirus milter (The Most Affordable Virus Scanner for Linux Mailservers for anyone not using a virus scanner on your mailserver). Our new server is very fast, even under high load. We have not had ANY corrupted mailboxes (except one who accessed it through IMAP and POP3 at the same time). I personally dont believe it is the format that needs changing, changing the hardware and software choices to scale to the growing about of email. The fact that email use is growing faster than any other internet service. Picking the right file hardware and filesystem are a must, as well as a properly configured mailserver. But just because you are having file corruption problems or the server is taking a long time to access your mailbox is no reason to go back and totally rewrite standard. Why are the mailboxes being corrupted? Why does it take to so long to open a mailbox? This is when you get to the root cause, not trying to go around it.

  182. We need a standard to move mail En Masse by magicianeer · · Score: 1

    Unfortunately, the standard way to move MIME encoded mail from one system to another is to mail it. This is *not* a good idea.

    Suppose you are changing to a new computer (Linux to MacOS X for example), and you want to keep your email. Or suppose you are changing jobs.

    Imagine emailing thousands of messages to yourself just to move them from one machine to another... If your former employer followed the prevailing advice here of locking up mail on a server, then this is the ONLY way you can keep your email.

    If you kept copies of your mail locally, then you can burn the mail archive to CD, but since your mail is still in some client-specific format, you must install the same mail client on your new machine. Perl help you if the mail client software does not run on it.

    --
    You can have it good, fast, or cheap. Pick any two.
  183. Re: Am I missing something? by rplacd · · Score: 1

    courier-imap doesn't use a non-standard format.
    see the maildir spec.

  184. And before Exchange... by Anonymous Coward · · Score: 0

    we used to call this idea cc:Mail...

    Notes/Domino could (and can) also be configured to use single-store. No one uses single-store in Domino because there used to be a dearth of backup tools for this configuration, and old habits die hard.

  185. VMS Notes and how VMS Mail does it... by I91MM · · Score: 1
    I heard that Lotus Notes was developed from/inspired from (take your pick) a product called DEC Notes, which ran on VMS systems.

    Having used DEC Notes and Lotus Notes, however, I can't really see any obvious connection between the two. Except, maybe DEC sold the name to Lotus...

    DEC Notes was designed primarily as an online, threaded conferencing system. A bit like /. really, but on text terminals. Not very many sites still running Notes, but it is quite a neat little system. AIUI, it also supports message replication over a network using DECnet. Don't know how the messages are stored.

    On the mail front, VMS initially (V2.0?) stored mail in flat text files, with form feeds between messages. Later versions of MAIL stored the mail in an indexed file; messages under 2,048 bytes are stored directly in the MAIL.MAI file; messages over that size are stored in external files of the form MAIL$nnnnnnnnnnnnnnnn.MAI, where nnnn...nnnn is the time of receipt in hexadecimal with a resolution of one hundredth of a second. The file organisation is handled by the file system. [Well, sort of. It's handled by RMS, the Record Management System, which isn't part of the filesystem per se, but is an adjunct to it. Like the name says, it manages records in files, not files themselves. The two are independent; the file system merely has to provide space for RMS to use to store file metadata. In situations where this is not possible (such as an ISO 9660 CDROM), the ACP (Ancilliary Control Program) for the file system ususally just makes a guess and fakes the RMS information: this is one of the major problems with trying to support foreign file systems in VMS...]

    Mail stored in the old, flat-file format can still be read by MAIL in the current version of VMS (7.3). [Backwards compatibility has always been one of VMS's strong points...]. IMAP and POP3 servers are available (commercial and freeware) to allow your mail to be read from the platform of your choice.

    One of the nice advantages of the way VMS mail works is that it is easy to merge mailboxes from two systems together. Just copy all the files into the same directory (the second MAIL.MAI becomes MAIL.MAI;2 thanks to version numbering), RENAME MAIL.MAI;2 MAIL_TO_BE_MERGED.MAI, then MERGE MAIL.MAI MAIL_TO_BE_MERGED.MAI.

    Of course, in Unix you can just cat two mbox files together <g>

    No doubt this post will be vigorously flamed for even mentioning the "legacy" operating system VMS, never mind explaining some of its mysteries :-) But, hey, I like it, even if no one else does these days. It has some really neat features and is incredibly stable (Until I get my hands on it!).

    -Malcolm (a VMS sysadmin).

    --

    Sen vord is thrall and thocht is fre,
    Keip veill thy tonge I conseill the.

  186. Re:Don't speculate. Profile. by ahde · · Score: 2

    I wonder...

    would a tarred maildir decrease the number of disk reads (renames would be trickier, but possible) and inodes, or would the tar overhead be greater than that of the filesystem?

  187. confusion resolved by Doktor+Memory · · Score: 2

    I see the problem here. You are attempting to use Evolution when the mail client you were actually wanting to install is called "mutt".

    If you don't like GNOME and GTK+, for the love of pete don't use a mailer that says in big flaming letters "I am a GNOME program!".

    --

    News for Nerds. Stuff that Matters? Like hell.

  188. One folder to rule them all... by Anonymous Coward · · Score: 0

    Are you on crack? Calling Exchange's "groupware features" anything but an utter joke is absurd. They're still trying to catch up to what Lotus has been doing for years, and they aren't doing a very good job of it.

    If you just want to run email, Exchange/Outlook is fine. If you want a collaborative groupware sollution with work flow built in, Domino/Notes is the only answer, currently.

    Plus, Domino runs on Linux, Aix, Solaris, NT, 2000, OS/2, AS/400... The list goes on and on. As far as a shared database, just setup shared mail.

    Not to mention, unlike Exchange, when one mail database gets hosed your whole server doesn't get scrapped. And you aren't supporting Microsoft.

  189. Re:and now for some politeness. by Inthewire · · Score: 1

    Mention that Linux sucks, as it designed for and by Communists, homosexuals, and other undesirable sub-groups. Explain that computers cost money, thus software should cost money. Give directions to barbers, stores that sell soap, and churches. Make it clear that you have no problem with Unix, nor do you hate freedom. Finally, explain the role of currency in society.

    --


    Writers imply. Readers infer.