There's a difference between not being able to pay off a debt in one fell swoop and not being able to make the minimum payment. The later is much more profitable to the credit companies in the long run,.
Memory chunks provide an space-efficient way to allocate equal-sized pieces of memory, called atoms. However, due to the administrative overhead (in particular for G_ALLOC_AND_FREE, and when used from multiple threads), they are in practise often slower than direct use of g_malloc(). Therefore, memory chunks have been deprecated in favor of the slice allocator, which has been added in 2.10. All internal uses of memory chunks in GLib have been converted to the g_slice API.
I'm lucky enough not to suffer from that sort of thing, but I've known all too many people who have. It can definitely be symptomatic of poor communication and coordination, but discrete communication between individual developers (or groups of developers) really isn't necessary in a lot of environments. Adopting a base set of tools and code standards can eliminate a lot of ultimately trivial quibbling.
Wow, good luck debugging and maintaining a hodge podge like that. I can just imagine the want ad: "Needed: Developer competenent in C, Pascal, Perl, Ruby, Bash, and Business Basic. Must also have extensive experience using named pipes, CORBA, SOAP and OK/RPC! to glue disparate systems together." Seriously, there are sometimes good reasons to reimplement or to restrict choices in the first place.
If the environment is disparate enough to have multiple needs, fine, but choose one appropriate language. If you're doing web services, do you want one person writing in PHP, another in Java, another using Ruby on Rails, and yet another doing everying in Zope?
What I'm looking for is a WORM-optimized FS that also has good journaling methods to prevent data loss due to some natural disaster while data is being shifted around.
Erm, Write Once Read Many would imply data isn't shifted, period.
I feel that Ext3 is not optimal for this
You don't provide a reason for dismissing it out of hand. It's a nice solid mature filesystem (thanks to being ext2 + additional features) that's widely supported. And if you want to use it for large files, you just need to tune it appropriately, either by manually increasing block size and blocks per inode, or using the -T flag to use a preset like one of the other posters suggested.
I'm curious about the issues reported by some of the other posts. I've been dealing with terabytes of data across hundreds of filesystems, including data with high turnover (e.g. mailspools, log servers, etc) with no data loss that wasn't attributable to hardware, like RAID controllers without battery backup that were left in write-back mode. I don't care what filesystem you're running, it won't be able to recover data that was in volitile cache during a power event.
And then someone might be tempted to work hard at trying to make your standard fix it and work. They might spend hours re-inventing the wheel. And what will that get them?
Why, the ability to say, "Yep, and we did it all with one language."
At the other extreme you've got people writing in whatever they want whenever they come across a problem and end up re-inventing the wheel because either "I don't like Perl!" or "Numbnuts wrote this code in Object Intercal 95, which doesn't have a compiler/interpreter on the platform I need."
And what does that get you?
Why, the ability to say, "Nope, we don't confine our employee's choice of languages." Well that and a morass of code based as much on individual whim as any logical need.
As always, there is a middle ground - having a standard (or standards) with an allowance for justified exceptions.
Single percision is often sufficient, but their single percision mode isn't IEEE complaint, and doesn't support a full range of floating point ops. This implementation was very much driven by the requirements of media and graphics.
It'll be interesting to see if IBM applies the same basic design philosophy to a more general purpose implementation though.
Actually, the current cell is not particularly useful for scientific applications. They only achieved the speeds they're throwing around by using sloppy single percision floating point. Put the processor into IEEE compliant mode and it's a full order of magnitude slower.
That's not a safe assumption. For one, most states honour the "employment at will" doctrine, which means no requirement of notice for termination (or for resignation).
Even if you're in a state which mandates notice, they don't require an employer honour your requested notice; it's the state requirement, usually two weeks, that they'd be required to honour.
If the files are not in the buffer cache using fs storage, then they would also not be in the DBs cache using a db for storage. You will have LESS RAM available for caching data if you use a database, since you now have all the other stuff you don't need from a database using up RAM too.
The difference here is that in a database you're caching the index - i.e. the information you need for message operations - and not the message itself. Filesystem caching typically works on a block level only. With ext3, for example, the minimum block size is 1k (and defaults to 4k). The relevent metadata is much much smaller.
On top of that, most operating systems will read ahead a few blocks on top of the requested information.
Having an index implemented by the IMAP daemon certainly solves some problems. If the underlying message format encodes metadata into the filename. That way you're only stating the directory and inspecting unchanged messages. The big problem with that approach, though, is that you're now indexing on access, not on delivery, which means you're pushing the machines harder during peak use periods, rather than taking advantage of spare cycles throughout the day.
Whether the idea is implemented is irrelevant to whether it's good. (And for that matter, there are database driven MTAs already.)
And whether something "complicates backups" is not always the major concern for a particular implementation. If that were true, noone would be using databases at all.
In the case of mailbox indexing, there's a simple solution to that concern anyways - don't put mail in the database; just shove metadata in there. Voila. Problem solved. You can even leave the actual mailbox in any format you'd like, whether your preference is mbox, mbx, maildir, or whatever. And then your index can be regenerated if it gets corrupted, or the on-disk data changed since last access.
Have a problem with using an external process for indexing? Fine, use an in-process database - sqlite, or maybe bdb if you don't insist on SQL semantics.
The approach certainly doesn't make sense for every instance, but as usual, simplistic approachs only make sense up to a point. Hell, even "plain filesystem" approaches have become more complex over the years because the origional ones were simple because they were unoptimized (try putting a gig of mail in mbox format, and then deleting an old message for a good example).
For my little mail server, even if it did catch fire, I could build a new one in under an hour. This is no different than a cluster of machines... if one dies, you simply replace it and move on. (with a properly oversized cluster nobody will notice one machines failure.)
An hour of downtime? That's incredibly unacceptable.
I don't know what "industry" you live in, but I've not seen anyone build a mail cluster[*]. NCSU is the closest, but that was simply departmentalized servers; the only reason any department had more than one server was due to storage not scalabilty or reliability.
Among other things, large scale e-mail hosting.
NCSU is the closest, but that was simply departmentalized servers; the only reason any department had more than one server was due to storage not scalabilty or reliability. (And that was for ~25k people.)
That's great. Some of us have SLAs.
[The largest I've worked with had just shy of 70k mailboxes across about 5k domains... on ONE server. It could've handled much more than that.]
We've had mail domains as large as a million users. I know how much you can shove on a single box. That doesn't change the fact that if your single point of failure goes, you lose everything.
Reading files from a directory is faster than an indexed database query? Interesting theory, and perhaps true with a small number of files that happen to be hot in the buffer-cache.
If we talking about a reasonably loaded mailserver, with a lot less memory than disk space (we average about 512MB per 100GB of storage), no. The machine will choke on a flood of itsy bitsy teeny weeny disk ops. This is, however, where a database shines - whether we're talking about something fully relational, or something embedded (like a dbm or bdb file).
Databases are not only appropriate for when data is being queries in "arbitrary, complex ways". The majority of databases out there are used to provide simple, well-defined access to indexed persistent data. This use pattern suits mail just dandy.
It's less taxing in terms of memory, but having the metadata for each message in individual files means you have to open each file to extract. For some common operations, like bringing up a message index, this means a ton of disk I/O.
For a small scale server this doesn't really show up, but as you add users the disk contention will kill your performance. You can offset this by adding more memory to the machine, so frequently accessed mailboxes are likely to be pulled into memory, but then "doesn't use any userspace memory" seems kind of an empty benefit.
Pure file-based message formats are great for smaller installations, but the larger one scales the more benefit one can reap from some form of intelligent indexing. Relationial databases are overkill for this sort of thing, but they have the benefit of being well understood, widely deployed, allow for easy manipulation with existing tools, and have established intelligent backup methods.
Me: "Well, boss, we're having problems with Linux at our datacenter, but don't
worry, I already found the answer on one of the newsgroups."
or
Me: "Well, boss, we're having problems with Linux at our datacenter, but don't
worry, I dug into the source code and found the issue."
or
Me: "Well, boss, we're having problems with Linux at our datacenter, but don't
worry, I messaged one of the original developers on IRC and worked out what the
problem was."
Not every shop has the in-house expertise to deal without support, but there are plenty of us out here that do it. Frankly, most vendor support is shit anyways. We have support contracts for some of the software we run, and I usually don't bother; it's quicker to figure it out myself.
There's only three entries on the list that aren't with OpenSSL directly: CAN-2004-0607, CAN-2004-0975, and the one you cited. So it's fifty rather than fifty-three.
Honest? Like De Raddt initially refusing to divulge details of an exploit in a bid to blackmail distributors to upgrade to 3.3 and enable privilege seperation, despite it being broken in numerous ways (e.g. privsep+compression broken on 2.2 kernels, broken or incomplete integration with PAM on numerous platforms).
The fact of the matter is that the two packages, individually and combined, have had many more vulnerability days than anything else we run in-house, including many that are not only at least as wide spread, but have a reputation (not always deserved with current versions) for insecurity (e.g. sendmail, BIND, wu-ftpd).
It's also worth noting that the window of vulnerability is not how long it takes for a patch to come out from the upstream vendor; you need to also account for packaging, testing and roll-out.
And regardless of how timely one closes the hole, each upgrade still has a cost. People who've never done large scale administration tend not to understand this, since as often as not they just compile a new version and slap it on, but when you deal with a large number of machines and have service level agreements to meet you need to be a lot more rigorous in the QA phase and often need to schedule a downtime window with your customers.
There's a difference between not being able to pay off a debt in one fell swoop and not being able to make the minimum payment. The later is much more profitable to the credit companies in the long run,.
I'm lucky enough not to suffer from that sort of thing, but I've known all too many people who have. It can definitely be symptomatic of poor communication and coordination, but discrete communication between individual developers (or groups of developers) really isn't necessary in a lot of environments. Adopting a base set of tools and code standards can eliminate a lot of ultimately trivial quibbling.
Wow, good luck debugging and maintaining a hodge podge like that. I can just imagine the want ad: "Needed: Developer competenent in C, Pascal, Perl, Ruby, Bash, and Business Basic. Must also have extensive experience using named pipes, CORBA, SOAP and OK/RPC! to glue disparate systems together." Seriously, there are sometimes good reasons to reimplement or to restrict choices in the first place.
If the environment is disparate enough to have multiple needs, fine, but choose one appropriate language. If you're doing web services, do you want one person writing in PHP, another in Java, another using Ruby on Rails, and yet another doing everying in Zope?
What I'm looking for is a WORM-optimized FS that also has good journaling methods to prevent data loss due to some natural disaster while data is being shifted around.
Erm, Write Once Read Many would imply data isn't shifted, period.
I feel that Ext3 is not optimal for this
You don't provide a reason for dismissing it out of hand. It's a nice solid mature filesystem (thanks to being ext2 + additional features) that's widely supported. And if you want to use it for large files, you just need to tune it appropriately, either by manually increasing block size and blocks per inode, or using the -T flag to use a preset like one of the other posters suggested.
I'm curious about the issues reported by some of the other posts. I've been dealing with terabytes of data across hundreds of filesystems, including data with high turnover (e.g. mailspools, log servers, etc) with no data loss that wasn't attributable to hardware, like RAID controllers without battery backup that were left in write-back mode. I don't care what filesystem you're running, it won't be able to recover data that was in volitile cache during a power event.
And then someone might be tempted to work hard at trying to make your standard fix it and work. They might spend hours re-inventing the wheel. And what will that get them?
Why, the ability to say, "Yep, and we did it all with one language."
At the other extreme you've got people writing in whatever they want whenever they come across a problem and end up re-inventing the wheel because either "I don't like Perl!" or "Numbnuts wrote this code in Object Intercal 95, which doesn't have a compiler/interpreter on the platform I need."
And what does that get you?
Why, the ability to say, "Nope, we don't confine our employee's choice of languages." Well that and a morass of code based as much on individual whim as any logical need.
As always, there is a middle ground - having a standard (or standards) with an allowance for justified exceptions.
Single percision is often sufficient, but their single percision mode isn't IEEE complaint, and doesn't support a full range of floating point ops. This implementation was very much driven by the requirements of media and graphics.
It'll be interesting to see if IBM applies the same basic design philosophy to a more general purpose implementation though.
Actually, the current cell is not particularly useful for scientific applications. They only achieved the speeds they're throwing around by using sloppy single percision floating point. Put the processor into IEEE compliant mode and it's a full order of magnitude slower.
That's not a safe assumption. For one, most states honour the "employment at will" doctrine, which means no requirement of notice for termination (or for resignation).
Even if you're in a state which mandates notice, they don't require an employer honour your requested notice; it's the state requirement, usually two weeks, that they'd be required to honour.
Both.
If the files are not in the buffer cache using fs storage, then they would also not be in the DBs cache using a db for storage. You will have LESS RAM available for caching data if you use a database, since you now have all the other stuff you don't need from a database using up RAM too.
The difference here is that in a database you're caching the index - i.e. the information you need for message operations - and not the message itself. Filesystem caching typically works on a block level only. With ext3, for example, the minimum block size is 1k (and defaults to 4k). The relevent metadata is much much smaller.
On top of that, most operating systems will read ahead a few blocks on top of the requested information.
Having an index implemented by the IMAP daemon certainly solves some problems. If the underlying message format encodes metadata into the filename. That way you're only stating the directory and inspecting unchanged messages. The big problem with that approach, though, is that you're now indexing on access, not on delivery, which means you're pushing the machines harder during peak use periods, rather than taking advantage of spare cycles throughout the day.
Whether the idea is implemented is irrelevant to whether it's good. (And for that matter, there are database driven MTAs already.)
And whether something "complicates backups" is not always the major concern for a particular implementation. If that were true, noone would be using databases at all.
In the case of mailbox indexing, there's a simple solution to that concern anyways - don't put mail in the database; just shove metadata in there. Voila. Problem solved. You can even leave the actual mailbox in any format you'd like, whether your preference is mbox, mbx, maildir, or whatever. And then your index can be regenerated if it gets corrupted, or the on-disk data changed since last access.
Have a problem with using an external process for indexing? Fine, use an in-process database - sqlite, or maybe bdb if you don't insist on SQL semantics.
The approach certainly doesn't make sense for every instance, but as usual, simplistic approachs only make sense up to a point. Hell, even "plain filesystem" approaches have become more complex over the years because the origional ones were simple because they were unoptimized (try putting a gig of mail in mbox format, and then deleting an old message for a good example).
For my little mail server, even if it did catch fire, I could build a new one in under an hour. This is no different than a cluster of machines... if one dies, you simply replace it and move on. (with a properly oversized cluster nobody will notice one machines failure.)
An hour of downtime? That's incredibly unacceptable.
I don't know what "industry" you live in, but I've not seen anyone build a mail cluster[*]. NCSU is the closest, but that was simply departmentalized servers; the only reason any department had more than one server was due to storage not scalabilty or reliability.
Among other things, large scale e-mail hosting.
NCSU is the closest, but that was simply departmentalized servers; the only reason any department had more than one server was due to storage not scalabilty or reliability. (And that was for ~25k people.)
That's great. Some of us have SLAs.
[The largest I've worked with had just shy of 70k mailboxes across about 5k domains... on ONE server. It could've handled much more than that.]
We've had mail domains as large as a million users. I know how much you can shove on a single box. That doesn't change the fact that if your single point of failure goes, you lose everything.
Reading files from a directory is faster than an indexed database query? Interesting theory, and perhaps true with a small number of files that happen to be hot in the buffer-cache.
If we talking about a reasonably loaded mailserver, with a lot less memory than disk space (we average about 512MB per 100GB of storage), no. The machine will choke on a flood of itsy bitsy teeny weeny disk ops. This is, however, where a database shines - whether we're talking about something fully relational, or something embedded (like a dbm or bdb file).
Databases are not only appropriate for when data is being queries in "arbitrary, complex ways". The majority of databases out there are used to provide simple, well-defined access to indexed persistent data. This use pattern suits mail just dandy.
It's less taxing in terms of memory, but having the metadata for each message in individual files means you have to open each file to extract. For some common operations, like bringing up a message index, this means a ton of disk I/O.
For a small scale server this doesn't really show up, but as you add users the disk contention will kill your performance. You can offset this by adding more memory to the machine, so frequently accessed mailboxes are likely to be pulled into memory, but then "doesn't use any userspace memory" seems kind of an empty benefit.
Pure file-based message formats are great for smaller installations, but the larger one scales the more benefit one can reap from some form of intelligent indexing. Relationial databases are overkill for this sort of thing, but they have the benefit of being well understood, widely deployed, allow for easy manipulation with existing tools, and have established intelligent backup methods.
Erm, I've run mail for 13 million users on about 60 machines. The average configuration was dual Pentium II 450s and 256MB of RAM.
That's something we in the industry like to call luck. Not the best reliability plan.
Erm, the page you linked to offers it in beige, gray, or black. No spray paint required. :)
There are at least two free software implementations of flash, one LGPL (http://www.schleef.org/swfdec/) and one GPL (http://swift-tools.net/Flash/).
What about:
Me: "Well, boss, we're having problems with Linux at our datacenter, but don't
worry, I already found the answer on one of the newsgroups."
or
Me: "Well, boss, we're having problems with Linux at our datacenter, but don't
worry, I dug into the source code and found the issue."
or
Me: "Well, boss, we're having problems with Linux at our datacenter, but don't
worry, I messaged one of the original developers on IRC and worked out what the
problem was."
Not every shop has the in-house expertise to deal without support, but there are plenty of us out here that do it. Frankly, most vendor support is shit anyways. We have support contracts for some of the software we run, and I usually don't bother; it's quicker to figure it out myself.
There's only three entries on the list that aren't with OpenSSL directly: CAN-2004-0607, CAN-2004-0975, and the one you cited. So it's fifty rather than fifty-three.
Honest? Like De Raddt initially refusing to divulge details of an exploit in a bid to blackmail distributors to upgrade to 3.3 and enable privilege seperation, despite it being broken in numerous ways (e.g. privsep+compression broken on 2.2 kernels, broken or incomplete integration with PAM on numerous platforms).
The fact of the matter is that the two packages, individually and combined, have had many more vulnerability days than anything else we run in-house, including many that are not only at least as wide spread, but have a reputation (not always deserved with current versions) for insecurity (e.g. sendmail, BIND, wu-ftpd).
It's also worth noting that the window of vulnerability is not how long it takes for a patch to come out from the upstream vendor; you need to also account for packaging, testing and roll-out.
And regardless of how timely one closes the hole, each upgrade still has a cost. People who've never done large scale administration tend not to understand this, since as often as not they just compile a new version and slap it on, but when you deal with a large number of machines and have service level agreements to meet you need to be a lot more rigorous in the QA phase and often need to schedule a downtime window with your customers.
It's hard to be impressed with "paraoid-level security" when one has dealt with OpenSSL and OpenSSH have had fifty three vulnerabilities between the two of them over the past five years: CVE-2000-0525, CVE-2000-0887, CVE-2000-1169, CVE-2001-0361, CVE-2001-0529, CVE-2001-0816, CVE-2001-0872, CVE-2001-1029, CVE-2001-1380 , CVE-2001-1382, CVE-2002-0059, CVE-2002-0083, CVE-2002-0575, CVE-2002 -0639, CVE-2002-0640, CVE-2002-0765, CAN-1999-0661, CAN-2000-0535, CAN -2001-0572, CAN-2001-1459, CAN-2001-1483, CAN-2001-1507, CAN-2003-0190, CAN-2003-0386, CAN-2003-0682, CAN-2003-0693, CAN-2003-0695, CAN-2003-0786, CAN-2003-0787, CAN-2004-0175, CAN-2004-1653, CAN-2004-2069, CVE-1999-0428, CVE-2001-1141, CVE-2003- 0078, CAN-2000-0535, CAN-2002-0655, CAN-2002-0656, CAN-2002-0657, CAN- 2002-0659, CAN-2002-1568, CAN-2003-0131, CAN-2003-0147, CAN-2003-0543, CAN-2003-0544,
Amazing revelations to start my morning off with.
IIRC, the iPod has at least one ARM chip, but I'm not sure who makes it.
Texas Instruments. It's mentioned in the article.