This does not solve the design problem
on
The Zen of SOA
·
· Score: 1
That's what SOA aims at: interchangeable components in systems. You're not crafting one big program, or complex of programs, from end-to-end, making it up as you go. You're building uniformly-structured and interchangeable components, and assembling them.
Yeah, in the programming world we call these things "libraries". People have been preaching componentized programming for decades. Reuse code. Use libraries. Don't reinvent the wheel. Easy, right? Nope. The problem is that most people don't have the skill to design a usable component interface, be it a library or a COM component (which is what I assume this "new" SOA thing is). Then, everyone who uses it has to write a wrapper on top of it so it would fit into their design. Then you have whole components wrapping components wrapping components, each layer written to its author's perverted taste and sucking in some particular way that the next wrapper writer will attempt to fix. Yes, I'm looking at you, GNOME... Delegating component design is always a difficult choice and when you buy from the minimum bidder, you end up with crap. Crap that fouls up your entire application by association unless you wrap it tightly into a leak-proof envelope, which kinda defeats the purpose of reusing code in the first place. Be it libraries, COM, SOA, or whatever new shiny name people come up with, the original problem remains: only a good programmer produces good code. No amount of delegation will make an army of bad programmers produce good code.
> Debian users are far more eclectic in their software choice, less likely to use any default options.
When most of your experienced users think your default options are crap and refuse to use any of them, perhaps it is a good time to change those defaults, eh?
> Actually, this is something the average person can see at home.
If anyone here had been a kid back in the last century, we could all have shared the memory of kite riders (no, I don't really know what they are called in english). While flying a kite, put a piece of paper on the string, and in a strong wind it will ratchet itself up. You could improve them, of course, one of the favorite mods being a release rod which would release the payload upon reaching the kite. You couldn't lift a lot of weight this way, but strategic stinkbomb bombardment was possible. Yes... Kids these days are sure missing out on a lot of things...
> It's really nice to be able to SSH directly to your boxes behind your router.
Really nice for hackers too. I don't know about you, but even if the whole internet switches to IPv6, I'm still keeping my NAT firewall. My computers have no business being poked from the internet.
Because when the computer just dies like that, there usually are no logs to explain it. If only because the computer died before they could be written.
> Then point out that 40% of the potential customers are being turned away....
That depends on who your customers are. If you are selling some technical product, or something that runs only Linux, you can ignore IE users completely, since IE does not run on Linux and no self-respecting geek would use it anyway. Likewise, if you would like to only sell to intelligent people, perhaps to save on tech support costs, then making a site that doesn't work in IE is an effective way to do it.
Go east, to a place called Klamath. K-l-a-m-a-t-h. Find Vic. V-i-c. Ask for beer. B-e-e-r. *sigh* You are the chosen one. Find the beer. Be our salvation.
Yeah, so I have a different password for every account I have. There is no friggin' way I'm going to remember them, so I keep them in a gpg encrypted file, which I consult when I need to. But the point of the password manager is not that you don't have to remember the password; it's that you don't have to type it. I do not want to type any passwords. All the sites these days are so paranoid about security, they make you type passwords all the time. Without a password manager I'd have to type a dozen passwords every day as I check all the sites on my morning list. You are welcome to be paranoid and type (don't forget the keyloggers!), but I'd rather not even have to click through. The computer knows who I am; it should be able to keep track of all that authentication info and negotiate connections automatically.
> Most people are not in a position where they can be sure that they will never run out of physical memory.
If you can fill 4GB of physical memory, you can fill your swap file just as quickly, and then your apps will crash anyway. So really, this is not a good argument. You can run out no matter how much you allocate. The right question to ask is what happens when you do, and that is that the process that is filling all that memory will be killed. Unless you are using OpenOffice, I highly doubt that that process will be your word processor.
As a person with sensitive eyes, I am constantly annoyed by applications setting backgrounds to white. White backgrounds hurt, people! And I mean actual physical pain here. So if you are writing some application, please use system colors, or at the very least let the user change the color scheme. In ten more years your eyes will get tired too, and trust me, you'll thank me.
Re:Turn in your nerd card...
on
American Nerd
·
· Score: 1
> Only with certain accents. In most American accents (and a great many British > and Indian ones) the words "their", "they're" and "there" sound exactly the same.
There is no such thing as an "accent". There is speaking proper English, and there is not speaking proper English, and while I do not claim to always do the former (English is my second language, after all), I at least recognize that there is one proper thing to strive for and am able to recognize it when I hear it. Likewise, I may occasionally pronounce "their" and "there" similarly, but I always know which one I meant to say, and would certainly be able to say it correctly if asked to repeat myself.
Turn in your nerd card...
on
American Nerd
·
· Score: 1
> When words like "there/their/they're" come along, my brain just says "there".
There goes someone who learned to read at an advanced age, like, say, in school. If you had learned these words from reading and looking them up in the dictionary, you would have acquired the proper pronunciation for each. "There" is pronounced "the-ere", "their" rhymes with "Jane Eyre", and "they're" is sounded out as "they are", but with the middle vowels shortened. Learn to speak, and you'll learn to spell.
> In quick mode, my brain is much more inclined to type "loose" for the sound "lewz", > because most every other word that has a double-o makes the "ew" sounds.
Here the pronunciation is also different. "Loose" has a softer s sound, like in "lusse", while the s in "lose" sounds more like a z. If we fixed all the spelling in the language into a proper english (as opposed to french or german perversions) form, where letters are used consistently to represent specific sounds, everyone would instantly gain the ability to spell perfectly. The way things are now, we're stuck with illogical letter assignments, many of which are the work of the evil french...
I'd buy the Xonar D2X. It compares reasonably well with the high-end HiFi cards, but also has EAX, and full Linux support. Naturally, you'll also need decent speakers. I have M-Audio AV40s, and that's about as much desk space as I can spare:) I'd add a 10" subwoofer if I could figure out where to put it.
> Shouldn't there be some law to protect the American people from legislators > who commit felonies relating to their position?
Did you know that desertion from the army is a felony? Do you also know that the Pope was a deserter in WWII? While desertion at the time of peace is just a felony (2-3 years max imprisonment in the US), during war it is punishable by death. If you insist on lumping all "felons" together as automatically evil, I would suggest that if you are good enough for the Holy See, you are probably good enough (too good?) to run for the US Senate.
> I've been using standby/sleep extensively on my desktops and laptops for the last 10 years, > and I still can't understand why people with a modern machine don't use standby.
Because it still doesn't work for everyone. I tried it a month ago, followed the instructions in the suspend HOWTO, made that suspend script and ran it. The machine suspended, but didn't resume - everything spun up, but looked dead. Sure, it might be easy to fix, but doing so would entail poring through hundreds of forums posts written by clueless idiots for that one little bit of information, followed by dozens of reboots for test-fail-retry cycles. I might get around to it, when I have a month to spare. Many things on Linux [don't] work like that...
Since 9/11, of course, and the day we implemented "level 4 security", whatever the hell that means, and mandated anyone to not dare show up without his ID badge.
> the most used way to determine if movie is successfull seems to be how well it sells during > the first weekend: Before anyone has had the chance to see it and tell others if it is good or not.
Ah, but that's the point. They already know the movie is bad, so their only chance to sell it is before people start talking. Remember that you don't have to be good to be successful;)
> "The hungry and cold unemployed masses aren't going to continue giving away their intellectual > labor on the Internet in the speculative hope that they might get some 'back end' revenue,"
Yeah, right! As the author of a dozen OSS projects, I can tell you right away that I did not "give them away in the speculative hope that they might get some back end revenue". I haven't made a cent on them, didn't plan to, and am not going to in the future. I'm not selling them because nobody would pay anything for them. I wrote them for my personal benefit, and open-sourced them because I had no reason not to. Perhaps someday someone will make something useful out of them, and make money from them, but I'm not holding my breath. In the meantime, it costs me nothing to publish them, so I do. The depression will not change this. It may give people less free time to work on OSS, but that doesn't mean it's going away.
> Secondly, extra writes don't necessarily cause a performance hit. It all depends > on the usage patterns. If the writes can be performed when the disk would > otherwise be idle, no performance is lost.
That's what the disk cache already does for you. You are still making two writes instead of one, so no matter how long you decide to postpone it, it will catch up with you eventually. Not only that, but you are increasing the interval between the time you finish writing and the time when the data gets to the disk by a factor of two, thus increasing your chances of catching a hardware failure in it. I bet it also makes making meaningful benchmarks incredibly hard. Oh well. I'll just drop this subject now due to lack of information.
> So you meant "space-efficient". I thought we were talking about time-efficiency.
On the disk, space efficiency is time efficiency. The bottleneck of the io system is on the physical disk surface, and the less of it you use, the faster you can get the data to you. Any form of compression should give significant benefits, especially with today's fast CPUs.
> True, but the same goes for archive files, of course. Any metric that the archiver > uses to determine which files should be compressed, the filesystem can use, too.
I was not talking about the archiver; I was talking about you. The difference is that with an archive, the programmer or the user decide which files go together, which files should be compressed, and which files are not very important. This is not a decision that can be made automatically because it depends on the contents of the files and the meaning of that contents to whoever needs it. I don't see the computer being able to make this decision in the general case, barring significant advances in AI.
> For example, the application might know which files should be compressed, and which > ones shouldn't, but the VFS does not typically allow this information to be communicated > to the filesystem. A purpose-built storage system specific to the particular application > could, of course, incorporate anything and everything that might be useful.
Well, yes, you could go this far, and build a specialized backend for every application. Heck, that's what everyone has been doing for decades, with proprietary file formats. What I'm suggesting is a standardized general compound file API, much like the OLE compound documents on Windows, only not quite as ugly and bloated. Applications already write their own data formats, and that will continue, but we are missing an easy way to package multiple related files together.
> You are saying that adding at least one additional API and updating all the > standard utilities is not a significant complexity increase. I beg to differ. > This would be changing the foundations on which Unix is built.
"The foundations of Unix" are not sacred. They are just ossified into the OS. That does not mean they are perfect, only that we are used to them. A VFS extension, IMO, is not all that hard in principle, and it will not in any way change your "foundations". All it will do is permit you treat compound files like directories. The basic capabilities otherwise stay the same.
> But that wasn't the case I was talking about. I was talking about the case where > you have pieces of data that are meaningful to the user, like individual emails. > I thought that was what you were talking about, as well.
No, what I had in mind specifically was the application internal data files. Look in the various subdirectories under/usr/share and you'll find plenty of tiny little files that ought to be aggregated. Another example is config files. I've seen proposals to create a registry-like storage where each value is a file! Sure, throw it in and let the filesystem figure it out. Right...
Saving a document into a multiple files is another bad idea. Database packages, like, say ingres, do that. A documen
> Just because journalled ext3 is slower than unjournalled ext2 doesn't mean that > journalling in general imposes an intolerable performance hit.
Journalling, by its very nature, implies turning every write into two writes. Furthermore, you have to add various hacks to avoid problems from out-of-order writes, to determine when the cache actually gets to the physical disk, and I am sure many other little things. I am not an fs designer, having learned everything I know about journaling from wikipedia, so I am not going to speculate whether there are ways to implement these things efficiently. The only thing I am certain of is that an unjournaled filesystem will always be simpler and faster than a journaled one, if only by a little bit.
> But one advantage of having a journal is that you can maintain consistency > even if you suffer a failure in the middle of a write.
This is relevant and interesting if you are running a bank. It is completely uninteresting if you spend your day writing text files and compiling them, like I do. The most I can possibly lose in a crash is five minutes of work, and the likelihood of the crash is so low, as to be beyond consideration. Since the magnitude of the disaster journaling averts is miniscule in my case, performance considerations far outweigh it, and I would always try to turn journaling off and trade it for performance and free memory that would save me time in the edit-compile-test cycle. That is my whole justification, and I see it as a damn good one.
> If you claim that compressed archives are more efficient for storing many small files, you had > better actually know at least one compressed archive format that actually is more efficient. > Otherwise, you would just be making an unfounded claim. Again, I ask you to back up your claims.
What sort of foundation would you like to hear? After all, if a particular compressed archive API does not exist, I can not give you benchmarks for it, can I? Nevertheless, I would consider it obvious for the simple reason that compressing a hundred files into an archive will always take less space than a hundred uncompressed files, no matter how efficiently they may be aggregated. While you can compress the entire filesystem, that is usually not a good idea, because there is no generic way to determine which files should be compressed and which ones should not. The only way I can think of doing it is to collect access patterns on each file and compress the ones that are read more often than written. Unfortunately, the access log has to be stored somewhere, and will naturally wipe out any savings you may get on your 4 byte files:)
> I _like_ the units of data that I think about to coincide with files; this allows me to use tools > that manipulate files to manipulate a lot of things on my computer: documents, programs, configuration > files, messages, etc. I don't like the idea of having to use specialized tools for each of these. If > anything, I would try to _extend_ the number of things I can apply my standard tools to, rather than > shrink it by hiding many objects in a single unit.
> Yes, exactly. You are proposing storing things in such a way that only that one application can access them. I don't > like that idea at all, for the reasons I have pointed out above.
First of all, I do not think that allowing the user to muck with application internal files with "standard tools" is a good idea. Just as I encapsulate private implementation of a class, making it inaccessible to the code that uses it, likewise I would seek to encapsulate the private details of an application and "hide" them from the user. That does not mean that I would go out of my way to prevent the user from manipulating it, only to hide unnecessary details and clutter from his filesystem. Packaging all the application's files into a suitable collection of archives would be very appropriate. Not only is this neat and space-efficient on every fi
>> Latency is the most important criteria, and reiserfs is just too complicated to deliver it'' > Excuse me? Do you have any numbers to back up that claim? > Because I'm having a hard time taking it on face value.
You are the one trying to get me to switch file systems, so you are the one who has to provide the numbers. I look at those benchmarks, and ext2 looks like the winner. If you have others, I'll be happy to take a look at them.
> A filesystem that has been included in the mainline Linux kernel for several years, > is offered as a prominent choice during installation of various distros, used to be > the default fs on some distros, and is widely used by people who make conscious and > informed choices about which filesystem to use.
Most people just take the default filesystem for their distro, and I don't know of any that have reiserfs as the default. That would meet my definition of "fringe". From this perspective, it would be the height of foolishness for me to write any code depending on tail packing for performance. Speaking as a user, I would probably just choose applications that don't create lots of files. The filesystem is not a replacement for compound files, having been designed for a different purpose.
> Kindly point me at this compressed archive format that lets me fetch files > (small and large) by name and other attributes more efficiently than Reiser4 > or even ReiserFS.
I don't know of any. Care to write one? 'Cause I'd sure like to have it. I think you can do this with Kioslaves, but I'm no expert there. My point is that it should exist, because it is the right solution. When you have many small files, that's just bad design. They are hard to keep track of, and there is no reason for them to be individually user-visible in the first place.
> Then please point out how I can use this as I would a filesystem: > so that the good old Unix software can access the files.
If you want that, you can use FUSE on a loop, but then, of course, Linux would make you mount it first. I was talking about doing it inside the application, transparently to the user.
> ext2 is _not_ faster than all journalled filesystems for everyone, and that > the performance hit of journalling, if any, is not "too high to tolerate" for everyone.
According to the benchmarks I linked to, ext2 is twice as fast as journalled ext3 for sequential access. I would call that "too high to tolerate". From personal experience, my system have felt noticeably more responsive since I had switched to ext2, so I'm inclined to keep it.
> I think smart people realize that having a UPS is no guarantee that your system will > never fail in the middle of a write.
Neither is a journal. But consider that the penalty is just an fsck, which on my disks takes about five minutes. I haven't had a power failure crash since I got the UPS a few years ago and I haven't seen a kernel panic since 1996. All in all, I think that performance gains outweigh the negligible "risk" of an fsck. And even if there is data loss, it would likely be very minor, and I keep backups anyhow.
Now, don't get me wrong, I would probably have been more concerned with fs reliability if I were running a data center. But on my own machine the chances of a disaster are so slight and the potential data loss is so small, that reliability in the face of a system crash is simply not an issue. As I keep saying, I'm not running a server for a Fortune 500 company; it's a whole different world out here, with an entirely different set of concerns, and it would be nice if kernel developers took notice of us once in a while.
> ReiserFS still is a great filesystem in terms of reliability and performance, from > tiny files to huge ones, under a wide range of scenarios. Reiser4 was going to be > even better: faster and more flexible and extensible, with fast arbitrary attributes > and a lot of other goodness.
> That humans are the only "intelligent" species using radio transmission as a communications medium
Or are the only ones stupid enough to broadcast it in the clear. In case you haven't noticed, most radio communication of our own civilization is already in digital spread spectrum form, which is indistinguishable from white noise unless you have the proper key. In fact, this property was the very reason why spread spectrum techniques were invented for. Military communications, noise radar, etc. Look it up. And all this only a hundred years after we discovered radio. Any advanced civilization would be completely undetectable by radio due to use of these techniques and of directional antennas. The only ones we'd be able to find are the stupid ones broadcasting their existence, and those won't exist for very long until they are wiped out by the Grox homing in on those transmissions.
> The simple fact though is that modern processors and disks are so fast > that the minimal speed impact of journaling is barely noticeable. It's > certainly not worth giving up over some marginal speed gains.
If you look at the benchmarks I linked to, you'd see that ext2 is twice as fast as a journalled ext3 for sequential access. I wouldn't call that "barely noticeable".
Yeah, in the programming world we call these things "libraries". People have been preaching componentized programming for decades. Reuse code. Use libraries. Don't reinvent the wheel. Easy, right? Nope. The problem is that most people don't have the skill to design a usable component interface, be it a library or a COM component (which is what I assume this "new" SOA thing is). Then, everyone who uses it has to write a wrapper on top of it so it would fit into their design. Then you have whole components wrapping components wrapping components, each layer written to its author's perverted taste and sucking in some particular way that the next wrapper writer will attempt to fix. Yes, I'm looking at you, GNOME... Delegating component design is always a difficult choice and when you buy from the minimum bidder, you end up with crap. Crap that fouls up your entire application by association unless you wrap it tightly into a leak-proof envelope, which kinda defeats the purpose of reusing code in the first place. Be it libraries, COM, SOA, or whatever new shiny name people come up with, the original problem remains: only a good programmer produces good code. No amount of delegation will make an army of bad programmers produce good code.
> Debian users are far more eclectic in their software choice, less likely to use any default options.
When most of your experienced users think your default options are crap and refuse to use any of them, perhaps it is a good time to change those defaults, eh?
> Actually, this is something the average person can see at home.
If anyone here had been a kid back in the last century, we could all have shared the memory of kite riders (no, I don't really know what they are called in english). While flying a kite, put a piece of paper on the string, and in a strong wind it will ratchet itself up. You could improve them, of course, one of the favorite mods being a release rod which would release the payload upon reaching the kite. You couldn't lift a lot of weight this way, but strategic stinkbomb bombardment was possible. Yes... Kids these days are sure missing out on a lot of things...
> It's really nice to be able to SSH directly to your boxes behind your router.
Really nice for hackers too. I don't know about you, but even if the whole internet switches to IPv6, I'm still keeping my NAT firewall. My computers have no business being poked from the internet.
Because when the computer just dies like that, there usually are no logs to explain it. If only because the computer died before they could be written.
> Then point out that 40% of the potential customers are being turned away ....
That depends on who your customers are. If you are selling some technical product, or something that runs only Linux, you can ignore IE users completely, since IE does not run on Linux and no self-respecting geek would use it anyway. Likewise, if you would like to only sell to intelligent people, perhaps to save on tech support costs, then making a site that doesn't work in IE is an effective way to do it.
Go east, to a place called Klamath. K-l-a-m-a-t-h. Find Vic. V-i-c. Ask for beer. B-e-e-r. *sigh* You are the chosen one. Find the beer. Be our salvation.
Yeah, so I have a different password for every account I have. There is no friggin' way I'm going to remember them, so I keep them in a gpg encrypted file, which I consult when I need to. But the point of the password manager is not that you don't have to remember the password; it's that you don't have to type it. I do not want to type any passwords. All the sites these days are so paranoid about security, they make you type passwords all the time. Without a password manager I'd have to type a dozen passwords every day as I check all the sites on my morning list. You are welcome to be paranoid and type (don't forget the keyloggers!), but I'd rather not even have to click through. The computer knows who I am; it should be able to keep track of all that authentication info and negotiate connections automatically.
> Being the consumer sheeple they are, they're going to go with what hits their wallet the least
The Pirate Bay
> Most people are not in a position where they can be sure that they will never run out of physical memory.
If you can fill 4GB of physical memory, you can fill your swap file just as quickly, and then your apps will crash anyway. So really, this is not a good argument. You can run out no matter how much you allocate. The right question to ask is what happens when you do, and that is that the process that is filling all that memory will be killed. Unless you are using OpenOffice, I highly doubt that that process will be your word processor.
As a person with sensitive eyes, I am constantly annoyed by applications setting backgrounds to white. White backgrounds hurt, people! And I mean actual physical pain here. So if you are writing some application, please use system colors, or at the very least let the user change the color scheme. In ten more years your eyes will get tired too, and trust me, you'll thank me.
> Only with certain accents. In most American accents (and a great many British
> and Indian ones) the words "their", "they're" and "there" sound exactly the same.
There is no such thing as an "accent". There is speaking proper English, and there is not speaking proper English, and while I do not claim to always do the former (English is my second language, after all), I at least recognize that there is one proper thing to strive for and am able to recognize it when I hear it. Likewise, I may occasionally pronounce "their" and "there" similarly, but I always know which one I meant to say, and would certainly be able to say it correctly if asked to repeat myself.
> When words like "there/their/they're" come along, my brain just says "there".
There goes someone who learned to read at an advanced age, like, say, in school. If you had learned these words from reading and looking them up in the dictionary, you would have acquired the proper pronunciation for each. "There" is pronounced "the-ere", "their" rhymes with "Jane Eyre", and "they're" is sounded out as "they are", but with the middle vowels shortened. Learn to speak, and you'll learn to spell.
> In quick mode, my brain is much more inclined to type "loose" for the sound "lewz",
> because most every other word that has a double-o makes the "ew" sounds.
Here the pronunciation is also different. "Loose" has a softer s sound, like in "lusse", while the s in "lose" sounds more like a z. If we fixed all the spelling in the language into a proper english (as opposed to french or german perversions) form, where letters are used consistently to represent specific sounds, everyone would instantly gain the ability to spell perfectly. The way things are now, we're stuck with illogical letter assignments, many of which are the work of the evil french...
I'd buy the Xonar D2X. It compares reasonably well with the high-end HiFi cards, but also has EAX, and full Linux support. Naturally, you'll also need decent speakers. I have M-Audio AV40s, and that's about as much desk space as I can spare :) I'd add a 10" subwoofer if I could figure out where to put it.
Drinking from the toilets in Megaton is actually the easiest way to do Moira's "get glowing" quest. Moira sure is quite a character :)
> Shouldn't there be some law to protect the American people from legislators
> who commit felonies relating to their position?
Did you know that desertion from the army is a felony? Do you also know that the Pope was a deserter in WWII? While desertion at the time of peace is just a felony (2-3 years max imprisonment in the US), during war it is punishable by death. If you insist on lumping all "felons" together as automatically evil, I would suggest that if you are good enough for the Holy See, you are probably good enough (too good?) to run for the US Senate.
> I've been using standby/sleep extensively on my desktops and laptops for the last 10 years,
> and I still can't understand why people with a modern machine don't use standby.
Because it still doesn't work for everyone. I tried it a month ago, followed the instructions in the suspend HOWTO, made that suspend script and ran it. The machine suspended, but didn't resume - everything spun up, but looked dead. Sure, it might be easy to fix, but doing so would entail poring through hundreds of forums posts written by clueless idiots for that one little bit of information, followed by dozens of reboots for test-fail-retry cycles. I might get around to it, when I have a month to spare. Many things on Linux [don't] work like that...
Since 9/11, of course, and the day we implemented "level 4 security", whatever the hell that means, and mandated anyone to not dare show up without his ID badge.
> the most used way to determine if movie is successfull seems to be how well it sells during
> the first weekend: Before anyone has had the chance to see it and tell others if it is good or not.
Ah, but that's the point. They already know the movie is bad, so their only chance to sell it is before people start talking. Remember that you don't have to be good to be successful ;)
> "The hungry and cold unemployed masses aren't going to continue giving away their intellectual
> labor on the Internet in the speculative hope that they might get some 'back end' revenue,"
Yeah, right! As the author of a dozen OSS projects, I can tell you right away that I did not "give them away in the speculative hope that they might get some back end revenue". I haven't made a cent on them, didn't plan to, and am not going to in the future. I'm not selling them because nobody would pay anything for them. I wrote them for my personal benefit, and open-sourced them because I had no reason not to. Perhaps someday someone will make something useful out of them, and make money from them, but I'm not holding my breath. In the meantime, it costs me nothing to publish them, so I do. The depression will not change this. It may give people less free time to work on OSS, but that doesn't mean it's going away.
> Secondly, extra writes don't necessarily cause a performance hit. It all depends
> on the usage patterns. If the writes can be performed when the disk would
> otherwise be idle, no performance is lost.
That's what the disk cache already does for you. You are still making two writes instead of one, so no matter how long you decide to postpone it, it will catch up with you eventually. Not only that, but you are increasing the interval between the time you finish writing and the time when the data gets to the disk by a factor of two, thus increasing your chances of catching a hardware failure in it. I bet it also makes making meaningful benchmarks incredibly hard. Oh well. I'll just drop this subject now due to lack of information.
> So you meant "space-efficient". I thought we were talking about time-efficiency.
On the disk, space efficiency is time efficiency. The bottleneck of the io system is on the physical disk surface, and the less of it you use, the faster you can get the data to you. Any form of compression should give significant benefits, especially with today's fast CPUs.
> True, but the same goes for archive files, of course. Any metric that the archiver
> uses to determine which files should be compressed, the filesystem can use, too.
I was not talking about the archiver; I was talking about you. The difference is that with an archive, the programmer or the user decide which files go together, which files should be compressed, and which files are not very important. This is not a decision that can be made automatically because it depends on the contents of the files and the meaning of that contents to whoever needs it. I don't see the computer being able to make this decision in the general case, barring significant advances in AI.
> For example, the application might know which files should be compressed, and which
> ones shouldn't, but the VFS does not typically allow this information to be communicated
> to the filesystem. A purpose-built storage system specific to the particular application
> could, of course, incorporate anything and everything that might be useful.
Well, yes, you could go this far, and build a specialized backend for every application. Heck, that's what everyone has been doing for decades, with proprietary file formats. What I'm suggesting is a standardized general compound file API, much like the OLE compound documents on Windows, only not quite as ugly and bloated. Applications already write their own data formats, and that will continue, but we are missing an easy way to package multiple related files together.
> You are saying that adding at least one additional API and updating all the
> standard utilities is not a significant complexity increase. I beg to differ.
> This would be changing the foundations on which Unix is built.
"The foundations of Unix" are not sacred. They are just ossified into the OS. That does not mean they are perfect, only that we are used to them. A VFS extension, IMO, is not all that hard in principle, and it will not in any way change your "foundations". All it will do is permit you treat compound files like directories. The basic capabilities otherwise stay the same.
> But that wasn't the case I was talking about. I was talking about the case where
> you have pieces of data that are meaningful to the user, like individual emails.
> I thought that was what you were talking about, as well.
No, what I had in mind specifically was the application internal data files. Look in the various subdirectories under /usr/share and you'll find plenty of tiny little files that ought to be aggregated. Another example is config files. I've seen proposals to create a registry-like storage where each value is a file! Sure, throw it in and let the filesystem figure it out. Right...
Saving a document into a multiple files is another bad idea. Database packages, like, say ingres, do that. A documen
> Just because journalled ext3 is slower than unjournalled ext2 doesn't mean that
> journalling in general imposes an intolerable performance hit.
Journalling, by its very nature, implies turning every write into two writes. Furthermore, you have to add various hacks to avoid problems from out-of-order writes, to determine when the cache actually gets to the physical disk, and I am sure many other little things. I am not an fs designer, having learned everything I know about journaling from wikipedia, so I am not going to speculate whether there are ways to implement these things efficiently. The only thing I am certain of is that an unjournaled filesystem will always be simpler and faster than a journaled one, if only by a little bit.
> But one advantage of having a journal is that you can maintain consistency
> even if you suffer a failure in the middle of a write.
This is relevant and interesting if you are running a bank. It is completely uninteresting if you spend your day writing text files and compiling them, like I do. The most I can possibly lose in a crash is five minutes of work, and the likelihood of the crash is so low, as to be beyond consideration. Since the magnitude of the disaster journaling averts is miniscule in my case, performance considerations far outweigh it, and I would always try to turn journaling off and trade it for performance and free memory that would save me time in the edit-compile-test cycle. That is my whole justification, and I see it as a damn good one.
> If you claim that compressed archives are more efficient for storing many small files, you had
> better actually know at least one compressed archive format that actually is more efficient.
> Otherwise, you would just be making an unfounded claim. Again, I ask you to back up your claims.
What sort of foundation would you like to hear? After all, if a particular compressed archive API does not exist, I can not give you benchmarks for it, can I? Nevertheless, I would consider it obvious for the simple reason that compressing a hundred files into an archive will always take less space than a hundred uncompressed files, no matter how efficiently they may be aggregated. While you can compress the entire filesystem, that is usually not a good idea, because there is no generic way to determine which files should be compressed and which ones should not. The only way I can think of doing it is to collect access patterns on each file and compress the ones that are read more often than written. Unfortunately, the access log has to be stored somewhere, and will naturally wipe out any savings you may get on your 4 byte files :)
> I _like_ the units of data that I think about to coincide with files; this allows me to use tools
> that manipulate files to manipulate a lot of things on my computer: documents, programs, configuration
> files, messages, etc. I don't like the idea of having to use specialized tools for each of these. If
> anything, I would try to _extend_ the number of things I can apply my standard tools to, rather than
> shrink it by hiding many objects in a single unit.
> Yes, exactly. You are proposing storing things in such a way that only that one application can access them. I don't
> like that idea at all, for the reasons I have pointed out above.
First of all, I do not think that allowing the user to muck with application internal files with "standard tools" is a good idea. Just as I encapsulate private implementation of a class, making it inaccessible to the code that uses it, likewise I would seek to encapsulate the private details of an application and "hide" them from the user. That does not mean that I would go out of my way to prevent the user from manipulating it, only to hide unnecessary details and clutter from his filesystem. Packaging all the application's files into a suitable collection of archives would be very appropriate. Not only is this neat and space-efficient on every fi
>> Latency is the most important criteria, and reiserfs is just too complicated to deliver it''
> Excuse me? Do you have any numbers to back up that claim?
> Because I'm having a hard time taking it on face value.
You are the one trying to get me to switch file systems, so you are the one who has to provide the numbers. I look at those benchmarks, and ext2 looks like the winner. If you have others, I'll be happy to take a look at them.
> A filesystem that has been included in the mainline Linux kernel for several years,
> is offered as a prominent choice during installation of various distros, used to be
> the default fs on some distros, and is widely used by people who make conscious and
> informed choices about which filesystem to use.
Most people just take the default filesystem for their distro, and I don't know of any that have reiserfs as the default. That would meet my definition of "fringe". From this perspective, it would be the height of foolishness for me to write any code depending on tail packing for performance. Speaking as a user, I would probably just choose applications that don't create lots of files. The filesystem is not a replacement for compound files, having been designed for a different purpose.
> Kindly point me at this compressed archive format that lets me fetch files
> (small and large) by name and other attributes more efficiently than Reiser4
> or even ReiserFS.
I don't know of any. Care to write one? 'Cause I'd sure like to have it. I think you can do this with Kioslaves, but I'm no expert there. My point is that it should exist, because it is the right solution. When you have many small files, that's just bad design. They are hard to keep track of, and there is no reason for them to be individually user-visible in the first place.
> Then please point out how I can use this as I would a filesystem:
> so that the good old Unix software can access the files.
If you want that, you can use FUSE on a loop, but then, of course, Linux would make you mount it first. I was talking about doing it inside the application, transparently to the user.
> ext2 is _not_ faster than all journalled filesystems for everyone, and that
> the performance hit of journalling, if any, is not "too high to tolerate" for everyone.
According to the benchmarks I linked to, ext2 is twice as fast as journalled ext3 for sequential access. I would call that "too high to tolerate". From personal experience, my system have felt noticeably more responsive since I had switched to ext2, so I'm inclined to keep it.
> I think smart people realize that having a UPS is no guarantee that your system will
> never fail in the middle of a write.
Neither is a journal. But consider that the penalty is just an fsck, which on my disks takes about five minutes. I haven't had a power failure crash since I got the UPS a few years ago and I haven't seen a kernel panic since 1996. All in all, I think that performance gains outweigh the negligible "risk" of an fsck. And even if there is data loss, it would likely be very minor, and I keep backups anyhow.
Now, don't get me wrong, I would probably have been more concerned with fs reliability if I were running a data center. But on my own machine the chances of a disaster are so slight and the potential data loss is so small, that reliability in the face of a system crash is simply not an issue. As I keep saying, I'm not running a server for a Fortune 500 company; it's a whole different world out here, with an entirely different set of concerns, and it would be nice if kernel developers took notice of us once in a while.
> ReiserFS still is a great filesystem in terms of reliability and performance, from
> tiny files to huge ones, under a wide range of scenarios. Reiser4 was going to be
> even better: faster and more flexible and extensible, with fast arbitrary attributes
> and a lot of other goodness.
Oh, sure, I w
> That humans are the only "intelligent" species using radio transmission as a communications medium
Or are the only ones stupid enough to broadcast it in the clear. In case you haven't noticed, most radio communication of our own civilization is already in digital spread spectrum form, which is indistinguishable from white noise unless you have the proper key. In fact, this property was the very reason why spread spectrum techniques were invented for. Military communications, noise radar, etc. Look it up. And all this only a hundred years after we discovered radio. Any advanced civilization would be completely undetectable by radio due to use of these techniques and of directional antennas. The only ones we'd be able to find are the stupid ones broadcasting their existence, and those won't exist for very long until they are wiped out by the Grox homing in on those transmissions.
> The simple fact though is that modern processors and disks are so fast
> that the minimal speed impact of journaling is barely noticeable. It's
> certainly not worth giving up over some marginal speed gains.
If you look at the benchmarks I linked to, you'd see that ext2 is twice as fast as a journalled ext3 for sequential access. I wouldn't call that "barely noticeable".