Leaving thousands upon thousands of connections open on the server is a terrible idea no matter how well-implemented the TCP stack is. The real solution is to use some sort of distributed mirroring facility so everyone could connect to a nearby copy of the feed and spread the load. The even better solution would be to distribute asynchronous update notifications as well as data, because polling always sucks. Each client would then get a message saying "xxx has updated, please fetch a copy from your nearest mirror" only when the content changes, providing darn near optimal network efficiency.
First, it's hard to believe that Sowell - usually such a total idiot in his own supposed areas of exerptise - could be so right about something in ours. Second, it's hard to believe that the OP tried to turn this into a "free software is better" message. As has been noted before, "option clutter" is a characteristic trait of much free software, as every disgruntled dev-team member is appeased by adding their favorite feature and every dispute over how some feature should work is "solved" by make it work seven ways and/or adding an option to control it. The people who most need to hear this message are in the free-software world, not the commercial world when there really is someome to put their foot down and impose a coherent vision on the other developers. That doesn't mean all proprietary software is more usable - just that this particular usability problem is not one that afflicts them the most.
It's one thing to say that either SQL or XQuery have their problems, because they do. It's quite another to say that SQL is bad because it doesn't live up to some arbitrary never-achieved (and perhaps unachievable) standard of relational purity that even Codd himself found superfluous. When Pascal does nothing but the latter, and in addition takes a dozen thoroughly unprofessional swipes at Chamberlin for having been involved in both SQL and XQuery, his professional jealousy is becomes thick enough to choke on. I wish he would, so we would be spared the incessant ranting of someone whose whole career has been marked by a lot of words and not a single deed to back them up.
As chance would have it, my name happens to appear on a patent for using leases within a cluster filesystem. They're a great mechanism, but no panacea. In any case, I'll drop you a line.
The problems that face a distributed filesystem are well known, and solutions can be found in any good book on distributed (file)systems.
Would be found in any good book on distributed filesystems, you mean. Good solutions are not known for the general wide-area case, and therefore the good books have not been written. Anybody who seriously studies this area - as I do - has to rely on tracking down the relevant papers, often by finding them in the bibliographies of subsequent papers. Sometimes the search takes on rather far afield into areas like web caching and distributed shared memory before the full shape of the problem can be appreciated.
Still, the filesystems that we have today either don't implement these solutions, or have buggy/incomplete implementations.
The reason the implementations seem incomplete or buggy is that many of the proposed solutions are themselves incomplete or buggy, failing to account for things like performance or high probabilities of node failure over time. The fact that the filesystems we have today fall short of ideal behavior would be taken by a rational person as evidence that comprehensive solutions are not in fact known.
None of these filesystems allows regular users to access remote filesystems (superuser privileges are required for mounting) like with FTP
No, and they don't cook your dinner for you either, but if that's what you're expecting then you're completely missing the point of what a cluster filesystem is for. Granted, the name "Global File System" is a misnomer, but it has been a misnomer for several years now and if you have anything more than a dilettante's interest in this you should know what GFS really does.
What's so hard about getting this stuff right?
Yeah, everything's easy when you're not the one doing it. Tell me what you do, and I'll tell you how wimpy that is. If you think that maintaining consistency across multiple machines in a cluster without compromising performance is easy, you're a fool. If you think that high availability of any form is easy, then you're an idiot. If you think putting those two together doesn't lead to an exponential increase in complexity and hence difficulty, you're a moron.
If you want a filesystem stub (not really a complete filesystem) that lets you access files stored half-way around the world over a standard protocol, look into one of the many efforts based on WebDAV. If you want a true global filesystem, look into OceanStore so you can appreciate some of the problems that are involved. If you want to be able to change the filesystem namespace without being root, look into Plan 9. Do your own googling. None of those are what GFS is about.
Unfortunately still true. Every novice programmer starts out thinking they're better than anyone else and will be able to get stuff right the first time. Over time they learn that they're wrong. Whether the second version is done as part of the same project (or job) or a different one, whether it's superficially similar or just similar in its internal design, there will be a second version. If you're good a majority of the lines of code in the second version will be recycled from the first, but it's practically a guarantee that the major conceptual structures will be different. Get used to it. Learn to use it to your advantage. People who continue to deny it spend their careers either doing trivial things or doing more serious ones in a totally half-assed way. If you want to get good at building something besides toys, learn how to build on past experience.
I found Thunderbird's spam filtering to be utterly useless. When I realized that as a result of using it I just had to sort through my junk folder as well as my inboxes, with the former just as likely to contain real mail as any of the latter, I just turned it off. I think my problem was the same as others have reported: a significant percentage of the spam I get is specifically designed to fool Bayesian filters, and as soon as the filters crank up to catch the spam they start catching ham as well. It's an arms race, and Thunderbird's filters lost.
Seriously, have any other/.'ers created their own system?
Sure, I did. Had comments, RSS, the whole bit, but I just moved away from that and started using pMachine instead. It was great as a learning exercise, and if that's what you want it to be then more power to you, but after a while when you move onto other totally different kinds of projects it could become a bit of a millstone around your neck. At the very least I would suggest looking at the database schemas and such for some other systems that are out there, both to get ideas and to make sure that a subsequent data migration won't be too difficult.
Could someone please edify me (and consequently the rest of the viewing audience who might not yet have weblogs) why we might find it desirable to use dynamic methods to update and display a plain text journal?
Simple: because it's not just a plain text journal. A weblog system gives you multiple views of your entries - last N, last N in a category, everything in a certain month, RSS/ATOM views, etc. A weblog lets you post when you only have web access and not FTP, which might be the case when you're traveling and you want to send the virtual equivalent of a postcard from a kiosk somewhere (like I did from Cradle Mountain Lodge in Tasmania last year). A weblog lets your readers comment on your posts. Then there's a bunch of stuff I'm not sure I care about, like "trackback" and "pingback" and such, but the point remains that a weblog gives you a lot of functionality that static files don't. Sure, you could cut and paste between those static files, but it would be an error-prone pain in the ass and a big waste of space, and there'd still be some functionality (e.g. commenting) that you'd be missing.
In short, a weblog system doesn't have to have every stupid feature the folks in the so-called "blogosphere" dream up, but it does add value to the people who use it.
Wouldn't you know it? I just spent much of the weekend converting my site from my own homegrown weblog codebase to pMachine. Here's the new version (with an entry about the change), and the old version for comparison. According to the table, b2evolution and WordPress would be equally good fits, perhaps even slightly better because they support assigning an entry to multiple categories like my old code but unlike pMachine Free, but when I tried them all out at opensourceCMS that really wasn't the case. I strongly recommend that you check out candidates there, because a lot of the small things make a difference. Here are some examples:
What kinds of markup is allowed in posts? In comments? Is it plain HTML, or a stripped-down square-bracketed subset like bbCode, or both, or neither? Which are you comfortable with? How about your users who leave comments? If it's real HTML, how are various cross-site scripting and other exploits prevented?
Are commenters allowed to register so they can have persistent profiles? Are they forced to register? Either/or?
Does the post entry format allow things like saving drafts, posting to the future, setting expiration dates?
Does the system have things like time offsets (between where you are and where your site is hosted)? Are the paths that it uses configurable, so you can make it work with different directory structures? How "tunable" are things in general? This can be a huge headache if you get halfway into your transition and you find something that just won't work properly in your environment without hacking the code.
Do you really like the way the templating system works? You really won't know until you try some customization, so fiddle a bit with the layout. Move stuff around, add links to other parts of your site, etc.
If you're converting from another system, are there automatic conversion tools? How well do they really work? Again, you have to try to see, and not just on opensourcecms either. If there are no converters, how hard would it be to write one? Does the database schema (and/or file layout) make sense to you? Is it similar conceptually to what you have now? Does it require complex relationships between tables/fields that would be hard to maintain as you suck in your old content? Is there any information in your old content that there's no place for?
These sorts of things, none of which are covered in a mere checklist, really matter when you actually take the plunge. Trying stuff out on opensourcecms is a great first step, but then you should actually download the real thing and really try to run a test version of your own site on it for at least an hour or so, to see if you can truly tweak it to your liking. Only then will you be able to make a decision that will really satisfy you.
That's an excellent example of treating a problem as a potential opportunity to create something positive. Fleury et al did something to degrade not just their own credibility but that of the forums where the astroturf occurred. They did harm to those forums, and the obvious way to atone would be to do something that creates positive value for those who were harmed. Offer to give them free software or services, write some free articles, give them some inside access to information about product roadmaps or benchmarks, pick up part of the bandwidth tab...whatever. That would be true atonement, in contrast to the empty non-apology that was actually offered.
Does anyone seriously think JBoss hasn't been doing that same sort of thing right here for ages? Or that the reviews you read on Amazon are all on the level? Or that the reviews you'll find when you're looking for a web host are all honest? Come on. Internet astroturf has been rampant for years.
The whole benefit of the system is that someone out there has the next block (or blocks) you'll need, and only they know how to get the next block beyond the block(s) that they carry for you.
Ahhhh, but the trick then would be how to do that in an efficient and fault-resilient way. The idea of giving "opaque" blocks to other nodes is a good one, but it's a very common one and somehwat more of a question than an answer. The questions of how to choose candidates, how to propagate the blocks, how to find them later, how to do all of that efficiently and how to prevent a single failure from acting like the weak link in a chain and bringing the whole thing down - the many possible answers to those questions are where the real inspiration and perspiration lie. Saying that blocks should be distributed is like saying that a boat shouldn't sink.
I proposed this solution about 4 years ago to one of the gnome-vfs guys at a Helixcode party in San Francisco "back in the day".
So, did he answer "been there, done that" or "that's dumb"? Or did he just nod politely and suddenly act like he was being hailed from across the room? Only about a thousand people have had variants of the same idea; the two closest would be Farsite or SFS, but there are many others. One thing that's unique to your proposal, though, is the idea of sending every block to every node - creating a system that cannot possibly scale beyond a trivial number of nodes.
There's nothing wrong with blue-sky thinking, but when the sky is already crowded with planes and helicopters and blimps you should take some time to study them before repeating the mistakes their designers made ten years ago. It's also good to get the basics working or at least decently thought out before you start speculating about what extra buzzword-compliant ideas you can throw into versions two through ten. We already have Freenet to show us what can happen when people don't heed either of those lessons.
It would have been nice if they had included smaller companies in their sample. Probably just as well that they didn't, though, because I suspect those would make the US numbers look even worse.
Sure you need a thread pool. Creating a new thread, does not only allocate a thread object, but it likely makes a kernel call to get a new thread handle. You pool to avoid the kernel call. Not to avoid the memory allocation/deallocation.
If your program uses lots of short-lived threads there's something fundamentally wrong with it that thread pools won't fix. Threads are often used as a crutch, where an event-based model running on one or more long-lived threads (not tied to specific operations) would perform much better. I've written about this extensively on my own website, most notably in my server-design guide.
I know I'm probably coming in too late for this to be noticed, but I'll give it a go anyway. My suggestion is to upgrade your non-technical skills. People think of software as an antisocial field but, as practiced in the real world, it can be intensely social. I'm not saying you should go out and get an MBA, or that you should ever give up coding, or anything like that. However, if you really want your resume to stand out from all those other people who also have the requisite technical skills, there's no better way than to show some capacity for initiative, leadership, mentoring, etc. Open source can be great for that - not just writing something on your own, but actually coordinating a group of other people on a project. Just participating in such a project in a proactive and constructive way would set you apart from the hundreds of other technically skilled but socially stunted folks that every employer can find by the hundred.
That's just my two cents, of course, but it's the two cents of a guy who - unlike 90% of those commenting - actually has a decisive role in a lot of hiring decisions.
Well, I'm sorry to hear about your situation, but I believe the conditions created by good vs. bad bosses are really sort of orthogonal to those based on the field in which you work. All else being equal, being a programmer with a bad boss still seems preferable to being on a road crew with a bad boss. You complain about being micromanaged, but even by your own account that was unusual for a computer company. In many other fields it's the norm, and you would have been fired the first time you took a 75-minute lunch. That's not to say your boss wasn't a total jerk, but that's really nothing to do with the industry you're working in.
That, when you stop and think about it, is one of the downsides of being a knowledge worker.
In some ways yes, in some ways no. I know exactly what you're getting at, and I don't disagree, but what you describe as your antidote to the job's sedentary nature is still a choice. That physical activity takes on a very different character when it's mandatory, when it's inflexible wrt activities or conditions, when you do it all day long for years on end, and/or when the pay sucks. It's like the difference between running for fun and running for your life.
The biggest stress isn't the technology; it's the bad-attitude buttheads who should be digging ditches or doing some other job better suited to their talents.
Don't get me started. One of the downsides of the computer industry is that the buttheads read Slashdot and they'd see what I was saying about them.;-)
That was kind of my point - that they seem cushy but when you get right down to it they have their downsides too. Next time try (a) to understand what someone wrote, and (b) not to make so assumptions before you post.
Yes, like that. :-)
Leaving thousands upon thousands of connections open on the server is a terrible idea no matter how well-implemented the TCP stack is. The real solution is to use some sort of distributed mirroring facility so everyone could connect to a nearby copy of the feed and spread the load. The even better solution would be to distribute asynchronous update notifications as well as data, because polling always sucks. Each client would then get a message saying "xxx has updated, please fetch a copy from your nearest mirror" only when the content changes, providing darn near optimal network efficiency.
First, it's hard to believe that Sowell - usually such a total idiot in his own supposed areas of exerptise - could be so right about something in ours. Second, it's hard to believe that the OP tried to turn this into a "free software is better" message. As has been noted before, "option clutter" is a characteristic trait of much free software, as every disgruntled dev-team member is appeased by adding their favorite feature and every dispute over how some feature should work is "solved" by make it work seven ways and/or adding an option to control it. The people who most need to hear this message are in the free-software world, not the commercial world when there really is someome to put their foot down and impose a coherent vision on the other developers. That doesn't mean all proprietary software is more usable - just that this particular usability problem is not one that afflicts them the most.
I think I'll stick with Google, thanks.
It's one thing to say that either SQL or XQuery have their problems, because they do. It's quite another to say that SQL is bad because it doesn't live up to some arbitrary never-achieved (and perhaps unachievable) standard of relational purity that even Codd himself found superfluous. When Pascal does nothing but the latter, and in addition takes a dozen thoroughly unprofessional swipes at Chamberlin for having been involved in both SQL and XQuery, his professional jealousy is becomes thick enough to choke on. I wish he would, so we would be spared the incessant ranting of someone whose whole career has been marked by a lot of words and not a single deed to back them up.
As chance would have it, my name happens to appear on a patent for using leases within a cluster filesystem. They're a great mechanism, but no panacea. In any case, I'll drop you a line.
The reason the implementations seem incomplete or buggy is that many of the proposed solutions are themselves incomplete or buggy, failing to account for things like performance or high probabilities of node failure over time. The fact that the filesystems we have today fall short of ideal behavior would be taken by a rational person as evidence that comprehensive solutions are not in fact known.
No, and they don't cook your dinner for you either, but if that's what you're expecting then you're completely missing the point of what a cluster filesystem is for. Granted, the name "Global File System" is a misnomer, but it has been a misnomer for several years now and if you have anything more than a dilettante's interest in this you should know what GFS really does.
Yeah, everything's easy when you're not the one doing it. Tell me what you do, and I'll tell you how wimpy that is. If you think that maintaining consistency across multiple machines in a cluster without compromising performance is easy, you're a fool. If you think that high availability of any form is easy, then you're an idiot. If you think putting those two together doesn't lead to an exponential increase in complexity and hence difficulty, you're a moron.
If you want a filesystem stub (not really a complete filesystem) that lets you access files stored half-way around the world over a standard protocol, look into one of the many efforts based on WebDAV. If you want a true global filesystem, look into OceanStore so you can appreciate some of the problems that are involved. If you want to be able to change the filesystem namespace without being root, look into Plan 9. Do your own googling. None of those are what GFS is about.
Unfortunately still true. Every novice programmer starts out thinking they're better than anyone else and will be able to get stuff right the first time. Over time they learn that they're wrong. Whether the second version is done as part of the same project (or job) or a different one, whether it's superficially similar or just similar in its internal design, there will be a second version. If you're good a majority of the lines of code in the second version will be recycled from the first, but it's practically a guarantee that the major conceptual structures will be different. Get used to it. Learn to use it to your advantage. People who continue to deny it spend their careers either doing trivial things or doing more serious ones in a totally half-assed way. If you want to get good at building something besides toys, learn how to build on past experience.
I found Thunderbird's spam filtering to be utterly useless. When I realized that as a result of using it I just had to sort through my junk folder as well as my inboxes, with the former just as likely to contain real mail as any of the latter, I just turned it off. I think my problem was the same as others have reported: a significant percentage of the spam I get is specifically designed to fool Bayesian filters, and as soon as the filters crank up to catch the spam they start catching ham as well. It's an arms race, and Thunderbird's filters lost.
Here's a more informative description of some of the technology being used by Konarka. Looks pretty interesting to me.
Sure, I did. Had comments, RSS, the whole bit, but I just moved away from that and started using pMachine instead. It was great as a learning exercise, and if that's what you want it to be then more power to you, but after a while when you move onto other totally different kinds of projects it could become a bit of a millstone around your neck. At the very least I would suggest looking at the database schemas and such for some other systems that are out there, both to get ideas and to make sure that a subsequent data migration won't be too difficult.
Simple: because it's not just a plain text journal. A weblog system gives you multiple views of your entries - last N, last N in a category, everything in a certain month, RSS/ATOM views, etc. A weblog lets you post when you only have web access and not FTP, which might be the case when you're traveling and you want to send the virtual equivalent of a postcard from a kiosk somewhere (like I did from Cradle Mountain Lodge in Tasmania last year). A weblog lets your readers comment on your posts. Then there's a bunch of stuff I'm not sure I care about, like "trackback" and "pingback" and such, but the point remains that a weblog gives you a lot of functionality that static files don't. Sure, you could cut and paste between those static files, but it would be an error-prone pain in the ass and a big waste of space, and there'd still be some functionality (e.g. commenting) that you'd be missing.
In short, a weblog system doesn't have to have every stupid feature the folks in the so-called "blogosphere" dream up, but it does add value to the people who use it.
Wouldn't you know it? I just spent much of the weekend converting my site from my own homegrown weblog codebase to pMachine. Here's the new version (with an entry about the change), and the old version for comparison. According to the table, b2evolution and WordPress would be equally good fits, perhaps even slightly better because they support assigning an entry to multiple categories like my old code but unlike pMachine Free, but when I tried them all out at opensourceCMS that really wasn't the case. I strongly recommend that you check out candidates there, because a lot of the small things make a difference. Here are some examples:
These sorts of things, none of which are covered in a mere checklist, really matter when you actually take the plunge. Trying stuff out on opensourcecms is a great first step, but then you should actually download the real thing and really try to run a test version of your own site on it for at least an hour or so, to see if you can truly tweak it to your liking. Only then will you be able to make a decision that will really satisfy you.
That's an excellent example of treating a problem as a potential opportunity to create something positive. Fleury et al did something to degrade not just their own credibility but that of the forums where the astroturf occurred. They did harm to those forums, and the obvious way to atone would be to do something that creates positive value for those who were harmed. Offer to give them free software or services, write some free articles, give them some inside access to information about product roadmaps or benchmarks, pick up part of the bandwidth tab...whatever. That would be true atonement, in contrast to the empty non-apology that was actually offered.
Does anyone seriously think JBoss hasn't been doing that same sort of thing right here for ages? Or that the reviews you read on Amazon are all on the level? Or that the reviews you'll find when you're looking for a web host are all honest? Come on. Internet astroturf has been rampant for years.
Ahhhh, but the trick then would be how to do that in an efficient and fault-resilient way. The idea of giving "opaque" blocks to other nodes is a good one, but it's a very common one and somehwat more of a question than an answer. The questions of how to choose candidates, how to propagate the blocks, how to find them later, how to do all of that efficiently and how to prevent a single failure from acting like the weak link in a chain and bringing the whole thing down - the many possible answers to those questions are where the real inspiration and perspiration lie. Saying that blocks should be distributed is like saying that a boat shouldn't sink.
So, did he answer "been there, done that" or "that's dumb"? Or did he just nod politely and suddenly act like he was being hailed from across the room? Only about a thousand people have had variants of the same idea; the two closest would be Farsite or SFS, but there are many others. One thing that's unique to your proposal, though, is the idea of sending every block to every node - creating a system that cannot possibly scale beyond a trivial number of nodes.
There's nothing wrong with blue-sky thinking, but when the sky is already crowded with planes and helicopters and blimps you should take some time to study them before repeating the mistakes their designers made ten years ago. It's also good to get the basics working or at least decently thought out before you start speculating about what extra buzzword-compliant ideas you can throw into versions two through ten. We already have Freenet to show us what can happen when people don't heed either of those lessons.
It would have been nice if they had included smaller companies in their sample. Probably just as well that they didn't, though, because I suspect those would make the US numbers look even worse.
I know I'm probably coming in too late for this to be noticed, but I'll give it a go anyway. My suggestion is to upgrade your non-technical skills. People think of software as an antisocial field but, as practiced in the real world, it can be intensely social. I'm not saying you should go out and get an MBA, or that you should ever give up coding, or anything like that. However, if you really want your resume to stand out from all those other people who also have the requisite technical skills, there's no better way than to show some capacity for initiative, leadership, mentoring, etc. Open source can be great for that - not just writing something on your own, but actually coordinating a group of other people on a project. Just participating in such a project in a proactive and constructive way would set you apart from the hundreds of other technically skilled but socially stunted folks that every employer can find by the hundred.
That's just my two cents, of course, but it's the two cents of a guy who - unlike 90% of those commenting - actually has a decisive role in a lot of hiring decisions.
Well, I'm sorry to hear about your situation, but I believe the conditions created by good vs. bad bosses are really sort of orthogonal to those based on the field in which you work. All else being equal, being a programmer with a bad boss still seems preferable to being on a road crew with a bad boss. You complain about being micromanaged, but even by your own account that was unusual for a computer company. In many other fields it's the norm, and you would have been fired the first time you took a 75-minute lunch. That's not to say your boss wasn't a total jerk, but that's really nothing to do with the industry you're working in.
In some ways yes, in some ways no. I know exactly what you're getting at, and I don't disagree, but what you describe as your antidote to the job's sedentary nature is still a choice. That physical activity takes on a very different character when it's mandatory, when it's inflexible wrt activities or conditions, when you do it all day long for years on end, and/or when the pay sucks. It's like the difference between running for fun and running for your life.
Don't get me started. One of the downsides of the computer industry is that the buttheads read Slashdot and they'd see what I was saying about them. ;-)
That was kind of my point - that they seem cushy but when you get right down to it they have their downsides too. Next time try (a) to understand what someone wrote, and (b) not to make so assumptions before you post.