Assuming that your compiler will fix your performance problems is as bad as assuming that it won't. Inline is a good example because (a) not everything can be inlined, and (b) inlining can create costs. If you use an inline function many places, it can increase code size and correspondingly reduce cache efficiency. If your compiler can handle the inline just fine but your debugger can't (e.g. screws up line numbers or breakpoints) your "fix" might waste valuable programmer time. Are inline functions handy? Sure. Should they be anywhere near the first set of things you should consider when you need to improve performance? Hell, no. Pick a good top-level structure, profile, and tune low-level algorithms. Using "inline" or "restrict" habitually to improve performance is as bad as unrolling your own loops or adding your own induction variables habitually for the same purpose.
MIPS R12000 system that's sitting on my desk has 8MB of L2 cache.
...and you need it, because the cache/memory speed disparity is much greater on that system. Don't repeat Intel's "bigger numbers are better" mistake; 2MB plus faster memory is a better design tradeoff today than 8MB plus slower memory.
For me it was clearly bots. I could practically watch them crawl the site, finding the page for each post and adding a comment...each time from a different IP address even though they were obviously coordinated. I added a comment password, displayed prominently where humans can see it and paste it into the box but where a bot wouldn't have been programmed to look for it. Problem solved; haven't had a single comment spam since.:)
On my website 90% of the comment spam was from online poker sites. That added up to hundreds of messages per day that I had to delete, and I know many others had similar experiences. I know I was thinking that they deserve a lesson, and maybe some folks decided to teach them one. While I don't necessarily approve of the method, I fully understand the impulse. Many online gambling sites are run by pricks; I won't shed a tear for them and their self-inflicted troubles any more than I would for the RIAA/MPAA.
Taking a hybrid centralized/distributed protocol and making the centralized parts distributed piecewise is so obvious that any clueful person who ever looked at BitTorrent would have thought of it already. Heck, I was doing it before there even was a BitTorrent, taking a hierarchical caching system and creating a distributed version of the root server. It's nice that someone's really doing it for BitTorrent, but unless they're soliciting developers or have a prototype working I wouldn't call it news.
It's not so much that C++ itself is so bloated etc. but that some of the common C++ programming idioms - full of templates, overloading, deeply nested class hierarchies with virtual methods, etc. - can lead to bloat. It's never good to have what looks like a simple assignment statement turn into thousands or even tens of thousands of instructions, but it's particularly dangerous in the kernel. Templates are particularly bad because they get re-instantiated for every type they're used with and that can very rapidly cause executable sizes to balloon out of control. They're a non-solution to a problem that's more than adequately solved by treating types as first-class objects, or by lower-level methods such as explicit dispatch-table pointers.
Sure, you can tell your developers to use a subset of C++ that doesn't have such problems, but what's the point? Why pay the runtime-support and tools cost and require greater vigilance in code reviews just for some syntactic sugar (which is all that would be left)? I'm not absolutely opposed to using C++ in a kernel where it's already supported, but there are costs that would have to be justified compared to just using C.
C++ was designed to be the language of choice for modern operating systems, meant to replace C.
Maybe to some people "operating system" means anything that's not a GUI, but according to standard definitions it is not what C++ was designed for. Even Stroustrup generally avoids any such claims.
Yeah, that's great, but if the redundant copy or parity is stored physically near the primary data it's likely to be taken out by the same scratch...and if it's not then you have a real potential performance problem seeking back and forth etc. Data reliability for denser media really is an issue, as magnetic tape and disk makers have known for decades.
If it's done as a shell namespace extension it's not quite the same as a true filesystem (some programs still won't work with it) but it's a cool hack just for the heck of it anyway. It reminds me of using Apple LaserWriters as compute servers (back when they had more CPU and memory than the Macs that connected to them) or building an SMP out of 8-bit processors.
There's this thing called "fork and exec" which has been out for awhile, which very easily enables an application to scale to N CPUs.
Only if your problem decomposes nicely into pieces that can be forked off and basically forgotten about (i.e. very little communication or data sharing between it and the parent).
Apache for example, will nicely scale to lots of CPUs
There are lots of things that are good about Apache, but performance and scalability are not among them. Just about every other server from Zeus and Roxen to thttpd and Boa does better, mostly by abandoning the very fork-based concurrency model you suggest. Anybody with even one percent of a clue about such things knows that forking profligately is one of the best ways to ruin performance even on an OS that does such things well.
I was struck by something while reading this passage:
Most people who look at the source code for open source software don't explicitly look for security bugs. Instead they likely have in mind a particular piece of functionality that they want to augment
Not only is that sort of developer not looking for security bugs, but they're pretty likely to be just getting their feet wet working on that project and might well introduce a bug. Then, there's a significant possibility that nobody else cares about the feature that one developer added to scratch their own itch, so nobody's going to look at the code that implements it. Yes, there are more eyeballs, but those eyeballs are not evenly distributed. There are certain pieces of code that everybody is looking at, and there are vast tracts of code that practically nobody is looking at - none with an eye toward security. How many Linux drivers have you looked at? I'll bet the majority of the people reading this haven't really looked at any Linux kernel/driver code whatsoever. Have you looked at the code for Apache? Perl/Python/Ruby? MySQL? Gcc? Open-source users outnumber programmers a hundred to one, and each developer has a fairly narrow area that they're either interested in looking at or qualified to look at, so the number of eyeballs on some piece of code implementing an unpopular feature in a popular package is nowhere near what some people seem to think. It might be dozens, it might be one, and quite often it will be zero once the guy who wrote it moved on to something else. That's no better than the almost-always-one you'll get with commercial software, and sometimes it's worse.
I worked at Mango, and the product you mention (called Medley at the time) was kind of cool in certain ways, but it wasn't really scalable in the way it would need to be for this purpose. Performance was barely adequate on a switched 100Mb/s Ethernet LAN, but it would have totally fallen apart on anything with more latency than that; if all of the copies of your data are on one LAN they're effectively in the same place. There were plans under way when I left to make a version (effectively a total rewrite) that could tolerate WAN latencies, but I don't think they ever got there.
Farsite. HiveCache. I even worked on a commercial offering: Mangomind (called Medley at the time). Some of these weren't positioned as backup solutions but, structurally, they're just like what Cringely describes. There have been many others, but I'll let people Google for themselves.
Hey, I didn't say the correlation was strong, or solidly proven - just that it was well studied. Even the paper's weak correlation was better than the grandparent poster's wild guess. One of my pet peeves is people who guess about things that they can find out for themselves in about a minute (or should already know).
The simple answer is in the subject line. If you do something that's too much like work, it will seem like work. Even if what you do is explore ideas that occurred to you in the context of work (e.g. infrastructures/algorithms that were deferred until a future release) it's probably going to seem like work. What you need to do is something completely different. For example, my work involves the confluence of kernel programming, distributed systems, and storage. The important parts are all written in C/C++. So what do I do on my own time? I hack on the code that runs my website (in PHP) or a backup/synchronization tool (in Python) or play around with automatic code rewriting (Python again, though it's manipulating C parse trees). Sometimes there's a bit of overlap, but for the most part the programming I do on my own time has a completely different "flavor" than what I do at work. That, plus a recognition that my personal projects will need to be suspended and resumed as higher priorities (work, family life, etc.) intervene, helps keep me happy with programming both at work and at home.
They are Software Engineers using latest-generation tools and languages, design patterns and best practices, object-oriented techniques and integration technologies like message queues, not to mention web services and remoting. And incidently, they're still employable.
Your attempt to put a very particular type of programmer with very particular set of habits and preferences and skills (ask a kernel/embedded programmer about "web services" or "remoting") above Graham's mere hackers is an exact mirror of Graham's own "great hackers are just like me" arrogance. If I may presume to make one comment about what makes a great hacker/engineer/programmer, I'd say that it's someone who recognizes that other people can be great without being exactly the same. In fact, they recognize that people they find intensely annoying might also be great. The people who are truly great welcome the chance to compare and contrast their ideas with those who have different design aesthetics, who use different tools, etc. Where do you think all of those design patterns came from? They came from people who were capable of understanding tradeoffs at that level, presented in cookbook form for those who can do little more than follow instructions.
I used to have a Shuttle SV24 and, even replacing the fan and hacking the voltage down to 7V, it really wasn't as quiet as I'd hoped. Maybe the newer ones with the heatpipes are better, but I haven't seen any evidence solid enough to balance out the price premium. My second try was a VIA Eden in a Morex 688 case, but the power supply was just so crappy that the system wasn't stable so now it's in a Chyang Fun e-Note which stable but much bigger and louder. *sigh* Most of the components from that SV24 are now sitting in a different system based on an Antec Aria microATX case. It's a little bigger and it's still not exactly silent, but it's definitely a lot closer to my goal of a reasonable balance between capabilities and decibels. The simple fact is that, as near as I can tell there are no systems out there that are both small and quiet without being crippled. For now you have to pick which two matter, or compromise on all three.
Ahhh, I forgot that you were the idiot who suggested leaving many thousands of TCP connections open all the time. I guess I assumed that was so beneath contempt that it couldn't possibly be what you were talking about. Tell you what, Russ. Why don't you go implement a few TCP stacks, then come back to us? Yes, I've done it. Your hand-wave about doing it out in user space with a database doesn't really solve the problem either. For one thing, vendors will not recompile their web servers to use your library for TCP. Just as importantly, putting it out in user space doesn't make the memory-consumption problem, which another poster already pointed out is not just TCP itself but other context that gets layered on top of each connection. People really do have better things to do with their memory and swap space than use it to store context for idle connections, and better things to do with their development time than solve the management problems that such large numbers involve.
You really do need to get some of those connections off the origin server altogether, which is where hierarchical or mesh approaches come in. As I mentioned in a previous post which you obviously didn't read before posting more ignorant garbage, that can still be a pull model using TCP if you feel you have to - it's just that not everyone is pulling directly from the origin. Take a bow yourself, Russ, for being the person in this thread who is most obviously uninformed, unqualified, and uninterested in listening to what others who overcame those failings a decade ago have to say. You're the George Bush of this thread.
This might come as a blow to your ego, Russ, but it's not safe to assume that everyone has read every single suggestion you've ever made. I have no idea what you're talking about when you say you've suggested a solution to some obvious problems that I never referred to...and I don't care. If you wanted to be more helpful than combative maybe you could provide a link to this perfect solution in the places where you attempt to lambaste people for not knowing it.
Plus, with your system, the server would have to send a billion updates every time any new content appeared.
Only if there were a billion separate subscribers querying the origin server directly, but you raise a good point. In the architecture description for a project at my last job, which was pretty closely related to this problem, I described two fundamental causes of wasted bandwidth. #1 was sending the same data over the same link multiple times when one would have been sufficient - the problem with RSS as it currently exists. #2 was sending data over a link once when zero would have been sufficient (i.e. it's never needed at the far end). I think I also referred to these as Scylla and Charybdis, but maybe that was something else. Anyway, the point is that neither extreme works optimally. What you have to look at - this is a variant of the first rule of optimization as described in Hennessy and Patterson's Computer Architecture: A Quantitative Approach - is the relative frequency of each error. How many wasted requests are made with the current system, vs. how many wasted change notifications would there be in the system I proposed above? Also, how would those wasted requests/notifications be distributed?
I still contend that the "distributed push" system would be an improvement over the status quo, but the approach I actually used in the aforementioned project might be even better. There, it was primarily a "pull" model, though push was supported as well when a reasonable prediction could be made about a need for data (e.g. to deal with demand as people around the world get to work and start their morning surf). Pulling data didn't just pull it to the original requester, though; it also pulled copies into a series of intermediate caches so that future requests near that first requester wouldn't have to go all the way to the origin. Note that such an approach can use either polling or asynchronous invalidation (or both) for consistency, and still benefit from the distributed caching. I know it works, even with full consistency, because I did it over two years ago and it scaled just fine. The subsequent fate of that project has to do with the incredible short-sightedness of corporate weasels at a certain large storage vendor, and doesn't reflect on the technology at all. Maybe if I had some spare time I'd apply some of those ideas (the ones that aren't being patented) to a better distributed-RSS system.
Like Akamai and other similar distributed content providers. That's what they were invented for.
...except that they don't really propagate update notifications. Just data. It's really a cache-consistency problem in disguise, but I didn't expect most slashdotters to grok that.
BitTorrent as currently constituted would not work, because it's necessary to propagate change notification as well as data and BitTorrent doesn't do that. The closest technological fit, amusingly enough, predates P2P as most people know it. It's good old NNTP, which had to deal with the exact same problem over a decade ago and still does so even for very large networks.
Assuming that your compiler will fix your performance problems is as bad as assuming that it won't. Inline is a good example because (a) not everything can be inlined, and (b) inlining can create costs. If you use an inline function many places, it can increase code size and correspondingly reduce cache efficiency. If your compiler can handle the inline just fine but your debugger can't (e.g. screws up line numbers or breakpoints) your "fix" might waste valuable programmer time. Are inline functions handy? Sure. Should they be anywhere near the first set of things you should consider when you need to improve performance? Hell, no. Pick a good top-level structure, profile, and tune low-level algorithms. Using "inline" or "restrict" habitually to improve performance is as bad as unrolling your own loops or adding your own induction variables habitually for the same purpose.
...and you need it, because the cache/memory speed disparity is much greater on that system. Don't repeat Intel's "bigger numbers are better" mistake; 2MB plus faster memory is a better design tradeoff today than 8MB plus slower memory.
For me it was clearly bots. I could practically watch them crawl the site, finding the page for each post and adding a comment...each time from a different IP address even though they were obviously coordinated. I added a comment password, displayed prominently where humans can see it and paste it into the box but where a bot wouldn't have been programmed to look for it. Problem solved; haven't had a single comment spam since. :)
On my website 90% of the comment spam was from online poker sites. That added up to hundreds of messages per day that I had to delete, and I know many others had similar experiences. I know I was thinking that they deserve a lesson, and maybe some folks decided to teach them one. While I don't necessarily approve of the method, I fully understand the impulse. Many online gambling sites are run by pricks; I won't shed a tear for them and their self-inflicted troubles any more than I would for the RIAA/MPAA.
Taking a hybrid centralized/distributed protocol and making the centralized parts distributed piecewise is so obvious that any clueful person who ever looked at BitTorrent would have thought of it already. Heck, I was doing it before there even was a BitTorrent, taking a hierarchical caching system and creating a distributed version of the root server. It's nice that someone's really doing it for BitTorrent, but unless they're soliciting developers or have a prototype working I wouldn't call it news.
When are you going to start talking about big iron, and not this piddly two-CPU stuff?
It's not so much that C++ itself is so bloated etc. but that some of the common C++ programming idioms - full of templates, overloading, deeply nested class hierarchies with virtual methods, etc. - can lead to bloat. It's never good to have what looks like a simple assignment statement turn into thousands or even tens of thousands of instructions, but it's particularly dangerous in the kernel. Templates are particularly bad because they get re-instantiated for every type they're used with and that can very rapidly cause executable sizes to balloon out of control. They're a non-solution to a problem that's more than adequately solved by treating types as first-class objects, or by lower-level methods such as explicit dispatch-table pointers.
Sure, you can tell your developers to use a subset of C++ that doesn't have such problems, but what's the point? Why pay the runtime-support and tools cost and require greater vigilance in code reviews just for some syntactic sugar (which is all that would be left)? I'm not absolutely opposed to using C++ in a kernel where it's already supported, but there are costs that would have to be justified compared to just using C.
Maybe to some people "operating system" means anything that's not a GUI, but according to standard definitions it is not what C++ was designed for. Even Stroustrup generally avoids any such claims.
Yeah, that's great, but if the redundant copy or parity is stored physically near the primary data it's likely to be taken out by the same scratch...and if it's not then you have a real potential performance problem seeking back and forth etc. Data reliability for denser media really is an issue, as magnetic tape and disk makers have known for decades.
If it's done as a shell namespace extension it's not quite the same as a true filesystem (some programs still won't work with it) but it's a cool hack just for the heck of it anyway. It reminds me of using Apple LaserWriters as compute servers (back when they had more CPU and memory than the Macs that connected to them) or building an SMP out of 8-bit processors.
Only if your problem decomposes nicely into pieces that can be forked off and basically forgotten about (i.e. very little communication or data sharing between it and the parent).
There are lots of things that are good about Apache, but performance and scalability are not among them. Just about every other server from Zeus and Roxen to thttpd and Boa does better, mostly by abandoning the very fork-based concurrency model you suggest. Anybody with even one percent of a clue about such things knows that forking profligately is one of the best ways to ruin performance even on an OS that does such things well.
I was struck by something while reading this passage:
Not only is that sort of developer not looking for security bugs, but they're pretty likely to be just getting their feet wet working on that project and might well introduce a bug. Then, there's a significant possibility that nobody else cares about the feature that one developer added to scratch their own itch, so nobody's going to look at the code that implements it. Yes, there are more eyeballs, but those eyeballs are not evenly distributed. There are certain pieces of code that everybody is looking at, and there are vast tracts of code that practically nobody is looking at - none with an eye toward security. How many Linux drivers have you looked at? I'll bet the majority of the people reading this haven't really looked at any Linux kernel/driver code whatsoever. Have you looked at the code for Apache? Perl/Python/Ruby? MySQL? Gcc? Open-source users outnumber programmers a hundred to one, and each developer has a fairly narrow area that they're either interested in looking at or qualified to look at, so the number of eyeballs on some piece of code implementing an unpopular feature in a popular package is nowhere near what some people seem to think. It might be dozens, it might be one, and quite often it will be zero once the guy who wrote it moved on to something else. That's no better than the almost-always-one you'll get with commercial software, and sometimes it's worse.
I worked at Mango, and the product you mention (called Medley at the time) was kind of cool in certain ways, but it wasn't really scalable in the way it would need to be for this purpose. Performance was barely adequate on a switched 100Mb/s Ethernet LAN, but it would have totally fallen apart on anything with more latency than that; if all of the copies of your data are on one LAN they're effectively in the same place. There were plans under way when I left to make a version (effectively a total rewrite) that could tolerate WAN latencies, but I don't think they ever got there.
Farsite. HiveCache. I even worked on a commercial offering: Mangomind (called Medley at the time). Some of these weren't positioned as backup solutions but, structurally, they're just like what Cringely describes. There have been many others, but I'll let people Google for themselves.
Hey, I didn't say the correlation was strong, or solidly proven - just that it was well studied. Even the paper's weak correlation was better than the grandparent poster's wild guess. One of my pet peeves is people who guess about things that they can find out for themselves in about a minute (or should already know).
Actually it's a pretty well studied connection.
The simple answer is in the subject line. If you do something that's too much like work, it will seem like work. Even if what you do is explore ideas that occurred to you in the context of work (e.g. infrastructures/algorithms that were deferred until a future release) it's probably going to seem like work. What you need to do is something completely different. For example, my work involves the confluence of kernel programming, distributed systems, and storage. The important parts are all written in C/C++. So what do I do on my own time? I hack on the code that runs my website (in PHP) or a backup/synchronization tool (in Python) or play around with automatic code rewriting (Python again, though it's manipulating C parse trees). Sometimes there's a bit of overlap, but for the most part the programming I do on my own time has a completely different "flavor" than what I do at work. That, plus a recognition that my personal projects will need to be suspended and resumed as higher priorities (work, family life, etc.) intervene, helps keep me happy with programming both at work and at home.
Your attempt to put a very particular type of programmer with very particular set of habits and preferences and skills (ask a kernel/embedded programmer about "web services" or "remoting") above Graham's mere hackers is an exact mirror of Graham's own "great hackers are just like me" arrogance. If I may presume to make one comment about what makes a great hacker/engineer/programmer, I'd say that it's someone who recognizes that other people can be great without being exactly the same. In fact, they recognize that people they find intensely annoying might also be great. The people who are truly great welcome the chance to compare and contrast their ideas with those who have different design aesthetics, who use different tools, etc. Where do you think all of those design patterns came from? They came from people who were capable of understanding tradeoffs at that level, presented in cookbook form for those who can do little more than follow instructions.
"What great hackers have in common is they're a lot like me (or at least like I imagine myself to be)"
What an amazing ego.
I used to have a Shuttle SV24 and, even replacing the fan and hacking the voltage down to 7V, it really wasn't as quiet as I'd hoped. Maybe the newer ones with the heatpipes are better, but I haven't seen any evidence solid enough to balance out the price premium. My second try was a VIA Eden in a Morex 688 case, but the power supply was just so crappy that the system wasn't stable so now it's in a Chyang Fun e-Note which stable but much bigger and louder. *sigh* Most of the components from that SV24 are now sitting in a different system based on an Antec Aria microATX case. It's a little bigger and it's still not exactly silent, but it's definitely a lot closer to my goal of a reasonable balance between capabilities and decibels. The simple fact is that, as near as I can tell there are no systems out there that are both small and quiet without being crippled. For now you have to pick which two matter, or compromise on all three.
Ahhh, I forgot that you were the idiot who suggested leaving many thousands of TCP connections open all the time. I guess I assumed that was so beneath contempt that it couldn't possibly be what you were talking about. Tell you what, Russ. Why don't you go implement a few TCP stacks, then come back to us? Yes, I've done it. Your hand-wave about doing it out in user space with a database doesn't really solve the problem either. For one thing, vendors will not recompile their web servers to use your library for TCP. Just as importantly, putting it out in user space doesn't make the memory-consumption problem, which another poster already pointed out is not just TCP itself but other context that gets layered on top of each connection. People really do have better things to do with their memory and swap space than use it to store context for idle connections, and better things to do with their development time than solve the management problems that such large numbers involve.
You really do need to get some of those connections off the origin server altogether, which is where hierarchical or mesh approaches come in. As I mentioned in a previous post which you obviously didn't read before posting more ignorant garbage, that can still be a pull model using TCP if you feel you have to - it's just that not everyone is pulling directly from the origin. Take a bow yourself, Russ, for being the person in this thread who is most obviously uninformed, unqualified, and uninterested in listening to what others who overcame those failings a decade ago have to say. You're the George Bush of this thread.
This might come as a blow to your ego, Russ, but it's not safe to assume that everyone has read every single suggestion you've ever made. I have no idea what you're talking about when you say you've suggested a solution to some obvious problems that I never referred to...and I don't care. If you wanted to be more helpful than combative maybe you could provide a link to this perfect solution in the places where you attempt to lambaste people for not knowing it.
Only if there were a billion separate subscribers querying the origin server directly, but you raise a good point. In the architecture description for a project at my last job, which was pretty closely related to this problem, I described two fundamental causes of wasted bandwidth. #1 was sending the same data over the same link multiple times when one would have been sufficient - the problem with RSS as it currently exists. #2 was sending data over a link once when zero would have been sufficient (i.e. it's never needed at the far end). I think I also referred to these as Scylla and Charybdis, but maybe that was something else. Anyway, the point is that neither extreme works optimally. What you have to look at - this is a variant of the first rule of optimization as described in Hennessy and Patterson's Computer Architecture: A Quantitative Approach - is the relative frequency of each error. How many wasted requests are made with the current system, vs. how many wasted change notifications would there be in the system I proposed above? Also, how would those wasted requests/notifications be distributed?
I still contend that the "distributed push" system would be an improvement over the status quo, but the approach I actually used in the aforementioned project might be even better. There, it was primarily a "pull" model, though push was supported as well when a reasonable prediction could be made about a need for data (e.g. to deal with demand as people around the world get to work and start their morning surf). Pulling data didn't just pull it to the original requester, though; it also pulled copies into a series of intermediate caches so that future requests near that first requester wouldn't have to go all the way to the origin. Note that such an approach can use either polling or asynchronous invalidation (or both) for consistency, and still benefit from the distributed caching. I know it works, even with full consistency, because I did it over two years ago and it scaled just fine. The subsequent fate of that project has to do with the incredible short-sightedness of corporate weasels at a certain large storage vendor, and doesn't reflect on the technology at all. Maybe if I had some spare time I'd apply some of those ideas (the ones that aren't being patented) to a better distributed-RSS system.
...except that they don't really propagate update notifications. Just data. It's really a cache-consistency problem in disguise, but I didn't expect most slashdotters to grok that.
BitTorrent as currently constituted would not work, because it's necessary to propagate change notification as well as data and BitTorrent doesn't do that. The closest technological fit, amusingly enough, predates P2P as most people know it. It's good old NNTP, which had to deal with the exact same problem over a decade ago and still does so even for very large networks.