The vtbl is not ever written to. On a unix system it is part of the ELF or a.out code area. The linker figures out how it looks, and it is not changed at runtime.
If you think about it for a while, you'll realize that there are situations that the linker can't handle, and therefore there must be run-time patch-ups for at least those cases. You really should be more careful about using words like "never".
using a ld.so that it got from FreeBSD, which uses a ld.so borrowed from, or inspired by the Linux version
And I suppose those are the only OSes that matter, eh?
ld.so does similar things for the C code, and it also has minimal OS support (the mprotect(2) call, make the stub tables non-executable+writable, change them, make them executable+read-only)
And do you suppose that mprotect is free? Or might this be one of those hidden costs whose existence you've been denying?
That's an opinion based on whether you view it as a kludge or not.
You got that backwards. I view it as a kludge or not based on the effect it has, instead of assuming it's not a kludge and then trying to deny effects to back up my opinion. IMO if it slows down the system as a whole *or* if it makes code elsewhere significantly more complex to support it, it's a kludge.
The right thing to try would be to convert from JMP JMP to a indirect JMP. If the indirect JMP is faster then you are right for that workload. If the JMP JMP is faster then I'm right for that workload
Not quite. The whole point here is that it's not sufficient to compile a C++ program and run it and compare the timings. It's also important to factor in the overall performance, maintainability, and other costs of making that program run faster and supporting the hacks that it uses. Remember what I said about hidden costs, or shifting load?
Yeah but the assumption would probably be 16 bytes because that is a really really common number
Yeah, and nobody ever got in any trouble by forgetting the difference between "really really common" and "universal for all time" right?
There is no need to stoop to insulting your debating partner.
When my debating partner is obstinately straying from the rules of debate, I actually do feel they deserve a little slap on the wrist. The crux of this whole debate is your statement (in cid#635):
The hidden costs the other poster was talking about [me, in cid#595] don't exist.
What annoys me is not that the statement was made, but that it wasn't retracted the first time it was refuted. Instead, I've had to put up with your topic changes, buzzword storms, squishy definitions, and all manner of other evasions. Frankly, I don't appreciate the extra work. I wouldn't treat you like an errant debate pupil if you'd stop acting like one.
C++ using the double jump doesn't make these problems any worse
Ahhh, but it does. On an architecture designed around the "i-space modification is rare" assumption, writing to a vtbl *even* at object-creation or class-loading time incurs a substantial overhead in exception handling, VM activity, the aforementioned cross-processor interrupts, etc. This is different from the modifications that must occur at image-load time (including DSO-load time) because those have distinct boundaries and the OS can treat pages differently during that period than afterward. Maybe if parts of the C++ runtime were integrated into the OS loader this could be handled more efficiently, but that's a heinous idea for other reasons.
Similarly, the whole point of the double-jump seems to be to abuse the BTB for performance. I call it abuse because every method pointer that's stuffed into the BTB is one less BTB entry that can be used for *real* branches. Also, the BTB is just a small, very fast special-purpose cache; there's another cache - the L1 - right nearby that could also contain that same information. So you save yourself a cycle on the method dispatch (if repeated) by using the BTB instead of the L1, in return for which you create a nice fat pipeline bubble for someone else when they hit a branch that would have fit in the BTB if not for your shenanigans. That's not a win, it's just shifting the load.
what do define "false sharing" as
I sincerely hope you're asking how false sharing applies to this particular situation, not what false sharing is, because if you meant the latter then you should be reading H&P instead of posting here. False sharing is an issue because a single cache line on a modern processor is likely to span multiple vtbl entries. Naive vtbl-patching code that does manual icache invalidation would therefore be likely to go through all that overhead multiple times. Ick. The only alternative would be to have the vtbl-patching code be *deeply* aware of the local machine's cache line size (i.e. not just hidden in some memory-munging library routines). Also ick. That kind of machine-specificity needs a reason, and there just doesn't seem to be much of one so far.
Care to let me know what the topic is?
The same as it has always been, Sparky: whether double jumps as an alternative to indirect junks are a reasonable or sucky idea. If you're having trouble making the connections between the issues we're discussing and that basic point, let me know and I'll dumb it down a little more for you.
I'm sure we could have a very interesting discussion about the relative merits of double jumps vs. indirect jumps if you'd cooperate, because you seem to know more than most/.ers about how CPUs work. However, as long as you're going to deny that these systemwide costs exist at all - things like false sharing, extra interprocessor communication in an SMP system to do TLB shootdowns, pollution of the BTB when the regular L1 cache is damn near as good - then that's not going to happen. How disappointing.
You can bitch about the vtbl being immutable is a bad requirement, and prevents C++ from being as flexible as Ruby. But that is a different topic.
No, it's the same topic because it impacts the same solution.
You can argue that there are hidden costs in how the CPU thinks about i-space, but that is a very different argument.
No, it's the same argument because it impacts the same solution. Please stop trying to redefine the topic to suit yourself.
With the sole exception of the x86 all modern CPUs are broken? They all assume that i-space is very seldom altered
"Seldom" is not equal to "never", and we were talking about the assumption that i-space would *never* change because that's the only assumption that would make the proposed solution seem reasonable.
From a system standpoint making self modifying code faster makes everything else slower.
We're not talking about self-modifying code, as much as you seem to be hoping that the taint associated with that phrase will stick to anyone who disagrees with you. We're talking about mutable data in i-space, and about the nasty hack of using double jumps with the intermediate target in i-space to "trick" CPUs and make method dispatch a cycle or two faster without considering the effect of such a hack on the rest of the system.
But you're almost right. Making this particular hack work faster makes the rest of the system slower. That's exactly the point. Congratulations on finally getting it.
If there are few bugs found and fixed, you can be confident you have a good system going out. If there are a lot of bugs found and fixed, I would worry that there are a lot more left undiscovered.
I disagree. If there are few bugs found and fixed, that could be a sign that the testing is inadequate. I once worked on a project where the customer expressed a concern about our bug counts being too low. This seemed counterintuitive at first, until they explained that in their (very broad) experience this more often indicated something about testing than about the product. Sure enough, we stepped up testing, found more bugs, and went on to deliver a much higher quality product than if we had sat around patting ourselves on the back for our low bug counts.
What's more important than the absolute bug count is the *trend* in bugs reported. Every non-trivial project can be expected to exhibit an increasing rate of bug reports as QA ramps up and as components become ready for testing, followed by a plateau, followed - hopefully - by a steady decrease as the bugs get nailed without introducing new ones. If the plateau goes on too long, or if the extrapolated decrease at the tail and does not reach an acceptable level by shipment time, there's a problem. If the rate starts to go back up at that late a stage, there's an even bigger problem: your programmers are getting desperate, and checking in "fixes" that introduce more new bugs. If you don't see a "hump" at all, it probably means your testers aren't doing their jobs (or possibly not getting the help they need from developers).
All of these patterns are readily apparent if you know to look for them. I'd say you're right that the purpose of QA is statistical, but I'd say it's a statistical measure of *process* as well as product quality, and if the process is broken you can't know anything about the product.
Severity vs. Priority
As several other posters have pointed out, it's important to make a distinction between severity and priority. I would also say that it helps to define severity as narrowly as possible, to refer only to the effect of a bug when it occurs. I've already seen some people who accept the severity/priority distinction make the mistake of letting other factors (e.g. availability of a workaround) be factored into severity when they really only affect priority. Priority is the "net net" of severity, frequency, relationship to core/extended functionality, availability of a workaround, and possibly even cost to fix.
Factors affecting severity
Even within a limited definition of severity, there are multiple factors to be considered - specifically scope, magnitude, and duration of damage. Scope is a matter of what was affected: the whole system, the program, access to particular features. Magnitude is a matter of how complete the failure is: did the affected component(s) fail totally, did it hiccup and then continue, did it suffer a performance degradation, etc. Lastly, duration is simply a question of transient vs. permanent errors. When you've characterized the bug according to these three criteria it's pretty easy to use a point system to sum it up into a single severity value (though it never hurts to record the details). Likewise, once you know severity you can use a point system incorporating that and other factors into a priority value. Obviously the assignment of point values can be a complex and contentious process, but it only needs to be done once and once it's done the assignment of both priorities and severities becomes so much clearer that it's worth the effort.
Other Points
Just some random thoughts about bug classification, in no particular order.
In some systems, one is the highest priority/severity. In other systems it's zero. It helps to be clear which kind of system you're talking about.
Bug severity is specialty-specific. For example, most people consider a host crash to be the most severe kind of error. In data storage, though, hosts can come and go but data loss or corruption are the most severe kinds of problems. It's not hard to think of examples in military or medical applications where other types of bugs are considered most severe.
Saying "it's irresponsible to ship with priority one bugs" without even making a distinction between severity and priority is itself irresponsible. It may in fact be reasonable to ship with a severity one bug if it only affects a certain configuration, occurs very rarely, a workaround exists, it would be very expensive to fix, etc. As always, the risk has to be weighed against the benefits - e.g. to customers who might be waiting anxiously for the new release to fix problems they already have.
If bug A is a "follow-on" to bug B, A cannot have a higher priority than B. This seems so obvious, and yet so many companies seem to miss it.
Exactly. The hidden costs the other poster was talking about don't exist.
You're so full of crap. Yes, those hidden costs do exist and must be paid *even though* the contents of the table are in fact effectively constant. That's the problem. You still have to invalidate the i-cache, you still have to forego VM-level optimizations, etc. because the values *could* change.
Well, it doesn't so much fool the CPU into thinking the location is constant as to actually inform it that the location is constant.
It does no such thing. Any CPU that makes such an assumption about the immutability of i-space could be considered broken. That breakage can be worked around by having the VM system play nasty tricks with making i-space pages read-only etc., but the cost of having to cover for the CPU's deficiencies like that is much greater than the benefit. Try looking at the problem from a *system* standpoint for a change, instead of a myopic "how can a CPU designer avoid work" standpoint.
If you are coding an I/O intensive application, chances are that the scripting language with run about as fast as your tuned C or assembly. Your hard drive or your network card will usually hobble the most carefully tuned C or assembly
What if you're coding the firmware for that hard drive, or the driver for that network card? Where's your scripting language then? That's why a lot of people use C, and that's the code to which I think the previous author was referring.
It's very hard to write good code in low-level languages.
Hard? Yes. Impossible? No. Many people have to write in C because that's the only language supported in their environment (e.g. a kernel or RTOS). These are often systems where the requirements for reliability, maintainability, etc. are very high, and the quality tends - of necessity - to be correspondingly high. My C is more object-oriented than 90% of the code I've seen in languages designed for OOP, for example, and I'm sure I'm not the only person for whom that's true.
Note that the previous author didn't say that the best code is *always* or even *usually* written in very low-level languages. He just said that it *often* is, and that's true.
Many CPU's (pretty much everything other then the x86 -- and other really old things like the 390) require a (i-)cache invalidation between modifying code and executing that code. The cache invalidation will also invalidate the BTB. So the CPU can feel free to use the BTB to optimize a JMP-JMP sequence, but not to optimize a indirect jump.
That seems like a classic example of not counting all the costs. The double jump might in and of itself be less expensive than an indirect jump, but there are hidden costs involved:
Manual cache invalidation isn't cheap. It requires a lot more interlocking within the MMU than a typical instruction, so you pay a penalty every time you create an object.
Invalidating i-cache may blow away unrelated (but needed) instructions because of false sharing.
The object-creation code is now messier and more system-dependent.
Mixing instruction and data spaces precludes a whole class of VM-system optimizations.
When you do count up all the costs, using double jumps is a tremendously stupid idea. The double jump itself might be faster than an indirect jump, but that's outweighed by the overall negative effect on the system as a whole.
It's as convenient as having an automatic transmission in your car. No need to shift manually. That's one of the reasons why I use those sort of high level languages.
And I drive a manual, because I have yet to find an automatic that shifts in a way I consider reasonable.;-)
You miss my point I'm not obsessing about using bignum. I'm obsessing about numbers just being numbers in a high level language. I just want _numbers_
So just use bignums, not automatically-promoting small integers. Your numbers will just be numbers, and you'll never have a problem. Small integers should only be used in a higher-level interpreted language when it's *important* for them to be small numbers, and therefore probably important for them to *stay* small numbers.
I not a great programmer, so I'd let the experts do that sort of stuff.
So maybe you should listen to those experts when they say that *if* you took the trouble to specify a particular integer size other than the largest available you probably mean for it to stay that size.
But so far looking at Bugtraq, there aren't that many experts in writing bulletproof stuff in low level languages.
Auto-promoting integers won't help with that. In fact, Ruby won't help with that. If you want to approach that particular problem (secure systems) at all through language design, you'd have to go to a language with at least some of the following features:
Mandatory specification of actual integer ranges, with strict checking.
Total type-safety.
Strict access checking (e.g. capabilities).
Sandboxing.
Enforcement of arbitrary programmer-defined correctness conditions (not just assertions).
Does Ruby fit that requirement? No. Nothing that borrows so heavily from Perl will ever even come close. Among widely-adopted languages, Ada or Java probably come closest, and there are a bunch of academic languages that go even further. Ruby might eliminate a couple of classes of security problems arising from overflows and such, but the same could also be said for most other scripting languages. Python, Perl, or Tcl programs are as secure as Ruby programs, and VB less so only because of usage rather than the language itself.
There are a few FC-AL enclosures, but only their very biggest systems use FC-AL drives, the rest use SCSI for the drives via controllers / convertors. I wonder how long it will take for prices of all-FC systems to drop to "human" levels..?
Firstly, FC-AL is pretty much dead for high-end systems, which all use switched FC nowadays.
Secondly, the very biggest storage systems out there still use SCSI inside, not FC...but it doesn't matter. It doesn't matter at all what's *inside* the box, because the whole point of such a system is to achieve high performance through aggregation of small channels and/or to avoid the channels altogether by using a huge cache. Sure, the drives are SCSI, but there are hundreds of them, on dozens of separate SCSI buses. There's plenty of internal bandwidth to make the drives do all that they're capable of doing, so the question is not "why use SCSI when FC is faster" but rather "why use FC when SCSI is fast enough".
Given that Plain Old SCSI is still doing yeoman's service as a back-end interconnect, and that FC components will probably always be more expensive than SCSI components, there's just no benefit to being "all-FC". Anyone who says otherwise is just putting marketing spin on a questionable engineering decision
Disclaimer: I work for EMC, but I don't speak for them (nor they for me). The above is all public information, personal opinion, and simple common sense, unrelated to any EMC trade secrets or marketing hype.
What is the point of throwing an exception if a number is too big? Hey Computer Language/Program, I told you to multiply two numbers, so you just do it.
What if the result is bigger than it was ever "supposed to be" according to some program-specific constraint? What if an index or hash or address calculated from the new value causes you to trash memory or return the wrong data? What if it only happens once in a while, and you never knew the offending code had anything to do with those mysterious crashes that just seemed to happen "every once in a while" so that you *thought* it had gone away but you actually shipped with the bug still there? Undisciplined programmers might not like it, but being forced to think about and deal with boundary conditions properly is A Good Thing for software quality.
And if you ask me, counting to infinity is preferable to wraparound.
For you. In your opinion. Not for everyone. That's just the problem: languages that do these sorts of things are making decisions that programmers should be making. Sure, having an integer type that can handle larger sizes but that uses more efficient smaller ones when the values fit might be very convenient, but having that be a default, unchangeable property of the smaller integer types is bad.
If the programmer really wanted a wraparound, then the programmer would be more likely to think of writing it in.
And if the programmer wanted to handle large integer values, they should use a large integer type. The point is that *the programmer should choose*. If you have an exception, you can catch the exception and choose what to do with it, but you can't just punt. If you want to switch to using larger integers, fine. If you want to allow wraparound, fine. If you want to treat it as a fatal error, fine. What's important is that you as the programmer have to make an explicit choice about what to do. Abdication of that choice, expecting the language to take up the slack, is not an option.
I've written machine code programs where cycle counting was important - all the various paths had to take exactly X cycles. Maybe I'll do that sort of stuff again if I have to, but to do nitty gritty stuff like this for every program is a waste of time.
That's exactly it. If it's such a waste of time worrying about nitty gritty stuff like integer sizes, then why are you obsessing over just using a bignum for something that might hold a large value? It's just not a problem for most people or programs. The int-to-bignum thing is purely a performance hack; if performance is that critical you shouldn't be using an interpreted language anyway, and the difference between multiplying ints or multiplying bignums only matters to the cycle-counters. People who write in Ruby - or Python, or Perl - should be focusing more on program correctness and functional issues than on micro-optimizing interpreted code or expecting the language to do it for them.
You've added 18.8 ns over and above any protocol overhead (usually much worse) and that's at 10 Gb/s!
19 cycles for a 1GHz processor is actually not too bad; good large-system memory interconnects today are in the hundreds of cycles for anything but the very nearest memory, and even that latency can be pretty well hidden in NUMA or MT systems. The serial nature of the interconnect is simply not that big a performance issue in real memory-system design; the simplicity/physical benefits of serial protocols and cabling are much more important.
Plus, you can have fibre channel (not fiber) hard drives right now, from Seagate (example), IBM (example), etc., and the big storage guys are heading that way too.
Apple figured this out long ago when they came up with NuBus.
Apple didn't invent NuBus. My flaky memory tells me it was TI, which could be wrong, but it wasn't Apple. Apple merely selected NuBus for the Mac II, from among several alternatives that already existed at the time.
I have to agree with all the people who say that much of the problem has to do with the routing protocols in common use on the Internet. IMO part of that problem is that everyone has gone to link-state protocols; protocols in this family have certain desirable properties wrt loop-freedom and optimality, but slow convergence is a known problem with this approach. Personally, I've always been a distance-vector guy.
All of this came back to me recently as I was reading Ad Hoc Networking by Charles Perkins. It's about protocols intended for use in environments where mobile nodes come and go relatively frequently, where the links go up and down as nodes move relative to one another, and where there's no central authority to keep things organized. A lot of this work has been done in a military context - think of a few hundred tanks connected via radio, rolling across a large and bumpy battlefield. It turns out that distance-vector protocols are making a comeback in this environment because of their faster convergence and lower overhead compared to link-state protocols, and researchers have pretty much nailed the loop-formation and other issues. It also turns out that a lot of the techniques that have been developed for this very demanding environment could be useful in the normal statically-wired Internet, not just in terms of robustness but also in terms of giving power over connectivity back to the people instead of centralizing it in huge corporations.
I strongly recommend that people read this book, to see what's happening on the real cutting edge of routing technology. In particular, anyone working or thinking of working on peer-to-peer systems absolutely must read this book, because it describes the state of the art in solving some connectivity/scalability problems that many P2P folks are just stumbling on for the first time. I've seen many of the "solutions" that are being proposed to these problems in the P2P space; I can only say that P2P will not succeed if such stunning and widespread wilful ignorance of a closely related field persists.
Well, everyday on Bugtraq I see tons of examples of programmers who need to be saved from themselves.
You're seriously misinterpreting the significance of what you see on Bugtraq. Those are the results of programmer errors that *weren't even caught* but we're talking about what should happen after the error is caught. The problem with trying to "save programmers for themselves" is twofold:
Truly saving the programmer would mean guessing correctly what the programmer really meant. Software - written by other stupid programmers - can't do that with any accuracy. More often than not, you just end up exchanging one kind of error (e.g. wraparound to zero) for another (e.g. counting to infinity). Quite often, you'll be worse off than if you weren't trying to be so "clever".
If you cover the programmers' asses, they won't learn to cover their asses themselves. They will persist in sloppy habits, sooner or later they'll encounter a situation where the safety net they've come to rely on isn't there or is faulty (see above), and they'll create still more bugs.
The alternative, which I and a great many others (who've either studied this carefully or learned the hard way) prefer, is not to mask errors but to make them as explicit as possible as soon as possible. Force the *programmer* to make the decision about what should happen in that boundary case. If you can't do it with an error at compile time, do it with an exception at run time. The programmer will have no choice but to provide a *real* fix, and both the software and the programmer will be better for it.
If you're worried about an unhandled exception causing a crash, don't be. My experience in both high-availability systems and in security has taught me that a crash is better than a hang every time, and *way* better than the spooky unreproducible non-linear behavior that often results from software that tries to second-guess programmer intentions.
* Perlisms such as "@" and other syntactic markers make me ill. It's like scope by indentation, but much better...
Other than your personal (anonymous) opinion, what argument can you give for it being better?
* The syntax for iterators is ugly. Try using them...
Been there, done that, not in Ruby but in other languages. I stand by my claim that Ruby's syntax for them is ugly. I would much prefer a statement-like syntax over that gobbledygook.
Ruby has overloading only by different number of arguments, if that matters.
Untrue. It also has operator overloading, which is the most heinous kind.
While we're here, let's consider another of Ruby's warts:
return @songs[i] if key == @songs[i].name
That's straight from the Ruby book (chapter on blocks and iterators) and it's a construct only a Perl weenie could love. Let me give a non-programming equivalent:
To defuse the bomb, cut the red wire...if there's no blue wire.
Good way to get blown up, eh? Really, there's just no excuse for countertemporal counterintuitive crap like that in a language.
As I've said before, Ruby has some good features but also some warts. At least it's not all wart like Perl or Tcl, but there are other languages that also have good features and some warts. Compared to those languages, the case for Ruby's superiority is extremely weak at best. Use it if you like, have fun, be productive. More power to you. Just don't try to act all superior because your language choice is better than someone else's.
The thing is, ".length" is something you always can use. While len(x) only works for a limited set of types.
Again, big deal. It's not actually *useful* for x.length to return a value if x is an integer. The only possibly meaningful value for an integer would be the size of the integer, which is (a) not the same concept as the length of a list/array/hash, and (b) might change anyway due to Ruby's sneaky bignum conversions. In fact, having x.length return a value for an integer could be a source of error. If I expect an integer in Python and apply len() to it accordingly, I get a nice juicy exception to tell me I screwed up. If I do the equivalent in Ruby, I get a value and go blithely on my way to screw things up even worse.
No, thanks. Orthogonality is nice, but sometimes life just doesn't break down into nice orthogonal pieces. It's like the old saying about things being as simple as possible, but no more so.
Personally, I find ruby a lot simpler to use -- but python is always my second choice, and I have to be dragged into perl...
We all have our own tastes and biases. Personally, my few complaints about Ruby are pretty similar to many people's complaints about Python. For example, the #1 complaint about Python is obviously the whitespace thing. I have to admit it's not my favorite feature and I might very well have made the decision differently in GvR's place, but that's the way it is and instead of whining I cope and get on with my work. So maybe it doesn't make sense that I find many of the "cosmetic" or secondary aspects of Ruby so infurating. For example:
Perlisms such as "@" and other syntactic markers make me ill.
Using ">>" as a comment delimiter just feels wrong.
The syntax for iterators is ugly. While the idea seems cool in the abstract, the more I think about iterators the less I like them. In general, I'd prefer that complex behaviors be encapsulated in functions or objects, not in arbitrarily complex expressions that end up spreading over several lines. If you want that, use LISP.
None of that should bother me, but it does. I also have more substantive complaints with Ruby, just to show that appearance isn't the only thing I care about:
The private/protected/public stuff borrowed from C++/Java is just crap. IMO a language should either have a real data-hiding system or just do without (like - surprise! - Python).
Mixins just aren't the same as multiple inheritance. No, they're not. Really.
Ruby advocates love to justify mixins by pointing out how much trouble you can get into with MI - and then Ruby supports operator overloading, which is the #1 most effective mechanism used by C++ programmers to make a mess of things. I've heard that overloading can be done right, but I've stopped believing the claim. I've never once seen a real-world example, and many coding standards rightly disallow this blighted misfeatures' use.
To be fair, Ruby also has its strenghts:
Making everything a first-class object, accessed similarly, is at some level The Right Thing To Do.
Virtual attributes are a good thing.
Real GC is a good thing.
It looks like the extension interface does allow subclassing of C types, which would be a nice thing to have in Python.
So, when you add it all up, what do you get? Basically, Yet Another Language. Sure, it has some neat aspects. It also has some warts. Just like Python, pretty much (see my web page for some discussion of Python's warts). As a first script language, Ruby seems very much worth consideration, but I don't see any reason why someone already familiar with a decent language - almost anything but Perl or Tcl - should switch. In particular, for large or long-lived projects, I think Ruby's borrowing from Perl and C++ raises some genuine doubts about maintainability.
I'd rather Ruby (and Python) did the scoping without *any* lexical gyrations.
From a compiler or interpreter writer's point of view "@" is more of a lexical gyration than "self" - which is just an entry in a namespace, not syntactically interesting at all.
Part of the problem with Python's object model comes from how it is introduced. If you slog through the books, or th tutorial, it really gets short shrift.
That's simply incorrect. Both the tutorial and the book - I assume you mean the Lutz book - give OO a quite reasonable amount of attention.
And once you get used to the concept of iterators (we are all used to the concept of classes, loops, etc.), iterators are marvelously clear.
Perhaps so, perhaps not. Almost any language construct can be used to obfuscate code, and iterators are no different. Also, while I'm not generally in favor of catering to the lowest common denominator, the fact that iterators are slightly "exotic" with respect to other common languages does have implications for readability.
If you want the length of a string: "Hello".length;
BFHD. Yippee. I just don't find x.length all that superior to len(x). If "length" acts *exactly* like any other method - e.g. accessible by name, can be passed as a bound method, etc. - then maybe there's a certain aesthetic coolness to it, but other than that it's pretty meaningless.
You can never get overflow, numbers are automatically converted to the Bignum class if they get too large
That's actually a bad thing, IMO. I'd prefer that the language not try to "save me from myself" by doing such conversions behind my back. Give me an overflow exception, please.
If you think about it for a while, you'll realize that there are situations that the linker can't handle, and therefore there must be run-time patch-ups for at least those cases. You really should be more careful about using words like "never".
And I suppose those are the only OSes that matter, eh?
And do you suppose that mprotect is free? Or might this be one of those hidden costs whose existence you've been denying?
You got that backwards. I view it as a kludge or not based on the effect it has, instead of assuming it's not a kludge and then trying to deny effects to back up my opinion. IMO if it slows down the system as a whole *or* if it makes code elsewhere significantly more complex to support it, it's a kludge.
Not quite. The whole point here is that it's not sufficient to compile a C++ program and run it and compare the timings. It's also important to factor in the overall performance, maintainability, and other costs of making that program run faster and supporting the hacks that it uses. Remember what I said about hidden costs, or shifting load?
Yeah, and nobody ever got in any trouble by forgetting the difference between "really really common" and "universal for all time" right?
When my debating partner is obstinately straying from the rules of debate, I actually do feel they deserve a little slap on the wrist. The crux of this whole debate is your statement (in cid#635):
What annoys me is not that the statement was made, but that it wasn't retracted the first time it was refuted. Instead, I've had to put up with your topic changes, buzzword storms, squishy definitions, and all manner of other evasions. Frankly, I don't appreciate the extra work. I wouldn't treat you like an errant debate pupil if you'd stop acting like one.
Ahhh, but it does. On an architecture designed around the "i-space modification is rare" assumption, writing to a vtbl *even* at object-creation or class-loading time incurs a substantial overhead in exception handling, VM activity, the aforementioned cross-processor interrupts, etc. This is different from the modifications that must occur at image-load time (including DSO-load time) because those have distinct boundaries and the OS can treat pages differently during that period than afterward. Maybe if parts of the C++ runtime were integrated into the OS loader this could be handled more efficiently, but that's a heinous idea for other reasons.
Similarly, the whole point of the double-jump seems to be to abuse the BTB for performance. I call it abuse because every method pointer that's stuffed into the BTB is one less BTB entry that can be used for *real* branches. Also, the BTB is just a small, very fast special-purpose cache; there's another cache - the L1 - right nearby that could also contain that same information. So you save yourself a cycle on the method dispatch (if repeated) by using the BTB instead of the L1, in return for which you create a nice fat pipeline bubble for someone else when they hit a branch that would have fit in the BTB if not for your shenanigans. That's not a win, it's just shifting the load.
I sincerely hope you're asking how false sharing applies to this particular situation, not what false sharing is, because if you meant the latter then you should be reading H&P instead of posting here. False sharing is an issue because a single cache line on a modern processor is likely to span multiple vtbl entries. Naive vtbl-patching code that does manual icache invalidation would therefore be likely to go through all that overhead multiple times. Ick. The only alternative would be to have the vtbl-patching code be *deeply* aware of the local machine's cache line size (i.e. not just hidden in some memory-munging library routines). Also ick. That kind of machine-specificity needs a reason, and there just doesn't seem to be much of one so far.
The same as it has always been, Sparky: whether double jumps as an alternative to indirect junks are a reasonable or sucky idea. If you're having trouble making the connections between the issues we're discussing and that basic point, let me know and I'll dumb it down a little more for you.
I'm sure we could have a very interesting discussion about the relative merits of double jumps vs. indirect jumps if you'd cooperate, because you seem to know more than most /.ers about how CPUs work. However, as long as you're going to deny that these systemwide costs exist at all - things like false sharing, extra interprocessor communication in an SMP system to do TLB shootdowns, pollution of the BTB when the regular L1 cache is damn near as good - then that's not going to happen. How disappointing.
You are the weakest link. Goodbye.
No, it's the same topic because it impacts the same solution.
No, it's the same argument because it impacts the same solution. Please stop trying to redefine the topic to suit yourself.
"Seldom" is not equal to "never", and we were talking about the assumption that i-space would *never* change because that's the only assumption that would make the proposed solution seem reasonable.
We're not talking about self-modifying code, as much as you seem to be hoping that the taint associated with that phrase will stick to anyone who disagrees with you. We're talking about mutable data in i-space, and about the nasty hack of using double jumps with the intermediate target in i-space to "trick" CPUs and make method dispatch a cycle or two faster without considering the effect of such a hack on the rest of the system.
But you're almost right. Making this particular hack work faster makes the rest of the system slower. That's exactly the point. Congratulations on finally getting it.
I disagree. If there are few bugs found and fixed, that could be a sign that the testing is inadequate. I once worked on a project where the customer expressed a concern about our bug counts being too low. This seemed counterintuitive at first, until they explained that in their (very broad) experience this more often indicated something about testing than about the product. Sure enough, we stepped up testing, found more bugs, and went on to deliver a much higher quality product than if we had sat around patting ourselves on the back for our low bug counts.
What's more important than the absolute bug count is the *trend* in bugs reported. Every non-trivial project can be expected to exhibit an increasing rate of bug reports as QA ramps up and as components become ready for testing, followed by a plateau, followed - hopefully - by a steady decrease as the bugs get nailed without introducing new ones. If the plateau goes on too long, or if the extrapolated decrease at the tail and does not reach an acceptable level by shipment time, there's a problem. If the rate starts to go back up at that late a stage, there's an even bigger problem: your programmers are getting desperate, and checking in "fixes" that introduce more new bugs. If you don't see a "hump" at all, it probably means your testers aren't doing their jobs (or possibly not getting the help they need from developers).
All of these patterns are readily apparent if you know to look for them. I'd say you're right that the purpose of QA is statistical, but I'd say it's a statistical measure of *process* as well as product quality, and if the process is broken you can't know anything about the product.
Severity vs. Priority
As several other posters have pointed out, it's important to make a distinction between severity and priority. I would also say that it helps to define severity as narrowly as possible, to refer only to the effect of a bug when it occurs. I've already seen some people who accept the severity/priority distinction make the mistake of letting other factors (e.g. availability of a workaround) be factored into severity when they really only affect priority. Priority is the "net net" of severity, frequency, relationship to core/extended functionality, availability of a workaround, and possibly even cost to fix.
Factors affecting severity
Even within a limited definition of severity, there are multiple factors to be considered - specifically scope, magnitude, and duration of damage. Scope is a matter of what was affected: the whole system, the program, access to particular features. Magnitude is a matter of how complete the failure is: did the affected component(s) fail totally, did it hiccup and then continue, did it suffer a performance degradation, etc. Lastly, duration is simply a question of transient vs. permanent errors. When you've characterized the bug according to these three criteria it's pretty easy to use a point system to sum it up into a single severity value (though it never hurts to record the details). Likewise, once you know severity you can use a point system incorporating that and other factors into a priority value. Obviously the assignment of point values can be a complex and contentious process, but it only needs to be done once and once it's done the assignment of both priorities and severities becomes so much clearer that it's worth the effort.
Other Points Just some random thoughts about bug classification, in no particular order.
You're so full of crap. Yes, those hidden costs do exist and must be paid *even though* the contents of the table are in fact effectively constant. That's the problem. You still have to invalidate the i-cache, you still have to forego VM-level optimizations, etc. because the values *could* change.
It does no such thing. Any CPU that makes such an assumption about the immutability of i-space could be considered broken. That breakage can be worked around by having the VM system play nasty tricks with making i-space pages read-only etc., but the cost of having to cover for the CPU's deficiencies like that is much greater than the benefit. Try looking at the problem from a *system* standpoint for a change, instead of a myopic "how can a CPU designer avoid work" standpoint.
What if you're coding the firmware for that hard drive, or the driver for that network card? Where's your scripting language then? That's why a lot of people use C, and that's the code to which I think the previous author was referring.
Hard? Yes. Impossible? No. Many people have to write in C because that's the only language supported in their environment (e.g. a kernel or RTOS). These are often systems where the requirements for reliability, maintainability, etc. are very high, and the quality tends - of necessity - to be correspondingly high. My C is more object-oriented than 90% of the code I've seen in languages designed for OOP, for example, and I'm sure I'm not the only person for whom that's true.
Note that the previous author didn't say that the best code is *always* or even *usually* written in very low-level languages. He just said that it *often* is, and that's true.
That seems like a classic example of not counting all the costs. The double jump might in and of itself be less expensive than an indirect jump, but there are hidden costs involved:
When you do count up all the costs, using double jumps is a tremendously stupid idea. The double jump itself might be faster than an indirect jump, but that's outweighed by the overall negative effect on the system as a whole.
And I drive a manual, because I have yet to find an automatic that shifts in a way I consider reasonable. ;-)
So just use bignums, not automatically-promoting small integers. Your numbers will just be numbers, and you'll never have a problem. Small integers should only be used in a higher-level interpreted language when it's *important* for them to be small numbers, and therefore probably important for them to *stay* small numbers.
So maybe you should listen to those experts when they say that *if* you took the trouble to specify a particular integer size other than the largest available you probably mean for it to stay that size.
Auto-promoting integers won't help with that. In fact, Ruby won't help with that. If you want to approach that particular problem (secure systems) at all through language design, you'd have to go to a language with at least some of the following features:
Does Ruby fit that requirement? No. Nothing that borrows so heavily from Perl will ever even come close. Among widely-adopted languages, Ada or Java probably come closest, and there are a bunch of academic languages that go even further. Ruby might eliminate a couple of classes of security problems arising from overflows and such, but the same could also be said for most other scripting languages. Python, Perl, or Tcl programs are as secure as Ruby programs, and VB less so only because of usage rather than the language itself.
Yes. I work for EMC.
Firstly, FC-AL is pretty much dead for high-end systems, which all use switched FC nowadays.
Secondly, the very biggest storage systems out there still use SCSI inside, not FC...but it doesn't matter. It doesn't matter at all what's *inside* the box, because the whole point of such a system is to achieve high performance through aggregation of small channels and/or to avoid the channels altogether by using a huge cache. Sure, the drives are SCSI, but there are hundreds of them, on dozens of separate SCSI buses. There's plenty of internal bandwidth to make the drives do all that they're capable of doing, so the question is not "why use SCSI when FC is faster" but rather "why use FC when SCSI is fast enough".
Given that Plain Old SCSI is still doing yeoman's service as a back-end interconnect, and that FC components will probably always be more expensive than SCSI components, there's just no benefit to being "all-FC". Anyone who says otherwise is just putting marketing spin on a questionable engineering decision
Disclaimer: I work for EMC, but I don't speak for them (nor they for me). The above is all public information, personal opinion, and simple common sense, unrelated to any EMC trade secrets or marketing hype.
What if the result is bigger than it was ever "supposed to be" according to some program-specific constraint? What if an index or hash or address calculated from the new value causes you to trash memory or return the wrong data? What if it only happens once in a while, and you never knew the offending code had anything to do with those mysterious crashes that just seemed to happen "every once in a while" so that you *thought* it had gone away but you actually shipped with the bug still there? Undisciplined programmers might not like it, but being forced to think about and deal with boundary conditions properly is A Good Thing for software quality.
For you. In your opinion. Not for everyone. That's just the problem: languages that do these sorts of things are making decisions that programmers should be making. Sure, having an integer type that can handle larger sizes but that uses more efficient smaller ones when the values fit might be very convenient, but having that be a default, unchangeable property of the smaller integer types is bad.
And if the programmer wanted to handle large integer values, they should use a large integer type. The point is that *the programmer should choose*. If you have an exception, you can catch the exception and choose what to do with it, but you can't just punt. If you want to switch to using larger integers, fine. If you want to allow wraparound, fine. If you want to treat it as a fatal error, fine. What's important is that you as the programmer have to make an explicit choice about what to do. Abdication of that choice, expecting the language to take up the slack, is not an option.
That's exactly it. If it's such a waste of time worrying about nitty gritty stuff like integer sizes, then why are you obsessing over just using a bignum for something that might hold a large value? It's just not a problem for most people or programs. The int-to-bignum thing is purely a performance hack; if performance is that critical you shouldn't be using an interpreted language anyway, and the difference between multiplying ints or multiplying bignums only matters to the cycle-counters. People who write in Ruby - or Python, or Perl - should be focusing more on program correctness and functional issues than on micro-optimizing interpreted code or expecting the language to do it for them.
19 cycles for a 1GHz processor is actually not too bad; good large-system memory interconnects today are in the hundreds of cycles for anything but the very nearest memory, and even that latency can be pretty well hidden in NUMA or MT systems. The serial nature of the interconnect is simply not that big a performance issue in real memory-system design; the simplicity/physical benefits of serial protocols and cabling are much more important.
Heading? No, we're already here.
DMA isn't really a feature of the bus so much as the adapter implementation. Ethernet cards can do DMA, and SCSI cards can do polled I/O.
Apple didn't invent NuBus. My flaky memory tells me it was TI, which could be wrong, but it wasn't Apple. Apple merely selected NuBus for the Mac II, from among several alternatives that already existed at the time.
I have to agree with all the people who say that much of the problem has to do with the routing protocols in common use on the Internet. IMO part of that problem is that everyone has gone to link-state protocols; protocols in this family have certain desirable properties wrt loop-freedom and optimality, but slow convergence is a known problem with this approach. Personally, I've always been a distance-vector guy.
All of this came back to me recently as I was reading Ad Hoc Networking by Charles Perkins. It's about protocols intended for use in environments where mobile nodes come and go relatively frequently, where the links go up and down as nodes move relative to one another, and where there's no central authority to keep things organized. A lot of this work has been done in a military context - think of a few hundred tanks connected via radio, rolling across a large and bumpy battlefield. It turns out that distance-vector protocols are making a comeback in this environment because of their faster convergence and lower overhead compared to link-state protocols, and researchers have pretty much nailed the loop-formation and other issues. It also turns out that a lot of the techniques that have been developed for this very demanding environment could be useful in the normal statically-wired Internet, not just in terms of robustness but also in terms of giving power over connectivity back to the people instead of centralizing it in huge corporations.
I strongly recommend that people read this book, to see what's happening on the real cutting edge of routing technology. In particular, anyone working or thinking of working on peer-to-peer systems absolutely must read this book, because it describes the state of the art in solving some connectivity/scalability problems that many P2P folks are just stumbling on for the first time. I've seen many of the "solutions" that are being proposed to these problems in the P2P space; I can only say that P2P will not succeed if such stunning and widespread wilful ignorance of a closely related field persists.
You're seriously misinterpreting the significance of what you see on Bugtraq. Those are the results of programmer errors that *weren't even caught* but we're talking about what should happen after the error is caught. The problem with trying to "save programmers for themselves" is twofold:
The alternative, which I and a great many others (who've either studied this carefully or learned the hard way) prefer, is not to mask errors but to make them as explicit as possible as soon as possible. Force the *programmer* to make the decision about what should happen in that boundary case. If you can't do it with an error at compile time, do it with an exception at run time. The programmer will have no choice but to provide a *real* fix, and both the software and the programmer will be better for it.
If you're worried about an unhandled exception causing a crash, don't be. My experience in both high-availability systems and in security has taught me that a crash is better than a hang every time, and *way* better than the spooky unreproducible non-linear behavior that often results from software that tries to second-guess programmer intentions.
Other than your personal (anonymous) opinion, what argument can you give for it being better?
Been there, done that, not in Ruby but in other languages. I stand by my claim that Ruby's syntax for them is ugly. I would much prefer a statement-like syntax over that gobbledygook.
Untrue. It also has operator overloading, which is the most heinous kind.
While we're here, let's consider another of Ruby's warts:
That's straight from the Ruby book (chapter on blocks and iterators) and it's a construct only a Perl weenie could love. Let me give a non-programming equivalent:
Good way to get blown up, eh? Really, there's just no excuse for countertemporal counterintuitive crap like that in a language.
As I've said before, Ruby has some good features but also some warts. At least it's not all wart like Perl or Tcl, but there are other languages that also have good features and some warts. Compared to those languages, the case for Ruby's superiority is extremely weak at best. Use it if you like, have fun, be productive. More power to you. Just don't try to act all superior because your language choice is better than someone else's.
Again, big deal. It's not actually *useful* for x.length to return a value if x is an integer. The only possibly meaningful value for an integer would be the size of the integer, which is (a) not the same concept as the length of a list/array/hash, and (b) might change anyway due to Ruby's sneaky bignum conversions. In fact, having x.length return a value for an integer could be a source of error. If I expect an integer in Python and apply len() to it accordingly, I get a nice juicy exception to tell me I screwed up. If I do the equivalent in Ruby, I get a value and go blithely on my way to screw things up even worse.
No, thanks. Orthogonality is nice, but sometimes life just doesn't break down into nice orthogonal pieces. It's like the old saying about things being as simple as possible, but no more so.
We all have our own tastes and biases. Personally, my few complaints about Ruby are pretty similar to many people's complaints about Python. For example, the #1 complaint about Python is obviously the whitespace thing. I have to admit it's not my favorite feature and I might very well have made the decision differently in GvR's place, but that's the way it is and instead of whining I cope and get on with my work. So maybe it doesn't make sense that I find many of the "cosmetic" or secondary aspects of Ruby so infurating. For example:
None of that should bother me, but it does. I also have more substantive complaints with Ruby, just to show that appearance isn't the only thing I care about:
To be fair, Ruby also has its strenghts:
So, when you add it all up, what do you get? Basically, Yet Another Language. Sure, it has some neat aspects. It also has some warts. Just like Python, pretty much (see my web page for some discussion of Python's warts). As a first script language, Ruby seems very much worth consideration, but I don't see any reason why someone already familiar with a decent language - almost anything but Perl or Tcl - should switch. In particular, for large or long-lived projects, I think Ruby's borrowing from Perl and C++ raises some genuine doubts about maintainability.
From a compiler or interpreter writer's point of view "@" is more of a lexical gyration than "self" - which is just an entry in a namespace, not syntactically interesting at all.
That's simply incorrect. Both the tutorial and the book - I assume you mean the Lutz book - give OO a quite reasonable amount of attention.
Perhaps so, perhaps not. Almost any language construct can be used to obfuscate code, and iterators are no different. Also, while I'm not generally in favor of catering to the lowest common denominator, the fact that iterators are slightly "exotic" with respect to other common languages does have implications for readability.
BFHD. Yippee. I just don't find x.length all that superior to len(x). If "length" acts *exactly* like any other method - e.g. accessible by name, can be passed as a bound method, etc. - then maybe there's a certain aesthetic coolness to it, but other than that it's pretty meaningless.
That's actually a bad thing, IMO. I'd prefer that the language not try to "save me from myself" by doing such conversions behind my back. Give me an overflow exception, please.