Slashdot Mirror


User: p3d0

p3d0's activity in the archive.

Stories
0
Comments
3,023
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 3,023

  1. Re:64bit performance gains... on AMD64 Preview · · Score: 1

    I'm not a VLIW/EPIC expert by a long shot, and I'm not on the IA64 team, but as I understand it, they can generate pretty good code with a few tweaks to the traditional compiler stages, and a wicked instruction scheduler.

  2. Re:64bit performance gains... on AMD64 Preview · · Score: 1

    True. On the other hand, if you replace a 256KB L2 cache with 1 cycle latency with a 1024KB cache with 2 cycle latency, that will help some programs and hurt others. It depends on their working set size; those with working sets between the L1 cache and 256KB will get slower; those with larger working sets will get faster.

  3. Re:64bit performance gains... on AMD64 Preview · · Score: 1
    Good points, except:
    The memory overhead from wide pointers looks like the only real concern. I can't see any 64 bit system shipping with so little memory that matters.
    For one thing, presumably people bought that memory because they want to use it. It's not like half their RAM is sitting around doing nothing, and will happily store those big pointers with no price and/or performance penalty.

    But aside from that, this affects the whole memory hierarchy. It increases cache footprint. Even more important, it increases memory bandwidth usage.

    And, to top it all off, we're comparing AMD64 executables against IA32 executables running on the same AMD64 machine. You'll get less data stored at each level of the cache hierarchy compared to IA32 code on the same box, so performance will suffer.

    The beauty of big register files is pointers don't get spilled very often...
    Yes...
    ...making memory bandwidth a non-issue.
    No! First, spills don't need memory bandwidth, because spills read and write near the top of the stack, which is always in the L1 cache anyway. Second, memory bandwidth is almost always an issue on those server machines that needed the 64-bit address space in the first place.
  4. Re:64bit performance gains... on AMD64 Preview · · Score: 2, Interesting
    Pointers inside objects occupy run-time memory from the *HEAP* -- i.e., they don't have any presence in the object file.
    Duh, yeah. What's your point?
    The use of REX to access r8-r15 is the register based alternative to using a SIB byte, and offset for an [ebp+offset] encoding for directly accessing the stack. I.e., paying the cost of an extra prefix byte saves in both execution speed and actual code size versus the spill/fill style or direct stack based alternative.
    Good point about r8-r15. However, the problem with needing REX prefixes for address manipulations is still a pure loss.
    Auto areas that are larger than 256 bytes because they are filled with a bazillion pointers are indicative of more serious program design flaws (that people don't generally have) than the statistical potential of loss from using far offset values from it. This is an extremely marginal case at best.
    Huh? Do you do compiler work? Surely you have seen methods with more than 128 bytes of local variables after inlining has occurred?

    Besides, as compiler writers, we don't have the luxury to tell application developers to "just redesign your code".

    I don't understand your linkage complaint -- the more parameters passed in registers, the fewer that will end up on the stack.
    Forget the linkage complaint, it's bogus. I was thinking of a different parameter-related problem that is specific to the compiler I work on right now. It's not a general problem.
  5. Re:64bit performance gains... on AMD64 Preview · · Score: 1
    No, register renumbering doesn't help when you just don't have enough regs. Renumbering is to eliminate unnecessary dependencies between uses of the same architectural register. For instance, renumbering can turn this:

    load r1
    store r1
    load r1
    store r1

    into this:

    load r1
    store r1
    load r2
    store r2

    Then, all these stores can be reordered internally, which would have been prevented otherwise.

    If you don't have enough architectural registers, then the compiler must insert spills to memory, and only store forwarding and data caches can help with spills. But caches are never as fast as registers, and you still have the instruction cache footprint of that spill code.

  6. Re:64bit performance gains... on AMD64 Preview · · Score: 1
    Don't be so sure about the cache. Bigger caches are slower. Make sure you know the cache latency before you claim a bigger cache is better.

    Anyway, some Xeons have a 6MB L3 cache, so 1MB isn't a big deal.

  7. Re:64bit performance gains... on AMD64 Preview · · Score: 1
    For programs that fit nicely in a 32 bit address space, perhaps you could designate one of the 64-bit registers as a base pointer, and store all the addresses as offsets? This may be cheaper than you think, since we now have 8 additional registers to play with.
    No, I'm pretty sure that would suck hard. Whatever addressing modes your CPU provides, you just lost one register. For instance, AMD64 provides [base + stride*index + offset] to access arrays. Your scheme could no longer access that in one memory reference; you'd need an extra add instruction.
    On the other hand, for some applications it may be useful and convenient to map large files into memory.
    You should see the alloc stream facility. It has performance at least as good as a whopping big memory-mapped file, but without eating a lot of address space. It's a nice interface; too bad nobody uses it.
  8. Re:64bit performance gains... on AMD64 Preview · · Score: 1
    Or am I missing something?
    Yes, you certainly are. To begin with, you're assuming that RAM gets addressed starting from zero, which it doesn't in some OSes. For instance, linux seems to put DLLs at around 0x2900000000, and the stack grows down from 0x7fffffffff, both of which are already out of 32-bit addressing.

    But even simpler than that: suppose you only set aside 4 bytes for a certain pointer. Then suppose you do indeed wind up using more than 4GB of ram. The next time you call malloc, you'll get a pointer beyond the 32-bit address range. How do you store that in your 4 byte pointer?

    Look at it this way: in today's 32-bit programs, if you only use less than 64KB, does that pointers only occupy 16 bits? I wish that were true, but it's not. Generally speaking, a pointer is a pointer, and a pointer must be large enough to address all of the memory that the program could possibly reach at runtime.

    If you can make your scheme work, you could probably get a best-paper award at some conference some place. Personally I'd love to see something like this work, because it bothers me to use up 8 bytes for each pointer when we know statically that the machine only has 256MB of RAM in it. :-)

  9. Re:64bit performance gains... on AMD64 Preview · · Score: 1
    Well, it's hard to have MMX math coexist with x87 math because they use the same register file, and I think it's slow to move data from GPRs to MMXs, so I wouldn't exactly say you can "easily" do it, but yes, I suppose it can be done.

    Anyway, I'm no expert on this because our compiler doesn't use MMXs yet. And regardless, 14 regs is still better than 8. :-)

  10. Re:64bit performance gains... on AMD64 Preview · · Score: 1
    I wonder if it will be possible to use 32 bit pointers within the X86-64 isa?
    In a word, no. If you want the extra regs, you get 64-bit addresses. You could always limit yourself to the low 4GB of memory, though. Then you could just omit the REX prefix from loads and stores of addresses, which would have the effect of zero-extending them while they are in regs, while also making the code a bit smaller!

    Incidentally, they seem to have abandoned the terrible name "x86-64" in favour of the much more sensible "AMD64".

  11. Re:64bit performance gains... on AMD64 Preview · · Score: 4, Informative
    I meant data objects, as in object-oriented programming. Not object files. OO data tends to have a lot of pointers.

    Having said that, object files will be bigger too. I'm not sure where you're getting your 10-15%; have you actually checked? I don't have access to our AMD64 boxes right now so I can't take a look at the object files, but I think the difference could easily be more than that for object-oriented code, for a number of reasons:

    • Probably 2/3 of the instructions in hot code will need a REX prefix, either because they use registers r8-r15, or because they manipulate addresses.
    • Only mov instructions can use an 8-byte immediate. Anything else that needs an 8-byte immediate must load that immediate into a register first with a 10-byte mov instruction, possibly spilling whatever was in that register. We could be talking about 3 extra instructions totalling maybe 18 bytes on an instruction that used to occupy maybe 6 bytes. Class tests in a polymorphic inline cache are particularly affected by this. Also, relocations (ie. jumps between different DLLs) must be 64 bits because there's no reason to think DLLs will be loaded within a 32-bit offset of each other.
    • Autos that are pointers now occupy twice as much stack space, making your stack frame that much less likely to fit within an 8-bit signed offset (ie. 127 bytes). That means you can't use [esp+12h] addressing to access your locals, but rather [rsp+12345678h], which requires three extra bytes (not to mention the Rex prefix). Highly optimized functions often have lots of variables, especially after inlining, and in OO code, lots of the variables are pointers, so this one could hurt.
    • Similarly, the AMD64 linkage convention on Linux has 6 parameters passed in registers (while IA32 has none) which also makes the stack frame bigger. This can be mitigated by using a frame pointer, but if you don't dedicate a register as a frame pointer, than you need to access your parameters with the stack pointer (rsp), and the parameters are always at the largest offsets from rsp. Result: parameters are likely not to be reachable with an 8-bit offset from rsp.
    If I had to estimate off the top of my head, I'd guess code would be more like 25% bigger, while OO data could be as much as 50% bigger. (Remember that each object contains a pointer to its class or vft, and many object fields are pointers.)
  12. Re:64bit performance gains... on AMD64 Preview · · Score: 4, Informative
    Nice summary. I would only add a couple of things:
    • 64-bit math on IA32 requires register pairs. With 8 GPRs, one of which is reserved for the stack pointer, that means you can only keep 3 long-longs in registers. On AMD64, even if you dedicate another register to the frame pointer, you can still get 14 long-longs in registers: almost a factor of 5 improvement.
    • The benefits from the memory subsystem will be offset by the fact that objects containing pointers will be twice as big as on IA32. That means objects could have twice the cache footprint and twice the memory bandwith requirements.
  13. Re:John Lakos on Tools for Analyzing C++ Class Code Generation? · · Score: 1

    I second that. Great book. I don't use templates much so I can't speak to that particular chapter, but his other advice is sound.

  14. Re:Uhhh, perl or python? on Tools for Analyzing C++ Class Code Generation? · · Score: 1

    To demangle, use c++filt.

  15. Re:I can see it now. on 'Storage' to Replace Traditional Filesystems? · · Score: 1

    Huh?

  16. Re:Am I FUD? on Code Generation in Action · · Score: 1

    Yep, you're FUD if you have ever used a compiler.

  17. Re:NOT about compiler code generators on Code Generation in Action · · Score: 2, Insightful

    I do compiler work too, and I think you need to relax a bit. The term "code generator" means "some device that generates code". Just because you misuderstood it at first (as I did) doesn't mean it's wrong.

  18. Re:Was anyone impressed? on A Traveler's Guide To Mars · · Score: 2, Interesting

    Mars was ok, but the most impressive thing I saw was around a year ago when about four planets were all close to each other. Looking out my window, I could mentally connect them and see the ecliptic, and it really gave me a visceral sense of being on a planet travelling with other planets around the sun.

  19. Re:To get boringly technical about it... on Armageddon... in 2014. Almost. · · Score: 1

    Here is the Wikipedia entry. If I'm understanding it correctly, it seems like you might have the meanings of probability and likelihood reversed, but I'm not sure.

  20. Re:To get boringly technical about it... on Armageddon... in 2014. Almost. · · Score: 1
    Hmm, that sounds interesting. My wife might know something about that, having studied that kind of thing.

    Thanks for being persistent. :-)

  21. Re:To get boringly technical about it... on Armageddon... in 2014. Almost. · · Score: 1
    Probability per se is completely a posteri, and is determined by computing the frequency of occurance of a given outcome over a set of events...They are computing an a priori most likely a postori probability if there was a large number of samples.
    Right. That's what probability is. Otherwise, how can you justify saying that probability of a fair coin coming up heads is 1/2? What's the difference between this rock and a particular coin toss?

    Whether the set of events considered actually occur, or are merely thought experiments, the probability is still the fraction of successful outcomes.

    If you consider only that toss, then the probability is either 0 or 1, but we don't know which. So what the hell is the point of that? Nobody does that because it's not useful.

    On the other hand, if we consider all tosses of a fair coin, and compute the fraction of those that come up heads, we get a probability of 1/2, which is a much better predictor given what we know about this coin.

    If we then learn more about the toss--for instance, if we learn that the toss has already occurred and it came up tails, or if we learn the precise physical consitions just prior to and during the toss--then the set of events included in the denominator changes, and so does the probability. Certain tosses are no longer similar enough to be included. Likewise, as we take more measurements of this rock, the probabilities change.

    I just can't see how it's useful or meaningful to say that a certain probability "is either 0 or 1". What does that do for you? I have never heard of this "pseudo-probability" concept before now, and I can't see how it is a useful notion.

  22. Re:geeky dweeb king of spazzes on Armageddon... in 2014. Almost. · · Score: 1

    You've never seen of these I guess?

  23. Re:Chances likely to change? on Armageddon... in 2014. Almost. · · Score: 1

    I missed a zero, and you missed this. I'd say that makes us even.

  24. Re:To get boringly technical about it... on Armageddon... in 2014. Almost. · · Score: 2, Insightful
    That's a valid distinction, but I don't think it affects the nature of probability. Show me a reference defining "probability" that makes that distinction, and I will concede the point.

    But I think you are mistaken, and that "probability" can include anything about which one has incomplete information. This "pseudo-probability" you have introduced does not strike me as a useful concept. However, I am prepared to be proven wrong.

  25. Re:Chances likely to change? on Armageddon... in 2014. Almost. · · Score: 1
    Your analogy has merit, but, in this case, the die has just been cast. It has not stopped rolling.
    You're thinking too hard. The point is that there is some info we know, and some we don't know, and probabilities are based on that. Once we learn more info, the probabilities will change. That's all.