Slashdot Mirror


User: tomstdenis

tomstdenis's activity in the archive.

Stories
0
Comments
6,870
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 6,870

  1. Re:bullshit on Reverse Multithreading CPUs · · Score: 1

    Yeah I wasn't talking about cache "ways". I meant the standard issue of macrops is 4. Just like in AMDland where we retire a line of [upto] 3 macrops at once Intel likely does that with 4.

    Tom

  2. Re:Brainwashing on Working at Microsoft, the Inside Scoop · · Score: 2, Insightful

    It's called "selling out". Everyone has their price. The trick is to recognize it and work within it.

    He came back spouting the virtues of MSFT because he basically sold his values and convictions [the good kind] for a paycheque and status.

    Tom

  3. Re:bullshit on Reverse Multithreading CPUs · · Score: 1

    Again only partially correct. While it's true you need more address decoder bits and area [e.g. longer wires] the actual data read is a single 64-bit value from a cache bank [bits discarded if smaller]. Both Intel and AMD pipeline their LSUs because they actually have multiple steps of work.

    This is how RaW works for instance... You'd have at least two cycles

    1. present address (write buffers, L1 and L2 pick up request)

    This is important because cache coherency is important. You have to make sure that you're going to read the latest and greatest copy. It may be anywhere in the CPU.

    2. read data (if in write buffer or L1)or stall (if in memory or L2)
    3. either done or ... wait ....
    X. read from L2 or memory

    I don't know the exact design of either off by heart but that's the jist of a LSUs job.

    A larger cache does mean a larger area [longer wires] so that's entirely possible the reason for 3 cycles instead of 2. But fundamentally the LSU of the K8 and P4 are not the same so even if the K8 had an 8KB cache it's possible that the delay would still be 3 cycles.

    Tom

  4. Re:FCC on FCC Commissioner Wants To Push For DRM · · Score: 1

    They are the ones who "think of our children". :-)

    Largely what they were supposed to do is make sure standards of broadcast technology were adhered to. E.g. licensing spectrum, making sure TV signals are in their respective bands, etc, etc.

    This whole "policing morality" bullshit is not new but it's also a lot different now than say 30 years ago. Nobody would have given two shits about a nipple showing off at a superbowl in 1978. That it happened in 2004 [or 2005?] is a crying shame and we must fight this injustice!!!

    Tom

  5. Re:-1, flamebait on Linux Snobs, The Real Barriers to Entry · · Score: 1

    Stupid ACs...

    Let me explain it to you in terms you can understand.

    1. There are asshats in every camp
    2. Stopping the show because of said asshats is stupid
    3. Grow up.

    Linux and the OSS scene is just fine despite the asshats who are members. If we were to stop what we were doing to concentrate on fixing the asshats we'd never get shit done.

    There are those who do and there are those who blog about those who do.

    Tom

  6. Re:-1, flamebait on Linux Snobs, The Real Barriers to Entry · · Score: 1

    On behalf of those who are helpful, you're welcome.

    It's all just a cycle of karma. I help you learn a Linux distro and you do something good for someone else as a result and so on.

    Tom

  7. Re:interesting... on Linux Snobs, The Real Barriers to Entry · · Score: 1

    As the author of cryptographic open source [a double whammy] I can attest to the "there are many stupid people out there". Even when you write documentation [like say hundreds of Doxygen comments and a 130 page user manual] people still ignore it and ask you anyways.

    I'd say a full 30% of my support emails (of which I can get quite a few at times) are from people who are completely and utterly lazy and don't read any of the documentation. They ask questions that are specifically answered in the text and often I just cite page numbers if I'm tired.

    It's the standard usenet attitude. You'll get a question [say in sci.crypt] such as "Where can I find an implementation of AES?" and the answer is "Go fucking Google for it".

    That's a symptom of the problem. People are just lazy and want everything handed to them on a silver platter. The problem is then you get dictated your platform and how it works and what you get to do with it. Can't figure out OpenOffice? Then get stuck with the uni-platform MS Word and it's closed proprietary document format. Can't figure out KDE, then get stuck with explorer, etc, etc, etc..

    Learning stuff often involves research. Like often when a new GFX feature creeps into a bug the answer is on an archive mailing list somewhere. Often a quick 10 mins of googling finds the answer and you're on your way.

    The problem of Linux adoption is multi-faceted.

    1. Yes, many projects lack documentation.
    2. Yes, lots of users just don't read it anyways
    3. Lots of people are lazy and unwilling to spend the time to learn something
    4. There are enough OSS zealots and assholes to spoil the party
    5. Lots of anti-OSS fud [like this] such as that from MSFT about ToC
    6. General misunderstandings of OSS like how licensing works, who do you contact for support, etc...

    It's too easy to just point the finger at developers but that's naive and doesn't actually answer the question.

    Tom

  8. Re:-1, flamebait on Linux Snobs, The Real Barriers to Entry · · Score: 1

    I actually picked that URL at random, I didn't know it pointed to anything in particular.

    Not having my own personal website didn't mean I couldn't put a URL in my profile :-)

    Tom

  9. Re:Yup...the utter truth... on Linux Snobs, The Real Barriers to Entry · · Score: 1

    While lack of sufficient documentation is a common problem [even in the commercial side] I don't think it's the main barrier.

    I got into Gentoo Linux [of all the distros] with minimal "Linux" knowledge. I knew the coreutils [e.g. ls, cp, cd] and bash fron Cygwin but that was about it. It took me a few tries to get Gentoo going at first but now it's a breeze. I can do an install without referring to the manual, what's more the install works :-)

    I think the main barrier is people are really apathetic to change or improvement. They're not willing to learn and furthermore learn why things are better in the OSS world. They just assume "Linux is hard" and go on their way.

    I mean the Gentoo install manual explains step for step how to install it. Other distros are even pointy-clicky installed.

    And don't forget that it's a feedback system too. The more users of a project the more support you're likely to see. So if you think, for instance, firefox is too hard to use and nobody uses it, chances are it won't get support. On the otherhand there are millions of users so it gets upgrades and updates often. Other OSS projects are no different really.

    The trick is to not give up when you get the slightest inconvenience. Which is even odder because most people will put up with WinXP ineptitudes but give up on Linux when the first device fails to start on bootup or something.

    Tom

  10. -1, flamebait on Linux Snobs, The Real Barriers to Entry · · Score: 3, Insightful

    There are assholes in every camp. I'm sure I can just as easily find Windows and MacOS snobs [well the latter is a given].

    I've personally helped a half dozen people switch to Gentoo. Not all of us are meanies [though I play one on TV].

    This article is pure flamebait.

    Tom

  11. Re:OT question on OS Virtualization Interview · · Score: 1

    Who supports users? How about the author of the damn tool?

    It's called personal responsibility.

    Unfortunately all too many people want the credit for writing OSS [no matter how shoddy] but don't want the actual work of supporting it. How many OSS projects are known for their stellar documentation and 24 hour turnaround e-mail support?

    Not that the commercial world is any better. I mean who do I write to, to get a behaviour in MS Word changed?

    Tom

  12. Re:bullshit on Reverse Multithreading CPUs · · Score: 1

    Answer: No. It would cause the real L1 to have L0.delay additional cycles of delay. This is also why most cpus don't have L3s even if you could make them out of DRAM on chip. So unless your hit rate for the L0 was like 99% you'd expect to lose performance.

    In both the Intel and AMD cases the L1 access is pipelined which is why it's multiple cycles. Intel merely has a shorter pipe to the LSU which is why they have [often] 2 cycle caches as opposed to the 3 cycle AMD has.

    Tom

  13. Re:bullshit on Reverse Multithreading CPUs · · Score: 1

    [speaking in general].

    You'll find that most LSUs in modern processors are really their own independent units.

    Think of a processor as a program with a bunch of threads and really efficient IPC [inter process communication]. the Load-Store Unit [LSU] is just one of many things going on.

    Both Intel and AMD have hardware prefetchers which examine memory usage and makes fetches to system memory to bring stuff in [L1 or L2 depending on the design].

    Tom

  14. Re:bullshit on Reverse Multithreading CPUs · · Score: 1

    Snooping between cores is fast but not super efficient. It also can send out snoops to the HT bus in MP systems if neither core owns the cache line. And as far as I know [from public info] you have to hit the SRQ before a memory read from the other core can read something written from the other core. The latency to the L2 is ~20 cycles or so on it's own. etc, etc, etc....

    As for the other comments, the ALU is already wide enough. You're right about the SSE side. At best FPU opcodes are 2 [of 4] EX cycles giving a latency of 2 cycles. That's partly because of the scheduler though as it looks for things in steps of 2 cycles.

    Tom

  15. Re:OT question on OS Virtualization Interview · · Score: 1

    They don't call it OpenNetscape now do they?

    Tom

  16. Re:Load balancing might be interesting on Reverse Multithreading CPUs · · Score: 1

    You're missing the point.

    Any task you can sufficiently isolate to different cores is a task you can thread. Otherwise if there is a lot of interdependence the concept won't work. Specially if they work in the same memory space. Keeping the caches sync'ed between cores would basically kill any benefit you think you can get.

    Unlike say the Intel designs the AMD "dual cores" are really two distinct independent cores with their own caches living in their own worlds.

    So why would AMD spend money on researching a concept which is basically doomed to failure.

    Tom

  17. Re:OT question on OS Virtualization Interview · · Score: 1

    Well if it's the closed project it's opened up.

    If it's a clean-house implementation then it's not strictly based on it.

    Call it something else like Vzeeforefree!

    Dunno just annoyed at people abusing the OSS blanket for publicity.

    Tom

  18. Re:OT question on OS Virtualization Interview · · Score: 2, Insightful

    Bosses don't care if it's open source. They care

    1. How much does it cost to license
    2. How much does it cost to setup
    3. What does it solve any better than what we already have.

    Tom

  19. OT question on OS Virtualization Interview · · Score: 1, Insightful

    What's with "open" in the name of all these projects. Is anyone really impressed by that anymore?

    Tom

  20. Re:This is Like RAID for CPU's on Reverse Multithreading CPUs · · Score: 1

    The problem is that CPUs are very independent once instructions get into the decoder window. The only way to stop it is to raise an exception or interrupt (e.g. APIC signal).

    So just because you may have 4 cores in your box [say dual-core 2P] doesn't mean all of the cores can act as one logically to the OS in a meaningful and efficient manner.

    The striping analogy would be to dispatch instructions in round-robin fashion to all the processors. The problem with that is that the architectural state has to be shared. Keeping that insync with current cores would kill any sort of performance gain you might hope to obtain.

    Tom

  21. Re:bullshit on Reverse Multithreading CPUs · · Score: 1

    Um actually you're wrong. The Core [64-bit stuff coming out] processors have a 4-way instruction window which is 1 larger than AMD already. That means they can issue upto 4 macro-ops per cycle. So processors are already using more pipes.

    There there THREE FPU pipes. Therefore it is possible to add an adder [or vice versa] to the multiplier then have the decoder be aware of this and feed stuff into either pipe. So technically you don't have to change the ICU at all to support more FPU resources.

    As for the ALU performance I never said make it wider. They're vastly underutilized as it is. L1 cache stalls account for quite a bit of cycles even when there is a hit.

    As for threading... that's an OS issue. Doing anything on the level the CPU will recognize is not feasible. You simply cannot extract architectural state fast enough. The best way to use two cores is with SMP aware software.

    I haven't heard of any AMD projects to merge cores like this and in fact the emphasis has always been on SMP and NUMA aware development practices.

    Tom

  22. Re:Load balancing might be interesting on Reverse Multithreading CPUs · · Score: 1

    The problem is once an instruction gets down to the core of the ... er core ... it's hard to get it to another core.

    So you can only load balance at the process/thread level.

    Tom

  23. Re:bullshit on Reverse Multithreading CPUs · · Score: 1

    "armchair"... whatever. I'd say I know a bit more about the K8 design than your average slashdotter.

    The point is as it stands now the K8 cannot, repeat cannot, get a register from one core to another FASTER THAN THE L1 CACHE WORKS.

    Now that we got that out of the way... realize that ...

    IPC OF 99% OF ALL CODE is less than 1 on most cases and why is that? Aside from register contention there is the three cycle latency of the L1. So it's very trivial to stall an entire execution unit.

    So AMD would see little benefit from tying the ALUs on core 1 (which can only access the registers local to it) to core 0 since they would just go unused most of the time.

    The only possible benefit is the FPU of the second core but even then it's pushing it. Getting data from one core to the other is really slow.

    AMD would benefit more from just adding another FPU adder or multiplier [or both] to a single core than by adding high speed super-wide busses between cores (which in terms of processors are "far away").

    Tom

  24. Re:bullshit on Reverse Multithreading CPUs · · Score: 3, Informative

    For those not in the know... reading a register from core 1 and loading it in core 0 would work like this

    1. core 1 issues a store to memory [dozens if not hundreds of cycles]
    2. core 0 issues a read, the XBAR realises it owns the address and the SRQ picks up the read
    3. core 0 now read a register from core 1

    It would be so horribly slow that accessing the L1 data cache as a place to spill would be faster.

    The IPC of most applications is less than three and often around one. So more ALU pipes is not what K8 needs. It needs more access to the L1 data cache. Currently it can handle two 64-bit reads or one 64-bit store per cycle. It takes three cycles from issue to fetched.

    Most stalls are because of [in order of frequency]

    1. Cache hit latency
    2. Cache miss latency
    3. Decoder stalls (e.g. unaligned reads or instructions which spill over 16 byte boundary)
    4. Vectorpath instruction decoding
    5. Branch misprediction

    AMD making the L1 cache 2 cycle instead of 3 cycle would immediately yield a nice bonus in performance. Unfortunately it's probably not feasible with the current LSU. That is, you can get upto 33% faster in L1 intense code with that change.

    But compared to "pairing" a core, die space is better used improving the LSU, adding more pipes to the FPU, etc.

    Tom

  25. bullshit on Reverse Multithreading CPUs · · Score: 1

    The bus between the two cores is FAR TOO SLOW for this sort of operation. Moving [say] EAX from core 0 to core 1 would take hundreds of cycles.

    So if the theory is to take the three ALU pipes from core 1 and pretend they're part of core 0... it wouldn't work efficiently. Also what instruction set would this run? I mean how do we address registers on the second core?

    AMD would get more bang for buck by doing other improvements such as adding more FPU pipes, adding a 2nd multiplier to the integer side, increasing L1 bandwidth, etc.

    This story is pure and utter bullshit.

    Tom