Slashdot Mirror


64-bit x86 Computing Reaches 10th Anniversary

illiteratehack writes "10 years ago AMD released its first Opteron processor, the first 64-bit x86 processor. The firm's 64-bit 'extensions' allowed the chip to run existing 32-bit x86 code in a bid to avoid the problems faced by Intel's Itanium processor. However AMD suffered from a lack of native 64-bit software support, with Microsoft's Windows XP 64-bit edition severely hampering its adoption in the workstation market." But it worked out in the end.

332 comments

  1. Whatever! PowerPC been doing 64-bit by Anonymous Coward · · Score: 0

    For over tweenty years.

    1. Re:Whatever! PowerPC been doing 64-bit by Anonymous Coward · · Score: 0

      For over tweenty years.

      I guess that's The New Math: 2013 - 2003.

      http://en.wikipedia.org/wiki/PowerPC#64-bit_PowerPC

    2. Re:Whatever! PowerPC been doing 64-bit by larry+bagina · · Score: 5, Funny

      MIPS and Alpha ask power pc to get off their lawn.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    3. Re:Whatever! PowerPC been doing 64-bit by Anonymous Coward · · Score: 0

      Yes, new math: Seventeen years then. PowerPC 620 development processor for RS64

    4. Re:Whatever! PowerPC been doing 64-bit by Anonymous Coward · · Score: 0

      IBM's POWER architecture has always been a 64-bit architecture, yet the first implementation used only the lower 32-bit portion due to the lack of software needing 64-bit processing, and why wast die space implementing the complete architecture when 99.99% of the software was 32-bit based. And POWER is over twenty years old. The 64-bit PowerPC 620 were used as developer workstations to migrate towards a 64-bit based code, RS64 series more migration in the code base, POWER4 was the first implementation in silicon of the complete 64-bit POWER architecture while code written for the PowerPC 620 and RS64 could be recompiled and optimised to take advantage of the features in the POWER4. POWER != PowerPC

    5. Re:Whatever! PowerPC been doing 64-bit by ArchieBunker · · Score: 1

      SPARC would like a word with you as well. When the Ultra workstations first hit the market, 32 bit software actually ran slower under 64 bit Solaris.

      --
      Only the State obtains its revenue by coercion. - Murray Rothbard
    6. Re:Whatever! PowerPC been doing 64-bit by Anonymous Coward · · Score: 0

      SPARC Version 9 -- 1993 64 bit

      https://en.wikipedia.org/wiki/SPARC

    7. Re:Whatever! PowerPC been doing 64-bit by Guy+Harris · · Score: 2

      POWER != PowerPC

      Both POWER (all-caps) and PowerPC refer both to instruction set architectures and brand names used on processors that implemented them.

      The PowerPC ISA took the POWER ISA, added some stuff such as general-register-based multiply and divide instructions, and removed a few instructions (and didn't add in the ones used in the POWER2 processor).

      POWER3 was a 64-bit processor that implemented the union of 64-bit PowerPC and POWER; I don't know whether any subsequent POWERn processors implemented the POWER ISA-only instructions or just the current version of the PowerPC/Power (not all-caps) ISA.

    8. Re:Whatever! PowerPC been doing 64-bit by hairyfeet · · Score: 1

      And how many people actually owned a SPARC, POWER, or Itanic for that matter? Then it really doesn't matter does it as we can play this game all damned day with chips.

      What matters is not only did AMD bring 64bit to the masses, thus paving the way for having large amounts of RAM without bad hacks like PAE or using RAMdisks but they also paved the way for making large RAM sticks for the masses which is why so many of us have oodles of RAM,hell even my netbook has 8GB of RAM which before 64bit went mainstream was unheard of outside the enterprise.

      So lets hear it for AMD, if it weren't for them you'd be stuck on the Itanic.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    9. Re:Whatever! PowerPC been doing 64-bit by Guy+Harris · · Score: 1

      And how many people actually owned a SPARC, POWER, or Itanic for that matter?

      Well, some of the masses might have had G5 iMacs (PowerPC 970, 64-bit), but, yes, it took AMD to bring 64-bit to most of the masses.

      At least one comment claims that the original title of the article was "64-bit Computing Reaches 10th Anniversary", which, if true, means the article came out with a bogus headline (there's more to "Computing" than stuff that runs on a mainstream desktop or laptop machine, and DEC OSF/1 came out in 1993, so it's been at least 20 years); if the original comment was posted before that, I can see his complaint (and the complaint of the person who pointed out that the MIPS R4000 came out before the first 64-bit PowerPC processor). Complaining about "64-bit x86 Computing Reaches 10th Anniversary" neglecting other 64-bit architectures, however, is silly.

    10. Re:Whatever! PowerPC been doing 64-bit by TheRaven64 · · Score: 1

      All the patents required to implement MIPS IV (64-bit) have expired. The ones on SPARCv9 expire this year. Alpha expired last year. There's a reason for caring about the 20-year mark: it makes implementing the architecture a lot safer. We have a research processor that implements the MIPS IV instruction set for this exact reason: we may accidentally infringe some patents, but it is definitely possible to work around them by implementing things however the R4K did.

      --
      I am TheRaven on Soylent News
    11. Re:Whatever! PowerPC been doing 64-bit by LizardKing · · Score: 1

      And PA-RISC hits the 20th anniversay for the 64 bit version on three years time. Was always intrigued by that architecture, but only got to play very briefly with a HP9000 server.

    12. Re:Whatever! PowerPC been doing 64-bit by Anonymous Coward · · Score: 0

      Alpha was around at least as early as '92 doing 64 bit.

      I saw UltraSparc in '95-96. It was faster then HP PA-RISC workstations so we switched. Desktops got 128MB or 256MB.

      We got an 3000 with 1GB of RAM in '96-97 IIRC. RAM was 1/2 the cost. We ran Solaris 2.6 which could handle the extra RAM. 2.5.1 could not. Larger jobs ran on the 3000 to save $$ on RAM.

    13. Re:Whatever! PowerPC been doing 64-bit by Anonymous Coward · · Score: 0

      Seeing as the PowerPC 970 ("G5") was the first 64-bit PowerPC (not POWER, though) processor, even the Opteron came out before the first 64-bit PowerPC. The PPC970 was first made available for purchase in June 2003 in the Power Mac G5. (The chip itself was announced in October 2002, but that isn't what I would call a release.) By that timeline, the Opteron beat it to market by about 2 months. (All of this is from Wikipedia, of course. So some grains of salt are in order.)

    14. Re:Whatever! PowerPC been doing 64-bit by unixisc · · Score: 1

      Opteron ought to be compared to POWER, not PowerPC, given the target markets. Also, in the Windows world itself, you had 64-bit MIPS and Alpha based workstations that ran NT - too bad it was badly supported by Microsoft and never caught on - long before AMD extended the x86 instruction set to 64-bit. While that was imaginative, it was tragic, as it ensured that the ultimate CISC to RISC migration that everyone - including Intel - hoped for, never happened.

    15. Re:Whatever! PowerPC been doing 64-bit by unixisc · · Score: 1

      So anybody could implement any of these CPUs independently of the companies that started them, and not worry about any patent violations? By MIPS IV, you mean the R8000, right, or is it R5000? I'm curious about whether Oracle would then come out w/ a SPARC v10 in that case, although it's not that there are others aside from them who would be interested in such a CPU.

      I'd love to see some of these CPUs, such as the MIPS, get resurrected and used in newer IPv6 routers and other networking gear. With more open hardware, it would be easier to manage.

    16. Re:Whatever! PowerPC been doing 64-bit by Guy+Harris · · Score: 1

      Seeing as the PowerPC 970 ("G5") was the first 64-bit PowerPC (not POWER, though) processor, even the Opteron came out before the first 64-bit PowerPC.

      As I said in an earlier post in this thread, "Both POWER (all-caps) and PowerPC refer both to instruction set architectures and brand names used on processors that implemented them."

      If "the first 64-bit PowerPC" refers to PowerPC-the-brand, yes, the first one (other than the 620, which wasn't made in large quantities) was the 970.

      If it refers to PowerPC-the-instruction-set, the first one was the POWER3 - it implemented the full PowerPC instruction set (as well as the POWER2 version of the POWER instruction set).

    17. Re:Whatever! PowerPC been doing 64-bit by Guy+Harris · · Score: 1

      Opteron ought to be compared to POWER, not PowerPC, given the target markets.

      Presumably referring to, as per the distinction I drew in "Both POWER (all-caps) and PowerPC refer both to instruction set architectures and brand names used on processors that implemented them.", POWER and PowerPC the brand names used on processors, not POWER and PowerPC the instruction set architectures.

      Also, in the Windows world itself, you had 64-bit MIPS and Alpha based workstations that ran NT...

      ...which was a 32-bit OS, so it didn't provide 64-bit computing on those 64-bit processors.

    18. Re:Whatever! PowerPC been doing 64-bit by hairyfeet · · Score: 1

      But WinNT was 32bit so we are right back to the hair splitting again.

      What MATTERS is why you and I can go out and buy 4GB and even 8GB of RAM on a single stick without having to take out a loan and that? That was all AMD, before that there really wasn't a point in large RAM sticks because the market was too nice, but with XP X64 and Athlon X2 and Pentium D you finally had a way for anybody to run more than 4GB of RAM and thus RAM sizes exploded.

      If it weren't for Win 7 I'd still be running XP X64 BTW, that was a truly great OS and if I wanted to set up a system that needed every cycle for the program like Folding I'd probably go XP X64 over Win 7, it was insanely low resource which made programs just fly on it. Hell I was running it on a Pentium D805 which was a shitty chip and it was just zippy, damned good workstation OS.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    19. Re:Whatever! PowerPC been doing 64-bit by unixisc · · Score: 1

      Yeah, that (NT being 32-bit on Alpha & MIPS) was another pity - Microsoft could have had then on those platforms what it has today on the x64, and had the OS readily 64-bit much earlier than it eventually did. And yes, I was referring to the processors, rather than instruction sets: POWER was used in the RS/6000 line of workstations from IBM, while PowerPC was used in Macs. Similarly, Opterons were used in servers, whereas Athlons were used in PCs. Therefore, PowerPC:POWER::Athlon:Opteron

    20. Re:Whatever! PowerPC been doing 64-bit by TheRaven64 · · Score: 1

      So anybody could implement any of these CPUs independently of the companies that started them, and not worry about any patent violations?

      Well, not quite. It's definitely possible to implement a CPU that is compatible with a 20-year-old CPU without infringing any patents, because any patents that were used in the original have now expired. That doesn't mean that the implementation will be patent free, however. For example, our branch predictor is cleverer than any shipping CPU 20 years ago, but I've not done a patent search so I don't know if it's covered by more recent patents. I intentionally avoided some techniques I know to be patented, but some of the older ones might have been patented around 1996-2000ish. That said, if we have to change the branch predictor to avoid patents then it's not a big deal - it doesn't change the ISA (and our CPU runs in an FPGA, so deploying a new version takes a few hours).

      By MIPS IV, you mean the R8000, right, or is it R5000?

      R8000 was the first MIPS IV chip, yes.

      I'm curious about whether Oracle would then come out w/ a SPARC v10 in that case, although it's not that there are others aside from them who would be interested in such a CPU.

      Well, Sun did something similar with the UltraSPARC spec. SPARCv9 doesn't specify the privileged mode instruction set, and this used to vary. The UltraSPARC spec that they released along with the T1 specified this and was intended to provide a stable interface for operating systems. Some parts of that may have been difficult to implement without trampling on patents. There are also likely to be issues with things like SIMD extensions.

      I'd love to see some of these CPUs, such as the MIPS, get resurrected and used in newer IPv6 routers and other networking gear.

      All Juniper routers run a tweaked FreeBSD on 64-bit MIPS and a lot of low-end routers use 32-bit MIPS chips.

      With more open hardware, it would be easier to manage

      The problem is always competing with companies that have large volumes. Intel is basically a process generation ahead of everyone else, and they have economies of scale that only the big ARM SoC vendors can match. There isn't really a business case for using a reimplementation of an old CPU architecture when you can get a Cortex A15 that will outperform it for less money. The only reason you'd want to is so you can add some custom instruction set extensions that massively speed up your workload, and even then they'd need to give and order of magnitude speedup for it to be a net win. In Juniper's case, they go with in-order MIPS cores with large SMP and SMT support, so that they can do a lot of relatively simple packet processing tasks in parallel. There isn't much branching or floating point in packet filtering (it's mostly just integer arithmetic and a large amount of data shuffling and a lot of data dependencies that completely kill performance on superscalar or out-of-order chips), so general-purpose CPUs (and GPUs) are optimised in all of the wrong places. Two simple in-order cores for them can be faster than one complex out-of-order superscalar core.

      --
      I am TheRaven on Soylent News
    21. Re:Whatever! PowerPC been doing 64-bit by unixisc · · Score: 1

      I thought that MIPS, like the Alpha, was particularly strong in floating point. Are the versions of the CPUs you're describing integer-only CPUs, the ones that are used in routers? Also, for router like applications, I'd imagine that such CPUs are really more IO intensive than anything else, and it's the data transfer instructions there that would need the most optimization?

    22. Re:Whatever! PowerPC been doing 64-bit by TheRaven64 · · Score: 1

      Being strong on floating point is an aspect of the implementation more than the ISA. SGI's MIPS chips devoted a lot of designer effort and silicon to floating point, because that's what their customers wanted. In MIPS, however, there is a generic coprocessor interface supporting 4 coprocessors. CP0 is the system management coprocessor, which does all of the things like TLB management. CP1 is traditionally the FPU, and CP3 is sometimes the SIMD unit. CP2 is usually some manufacturer-specific extension. Cavium's Octeons, for example, put some network processing acceleration functions into CP2, but I don't think they implement CP1, or if they do it's likely a single floating point pipeline shared between cores. With a multithreaded CPU and a well-designed memory controller, you can have enough threads blocking on reads that you can handle one read and one write every cycle and completely saturate the bus, which is exactly what you want for network processing.

      --
      I am TheRaven on Soylent News
  2. Let us give thanks.... by cold+fjord · · Score: 5, Funny

    ...for being delivered from Itanium and 32bit x86.

    --
    much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
    1. Re:Let us give thanks.... by muon-catalyzed · · Score: 5, Interesting

      The people at AMD who did this, an unquestionably biggest AMD's achievement to date, they should be rehired and given executive positions.

    2. Re:Let us give thanks.... by Cyclon · · Score: 5, Informative
    3. Re:Let us give thanks.... by Anonymous Coward · · Score: 1

      Mod this up, AMD just re-hired Raja Kaduri, and a famouse programmer from Intel (I can't remember his name). I believe their new HSA architecture is going to be big.

    4. Re:Let us give thanks.... by MachineShedFred · · Score: 1

      Opteron was nice when it shipped, except had no software that really would use it to the fullest extents. Chicken and the egg, and all that.

      However, it's biggest achievement was putting a stake through the heart of Itanium, guaranteeing that the only thing that came out of that Intel debacle was EFI.

      --
      Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
    5. Re:Let us give thanks.... by operagost · · Score: 1

      As a fan of OpenVMS, I'm happy to say Itanium isn't dead yet. But every day without an announcement that it's being migrated to x86-64 makes me more nervous.

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    6. Re:Let us give thanks.... by VGPowerlord · · Score: 1

      The people at AMD who did this, an unquestionably biggest AMD's achievement to date, they should be rehired and given executive positions.

      Their second biggest was outdoing Intel with the first Athlon chips. As I recall, AMD had the performance advantage over Intel for several years starting in 1999, possibly even until Intel released the Core 2 in 2006.

      --
      GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
    7. Re:Let us give thanks.... by Anonymous Coward · · Score: 0

      Itanium? I thought it was Itanic.

  3. Twice as big as it needs to be? by Anonymous Coward · · Score: 0, Funny

    Does 64 bits really mean that every program is twice as big as it needs to be? Every time I hear about an innovation that requires things to be bigger, I question the necessity.

    1. Re:Twice as big as it needs to be? by houstonbofh · · Score: 1

      I call it garage syndrome... The stuff you have expands to fill your garage, no matter the size. It does not need to be bigger, but programmers can get lazy too. Not to mention fitting in the latest new shiny.

    2. Re:Twice as big as it needs to be? by leenks · · Score: 1

      No.

    3. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      No. No,it does not mean that at all.

    4. Re:Twice as big as it needs to be? by realityimpaired · · Score: 1

      Does 64 bits really mean that every program is twice as big as it needs to be? Every time I hear about an innovation that requires things to be bigger, I question the necessity.

      Nope. Doesn't mean that at all.

      Maintaining backwards compatibility with 32-bit means that you have to compile it twice, and include both sets of binaries. Actual compiled code that doesn't bother with backwards compatibility isn't significantly larger than 32-bit code.

    5. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 1

      This!

      For all the people who insist "Oh, but I need those big access and my umpteen gajillion giggerbytes of memory", well... no. No you don't, except for the rare computational simulation. What happened, is programmers who write the fancy software that you use decided that everything is easier if they allocate 100 times are much memory as they actually need. Computing has become 99% overhead and laziness and 1% actually doing something useful.

      It's like seeing an empty 12 lane highway and deciding that you'd better widen your car until it fills 11 of them, because otherwise they'd be all wasted lanes. The car still travels the same speed and gets the same small payload to it's destination, of course.

    6. Re:Twice as big as it needs to be? by vistapwns · · Score: 1

      Depends on how you want to look at it, and who you feel like being cynical against. Easing the job of programmers is a good thing, if they can use 10x more ram and not have to write code to juggle memory as much, they have eliminated a potential source of bugs and a time sink, that is probably hard to maintain as well. Memory is cheap, I got 16GBs for $90 bucks, and though programs are larger, maybe unnecessarily so, nothing comes close to exhausting my memory. It seems like a much better method, than defining some arbitrary limit, stopping all progress, and telling programmers to 'stop being lazy sobs'.

      --
      "...I think the Microsoft hatred is a disease." - Linus Torvalds
    7. Re:Twice as big as it needs to be? by FreonTrip · · Score: 1

      Nah - your primitives are doubled in size, which realistically represents something closer to a 25-33% size increase on average. But between the abilities to manipulate MUCH larger quantities of data at once and addressing >3.5 GB RAM, it's an easy choice unless you absolutely need 16-bit support.

    8. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      Sillies, everybody knows that 64-bit is 4,294,967,296 times bigger than 32-bit software.

    9. Re:Twice as big as it needs to be? by FreonTrip · · Score: 1

      Look, I like bashing straw man lazy programmers as much as anybody. But in scientific computing in the year 2013 - say, where you need to store 50 cubic miles of subsurface 4-dimensional seismic reflector data for 3D visualization and modeling density change over time - you run into the limits of 4 gigabytes very quickly. Never mind large-scale simulations run in TOUGH2-MP... Don't paint with such a large brush. People may piss memory on stuff that ran in less RAM back in 1996, but we're not there any more, and adventurous, relevant, and efficient uses of RAM really do exist.

    10. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      Yes, because all those general home and office users buying commodity computers with 64-bit CPUs and OSs are using them to "store 50 cubic miles of subsurface 4-dimensional seismic reflector data for 3D visualization and modeling density change over time".

    11. Re:Twice as big as it needs to be? by Tarlus · · Score: 1

      Only if it's a fat binary, but thankfully these never needed to catch on with the x86 to x86-64 transition.

      --
      /* No Comment */
    12. Re:Twice as big as it needs to be? by haruchai · · Score: 1

      Most of those same home users might get by with 512 - 1GB RAM and a 1$10 AGP video card; but with millions having multigigabyte machines with vector processor GPUs, the potential for cheap, powerful distributed processing is enormous - if you can convince them to give up a few hours of CPU time occasionally.

      Otherwise, well, it's probably just a waste electricity although PCs have been pretty darned efficient in the last few years.

      --
      Pain is merely failure leaving the body
    13. Re:Twice as big as it needs to be? by jemmyw · · Score: 1

      but his ability to do so might be hampered if the hardware wasn't in general use.

    14. Re:Twice as big as it needs to be? by exomondo · · Score: 1

      Nah - your primitives are doubled in size, which realistically represents something closer to a 25-33% size increase on average.

      What makes you think your 'primitives' are doubled in size?

    15. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      The fact that they are, maybe? ints become twice as big, so do pointers. The rest is usually padded to 32/64-bit anyway, so it's usually pretty irrelevant. Run the same installation of debian, one on 32-bit and one of 64-bit, install a few network services, and see how much memory the 64-bit one takes over the other.

    16. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      uh, it's about addressing.

    17. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      My 'primitives' are tripled in size, given a proper stimulation. They all point and crave to the nearest available memory location at that moment. My cache then feels like it could tear any moment and my busses fill with activity.

    18. Re:Twice as big as it needs to be? by FreonTrip · · Score: 1

      Sure, and heaven forbid any of those "users buying commodity computers with 64-bit CPUs and OSes" could ever use their hardware to begin tinkering with high-resolution video editing, or programming, or playing with the full capabilities of the hardware with which Moore's Law has blessed them. The fact that I do real work that can use more than 4 GB RAM doesn't mean the average user of whom you clearly think very little is incapable of doing so.

    19. Re:Twice as big as it needs to be? by BitZtream · · Score: 1

      Its not cheap you jackass, you're just passing the bill of to someone else.

      On top of that, its incredibly shitty for the environment.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    20. Re:Twice as big as it needs to be? by exomondo · · Score: 1

      The fact that they are, maybe? ints become twice as big

      No they don't, the size of an int is entirely compiler dependent.

    21. Re:Twice as big as it needs to be? by haruchai · · Score: 1

      By your blinkered "thinking", all research that doesn't produce instant results is wasted.

      --
      Pain is merely failure leaving the body
    22. Re:Twice as big as it needs to be? by Guy+Harris · · Score: 1

      The fact that they are, maybe? ints become twice as big,

      If you're talking about C-language ints, on very few 64-bit platforms are they 64-bit. Most UN*Xes are LP64, not ILP64, and Windows is LLP64 (they didn't even make long 64-bit, unlike most UN*Xes).

      so do pointers.

    23. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      I know, I was giving an example from the real world on that platform that people of Slashdot are usually interested about, Linux.

    24. Re:Twice as big as it needs to be? by Guy+Harris · · Score: 1

      Only if it's a fat binary, but thankfully these never needed to catch on with the x86 to x86-64 transition.

      ...although they did anyway, in OS X (even though the vast majority of Macs had x86-64 processors).

    25. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      The fact that they are, maybe?

      the size of primitive types has absolutely nothing to with the underlying platform architecture. regardless of whether you are on a 32 or 64 bit machine in java an integer is always 32 bits, in clang an integer is 32 bits, in visual c++ an integer is 32 bits and in gcc an integer is 32 bits. in c++ on a 16bit machine you will often find that an integer is 16 bits but that is only the minimum required to represent an integer according to the standard.

      The rest is usually padded to 32/64-bit anyway, so it's usually pretty irrelevant.

      who exactly does that? using 8 bytes when you only need 4 is just stupid.

    26. Re:Twice as big as it needs to be? by exomondo · · Score: 1

      I know

      If you know then why did you say ints become twice as big when they don't?

      I was giving an example from the real world on that platform that people of Slashdot are usually interested about, Linux.

      That example doesn't in any way illustrate that primitive types would be twice as big, pointers yes, but not primitive types.

    27. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      Doesnt seem to be much. What si your actual configuration and memory differences?

    28. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      Some primitives (such as pointers) are doubled in size.

    29. Re:Twice as big as it needs to be? by cusco · · Score: 1

      I suppose you consider running virtual machines to be a "rare computational situation". Try running a couple of VMs under your 32-bit OS, and you'll change your tune pretty quickly.

      --
      "Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
    30. Re:Twice as big as it needs to be? by SEE · · Score: 4, Informative

      it's an easy choice unless you absolutely need 16-bit support.

      The annoying thing being that an x86-64 processor in long mode can, in fact, run 16-bit protected mode code (like essentially all actual Windows 3.x programs) with the same compatibility sub-mode that runs 32-bit code. It's merely that Microsoft decided they didn't want to bother supporting it.

      That this can be done is easy enough to prove; take a Win16 app and run it in WINE on 64-bit Linux.

    31. Re:Twice as big as it needs to be? by dbIII · · Score: 1

      No, but they made it a hell of a lot cheaper for those of us that do. Now there's machines with 64 real cores and 128GB of memory for less than $10k. While GPUs are really nice for some stuff they can't handle much memory, so real CPUs still have a place.

    32. Re:Twice as big as it needs to be? by fnj · · Score: 2

      You could both, I don't know, ACTUALLY FIND OUT the answer and present it. The truth is somewhere in between. Some sizes are the same, and some are larger. The first column is 32 bit gcc in current Arch; the second is 64 bit gcc in RHEL 6.4; both with default options.

      sizeof (char)        1    1
      sizeof (short)       2    2
      sizeof (int)         4    4
      sizeof (long)        4    8
      sizeof (long long)   8    8
      sizeof (void *)      4    8
      sizeof (size_t)      4    8
      sizeof (float)       4    4
      sizeof (double)      8    8
      sizeof (long double) 12   16

      Given that there are quite a few long's, size_t's and pointers in typical C code, the 64 bit code is indeed substantially larger.

    33. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      Memory expands far more than the program size. And certain things are easier if the default word size is 64 bits. With 32bit ints, there aren't enough to go round: if each human alive had to be given a unique number, and the rest eliminated, you'd want the evil empire to be using 64bit ints, not 32bit ones ;-).

    34. Re:Twice as big as it needs to be? by exomondo · · Score: 1

      You could both, I don't know, ACTUALLY FIND OUT the answer and present it.

      I did, the answer is that it is dependent on the implementation, not the machine architecture. Or are you going to tell me that those are the size of those primitive types on 32bit and 64bit architecture? Because they aren't, they are just the values defined by the implementation you used.

      The truth is somewhere in between.

      No, the truth is exactly as I said, that - as your post demonstrates - primitives are not doubled in size on 64bit architecture, and the reason why is because the decision is on the size of primitives is not governed by the underlying architecture.
      If you look at the C standard for example you will find those values are not defined by the standard, the C99 standard only defines a minimum precision, the actual size is up to the implementation - which again is nothing to do with the underlying architecture.
      Did ints become twice as big? No. Could they? Of course. Why? Because it is defined by the implementation, not the machine architecture, it's all right there in the specification.

    35. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      the question isnt what are the values that a particular compiler for a particular language gives you on a particular platform. its do primative types double in size on 64 bit? the answer is the size of primative types is irrelevant to the system precision outside of things that are directly impacted by it (like size_t in c), its up to the spec (whether that is c, java, .net or whatever), dude i can have 64 bit integers on 32 bit if i want!

    36. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      Notice how I wrote 32/64?? Jeez.

    37. Re:Twice as big as it needs to be? by serviscope_minor · · Score: 1

      This!

      No, not this.

      It's just a thinly disguised "the youth of today" argument.

      Just stop and think for a minute about how easy it is to eat up 2G.

      It's one 40000x40000 image. In other words juuust a bit larger than the maximum practical X11 framebuffer size. Plenty of sources spit put images larger than that now.

      It's about 1000 1080p images, which is about 30 seconds of uncompressed video. I'm sure if you're editing streams together, it won't matter haveing a total of 30 seconds of video cached. Won't be annoying at all.

      Now go and apply those to games. If you want to avoid slow loading from disk, you need to cache all those assets.

      And as for "all programmers now are crap" it's just rot. The Mel's of this world have only ever been 0.01%. Back when C was in vogue, compilers were bad and produced slow code, few people knew how to optimize by hand and memory leaks were rampant.

      You know what? I like having lots of memory and a fast CPU. It means several things. When I just need a one-off thing done, I can do it in an appropriate language. When I want speed, I can write C++ and it will go blindingly fast. And it's simpler too since I don't have to muck around with complex caching or overlay schemes or any of that crap.

      Or do you believe that 640k really is enough for anyone?

      --
      SJW n. One who posts facts.
    38. Re:Twice as big as it needs to be? by fnj · · Score: 1

      +0, pedantic. Everybody knows all that. Still, the FACT is that with the C compiler used for most open source compiling, 64 bit code is bigger in size, because some of the variables are bigger and none of them are smaller. Are or are not long and pointer both twice as big in 64 bit? Never mind "well it doesn't have to be" and "it's just a chocie" and "has nothing to do with number of bits in the CPU".

    39. Re:Twice as big as it needs to be? by TheRaven64 · · Score: 1

      Windows is LLP64 (they didn't even make long 64-bit, unlike most UN*Xes)

      And this caused a lot of pain because a huge number of programmers believed (some still do) that the C standard guaranteed that sizeof(long) >= sizeof(void*) and so used long instead of intptr_t. They did this because a lot of their headers used packed structures for things like file headers and used a type that was typedef'd to long, and it was easier for them than fixing all of their headers.

      --
      I am TheRaven on Soylent News
    40. Re:Twice as big as it needs to be? by Dr_Barnowl · · Score: 1

      who exactly does that? using 8 bytes when you only need 4 is just stupid.

      It's more complicated that than.

      CPUs move memory around in register-sized chunks ("words"). Therefore a CPU operating in 64-bit mode moves memory around in 64-bit sized words.

      You can gain some ground by packing smaller variables together, but there will be some slack for things that don't fit into the chunk size. And it's more efficient to access memory aligned to word boundaries.

      You may as well say "why use 8 bits when you only need one" - most databases store boolean values as a whole byte, because it's a total pain in the arse to write a single bit then offset the rest of the row by one bit to save 7 bits of space. If you have multiple boolean fields (up to 8 per byte), they get packed together, because it's much cheaper to shift a single byte to the left than it is to shift the rest of the row.

      So the answer is, everyone does that, because their compiler takes care of it for them.

    41. Re:Twice as big as it needs to be? by exomondo · · Score: 1

      Everybody knows all that.

      Obviously not, i wouldn't have even been having the discussion otherwise, the reply was directed squarely at the fact that primitives - specifically integers - do not double in size on 64bit architecture. Go back and read the thread, had you done that in the first place you would know that.

      Still, the FACT is that with the C compiler used for most open source compiling, 64 bit code is bigger in size

      I never disputed that, again read the thread before you reply so you know the context of the discussion before you interject with an irrelevant comment.

      Are or are not long and pointer both twice as big in 64 bit?

      They are in that instance, which is fine and was never in dispute, why do you think that was in dispute?

      Never mind "well it doesn't have to be" and "it's just a chocie" and "has nothing to do with number of bits in the CPU".

      That is the topic of the discussion and had you bothered to read you would know that, read here.

    42. Re:Twice as big as it needs to be? by arth1 · · Score: 1

      Also, address offsets and immediates are larger when 64-bit.

      Mitigating this somewhat are new 64-bit instructions which can do in a single operation what you might need several operations to do on a 32-bit system. Adding two 64 bit integers, for example, is a single add instead of one add and one add with carry. With multiplication or division of 64-bit values, you're talking big savings.

    43. Re:Twice as big as it needs to be? by NJRoadfan · · Score: 1

      Did WINE ever support Real Mode Win16 applications? There were so few Windows 1.x/2.x apps out there that I highly doubt it. Some apps written for Windows 2.1 and up seem to be dual mode, supporting both Real and Standard (286) Mode of Windows.

    44. Re:Twice as big as it needs to be? by FreonTrip · · Score: 1

      I didn't know that! Some time soon I'm gonna have to get some very old games up and running in Linux soon... Thank you!

    45. Re:Twice as big as it needs to be? by Blaskowicz · · Score: 1

      With BOINC and "@Home" projects, who knows : home users will eventually get there.

    46. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      The topic of the discussion isn't irrelevant pedantic bullshit, it's why 64-bit programs usually take more memory, and the reason is that many of the primitives are twice as big. Now stop changing the topic just to please your masturbatory needs and acting like we give a fuck.

    47. Re:Twice as big as it needs to be? by Shirley+Marquez · · Score: 1

      64-big x86-64 code only suffers a slight size penalty in real-world applications; I've seen it range anywhere from zero to 10%. The longer pointers cost you but there aren't that many of them in typical code, and you get some code size back by gaining access to more registers.

      This led to an interesting phenomenon in the early days of x86-64: programs recompiled for 64-bit architectures typically had a 20% speed advantage on Athlon 64 systems but no advantage at all or a slight slowdown on Pentium 4 systems. The AMD systems were execution-unit bound, and doing fewer but larger instructions was a win. The P4 was instruction-fetch bound (the design's memory bandwidth for instruction fetch was lacking) and so the fact that the programs were bigger hurt the P4's performance. AMD was also helped by the fact that the code optimization in compilers at the time was tuned for AMD processors as they had gotten to market earlier.

    48. Re:Twice as big as it needs to be? by Shirley+Marquez · · Score: 1

      Sure, I'll try that. Under my 32-bit Linux OS. Works just fine, since 32-bit Linux has access to more than 4GB RAM, it just can't give all of it to one process. It wouldn't be a notable success on Windows, where the 32-bit version has 4GB of address space, period.

      That said, I'd still choose to run the 64-bit version of Linux in that scenario.

    49. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      Do you mean 16-bit real mode? There was also "unreal" mode that allowed running 16-bit code under memory protection, but that was an undocumented and unsupported glitch.

    50. Re:Twice as big as it needs to be? by oreiasecaman · · Score: 1
      --
      This is a UDP joke, I don't care if you get it or not...
    51. Re:Twice as big as it needs to be? by Kyusaku+Natsume · · Score: 1

      I have yet to see a PC that doesn't feel slow rendering a full HD movie shot.

      --
      Mexico: 100% conservative's America now!
    52. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      If you know then why did you say ints become twice as big when they don't?

      Because I figured this discussion would interest people who know that there are many integers types in C/C++ (including long int), not just pedantic twats.

    53. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      CPUs move memory around in register-sized chunks ("words"). Therefore a CPU operating in 64-bit mode moves memory around in 64-bit sized words.

      That is an incredibly naive statement. 32-bit CPUs began using 64-bit busses about 20 years ago, and it wasn't just for giggles. Such CPUs usually moved memory around in cache-line sized chunks, which were much larger than register-sized chunks. Larger than the bus width, even. (Typical mid-90s CPUs used a cache line size of 32 bytes, so they needed bursts of four words on a 64-bit bus to move one line.)

      Dial the clock back another 15 years or so. The 68000 was hot shit. It featured 32-bit registers and instructions with an internal hardware implementation that was actually 16 bits wide, with a 16-bit external data bus, and no caches. So the 68000 actually moved data around in half-register-sized chunks.

      Register size tells you very little about external or even internal datapath widths. It suggests what they might be but it isn't an absolute guide by any means.

    54. Re:Twice as big as it needs to be? by Anonymous Coward · · Score: 0

      I was running bind9, ntp, ssh, rsyslog, exim4, after reloading it it all the 64-bit version took a significant amount of memory more than the other on a 512MB VPS. Obviously the effect is less significant the more total memory you have.

    55. Re:Twice as big as it needs to be? by SEE · · Score: 1

      No. The x86-64 does not support 16-bit real mode code when in long mode, whether directly or through a virtual 8086. Nor does it support unreal mode.

      However, the x86 instruction set didn't directly jump from the 8086 to the 80386; the 16-bit 286 supported a 16-bit protected mode. And in long mode, an x86-64 processor can execute such code.

  4. Re:Did it really work? by cheater512 · · Score: 1

    Erm that 'smart memory management' (PAE) has a nice big performance hit. Somewhat bigger than a 3% slowdown.

    Also 64 bit can handle bigger numbers (over 4.3 billion) an awful lot faster than 32bit can. It doesn't help with small numbers but for the bigger ones 32bit processes them rather inefficiently.

  5. Re:Did it really work? by Grashnak · · Score: 5, Insightful

    My 32 GB of RAM, absolutely essential for my work, laughs at your "memory management" bullshit.

    --
    Life needs more saving throws.
  6. Re:Did it really work? by sribe · · Score: 3, Funny

    If it's such a success, why does 64-bit software generally only run marginally faster than its 32-bit build? 64-bit binaries are larger and might run 103% at the speed of 32-bit if you're lucky.

    Sure, it helps with the 4GB memory space limit, but so can smart memory management and other approaches.

    I could see it being useful for super-computing things, but in general, there still just doesn't seem to be a point.

    Wow, just wow. Do you actually work in the software field???

  7. 64 bit x86 worked out, but not for AMD by iggymanz · · Score: 2

    AMD may have helped create the x86-64 market, but now it's getting killed by it. soon Intel will be the only major player. ARM market is AMD's only hope.

    1. Re:64 bit x86 worked out, but not for AMD by sayfawa · · Score: 4, Informative

      The next console generation disagrees. Sony and MS are both using AMD.

      --
      Free the Quark 3 from asymptotic confinement! Bring your charm! Don't get down! All colours and flavours welcome!
    2. Re:64 bit x86 worked out, but not for AMD by Anonymous Coward · · Score: 1

      I am not sure why you would believe this, as AMD's apus are far above what Intel can do in the mobile space, as well as being in all 3 new consoles coming out this generation. The HSA alliance also stands to benefit AMD much more than Intel, as Intel doesn't have an immediate interest in it.

    3. Re:64 bit x86 worked out, but not for AMD by KiloByte · · Score: 1

      AMD smokes Intel in performance/price for most stuff that can be parallelized. It's only single thread performance where Intel wins.

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    4. Re:64 bit x86 worked out, but not for AMD by edxwelch · · Score: 1

      Yeah, just abandon your bread and butter x86 business because ARM is the new darling with "analysts". That's not gone too well for Nvidia. So far the their Tegra business has lost hunderds of millions each year since it began 5 years ago.

    5. Re:64 bit x86 worked out, but not for AMD by Kjella · · Score: 1

      AMD smokes Intel in performance/price for most stuff that can be parallelized. It's only single thread performance where Intel wins.

      On CPU prices alone, yes... but they're also struggling on performance/watt which translates into performance/$ both in power supply and cooling, which is a fair bit of the cost if you're running big, massively parallel jobs that engage all the cores over long periods of time. Anandtech simply summarized it like this:

      Power consumption is also a big negative for Vishera. The CPU draws considerably more power under load compared to Ivy Bridge, or even Sandy Bridge for that matter.

      Every dollar AMD loses on the power bill is of course another dollar Intel can charge extra for a more efficient processor.

      --
      Live today, because you never know what tomorrow brings
    6. Re:64 bit x86 worked out, but not for AMD by Blaskowicz · · Score: 1

      There's motherboard cost too, you can run an i7 on the cheapest motherboard
      But with an FX 8350 or lower, using the cheap 760G boards leads you to trouble because the VRM circuitry can't handle above 95 watts. Many buyers unknowingly make that mistake and so end up with a great FX CPU underclocked at 800MHz, or it works fast but throttles down and stutters when you do something demanding enough with it.

    7. Re:64 bit x86 worked out, but not for AMD by WD · · Score: 1

      My 16-cores-per-processor servers question your statement. I don't think any other vendor beats AMD on the core density aspect.

    8. Re:64 bit x86 worked out, but not for AMD by tlhIngan · · Score: 4, Insightful

      AMD may have helped create the x86-64 market, but now it's getting killed by it. soon Intel will be the only major player. ARM market is AMD's only hope.

      Intel won't let AMD die. In fact, AMD is right where Intel wants them to be - big enough to ward off government regulators, small enough to not be a huge pain in the rear. Intel and other large companies are scared of government regulation and monopoly declaration, and we do know that Intel has committed enough sins that if the regulators look hard enough, they can make a case to break up Intel. Including separating the ASIC design and foundry parts (and we know Intel has a LOT of foundry capacity). And I'm sure Intel's shareholders would rather give up some revenue to ward off the much bigger hit that would happen when the government regulators step in.

      It's entirely possible that Intel has a bunch of "AMD rescue" plans - ranging from simple "let's just buy up all of AMD's CPUs and bury them" to more elaborate schemes. Of course, Intel cannot directly fund AMD. Perhaps Intel could give AMD some patents in an emergency.

      Heck, you could argue that Intel told Sony and Microsoft to buy AMD chips - it gives AMD a nice steady income for the next few years. Intel could've used their extensive fab capacity to make custom chips for the consoles (much more easily than AMD can), but you can bet an opportunity like this to help prevent AMD from keeling over was just perfect.

      And no, this isn't unusual in the business world. What you see as competitors can have all sorts of incestuous relationships amongst themselves - it's not unknown to have competitors to buy parts from each other. And you can bet Apple, Google, Microsoft, Samsung and others are far more chummy to each other than patent lawsuits or settlements will imply. There's enough back room deals and arrangements that really hide the interdependence on each other they all have.

    9. Re:64 bit x86 worked out, but not for AMD by gl4ss · · Score: 1

      AMD may have helped create the x86-64 market, but now it's getting killed by it. soon Intel will be the only major player. ARM market is AMD's only hope.

      not this shit again.

      amd doesn't own any plants, so how would be licensing an arm design and having it contract manufactured save them ?? how the hell?? what would be the amd business and research in that situation?? who the fuck would buy them??

      --
      world was created 5 seconds before this post as it is.
    10. Re:64 bit x86 worked out, but not for AMD by FithisUX · · Score: 1

      It still has very good products. But in my own opinion they should have switched to MIPS64. In any case better incorporation of latest x64 ISA updates by Intel, investment on Coreboot and by default incorporation of IOMMU (ukernel, virtualization friendly) into their products could revive them. Their APU dream is forward looking and they still have a lot to offer. However I still believe a switch to MIPS64 could have given them better chances to differentiate. The Win32 monopoly is the problem but there are other OSes that could make a difference (see Linux,Haiku, BSDs and why not OSX).

    11. Re:64 bit x86 worked out, but not for AMD by aztracker1 · · Score: 1

      Not sure about that.. I always went for 780+ though... I will say that the lower end AMD boards are significantly lower in price than anything that can run an i7.. I recently went from a first gen i7 to an 8350, and am pretty happy.. I need parallel ability more than single-core performance, and it's far better than the i3/i5 options in the same price range for my needs. It's not *that* much faster than my i7 was, and if it weren't for stability issues, I wouldn't have upgraded.

      In the value segment, and in multi-threaded workstations AMD is pretty competitive. I've gone both ways though.. I just wish AMD had more to offer.. I think the pricing currently is not progressing well at all because of the lack of competition on the high end.

      --
      Michael J. Ryan - tracker1.info
    12. Re:64 bit x86 worked out, but not for AMD by Anonymous Coward · · Score: 0

      xbox 360 has sold 70 million units to date - 6 year history. The current pc market, even in its current low sales rate thanks in part to phones and tablets, moves that many units in a quarter. While getting the xbox and ps4 contracts are a win for amd, its not enough to save them. Especially when you consider the fact that most money in processors are made either with high end offerings (that amd are no longer a major player in) or in volume, where intel is cleaning up. Not looking good for amd, unless they can get a win elsewhere.

    13. Re:64 bit x86 worked out, but not for AMD by Shirley+Marquez · · Score: 1

      100% of the console business compares favorably in volume to the 10% of the PC business that AMD has now. It's a considerable increase in volume for them, whereas for Intel it would be a drop in the bucket. And AMD's APUs were made to order for the console business, giving the console makers a single chip solution. Intel's integrated graphics are inadequate for game consoles; they would have had to partner with NVidia on a two-chip setup.

      The console contracts aren't going to help AMD regain a place in the heart of computing enthusiasts. Nothing the company does may ever do that. But it will help them get up their production volumes for parts intended for mainstream PCs and laptops, and meanwhile make a bit of money; that may be enough to keep AMD alive as an Avis to Intel's Hertz.

  8. Re:Yeah, I guess it's 10 years by LocalH · · Score: 1, Insightful

    Those were x86-based? The title was "64-bit x86 Computing Reaches 10th Anniversary", not "64-bit Computing Reaches 10th Anniversary".

    --
    FC Closer
  9. How soon till we get 128-bit? by Anonymous Coward · · Score: 0

    We need 200 versions of Windows.

    1. Re:How soon till we get 128-bit? by lightknight · · Score: 1

      Hmm. Depends. The global economy is a bit too unstable to make much progress for now, and people are still getting used to the 64-bit changeover.

      We have multiple cores, but the software kits haven't evolved enough yet to take full advantage of them, or so I'm told.

      Personally, I think the next big leap should be optical processing.

      --
      I am John Hurt.
    2. Re:How soon till we get 128-bit? by petermgreen · · Score: 3, Informative

      A long time.

      We don't even have true 64-bit x86-64 processors yet. While programmers are told to* treat pointers as 64-bit in the current implementation (reffered to as a "48-bit implementation" there are only 47 usable bits for user-mode pointers**. That is enough to map 128 terabytes to one process, afaict the most ram you can currently get in a PC architecture machine is 2 terabytes.

      If we assume the largest available memory size doubles every 1.5 years and we want to be able to map all the memory to one process then we have 9 years until the current implementation is used up and another 24 years after that before a "full 64-bit" (with one bit used to distinguish between kernel and user mode) implementation is used up.

      * Of course just because programmers are told to do something doesn't mean they will http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642750
      ** A 48th bit is used to differentiate kernel and user addresses. The number is then sign-extended to produce a 64-bit number.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    3. Re:How soon till we get 128-bit? by Shirley+Marquez · · Score: 1

      Nothing new here really. Some early 32-bit processors couldn't actually address 4GB of memory, and it was a long time before anybody produced a motherboard for a 32-bit processor that could hold 4GB. (It never happened for the 386 or 486 although those CPUs had 32 address lines.) No Alpha CPU ever had 64 real address lines, and I doubt that any SPARC or Power CPU has to date.

  10. "worked out" by girlintraining · · Score: 1, Insightful

    But it worked out in the end.

    Yes, mostly due to the fact that we needed a way to get past the 4GB memory limitation, and not because we gave a damn about whether the processor was native x64 or not. AMD has had some great ideas, but they've almost always shorted themselves on the implimentation, leaving the field wide open for Intel to come in with a better offering and take the lion's share of the profit.

    --
    #fuckbeta #iamslashdot #dicemustdie
    1. Re:"worked out" by aliquis · · Score: 1

      I would say it "works out" at about three times the speed of IPv6. Or whatever.

      Still Steam on Linux is a 32 bit application. My bank encryption plug-in is 32 bit.

      Why I have to be stuck with such issues now is beyond me, the later plug-in suck donkey balls regardless and got an open-source implementation which do about the same work and builds cleanly on 64 bit machines to so that's that. For whatever reason it doesn't work with my bank though (but most others.)

    2. Re:"worked out" by Kjella · · Score: 1

      AMD has had some great ideas, but they've almost always shorted themselves on the implimentation, leaving the field wide open for Intel to come in with a better offering and take the lion's share of the profit.

      Well AMD can't magically just "be big" and even when they were kicking Intel's ass fab capacity means you can't take over this market overnight. Intel could afford to gamble on things like Pentium IV and Itanium, while still working on entirely different lines like Pentium III-M that was the basis for the Core processors and Atom which has denied AMD much revenue on the low end. That is the sort of thing AMD never could afford to do, they had to design a jack-of-all-trades and hope that through Intel's ineptness it turned out a king-of-all-trades. And all other things being equal, Intel poured money into process development, which AMD couldn't keep up with even with massive external investment through the GloFo spin-off.

      In short, while it's easy to say AMD should have done better it's hard to say how exactly, they were always dodging punches but sooner or later Intel would score a KO punch like the Intel Core processors, AMD has been dazed and confused ever since. They've been running around the ring bouncing off the ropes but you could tell for each round they've taken more and more of a beating. And right now the market they've been betting on is being eaten up by tablets, look at the revenue figures for AMD. One thing is that they're not profiting, but they're not selling. Revenue way down, inventories going up again despite how much Slashdot praises their APUs.

      --
      Live today, because you never know what tomorrow brings
    3. Re:"worked out" by Dawn+Keyhotie · · Score: 5, Insightful

      WRONG on many levels. Yes, we had to get past the 4GB memory limitation, but there had been, and still were at the time, several other true 64-bit microprocessors around when AMD introduced the Opteron: Alpha, UltraSPARC, MIPS, PowerPC, and yes even IA-64. (not to mention IBM POWER and zSeries.) But they all had the fatal flaw of NOT being compatible with the Intel 32-bit x86 processors and off-the-shelf Windows software. Only Opteron had that, and that compatibility was so critical that Intel was grudgingly forced to adopt the x86-64 instruction set.

      So, you may say, why didn't AMD take the IT world by storm? Because of 1) AMD was not Intel, and never could/would be; 2) Intel was paying manufacturers NOT to offer ANY AMD based systems with marketing kickback agreements; 3) Intel would punish any manufacturer who did offer AMD systems with exorbitant price hikes on the Intel parts they did sell; 4) All this was taking place during the Bush years of federal laissez-faire non-enforcement policy, giving Intel free rein on those practices; 5) Prejudice against AMD in the IT industry was widespread, and still is; 6) few people saw or acknowledged the need for a flat 64-bit address space; 7) those that did have the need for 64-bit software were forced to spend exorbitant amounts of money for RISC workstations, which motivated them to look down their nose at commodity PCs, even if they were 64-bit; 7) Chicken-and-Egg syndrome (no volume 64-bit hardware, thus no volume 64-bit software, thus no need for volume 64-bit hardware).

      So AMD did not "short themselves on implementation". Their architecture was state of the art, and kicked both 32-bit Pentium and non-compatible IA-64 in the nuts. They had all of today's advanced hardware features years before Intel: x86-64 architecture; Hyper-transport to replace the front-side bus bottleneck and enable point-to-point CPU links; and on-board memory controllers. AMD was not able to block Intel from poaching their features because of the pre-existing patent cross-licensing agreements. And anti-monopoly enforcement was practically non-existent at the time (and not much better today).

      Of course, not of this is meant to imply that AMD was not partially or even mostly responsible for their troubles. They were (and still are) horrible at executing their own roadmaps. They were (and still are) horrible at marketing to consumers. They were (and still are) horrible at manufacturer relations. They were (and still are) unable to make a sane strategic decision if their life depended on it. They were (and still are) perceived as the el-cheapo Intel-knockoff copycat instead of pioneering leaders in their field.

      So yeah, AMD is a hot mess, but there is plenty of blame to go around.

      --
      "The only good windmill is a tilted windmill."
    4. Re:"worked out" by Anonymous Coward · · Score: 0

      There are plenty of good reasons *not* to use 64-bit code - smaller code size, small data requirements ( 2GB), and ease of maintenance. Blame the OS for not making 32-bit runtime a seamless mode.

    5. Re:"worked out" by Randle_Revar · · Score: 1

      Use Debian testing (about to become Stable), and use multiarch. Never worry about 32bit vs 64bit again

    6. Re:"worked out" by deburg · · Score: 1

      So yeah, AMD is a hot mess

      Exactly, back in 2006-2009 when manufacturers started coming out with low cost AMD Desktops and laptops, most fried or had to get their coolers replaced (sadly most people could only afford to do this for desktops) after a year. Heck, it was a lot worst in my country where the ambient temp is 35 degrees Celsius and 100 per cent humidity.

    7. Re:"worked out" by Anonymous Coward · · Score: 0

      4GB memory limitation? How about PAE?

    8. Re:"worked out" by Nutria · · Score: 1

      How about PAE?

      There's still a 2GB limit on the size of a single process.

      Not much of a problem if your server run multiple 1.5GB programs, but prevents one from opening huge video and image projects, and I've seen Firefox die from OOM when lots of tabs have loaded Web 2.0 sites chock full of JavaScript.

      --
      "I don't know, therefore Aliens" Wafflebox1
    9. Re:"worked out" by Anonymous Coward · · Score: 0

      So, you may say, why didn't AMD take the IT world by storm?

      Er, for a while they kinda did? There was a long period where Opteron was so superior to anything Intel was selling that AMD's gains in IT were quite rapid. They went from next to nothing to owning somewhere around 20%-30% of the x86 server market. (Now they're back down below 5% last I saw.)

      5) Prejudice against AMD in the IT industry was widespread, and still is;

      If you really remember your history, you'll know why. AMD first took a crack at the x86 server market with 32-bit "K7" Athlons, not 64-bit Opterons. K7s were pretty good CPUs for their time, but AMD shot itself in the foot with horrible server/workstation K7 chipsets. At the start of the Opteron era, AMD had to overcome that bad first impression.

      6) few people saw or acknowledged the need for a flat 64-bit address space; 7) those that did have the need for 64-bit software were forced to spend exorbitant amounts of money for RISC workstations, which motivated them to look down their nose at commodity PCs, even if they were 64-bit; 7) Chicken-and-Egg syndrome (no volume 64-bit hardware, thus no volume 64-bit software, thus no need for volume 64-bit hardware).

      What color is your sky?! The combination of Opterons and 64-bit Linux triggered an enormous migration of scientific and engineering applications away from RISC workstations. I work in one of the many industries which saw this conversion happen. When I started at this company several years ago, the standard engineering workstation was a dual Opteron 285, just beginning to be phased out for Xeon workstations. I wasn't here for it, but I'm told the Opterons completely displaced Sun SPARC hardware.

      Also, AMD completely sidestepped C-and-E syndrome by making Opteron/Athlon 64 such a good 32-bit x86 CPU that it was well worth buying even if you didn't have 64-bit software yet.

  11. Re:Did it really work? by iggymanz · · Score: 1

    do you? for average PC applications (browsing the web, e-mail, office documents) 64 bit gives no advantage. for the above-average applications (multimedia creation/editing, CADD, running multiple VMs, ) it's very helpful.

  12. Re:Did it really work? by Anonymous Coward · · Score: 0

    Bitch, the cluster I work with has over 700G RAM in each node.

  13. Lithium by Anonymous Coward · · Score: 0

    Dude you really NEED a new Lithium prescription! Really!

    And counseling!

    And ..Oh for Christs Sakes! You NEED to be institutionalized!

    They got TV, Video games - ANY one you want! Really!

    And ...

    Goddamn it Sam!

    Guys ...

    The parent is with me at Bellevue.

    His name is Daniel. He's a Time Lord - that's why you can't get to him and why he's always first post.

    Yes, he took me back to 2021 and we Trolled President Hillary Clinton and vp Jeb Bush. We're on the run from the Secret Service for that. i wish he wouldn't do shit like this!!

    No ...really BUUY GOLD and MORE Facebook!!!

    FUCK - they're shooting at us.... go tot go!

    *Hurrrrr ...huurrrr ....hurrrr....hurrr wahhhhhhh .... ping*

  14. Only a matter of time... by Anonymous Coward · · Score: 0

    I can't wait for 128 bit !!!

    1. Re:Only a matter of time... by unixisc · · Score: 1

      Transmeta's (remember them?) Crusoe processor was internally a 128-bit VLIW CPU. Their Efficion was internally a 256-bit CPU. So if one could salvage those and access the native instruction set, one has something to work with. Not to mention it being a low power consumption CPU.

  15. Re:Did it really work? by vistapwns · · Score: 3, Interesting

    Heard x64 was barely faster than 32-bit, wrote this program to find duplicate files on Windows: http://poshcode.org/3377 - it's at least twice as fast in x64 than 32-bit. Naturally it won't apply to everything, but for certain things x64 is really good.

    --
    "...I think the Microsoft hatred is a disease." - Linus Torvalds
  16. Re:Did it really work? by Anonymous Coward · · Score: 1, Informative

    Depends on how it's coded, for example: 64 bit MAME runs around 30% faster than the 32 bit version: http://www.mameui.info/Bench.htm

  17. Re:Did it really work? by Anonymous Coward · · Score: 0

    Yes, but I work with embedded software, where the goals are to make it work with the smallest/cheapest hardware footprint possible. I know this differs vastly from the goal of desktop developers to use as much electricity as possible and code for bragging rights about specs and CPU usage.

  18. Re:Did it really work? by cbhacking · · Score: 4, Informative

    Most programs still don't need to work with numbers larger than 4 billion on a regular basis, so native 32-bit ints are just as fast as native 64-bit ones.
    Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.

    Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.

    Where 64-bit does become really valuable is working with very, very large amounts of sequential data (want to allocate a 10GB array? Can't do that on x86, no way no how). That's hardly a typical requirement right now (although I wrote a program a few weeks ago that needed to do it). However, it's getting closer. Additionally, while clever memory mapping can allow a 32-bit process to access over 4GB of RAM (just not all at the same time), there is a (small) performance impact associated with the need to be constantly re-mapping that memory.

    The other area where 64-bit really helps is with security, specifically exploit mitigation. High-entropy ASLR in recent versions of Windows and some other OSes randomly places 64-bit aware executables and their various data regions across their entire 64-bit address space. This not only makes it completely impossible to correctly guess the address of any given bit of code in memory, it also makes spraying (heap spray, JIT spray, etc.) attacks completely infeasible; to cover even a tenth of a percent of the address space, you'd need to spray 16 million gigabytes of data. That's not only quite impractical at modern CPU speeds (even on a blazingly fast CPU and done in parallel, it would take a week or more), it also is far more memory (physical or virtual) than any modern computer will be able to allocate.

    --
    There's no place I could be, since I've found Serenity...
  19. Re:Did it really work? by cbhacking · · Score: 1

    I've seen Firefox run into the 2GB user-mode address space / process limit many times... Chrome and (recent) IE don't have this problem due to per-tab processes, but Firefox definitely hits it when you use as many tabs as I do.

    --
    There's no place I could be, since I've found Serenity...
  20. Re:Slashdot refuses to respond to abuse... apk by Anonymous Coward · · Score: 0

    I vote roman_mir for best current Slashdot troll.

  21. WinXP x64 by Anonymous Coward · · Score: 0

    Was Win XP x64 ever widely used? Nowhere I worked ever formally supported it. The driver support was poor, as Vista x64 was adopted more widely before the advent of 7.

    In fact, I think we quickly migrated the 2 people (out of 20k+) we found using it.

    1. Re:WinXP x64 by Anonymous Coward · · Score: 1

      Define "widely used", in engineering circles plenty people ran XP64... roughly the same crowd that ran win2k before.
      It was also misnamed, it's Windows 2003 x64 workstation.
      And that gets us to driver support... use 2k3 x64 drivers. They work (surprise, it's the bloody same kernel).

    2. Re:WinXP x64 by petermgreen · · Score: 1

      In my experience most hardware that works with other versions of x64 windows works fine on XP x64. The only two exceptions I ran into was the data translation DT9816 (which worked with some APIs but not others, go figure) and the NI mydaq (for which the software refused to install at all). Remember from a driver point of view XP x64 is basically the same as server 2003 x64 so all the core hardware that is used in both clients and servers is well supported.

      As for adoption I know of a few dedicated simulation/number crunching boxes at university running it but I don't know anyone else who uses it as the OS on their main office desktop.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    3. Re:WinXP x64 by Anonymous Coward · · Score: 0

      And that gets us to driver support... use 2k3 x64 drivers. They work (surprise, it's the bloody same kernel).

      That didn't help my brother get his onboard Creative sound card working... (many consumer hardware devices don't have server OS drivers)

    4. Re:WinXP x64 by bonehead · · Score: 1

      I did some contract work for a large international company about a year ago. For reasons I still cannot fathom, the company was still standardized on WinXP. (If things have gone according to plan, they have upgraded to Win7 by now).

      One of the tasks I was assigned was setting up workstations for some engineers who were feeling the pain of the 3.5 GB memory limit, and getting all their software to run.

      So here we are, 9 or so years ago since the release of XP64, and even with all updates applied it was still not a usable OS. Never mind the challenges of getting legacy 32-bit stuff to run, I got that handled. The deal breaker was that the only available driver that would even work with the very common nic couldn't seem to get it to run over 1 mbit.... Also, IE (a corporate requirement) really loved to crash for no discernable reason. In the end, it turned out that the only way to even come close to a usable system was to run 32-bit software, and it had to be set to "Run as Administrator". (Which, of course, defeats the entire purpose of running a 64-bit OS)

      Maybe there's some magician out there that could have got it running well. I'm primarily a Linux guy, the last time I qualified as a Windows guru we still had "Program Manager" and hadn't yet heard of the Start button. But I do know a few things, including how to make use of Google, and I did a LOT of research on how to solve these nagging, idiotic problems. The conclusion I was left with is that 64-bit XP is simply an unfinished product. They slapped it together in a rush, and moved on to working on Vista before they could be troubled with fixing any of the broken shit.

      That's my story and I'm sticking to it.

    5. Re:WinXP x64 by bonehead · · Score: 1

      Oh, yeah, that driver that managed to get a gigabit nic to run at a megabit, that was a Vista driver which explicitly did NOT support XP. And out of the 30 or so I downloaded and tried, it was the only one that gave even a sniff of network connectivity....

    6. Re:WinXP x64 by Shirley+Marquez · · Score: 1

      64-bit XP never got any acceptance in consumer PCs and rightly so, because the drivers needed for many consumer devices were never written. It wasn't just sound cards as another comment implied, it was also all the other stuff that people connect to their PCs. Good luck getting that cheap printer or scanner, TV tuner, etc. to work on the 64-bit version.

      64-bit XP did get used in engineering settings. At the time I told people to avoid it unless they had a specific need for its support for large applications.

      The story changed with Vista. You needed new drivers anyway and most devices got both 32 and 64-bit driver support. (The fact that you had to offer both to get permission to use the "Designed for Windows" logo didn't hurt.) But the 64-bit version didn't really go mainstream until Windows 7 came along; by then ordinary desktop users were buying PCs with enough memory to need it.

  22. Next up, IPv6! by Anonymous Coward · · Score: 0

    I can't wait for the IPv6 version of this article in 2060.

  23. Re:Did it really work? by Anonymous Coward · · Score: 1

    Most programs still don't need to work with numbers larger than 4 billion on a regular basis, so native 32-bit ints are just as fast as native 64-bit ones.
    Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.

    Most programs don't need more than one floating point pipeline.
    Most programs don't need lots of cache.

    The pros outweigh the cons.

  24. Re:Did it really work? by loufoque · · Score: 1

    Not if by node you mean NUMA node.

  25. Re:Did it really work? by Anonymous Coward · · Score: 0

    Bitch, the cluster I work with has over 700G RAM in each node.

    Sure, but can you handle that much RAM or is it mostly hanging idle.

  26. Re:Did it really work? by sribe · · Score: 4, Funny

    do you? for average PC applications (browsing the web, e-mail, office documents) 64 bit gives no advantage. for the above-average applications (multimedia creation/editing, CADD, running multiple VMs, ) it's very helpful.

    1) Yes, I do.

    2) You are so wrong that it's actually funny.

  27. Re:Did it really work? by Anonymous Coward · · Score: 0

    Like other commenters, I disagree with you for the most part, but I have to add this:

    If it's such a success, why do so many people still run 32-bit OS's, even on 64-bit CPU's? Why isn't there a 64-bit version of Chrome? Why is there even a 32-bit version of anything?

  28. Re:Did it really work? by Anonymous Coward · · Score: 0

    That makes no sense as you application "should" be mostly I/O bound.
    How did you test this speed difference?

  29. Re:Slashdot refuses to respond to abuse... apk by Anonymous Coward · · Score: 0

    Hello roman_mir.

  30. Re:Did it really work? by Anonymous Coward · · Score: 0

    64-bit binaries are larger and might run 103% at the speed of 32-bit if you're lucky.

    Maybe there is a lot of software written in C that uses int or unsigned when it should have typedef'd a size appropriate for its needs. Programmers need to be mindful of when scaling to the machine is important, and when narrowing to the application is important. I pity the fool who uses an array of unsigned to store values that will never exceed 255.

  31. It should not have been called XP... by Anonymous Coward · · Score: 2, Interesting

    XP x64, Microsofts ginger step-son of an OS. Ignored and dropped like a hot potato as soon as they could.

    You couldn't get drivers for half the stuff, even MS didn't provide their own software and lots of 'free for home, pay for commercial' stuff would detect it as 2003 Server and refuse to run/install.

    Somewhat of a shame really as it wasn't a bad OS.

  32. An Extra Bit of Register by Relic+of+the+Future · · Score: 5, Insightful
    When AMD gave a presentation to my processor design course (not coincidentally about 10 years ago) one of the presenters said that one of the most surprising speed-ups for 64-bit code came from just having 16 real general purpose registers to work with. Even though register renaming lets you smooth over them, it meant all those extra load and store ops (that RR would identify as waste and work around) now didn't need to be in the code at all. It turned out to be rather non-trivial for one of their test apps.

    So those 32 extra bits of memory addressing are nice. But don't forget about that 1 extra bit for identifying registers!

    --
    Those who fail to understand communication protocols, are doomed to repeat them over port 80.
    1. Re:An Extra Bit of Register by Darinbob · · Score: 4, Informative

      And this is something people who've worked on RISC chips have known for ages. The x86 system architecture is essentially stuck in the early 80s. The 386 was just a simple extension on top of 286 model, nothing really fundamentally changed, you still had limited number of registers each with at least one specialized purpose. Maybe MMX and similar stuff fixed that but you couldn't rely on everyone's PC to have the instruction set you compiled it for.

      Intel was stuck supporting a very popular CPU with an instruction set that they knew was outdated, and they even tried having replacements for it that failed to gain acceptance. The reason this Opteron caught on was because it was backwards compatible with x86, not because it was the first thing to try to break out of the mold. And 386 was designed to be compatible with 286, which was designed to be compatible wiht 8086, which was designed to be compatible with 8085, which is compatible with 8080, which is compatible with 8008, which is compatible with 4004, which was the first commercially available microprocessor... (and all of those retain the original accumulator A register)

    2. Re:An Extra Bit of Register by UnknownSoldier · · Score: 1

      > one of the presenters said that one of the most surprising speed-ups for 64-bit code came from just having 16 real general purpose registers to work with.

      Yeah, this has been known for ages. The technical term is called "register spill"* in compiler land.

      * See: http://en.wikipedia.org/wiki/Register_allocation

      i.e. A compiler tries to optimize register usage by trying to reuse temporaries and minimize load/stores since memory is extremely SLOW compared to registers/L1/L2/L3.

      Here's a practical example. Let's say your entire code base at run time has a maximum function depth of around 8 calls. If the average number of registers used (passed in, passed out) is 3, a CPU with 32 registers might be able to have the compiler pass all function arguments in registers instead of the slow stack. If your CPU only has 8 registers you will be spilling the majority of time seriously impeding performance.

      Other optimizations involve using huffman encoding for the mnemonics. i.e. You can sort ALL the assembly instructions based on usage and THEN assign opcodes THAT value. Related is 'code density'. Initially 64-bit code wasted vastly more memory.

      Efïcient Code Density Through Look-up Table Compression
      http://www.date-conference.com/proceedings/PAPERS/2007/DATE07/PDFFILES/05.7_2.PDF

    3. Re:An Extra Bit of Register by Anonymous Coward · · Score: 0

      Of course, with more registers comes more interconnects to move stuff around the chip. Adding 8 more GP registers is wise, possibly another 16 in a future revamp, but diminishing returns kicks in pretty hard at that point, and there are better ways forward than adding more registers.

    4. Re:An Extra Bit of Register by mickwd · · Score: 1

      I'm very surprised someone from AMD would say this, given that they used to produce the AMD29000, which used to be rather popular in some niche areas. This used register windows, with 192 registers in total. Nice chip, back in the day.

      The Wikipedia article also says that parts of the 29050 design were used as the basis for the K5 x86-compatible chips.

    5. Re:An Extra Bit of Register by Anonymous Coward · · Score: 0

      It should be noted that Intel's CPUs became a whole bunch faster even in 32-bit mode with the Core2 generation due to a thing called load/store reordering. That's a thing that lets the CPU re-order load instructions before potentially-conflicting stores, and restart them in store order if a conflict is found. It takes care of most of the x86 stack spills, and (somewhat more importantly for mainstream application code) makes the store chain in a typical x86 ABI's function call sequence almost a non-issue.

      Of course it's still substantially better to have 8 more registers and an ABI where parameters are passed in registers, but Intel did make up for a lot of Netburst's braindamagedness right there.

      AMD's chips didn't gain load/store reordering until the Phenom generation, with the first native quad-core processors; another AMD first.

    6. Re:An Extra Bit of Register by nogginthenog · · Score: 1

      It's not only RISC chips, even the humble 68000 had 8 x 32bit general purpose data registers and 8 x 32bit address registers (although 1 was the stack pointer).

    7. Re:An Extra Bit of Register by AmiMoJo · · Score: 1

      Itanium failed because it was crap, not because of compatibility. The whole architecture was designed not to do all the clever out of order execution, branch prediction and other performance enhancing stuff that x86 chips were doing. Instead the hoped that the compiler would be able to understand the architecture and optimize for it, making such things unnecessary.

      Unfortunately for Intel the compiler was never up to scratch and performance was held back by the fact that the only silicone level optimization they could do was bumping the clock speed. The architecture had to be more or less fixed so that it didn't break the compiler's optimization code.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    8. Re:An Extra Bit of Register by arth1 · · Score: 1

      Here's a practical example. Let's say your entire code base at run time has a maximum function depth of around 8 calls. If the average number of registers used (passed in, passed out) is 3, a CPU with 32 registers might be able to have the compiler pass all function arguments in registers instead of the slow stack. If your CPU only has 8 registers you will be spilling the majority of time seriously impeding performance.

      This is not universally true - if using too many registers, the registers have to be saved and restore, both at function call time, and when context switching. This can be expensive too.

      Moving the code around so it can use fewer registers by being able to re-use them is often a good low-level optimization, even if you have registers you haven't used yet. Even using swaps to use the upper/lower parts of a register separately might be a gain compared to saving/restoring to memory to be able to use another register.

      And the language using a sensible ratio of scratch/return registers to saved registers also helps. Few function calls have more than 4 parameters, so more than 4 scratch registers is likely not valuable, and 2-3 might be optimal for most programs.

    9. Re:An Extra Bit of Register by Anonymous Coward · · Score: 0

      Well duh. Contrary to popular belief, even L1 cache access isn't free. If you e.g. keep u/v coordinates in stack and then make a texture fetch, the processor cannot treat those as independent accesses like it can with registers.

    10. Re:An Extra Bit of Register by Anonymous Coward · · Score: 0

      > Yeah, this has been known for ages. The technical term is called "register spill"* in compiler land.

      And the microarchitecture answer to spills is called "register renaming", not more identifiable registers.

      Even a boring PIII has something like 40 registers, a P4 has 128. The harder does behind the scenes swapping to defer stores and eliminate loads and to prevent register contention during speculative execution. So it's at least _somewhat_ surprising that the additional registers make as much difference as they do.

    11. Re:An Extra Bit of Register by operagost · · Score: 1

      They added hyperthreading, integrated memory controllers, and larger caches. And the architecture doesn't need to be "fixed" any more than x86-64 needs to be fixed, because compilers are software! The software is coded to support newer processors. If anything, IA64 is more versatile, since the compiler does more and the CPU has to do less.

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    12. Re:An Extra Bit of Register by NJRoadfan · · Score: 1

      I actually have a device that uses the 29000, a Laserwriter upgrade board. Runs PhoenixPage (postscript level 2 clone). Most early RISC chips found a home in printers instead of general-use computers.

    13. Re:An Extra Bit of Register by Anonymous Coward · · Score: 0

      Surprising, really? Even smartest microarchitecture can't just optimize out all the memory accesses generated by spills.

    14. Re:An Extra Bit of Register by unixisc · · Score: 1

      Intel also tried out the i960 and the i860 before they teamed up w/ HP on the Itanium

    15. Re:An Extra Bit of Register by Darinbob · · Score: 1

      Not just Itanium, Intel has had other alternative architectures through the years. Such as i860.

    16. Re:An Extra Bit of Register by Anonymous Coward · · Score: 0

      And this is something people who've worked on RISC chips have known for ages. The x86 system architecture is essentially stuck in the early 80s.

      No, it's not. You don't get to talk about one of the key ways in which it's not stuck and then turn around and say it's stuck! That's a complete logic fail.

      The 386 was just a simple extension on top of 286 model, nothing really fundamentally changed, you still had limited number of registers each with at least one specialized purpose.

      Complete bullshit. The 386 made huge changes to the 286 machine model. I'm not even an x86 asm programmer and I know that.

      Maybe MMX and similar stuff fixed that but you couldn't rely on everyone's PC to have the instruction set you compiled it for.

      Wuh?! MMX added vector instructions, and reused the x87 FP register stack rather than adding new registers, i.e. it didn't change x86 registers at all really. And there is no possible way to extend an ISA without encountering that problem. And nobody cares about that particular one anymore because in 2013 nobody uses pre-MMX CPUs in personal computers. In fact, MMX itself is rapidly becoming obsolete and should soon begin to disappear. (It's been replaced by SSE1+SSE2, particularly for x86-64 code.)

      And 386 was designed to be compatible with 286, which was designed to be compatible wiht 8086, which was designed to be compatible with 8085, which is compatible with 8080, which is compatible with 8008, which is compatible with 4004, which was the first commercially available microprocessor... (and all of those retain the original accumulator A register)

      The 8086 is not, in fact, "compatible with 8085". Or 8080. Or 8008. Or 4004. The one shred of truth in that idea is that the 8086 instruction set was designed to make mechanical source code translation from 8080 to 8086 easy. But it's not binary compatible with the 8085 or 8080, for reasons including "it wasn't actually an accumulator machine anymore".

      It's especially absurd to mention the 4004. The 4004 was co-designed with the Japanese company Busicom for calculators, hence the 4-bit BCD-oriented architecture. (1970s calculator chips tended to do decimal math in BCD -- binary/decimal conversion is hard and binary math can cause issues for humans expecting rounding to behave like base-10 rounding.) The 8008 was co-designed with a different business partner which wanted to make a terminal; hence it's an 8-bit machine suitable for processing character streams. A key thing to note: the 8008 instruction set was mostly designed by Intel's partner, though Intel ended up with the rights. As far as I know there is very little resemblance between 4004 and 8008 code.

      It wasn't until the 80186 that Intel began doing true (that is, binary) compatibility.

      (Yes, there was an 8086/8088 successor before the 286! Not well known today because it was essentially an 8086 with some integrated peripherals, and it was never used in PCs. The instruction set was compatible, but the peripherals weren't IBM PC compatible.)

    17. Re:An Extra Bit of Register by Anonymous Coward · · Score: 0

      And the microarchitecture answer to spills is called "register renaming", not more identifiable registers.

      Er, no it's not. Renaming is a dynamic transformation of the instruction stream, eliminating certain hazards by reassigning register names. But it can only change register names, not fundamentally change the instructions. Spills are part of the instruction stream the processor sees, store and load pairs (or stack push/pop pairs) inserted by compilers (or assembly programmers for that matter) when they run out of named (architectural) registers. They're real memory references, and the processor must execute them for correctness' sake. The semantic information which might let the CPU completely eliminate them is long gone.

      The reason for having tons more rename registers is more about write-after-read hazards. Say a loop contains something like this in its body:

      mul r1, r2
      load r2, (someAddress) // done with old value of r2, get the next

      Without renames, assuming a fairly deep pipeline (which is always the case these days), the processor must stall as soon as it attempts to dispatch the load instruction in the second loop iteration. It has to wait for the previous iteration's multiply to finish, or at least to get past the operand read stage. With renaming, the number of times the loop can be unrolled internal to the processor is limited by how many renames are available. Each iteration will need new names for lots of registers, such as r2 in my example.

      Modern x86 CPUs support somewhere around 100 instructions in flight, so they potentially need a lot of renames to avoid ever stalling because all the rename registers are in use.

      So it's at least _somewhat_ surprising that the additional registers make as much difference as they do.

      No, not really. Spills can be accelerated with special logic to optimize stack operations (many x86 processors have this), but it's even better to eliminate them altogether by having more registers for the compiler to work with.

    18. Re:An Extra Bit of Register by Anonymous Coward · · Score: 0

      AMD's chips didn't gain load/store reordering until the Phenom generation, with the first native quad-core processors; another AMD first.

      It's really, really silly to talk about that as an "AMD first". Phenom was a 65nm quad core that shipped in 2007. Just to pick one example of prior art off the top of my head, Sun's UltraSPARC T1 shipped in 2005. "Native" octal-core in a 90nm TI process. I'm sure there's plenty more examples. x86 is not the entire universe...

      AMD was merely the first to ship a quadcore single die x86 -- and that should be regarded as a choice about what to build in a very complicated engineering tradeoff landscape, not a technical feat. (From the chip design point of view, it's actually harder to design and validate and manufacture a multi-die CPU, not easier!)

      As events proved, in that generation Intel's investment in 2-die MCM tech proved to be the better approach for building 4-core 65nm CPUs than AMD's 1-die decision. The performance tradeoff wasn't too bad, and it was much more economical for Intel to make quadcore CPUs. (Huge dies are a big problem for chip manufacturing economics, and the original Phenom was quite large for a consumer chip.)

    19. Re:An Extra Bit of Register by UnknownSoldier · · Score: 1

      Yeah the example has been greatly simplified as I didn't discuss locality (Basic Blocks), nor compiler tricks such as the one you mention stuffing values in to "partial" registers, nor compiler-assisted register renaming (hints the compiler can give to the CPU), thread optimization, etc. as they are extremely hardware, compiler, and application specific. It is amazing how exceedingly complex this gets so fast.

      I really wish we could have cheap CPUs with 256 x 64-bit accessible registers, where

      * register 0 has been hardwired to 0x00000000_00000000 = zero
      * register 1 has been hardwired to 0x00000000_00000001 = one
      * register 2 has been hardwired to 0x00000000_80000000 = signed 32-bit mask
      * register 3 has been hardwired to 0x80000000_00000000 = signed 64-bit mask
      * register 253 has been hardwired to 0x7F7F7F7F_7F7F7F7F
      * register 254 has been hardwired to 0x80808080_80808080
      * register 255 has been hardwired to 0xFFFFFFFF_FFFFFFFF = negative 1 = not zero
      * and every 4 registers could be aliased in vector form: i.e. V0 refers to < R0, R1, R2, R3 > and V1 = < R4, R5, R6, R7 > , etc.

      Sadly it looks likes x86_64 / AMD64 is here to stay.

    20. Re:An Extra Bit of Register by Anonymous Coward · · Score: 0

      There are a lot of problems with your idea. The core of it isn't super bad -- there are many RISC CPUs which dedicate one register to always read as the constant zero (usually r0, of course). But it's not that great an idea to have 256 registers (even if seven are hardwired to constants). That has two very unpleasant consequences: register specifier fields in instructions must be 8 bits wide, and the register file has to be really large. There's also problems with your constant selections.

      Large specifiers are bad because instructions need to be packed as tightly as possible. Classic 32-register RISCs with fixed size 32-bit instruction words tend to have just barely enough space in their instruction formats for three register specifiers, which take up 15 bits total. Going to 256 registers would blow that up to 24 bits, which doesn't leave a lot of space left over in the instruction for other things (opcodes, etc.). The instruction word will have to grow beyond 32 bits, which is pretty ugly for code density.

      The next problem is a side effect of making the register file large. The universal rule is that larger memories are slower. Either they take more cycles to access, or you have to lengthen the cycle time (slow down the clock). That applies to caches: L1 caches are small because they have to be relatively fast. But the register file is also a form of memory, just one addressed by the register specifier fields in instructions rather than memory addresses. And it's vital for it to have single cycle access times. It also needs to support multiple single-cycle accesses simultaneously, which makes things even harder (more access ports tends to slow a memory down).

      Finally... constants are actually a well-studied problem in computer architecture. It's beneficial to include special features to help represent them in the instruction set, but there are good and bad ways of doing that. Zero is by far the most frequently used constant, but after zero it quickly begins to be a questionable thing to devote precious register addresses to constants. Most of the focus has been on how to represent constants in immediate fields in instructions.

      The typical result of such studies is that the most frequently used constants are 0, small powers of 2 up to 2^4 or 2^5 or so, and a few other small integers. If you cover these, you've got probably 95% or more of the constants used in typical programs. It probably doesn't make sense to worry about representing 64-bit long values, other than ones which can be sign-extended from small values. It may look cool to you, but it's really important to be sure you're optimizing the right things. The resources here are limited, burning them on seldom-used features (or constants) is not a good idea.

    21. Re:An Extra Bit of Register by bonehead · · Score: 1

      Actually, I had a PC running an 80186.

      OK, true enough, as far as I know there were never any 186 PC motherboards made, but there was an upgrade card. The card had the CPU and some supporting electronics, plugged into an ISA slot and then had an adapter on a ribbon cable that plugged into the original CPU socket in place of the 8086 CPU.

      There was a slight performance improvement, but not much. But I got the card secondhand for cheap, so wasn't expecting much anyway.

    22. Re:An Extra Bit of Register by UnknownSoldier · · Score: 1

      > for three register specifiers, which take up 15 bits total. Going to 256 registers would blow that up to 24 bits
      I appreciate your feedback.

      > And it's vital for it to have single cycle access times.
      I'm not sure why you are asserting that MUST be single cycle? CPUs in the 80's were not single cycle. Even in today's CPUs there are many instructions that operate on registers that take multiple clock cycles. Would it be nice? Hell yeah. But is it mandatory? Since existing CPUs don't "break" I must conclude "no".

      Your analysis is good however there are other options that still may make this feasible:

      1. CPUs can be designed for specifying 1, 2, or 3 registers. I wasn't saying one was forced to use a tri-opcode format. But it certainly is _convenient_. :-)

      2. Additionally there is a way to specify 3 registers, keep 32-bit code density and still gain 31-bit precision for "Load Immediate". :-)

      Reserve the high bit of each opcode to mean:
      0 = Immediate
      1 = Opcodes

      Anytime the Fetch & Decode sees a immediate value it knows that it is supposed to be paired with the previous opcode (or vice versa.) The point is, one doesn't need to stall the pipeline simply because you interleave 32-bit and "effective" 64-bit opcodes.

      i.e. Given the pseudo-code:
            r3 = r1 / r2;
            r4 = r1 % r4

      The data stream could be encoded as:

      1...xxxx 32-bit opcode LOAD R1
      0...abcd 32-bit immediate r1
      1...yyyy 32-bit opcode LOAD R2
      0...bbbb 32-bit immediate r2
      1...zzzz 32-bit opcode R3 = R1 / R2
      1...zzzz 32-bit opcode R4 = R1 % R3

      > constants are actually a well-studied problem in computer architecture. It's beneficial to include special features to help represent them in the instruction set, but there are good and bad ways of doing that

      Actually I'd be very interested in that! Have any links or specific terms to search for?

      Thanks for the interesting feedback AC

  33. Re:Did it really work? by Anonymous Coward · · Score: 0

    The speed improvement depends entirely on whether or not you are taking advantage of the features provided.
    Remember that not all things in your address space are "memory".

    Games that need to access obscene amounts of data quickly and randomly, for example, are able to map gigs of data into their address space and let the OS deal with caching. This in particular has led to massive speedups for me personally in software I develop.

    More address space == better, for many reasons.
    32-bit address space, and weak PAE workarounds were just a bigger form of the i386 (real-mode) days with EMS extensions. Crap.

  34. Re:Slashdot refuses to respond to abuse... apk by Anonymous Coward · · Score: 0

    I respectfully submit ShanghaiBill for your consideration.

  35. with 32 bit on some system you get like 2.5-3.7gb by Joe_Dragon · · Score: 1

    with 32 bit on some system you get like 2.5-3.7gb useable ram. and yes video ram eats from the 4gb pool.

  36. Re:Did it really work? by Anonymous Coward · · Score: 0

    can you handle that much RAM or is it mostly hanging idle

    God made hand size proportional to RAM for a reason.

  37. Because it's coded/compiled crappily by Anonymous Coward · · Score: 1

    A 32-bit x86 app has access to 8 32-bit "general purpose" registers - they ain't really all general purpose because three of them are the stack pointer, frame pointer, and program counter.

    A 64-bit x86 app has access to 16 64-bit "general purpose" registers. Optimize away the use of the frame pointer (if you can), and your app goes from 5 32-bit registers to 14 64-bit registers.

    Of course, when you wrote your app you didn't do stupid brain-dead shit like "gee, size_t is really an unsigned int, so I'll use that to hold this pointer value", now did you?

    1. Re:Because it's coded/compiled crappily by Guy+Harris · · Score: 1

      A 32-bit x86 app has access to 8 32-bit "general purpose" registers - they ain't really all general purpose because three of them are the stack pointer, frame pointer, and program counter.

      You appear to have confused x86 with the PDP-11; the program counter ("instruction pointer" in x86land) is not one of the GPRs.

      As for the frame pointer, GCC, for example, has a -fomit-frame-pointer flag that generates code that doesn't use EBP as a frame pointer, so it's available as a GPR. That might make debugging more difficult. If you're not just overlapping the data and stack segments, references through EBP implicitly go to the stack segment, so you'd have to use a segment-override prefix if it has a pointer to a location in a segment other than the stack segment, but if you're just overlapping them, that's not an issue. If you're using the ENTER or LEAVE instructions, EBP is a stack pointer; I don't know whether any current compilers bother with them.

  38. x32 ABI by Chirs · · Score: 5, Informative

    And for those that want the best of both worlds, there is the x32 ABI, which uses all the good stuff from x86-64 (more registers, better floating-point performance, faster position-independent code shared libraries, function parameters passed via registers, faster syscall instruction... ) while using 32-bit pointers and thus avoiding the overhead of 64-bit pointers.

    They're working on porting Linux to the new ABI...kernel and compiler support is there, not sure about all the userspace stuff.

    1. Re:x32 ABI by w_dragon · · Score: 1

      You mean all the good stuff except the ability to access more than 4GB of RAM.

    2. Re:x32 ABI by KiloByte · · Score: 3, Informative

      kernel and compiler support is there, not sure about all the userspace stuff.

      Just debootstrap it from Daniel Schepler's repository. Most of the work has since moved to official second-class repositories (AKA debian-ports), but because of the freeze, you want both, So after debootstrapping, echo "deb http://ftp.debian-ports.org/debian unstable main" >>/etc/apt/sources.list and you're set.

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    3. Re:x32 ABI by KiloByte · · Score: 2

      except the ability to access more than 4GB of RAM

      3GB typically. That limit applies only per process, and it's pretty rare for a typical user to have a single process that big.

      Then, you have netbooks and/or vserver hosting where the entire [virtual] machine doesn't have that much physical memory.

      x32 is also noticeably faster: over i386 for anything that wants registers, over amd64 for anything with more pointers than CPU's cache. Benchmarks vary wildly, but figures around 7% faster than amd64 are typical.

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    4. Re:x32 ABI by petermgreen · · Score: 1

      3GB typically.

      AIUI an x86 process running on an x64 linux kernel gets damn near 4GB of usable virtual address space. I presume the same applies to x32 processes running on that same kernel.

      On a 32-bit kernel as you say 3GB is typical. There were "4G/4G" patches at one stage to increase this but afaict they never made it into mainline.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    5. Re:x32 ABI by BitZtream · · Score: 1

      A 32 bit app can't possibly have access to a full 4GB of ram. Doing that prevents you from having any way to interface and pass data two and from the kernel. That 1GB of RAM at the top of the address space was where your kernel pretended to sit, so apps could talk back to the kernel and read data from the kernel.

      Unless you remove the kernel interface, you can't remove the address ranges used by the kernel.

      You can make them smaller, but there are limits, certainly can't go below page sizes.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    6. Re:x32 ABI by Anonymous Coward · · Score: 0

      3GB typically. That limit applies only per process, and it's pretty rare for a typical user to have a single process that big.

      It depends on how you think of users. If you only think about desktops, then sure it's rare. If you consider each customer of Amazon.com to be a user, then it's pretty common. It was one of the reasons for the move to a service-oriented architecture and eventually Obidos deprecation.

    7. Re:x32 ABI by Anonymous Coward · · Score: 1

      That's not true. x86 supports call gates that can mode switch. The interrupt to notify the kernel of the syscall switches address spaces. Some parameters fit in registers. The others must be temporarily mapped and copied out. This is all done via the GDT.

  39. Re:Did it really work? by Anonymous Coward · · Score: 0

    If it's such a success, why does 64-bit software generally only run marginally faster than its 32-bit build? 64-bit binaries are larger and might run 103% at the speed of 32-bit if you're lucky.

    Sure, it helps with the 4GB memory space limit, but so can smart memory management and other approaches.

    I could see it being useful for super-computing things, but in general, there still just doesn't seem to be a point.

    Wow, just wow. Do you actually work in the software field???

    methinks he works in the middle school education field, and not as an instructor

  40. Re:with 32 bit on some system you get like 2.5-3.7 by Anonymous Coward · · Score: 1

    with 32 bit on some system you get like 2.5-3.7gb useable ram. and yes video ram eats from the 4gb pool.

    I'm not sure if you're agreeing with the parent or disagreeing. Per-process you get less than 2 GB, for the system as a whole you get somewhat less than 4GB (depending on how much the system has mapped to something else). PAE can hack around the 4GB system limit somewhat.

  41. Re:Slashdot refuses to respond to abuse... apk by Anonymous Coward · · Score: 1

    He IS too stupid to be real.

  42. Re:640k.... by Anonymous Coward · · Score: 2

    Uh, no. He never said anything like that. But hey, don't let the facts stop you... just keep repeating that retarded meme.

  43. Re:Did it really work? by MightyYar · · Score: 2

    The program is written in C#. Only MS knows what is going on there.

    --
    W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
  44. Re:Did it really work? by vswee · · Score: 2

    I just want all of you to know that you're making this thread very hard for me to score cause I'm not sure who's exactly right and who isn't. that is all. you may carry on now.

  45. Re:Did it really work? by gatkinso · · Score: 1

    In my day, a Beowulf cluster had 128MB ram per node.

    Uphill. Both ways. In the snow.

    --
    I am very small, utmostly microscopic.
  46. Re:Did it really work? by the+eric+conspiracy · · Score: 1

    Snow?

    My first computer required that you toggle in the boot loader binary code from front panel switches!

    That has to be the modern equivalent of hand crank started horseless carriages.

  47. Not just for the extra memory. by Ecuador · · Score: 4, Interesting

    In our algorithms lab there were programs that would gain more than 2x when compiled for 64 bit.
    A more "real-world" example is when I started in 2005 at my current company. The engineers had 6-month old P4s @ 3.2 or 3.4GHz, running 32bit linux. For a project they used VisualStudio on VMWare and it took over a minute to compile the project. The company allowed engineers to choose their hardware, so I built an Athlon 64 @ 2.2 or 2.4GHz and I had it run 64bit SuSE. I remember the shock and awe from the first time I tried to compile the project under VMWare - a little more than 10 secs - the engineer next to me had his jaw drop. Of course most of the engineers immediately requested to switch to 64bit machines. I am not sure why it made such a difference in that application - perhaps the 16 general purpose registers come in really handy in this scenario? Of course it didn't help that the P4 was slower in everything (funny how at the time very few reviews really clarified this), but not order of magnitude slower...

    --
    Violence is the last refuge of the incompetent. Polar Scope Align for iOS
    1. Re:Not just for the extra memory. by mewsenews · · Score: 1

      This may not have been a 32/64 bit difference but VMWare taking advantage of AMD-V extensions in the athlon 64 processor:

      http://en.wikipedia.org/wiki/X86_virtualization

    2. Re:Not just for the extra memory. by Ecuador · · Score: 1

      This was in 2005, a year before the first AMD-V CPU.

      --
      Violence is the last refuge of the incompetent. Polar Scope Align for iOS
    3. Re:Not just for the extra memory. by Dr_Barnowl · · Score: 1

      It might also be because of L2 cache sizes, or bus speeds.

      P4 Northwood had an L2 cache of 512Kb

      Athlon 64 had an L2 cache of 1MB

      Most of the text-processing jobs I run (XML, XSLT, HTML Tidy, Regex) get a really big boost out of having a larger cache. The jump in performance from a Core 2 Duo to a 2 core i5 is very noticeable, for parts that run at similar clock speeds.

    4. Re:Not just for the extra memory. by Anonymous Coward · · Score: 0

      Probably not, given that such CPUs were released a year after the time stated in the post.

    5. Re:Not just for the extra memory. by wagnerrp · · Score: 1

      Or the fact that those high clockrate Prescotts (I don't think they ever made a 3.4GHz Northwood) had shit for branch prediction on their extremely long pipeline, and stalls were merciless.

    6. Re:Not just for the extra memory. by Anonymous Coward · · Score: 0

      AMD-V support was introduced in socket AM2 processors, which was released in mid-2006, so if his time frame is correct it would not have been virtualization extensions providing the additional performance.

    7. Re:Not just for the extra memory. by Anonymous Coward · · Score: 0

      The P4's had some _very_ slow paths. Virtualisation at the time definitely tickled that.

      Heck, even syscalls and locks are slow on the P4.

    8. Re:Not just for the extra memory. by eabrek · · Score: 1

      3.4 was the last one.

  48. Re:Did it really work? by Osgeld · · Score: 1

    I never knew it was suposta be faster

  49. Yes by Frankie70 · · Score: 1

    That's why I switched to using 1 bit microprocessors. My programs are really small now. I just wrote a database which I can fit in my pocket.

  50. Re:Did it really work? by tyrione · · Score: 1

    do you? for average PC applications (browsing the web, e-mail, office documents) 64 bit gives no advantage. for the above-average applications (multimedia creation/editing, CADD, running multiple VMs, ) it's very helpful.

    On Debian Linux and I can peg with Flash a stupid Zynga game running past 3GB of RAM. For Multimedia Creation/Editing you bet your sweet ass 64 Bits matters. Then again Linux doesn't have shit like GCD and quality OpenCL built in the OS with app suites that can leverage both and welcome 32/64 GB of RAM with open arms. Quality drivers, quality OpenCL/OpenGL etc., are coming with all the hard work at LLVM/Clang, Mesa and more. When that shit lands you better believe 64 bit matters and any heavy engineering/scientific computing, to Blender Modeling/Rendering damn well loves it. So does GIMP.

  51. Re:Slashdot refuses to respond to abuse... apk by Anonymous Coward · · Score: 0

    Kristopeit > ShanghaiBill

    The man created dozes upon dozens of accounts due to his numerous bannings, and to this day people still debate whether he was real or a script.

    Top. Shelf. Troll.

  52. Re:Did it really work? by Burning1 · · Score: 4, Interesting

    I think if you understand how truly horrifying PAE is, you would have no doubt at all that 64 bit platforms were the way to go. There's a lot of memory management cruft in the Linux kernel that x86_64 eliminates.

    x86_64 also slipped in a few much needed enhancements to the ia32 architecture, including some extra general purpose registers.

    http://en.wikipedia.org/wiki/X86-64

  53. Re:Did it really work? by h2oboi89 · · Score: 1

    I am constantly hitting the 3.3 MB RAM cap on my 32 bit machine at work just having the applications I need to do my job open at the same time. Combined with the fact that the hard drive is fully encrypted makes using it for swap space extremely expensive. I would kill someone for a 64 bit machine at times just for the increased RAM space.

  54. Re:Slashdot refuses to respond to abuse... apk by Anonymous Coward · · Score: 1

    Yeah, he's trolling in real life. This is the email I sent him to refute his garbage. No response. Imagine that.

    Are you really serious about having 650 thousand lines in your hosts file? I can't imagine why you'd need that many. It also has a crippling effect on one's computer.

    To test this, I created a sample copy of a hosts file with that many entries, using the "0" shorthand for IP address and a randomized hostname of average 32 characters. Total size of this file is 22855 kilobytes, and after an hour the DNS cache had only loaded a third of it in. This is primarily due to the choice of algorithm used by the DNS cache service - it wasn't designed for tens of thousands of hosts file entries to be stored, so uses a rather inefficient method of growing the space used to store that involves copying huge swathes of data around for each new entry. It also blocked any name lookups while loading the file.

    So instead of this, I tried with only 65k entries, and made three copies of this file. Each had an identical list of hostnames, but used "0", "0.0.0.0" and "127.0.0.1" respectively. The DNS cache now took 1 minute 55 seconds to load each one; the choice of IP address style didn't make any difference to the loading time as the bulk of the processing was in inserting new entries as described in the paragraph above. Name resolution was at normal speed after that, though. Searching in-cache - even for such a large set of data - added no discernible penalty.

    I decided to try with the DNS cache disabled. This isn't a good idea, as it forces uncached name resolution to be done for every single lookup. This is indeed what it did, and the original 650,000 entry hosts files added around 3 seconds onto every single name lookup, the amalgamated effect of which slowed general Internet access down considerably. Unlike the DNS cache loading, this time there was a slight difference in loading times between the different hosts files - this was expected, as it was reading the entire file each time so that became the bottleneck.

    Finally, to address your last question: every IPv4 address is sorted in the cache using the same size of four bytes. e.g. both "0" and "0.0.0.0" become 00 00 00 00, both "127.1" and "127.0.0.1" become 7F 00 00 01, and so on. This is consistent with the binary format used in the sockets API.

    In conclusion, using the hosts file to store tens of thousands of entries has a negative effect on the performance of Windows' name resolution. You should really consider another option to filter all those hostnames.

  55. Re:Yeah, I guess it's 10 years by Anonymous Coward · · Score: 0

    Ahem, there was no x86 in the title when it was posted.

  56. Re:Did it really work? by Blaskowicz · · Score: 1

    An awful lot of people run 10 year old computers, and also an awful lot of people run XP on computers that could handle 7 64bit or linux 64bits. So you'd better have a 32bit version of your program (Google Chrome, Google Earth, Firefox, whatever).
    Though, it ought to be easier to have a fully 64bit system (a linux distro without Wine might do it, if you're careful to not install 32bit software and if Chromium and/or Firefox are 64bit there. But the benefits is only not storing and running duplicate 32bit libraries)

  57. Re:Did it really work? by Anonymous Coward · · Score: 3, Informative

    Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.

    You're arguing on the correct side, but what you wrote here is badly flawed. Packing multiple 32-bit values into a 64-bit register is near worthless, what is valuable is amd64 gives you twice as many general-purpose registers (that also happen to be 64-bits wide). A far bigger gain for 64-bit on x86 was the addition of full relative addressing. Instead of 32-bit jumps always being to absolute addresses, in 64-bit mode software can do addressing relative to the program counter. This helps a great deal with libraries, since instead of needing large relocation tables, they simply use relative jumps that are valid no matter what address the library is loaded at. With most processors using 64-bit mode loses performance due to having to shuffle more data around, x86 is about the only one that gains performance.

  58. He forgot to use a hash table by raymorris · · Score: 2

    If the OP compares each file with every file, that would be CPU bound. With a well chosen hash table it shouldn't be.

  59. Re:Did it really work? by Anonymous Coward · · Score: 0

    If it's such a success, why does 64-bit software generally only run marginally faster than its 32-bit build?

    32? Heck, I'm still using 16. 32 and 64 are just a market gimmick to keep you buying new hardware.

  60. Re:Slashdot refuses to respond to abuse... apk by Anonymous Coward · · Score: 0

    so you have been properly trolled. i am appoint.

  61. Nobody's said 64 bit Linux 4 years before Windows? by raymorris · · Score: 4, Interesting

    Is this still Slashdot? Nobody mentioned that Linux supported x86 64 in 2001, before it was even released, while Windows was stuck at 32 bit for another four years.

  62. Re:Did it really work? by BitZtream · · Score: 4, Informative

    PAE is more or less old school segmentation. You can't say 'it has a 3% slow down' because it has 0 slowdown if that particular page is already in memory, and if not ... it has the same 'slowdown' as an other paging operation plus a fixed number of cycles. So if you're dealing with tiny amounts of 'more than 2/3gb' then the overhead is a lot higher than if you're mapping out 2GB on every window change. PAE is just another form of paging. It is slower, but you're making numbers up from nothingness.

    The interger math performance of the processer has nothing to do with it being 64 bit. Most (All now?) x86-64 processors internally will process 2 32 bit numbers in the same span as a 64 bit number if properly optimized by sending the 32 bit values through together. 64 bit code using less than the OS max for 32 bit code is actually slower than 32 bit code due to the increased pointer sizes wasting the processors registers filling them with 0s.

    You really have no idea how processors work. While nothing you said is illogical, it is still in fact wrong in every account. Under the hood, processors don't work anything like they do on the surface.

    Other processors also do other weird things. I have an 8 bit CPU that can handle 32 bit numbers in a single clock cycle, exactly like it does 8 bit numbers ... and the neat thing ... it can do 2 16 bit numbers in a single clock cycle! Why? Because the processor as I see it from a software developers perspective isn't anything like the actual hardware doing the work. Processors have translation units in front of them to provide you with one look while allowing themselves to rewire the backend in all sorts of different ways.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  63. Re:Did it really work? by Just+Some+Guy · · Score: 1

    Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.

    Perhaps not most, but a whole awful lot of programs want more than that. I'd say the mean for "large" apps on my laptop is 1.5GB, and the resident size distribution is (to my eye) more or less gaussian. That means that few apps want more than 2GB today, but if the average app grew by 33%, about half of them would be over the 31-bit size limit.

    --
    Dewey, what part of this looks like authorities should be involved?
  64. Re:Did it really work? by AbrasiveCat · · Score: 1

    Snow?

    My first computer required that you toggle in the boot loader binary code from front panel switches!

    That has to be the modern equivalent of hand crank started horseless carriages.

    Takes me back to loading those Interdata model 3s with the front buttons so we could load the paper tape. Then we could watch the registers with lights on the front as our code executed. Ah glad those days are over.

  65. Re:Did it really work? by BitZtream · · Score: 1

    Then you did something wrong.

    There is no logical reason that an x86-64 procressor in 64 bit mode would perform faster than 32 bit mode unless you are memory constrained. Raw operations are not inherently faster in 64 bit mode than they are in 32 bit mode.

    If you are not exceeding 32 bit memory limits, your 64 bit version SHOULD be a tiny little bit slower than the 32 bit version.

    Let me guess, you ran it in 32 bit mode, then ran it again immediately after in 64 bit mode ... and then ignored the disk cache completely?

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  66. Re:Did it really work? by BitZtream · · Score: 1

    Really? PAE is bad? Have you just learned to completely ignore segmentation unless its named PAE?

    Segmentation on x86 is utter tripe as well, but PAE is nothing but a spec on top of the other mess of bullshit known as segmentation.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  67. Re:Did it really work? by cheater512 · · Score: 2

    My first point was that PAE does have a overhead somewhat larger than the 3% the parent mentioned.
    And that overhead increases with the amount of ram you have. Sure 32gig of ram has very little overhead with PAE. That is of course unless you actually use the 32gig of ram and then it will be constantly swapping memory pages around.
    Yes I know most people don't use that much RAM. My point is still valid.

    Also my 2nd point was that 64bit processors handle big numbers faster, not small numbers slower.
    Yes different architectures can behave differently. We are talking about x86 though.
    Fact: 64bit x86 will process a 64bit number faster than 32bit x86.

  68. Re:Now if we could just get rid of IPv4 by TheGavster · · Score: 0

    Practically speaking, PAE (the NAT of memory addressing) is sufficient for the vast majority of users (and the specialized applications requiring single huge memory spaces are moving to specialized compute nodes). The one desktop use case where an application would require >4GB of memory was a browser with a ton of tabs, but browsers have started moving each tab into its own process and reaping a security gain to boot. I'd much rather the adoption been the other way around, with IPv6 becoming commonplace and x64 languishing. Processes on my computer should be using the OS's IPC architecture anyway: nodes on the Internet have bigger benefits from being full hosts.

    --
    "Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
  69. Re:Did it really work? by TheGavster · · Score: 1

    My experience with moving applications to 64-bit that didn't need the massive single memory space was that I started paging a lot more, since they were allocating words twice as wide (and while I could address every molecule in the computer separately, the same number of them were still memory). Physical memories have since expanded to compensate, but I'd like to see some statistics on the entropy of the upper 32 bits of the average QWORD.

    --
    "Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
  70. Re:Nobody's said 64 bit Linux 4 years before Windo by Anonymous Coward · · Score: 0

    Maybe no one felt like sucking the Linux dick today.

  71. Re:Did it really work? by Guy+Harris · · Score: 3, Informative

    but PAE is nothing but a spec on top of the other mess of bullshit known as segmentation.

    Actually, no, it's a mode that changes the page table format to allow larger physical addresses in page table entries. Nothing to do with segmentation.

  72. Re:Did it really work? by Alioth · · Score: 2

    Notwithstanding all of that, amd64 also has more registers, so there''s less having to move stuff to and from memory and you can make most function calls by passing parameters in registers instead of on the stack. amd64 provides a worthwhile increase in performance just due to having twice as many general purpose registers (actually, more than twice as many because there's only really 4 proper general purpose registers on 32 bit x86 - amd64 adds 8 more registers).

  73. Re:Did it really work? by Guy+Harris · · Score: 1

    64-bit binaries are larger and might run 103% at the speed of 32-bit if you're lucky.

    Maybe there is a lot of software written in C that uses int or unsigned when it should have typedef'd a size appropriate for its needs.

    Software that's written in C, in all of the environments I know of for x86, has 32-bit ints (signed or unsigned) whether compiled 32-bit or 64-bit, so you're presumably not saying that those programs suddenly get 64-bit ints when compiled 64-bit. They will get 64-bit longs on UN*X (but not on Windows), and will get 64-bit pointers in either case.

  74. Re:Did it really work? by Alioth · · Score: 4, Informative

    x64 has twice as many registers. That alone means less having to move stuff in and out of memory, so that will improve the speed when compared to 32 bit applications. 32 bit x86 has only 4 truly general purpose registers. x64 adds another 8 64 bit registers.

  75. Re:Did it really work? by Guy+Harris · · Score: 1

    PAE is more or less old school segmentation.

    PAE isn't segmentation at all. It's a mode that changes the page table entry format to support more physical memory. Maybe you're thinking of something else as being "PAE", but Intel's (and AMD's and...) idea of PAE is a Physical Address Extension.

  76. Re:Did it really work? by cusco · · Score: 1

    I frequently have to run a VM on my laptop. When I had 32-bit Win7 running with 4 gb of RAM it was painfully slow, 8-10 minutes between boot and getting a login. After reinstalling with 64-bit Win7, exact same hardware, same VM, boot time went down to 3-4 minutes. With 8 gb of RAM boot time was around 2 minutes, not even enough time to go get a cup of coffee.

    --
    "Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
  77. Re:Did it really work? by metrix007 · · Score: 2

    Why would a 64 bit program be slower when modern processes are optimized for 64bit programs?

    --
    If you ignore ACs because they are anonymous - you're an idiot.
  78. Re:Did it really work? by Anonymous Coward · · Score: 0

    Does Windows actually have any middle ground that would allow an application to directly do its own segmented memory management, kind of like 64k realmode on steroids? Say, asking Windows for 9 gigs... 1 handled normally, and used for the program's "normal" runtime objects, variables, data structures, etc... and 8 that are stacked into the same 1-gigabyte address space and virtually bank-switched upon request of the app itself? Or some smaller segment size that happens to better-fit the data being banked? Just to give one example, this would pretty much solve the "Photoshop Problem" (and most video-decompression GOP-buffering problems, too). You could even resurrect some tricks from the old realmode toolbox, like arranging the banks so you can sequentially fetch the layers from the same pointer offset, and just toggle the segment pointer until you're ready to move on to the next chunk of byteplanes.

    Complicated? Of course... but sometimes, you HAVE to touch the bare metal if you want to push the boundaries and redefine what a given piece of hardware can do ;-)

    http://www.youtube.com/watch?v=PRrXi411ESA

    http://www.youtube.com/watch?v=w6Ge6G9sT9E

    Or, put another way... when Future Crew created Second Reality ( http://www.youtube.com/watch?v=XtCW-axRJV8 ) ~20 years ago, it wasn't written in an object-oriented language that ran under a VM.

  79. On Slashdot? by raymorris · · Score: 1

    No Linux fanboys on Slashdot today? That's hard to believe. That's about as likely as not having anyone who just can't admit, after all these years of fail, that their "team", Microsoft, sucks horribly. So bad that not Only is Apple spanking them in sales, but a bunch a greasy, toenail-fungus-eating hippies programming in their spare time kicked the crap out of Microsoft for servers, embedded systems, and just about anything that's not attchached to a 19" CRT.

    1. Re:On Slashdot? by Anonymous Coward · · Score: 0
    2. Re:On Slashdot? by Anonymous Coward · · Score: 0

      But!, when you thing about it, laterally, is x64 really a good thing for society? I was an early adopter and had the Opteron but now that wea re no longer limited to 4GB of RAM, have we not just paved the way even more for big brother?

      With a 4GB limitation, the distributed super computing model would have had to thrive. Hadoop etc. instead of the rise of centralized, corruptible cloud poo.

  80. Has it been that long by Anonymous Coward · · Score: 0

    and yet, I only a hand full of 64 bit programs on this windows.

  81. Re:Did it really work? by lightknight · · Score: 1

    32 GB Ram High-Five! Seriously, anytime Asus is feeling poor, they can release a Crosshair motherboard that takes 64 GB or perhaps 128 GB of RAM.

    I am not through upgrading until I can virtualize the speed and location of every particle in the universe. Then I'm going to see what exactly this Time dimension actually looks like from a different angle. Maybe. I have a few other ideas, but I probably won't be allowed near a computer this powerful if I announce them all at once. =^_^=

    --
    I am John Hurt.
  82. Re:Did it really work? by lightknight · · Score: 1

    He's never written anything that's tested the limits of computing...

    Meanwhile, I need only load up my badly coded evolutionary program to see my machine scream at the ~12 GB hit to the RAM. I say badly coded because I have found a few tricks to help get some additional memory savings out of it...also on topic, the aggression level was kind of low, so I imagine future tests might break the 32 GB barrier easily. Currently thinking of giving it a SSD for virtual memory...

    --
    I am John Hurt.
  83. Re:Did it really work? by lightknight · · Score: 1

    Thank you. The people spouting nonsense about 32-bit programming, and how they can't understand why 64-bit computing would be faster (in the x86 world) drive me loony...it's like they missed an entire year's worth of classes where we went over, in detail, the various changes, and why it's faster...and they have the gall to ask for your notebook the night before the final. I mean, it's impressive, that kind of blindness, but they're aren't getting the notebook without a pimp slap to go with it (extra baby powder).

    It's kind of like watching the functional programming people slowly reinvent OOP...makes me scream inside. "Dude, we've figured out a new way to organize our methods / fields so that it's easier to keep them straight in our heads..." "Please God, let it not be OOP." "*talks for a bit*" "Damn it."

    --
    I am John Hurt.
  84. Re:Did it really work? by lightknight · · Score: 1

    Most programs don't need a GUI...but they tend to function better with one. Most computers don't need a SSD...but they tend to run faster with one, and users tend to agree that you can have your SSD back when you pry it from their cold dead fingers.

    You don't have to fly First Class, you're getting there at the same time as the people in Business or Economy class...but it's a lot nicer.

    --
    I am John Hurt.
  85. Re:Did it really work? by Anonymous Coward · · Score: 0

    I don't.

  86. Re:Did it really work? by zbobet2012 · · Score: 3, Insightful

    It sounds like you where just talking to a very bad functional programmer. You also have the order completely backwards. ANSI Common Lisp was the first standardized OO language. But more importantly most "OO" concepts come from functional languages to start with.

    Design patterns for the most part are actually adaptations of pre-existing functional concepts. For example Chain of Responsibility is really just a slightly simplified monad (input must equal output). The first Iterator pattern was (map fn list). Flyweight is a simplified form of Memoization.

    Packages and namespaces also first appeared in many functional languages first. Encapsulation vai lexical closures has been around since Scheme was invented in the 70's. Lambda functions? Those little gems, making there way into every OOP language where invented with lisp.

    You have missed the entire point though if you think OOP is about organizing you programs or something. OOP is largely about encapsulating moving parts into logical pieces. Functional code is largely about minimizing or removing "state" (aka moving parts) from your code. E.g. an input to a function should always give the same output. These concepts are not incompatible at all.

  87. Re:Did it really work? by Chrisq · · Score: 1

    PAE is more or less old school segmentation.

    PAE isn't segmentation at all.

    I suppose it depends on how you look at it. If you view the page directory as a bank select then it is a sort of segmentation.

  88. Re:Did it really work? by TheRaven64 · · Score: 1
    The 64-bit address space doesn't give much advantage in typical desktop applications. Even my web browser with a silly number of tabs open at the moment is using

    PC-relative addressing makes position-independent code significantly faster. This is useful for shared libraries, but also for position-independent executables which, in combination with address space randomisation, add some security.

    SSE is guaranteed to exist. This alone accounts for most of the speedup, because compiling for x87 is really hard (crazy hybrid of a stack- and a register-based architecture), so generating SSE ops for floating point, even if you're only doing scalar arithmetic, is a lot more efficient.

    More GPRs. x86-32 code ends up with a lot of stack spills because it only has a tiny number of general-purpose registers. x86-64 has 16, which makes it a lot easier to work with.

    64-bit registers. On x86-32, 64-bit arithmetic is painful, because you need two registers for each of the operands, and you only have 6 registers to use (two of which must be used for the destination in a lot of ops). On x86-64, it's a lot easier to do sequences of 64-bit arithmetic without spills.

    --
    I am TheRaven on Soylent News
  89. Re:Did it really work? by TheRaven64 · · Score: 1

    There is no logical reason that an x86-64 procressor in 64 bit mode would perform faster than 32 bit mode unless you are memory constrained

    Or you benefit from more registers. Or you benefit from vastly more 64-bit registers. Or you're doing floating point and benefit from the compiler being able to assume SSE is present and never use x87 arithmetic. Or you're using shared libraries so benefit form faster position-independent code. But, apart from that, no logical reason at all...

    --
    I am TheRaven on Soylent News
  90. Re:Slashdot refuses to respond to abuse... apk by DrXym · · Score: 1
    Personally I pray that Slashdot will recognize the wisdom of tweaking their existing "Read the rest of this comment..." logic so it kicks in a lot sooner for 0 and -1 posts. e.g. a 0 post might show 10 lines, a -1 post only shows 2 lines with the link there to show the remainder.

    Then this raving nutcase / troll can post his mind spool as much as he likes but its impact will be minimal.

  91. Well that's good and bad by Sycraft-fu · · Score: 1

    So good in that AMD got the contract. It is money, no question about it, and the console market is not small. Better (for them) they should have it than IBM or someone.

    So how is it bad? Low, low margins.

    Consoles are very cost driven devices. Often sold at a loss initially, and then little to no profit later. The reason is they want to pack as much hardware as they can in for as cheap as they can. Well the other side of that is they lean on suppliers, hard, to offer very low prices. They don't give their suppliers a lot of profit. They don't force them to take a loss or anything (the suppliers wouldn't agree) but it is just this side of it.

    So selling 50 million units for consoles is way less profitable than selling 50 million units for laptops, desktops, servers, that kind of thing.

    Hence while it is better than having no sales at all, it is not as good as taking a bigger slice of the computer market.

    1. Re:Well that's good and bad by rwise2112 · · Score: 1

      and the console market is not small

      It seems to be smaller than you imagine. From a recent article at Ars about AMD: "Game consoles are a relative drop in the bucket compared even to the dwindling PC market (Microsoft has sold a little over 70 million Xbox 360s since 2005, and the PC market can generally meet or beat that number in a single quarter)"

      --

      "For every expert, there is an equal and opposite expert"
    2. Re:Well that's good and bad by Sycraft-fu · · Score: 1

      Depends on what you mean. Small compared to PCs? Yes. The PC market is huge though. Despite all the crying about it dying, it is still a massive market particularly when you consider we are talking desktops, laptops, and servers. Those datacenters are not filled with smartphones.

      So it is much bigger than consoles. However the console market isn't trivial. The 360 has sold something like 76 million units, the PS3 like 70 million. So assuming the next gen consoles do similar, that's around 140 million units for AMD. Nothing to sneeze at.

      However it is low margin. So Like I said, not the greatest thing in the world. An increase in PC marketshare would be much better for AMD. The console contracts aren't worthless, but aren't going to be that lucrative.

    3. Re:Well that's good and bad by Shirley+Marquez · · Score: 1

      AMD is good at serving the low-margin part of the CPU business. They've been doing it with x86 in PCs for years now, so doing it in the console business isn't a big change for them.

    4. Re:Well that's good and bad by Anonymous Coward · · Score: 0

      AMD is good at serving the low-margin part of the CPU business. They've been doing it with x86 in PCs for years now, so doing it in the console business isn't a big change for them.

      That's a business model which has frequently left AMD with negative profit margins.

      And... it's not so much that they're good at it, it's just where they've been forced to operate. They enjoyed a short window of beating Intel technically, starting a couple years before the first AMD64 CPUs. During that time they quite happily attacked the high margin x86 market. But since about 2006 (when Intel introduced Core 2 Duo) they've been left in Intel's dust, and Intel's been using this technical advantage to squeeze AMD back out of all the high margin segments.

      That's not good for AMD, because it's an industry where it really pays to be the big gorilla, even if you're just trying to do cheap. (Perhaps especially when you're doing cheap.) There are major economies of scale in silicon manufacturing -- Intel's cost to make a chip of any given size is substantially less. And Intel gets to amortize R&D across more chips, permitting them to spend a lot more in absolute terms. They use that both to maintain a lead in process tech (usually 1 or more nodes ahead of the whole industry in recent years), and to do a lot more CPU microarchitecture work than AMD can afford to. So: if you specify a given performance level which both Intel and AMD can meet, and look at the chips in question, Intel's will invariably be much smaller (and usually lower power) than AMD's. Even on the low end. And size determines cost in chips.

      Which means that Intel can make a profit at price points which are break-even or loss-makers for AMD. If you're thinking that's a very uncomfortable place for AMD to be, you're right. But it's the only place they can be.

      In that light, doing what it took to get this generation's console design wins makes a lot of sense. It gets them some safe, mildly profitable revenue streams which Intel can't directly undercut. But it's not going to shower AMD in money either.

  92. Re:Did it really work? by LizardKing · · Score: 1

    Simula 67 was standardised in 1968. ANSI Common Lisp dates from 1984, and the OO implementation it includes (CLOS) was a relatively recent development at the time. CLOS is also a hack, although Lisp bores try and pretend it isn't by claiming the omissions make it "more poowerful".

  93. Re:Did it really work? by Dr_Barnowl · · Score: 1

    I agree that 64-bit machines are somewhat niche, but I work in that niche.

    If you do anything serious with Java, on Windows, because of the memory layout and the insistence of the HotSpot VM on being allocated contiguous stretches of address space, you're limited to about 1.2GB of heap space. When you have a domain that has object counts in the 3 - 5 million region, that fills up rapidly. This is for a big graph of objects and the queries for them involve lots of graph traversal. The code in question can do set queries in about 0.5s that an RDBMS takes over 5 mins to do, so there's a real value to caching all the objects on the heap.

    Yes, I could use another language that doesn't have a stupid VM and have ample overhead in 4GB, although this data set will grow (even if it's not "social network" level of growth). But with working code in Java, it's much cheaper and easier to throw a 64-bit OS and another stick of RAM at it.

    A shame that my employer is still tragically stuck in the 90s and thinks 32 bits should be enough for anyone..

  94. Re:Did it really work? by Dr_Barnowl · · Score: 1

    He's de-duping files with SHA512, from the listing.

    That will get a major boost on 64-bit machines just because of the increased word width. I imagine the hashing step is what is consuming most of the CPU time, and making the code CPU bound instead of I/O bound.

  95. Re:Did it really work? by dreamchaser · · Score: 1

    64 bitness was never about performance. It was always about larger address space. The fact that in some cases there is a performance increase is just a bonus.

  96. Re:Did it really work? by Anonymous Coward · · Score: 0

    You had snow??? All we had was a steady rain of comets and asteroids.

  97. Re:Did it really work? by goose-incarnated · · Score: 1

    It's kind of like watching the functional programming people slowly reinvent OOP...makes me scream inside. "Dude, we've figured out a new way to organize our methods / fields so that it's easier to keep them straight in our heads..." "Please God, let it not be OOP." "*talks for a bit*" "Damn it."

    I find it quite funny when the OO-crowd goes off like this :-)

    (In case no one else clues you in: you've got it backward - Functional came first, and gave the world OO. OO now constantly reinvents everything that lisp had, under the guise of "new and improved")

    --
    I'm a minority race. Save your vitriol for white people.
  98. Re:Did it really work? by Anonymous Coward · · Score: 0

    If you're heavily multitasking perhaps not so much, but individual applications still only have a 32bits address space (some of which is reserved). That means even if you have 64GB of RAM your video editing program that would love some larger buffers is still limited to 3-4GB. So most of the RAM above 4GB will rarely be used, not a very efficient use of it.

  99. Coursera 64-bit course by F.+Lynx+Pardinus · · Score: 2

    For anyone interested in learning more about x86-64, Coursera, in conjunction with UWashington, just started a "Hardware/Software Interface" course that focuses on 64-bit processors.

  100. Re:Did it really work? by gatkinso · · Score: 1

    You had it good. Those clay tablets were a bitch to load,

    --
    I am very small, utmostly microscopic.
  101. Re:Did it really work? by Anonymous Coward · · Score: 0

    This is simply incorrect about PAE. Why isn't it modded into oblivion.

  102. Re:Did it really work? by petermgreen · · Score: 1

    Seriously, anytime Asus is feeling poor, they can release a Crosshair motherboard that takes 64 GB or perhaps 128 GB of RAM.

    How much ram we can put in our desktops is not really up to motherboard manufacturers like ASUS, it's up to the CPU and RAM manufacturerers.

    Current intel mainstream desktop CPUs support four DIMMs and current high end high end desktop CPUs support eight DIMMS. Afaict the largest DIMM of desktop memory* currently available is 8GB. So the current limit is 32GB for mainstream desktop and 64GB for high end desktop. I belive that the high end desktop stuff theoretically supports 128GB but noone makes the DIMMs needed to do it yet.

    Workstation/server platforms can take a lot more than that both through supporting more DIMMs and through supporting types of DIMM that come in higher capacities. I've seen systems that claim support for up to 2TB of ram.

    * DDR3, unregistered non-ecc.

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  103. Re:Did it really work? by petermgreen · · Score: 1

    Every 64-bit platform i'm aware of still has a 32-bit int. There may be some software that will waste memory on unix like systems when it uses "long" (which is typically 64-bit on 64-bit unix like systems) where a 32-bit value is fine but I doubt that is significant in the grand scheme of things.

    The code itself is usually slightly bigger on x86-64 than on x86 which is probablly what the GP was reffering to but in the grand scheme of things code is usually pretty small and the greater efficiencies for position independent code offset this by reducing the chance of multiple copies of the same code being loaded at once due to load time relocations.

    The real problem is pointer heavy code. If a program uses data structures that are mostly made up of pointers (or integers that could potentially contain a typecasted pointer and therefore need to be pointer-sized) then those data structures will nearly double in size on x64.

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  104. Re:Did it really work? by sribe · · Score: 1

    If you do anything serious with Java, on Windows, because of the memory layout and the insistence of the HotSpot VM on being allocated contiguous stretches of address space, you're limited to about 1.2GB of heap space. When you have a domain that has object counts in the 3 - 5 million region, that fills up rapidly. This is for a big graph of objects and the queries for them involve lots of graph traversal. The code in question can do set queries in about 0.5s that an RDBMS takes over 5 mins to do, so there's a real value to caching all the objects on the heap.

    Yes, I could use another language that doesn't have a stupid VM and have ample overhead in 4GB...

    Actually, I think you're mistaken about how much more heap space you could get outside of Java. On Windows you can tune the amount of your address space taken by the kernel down and thus, possibly, start with as much as 3GB for user space available to your app. On Linux, you can't even do that and are stuck with 2GB to start. Load your libraries and runtime for *any* language and tool set, and you're not going to have all that much more than 1.2GB. Sure, maybe 1.5GB and on Windows maybe even a bit more. But until you go to 64-bit, you're stuck starting out with 1-2GB.

  105. Re:Did it really work? by sribe · · Score: 1

    He's never written anything that's tested the limits of computing...

    And he's making invalid assumptions about what ordinary users might need. It's not that 32-bit is enough because most users don't need anything that requires 64-bit, it's that most users have never been offered things that require 64-bit, because they didn't have it. You know what needs more heap than you'll get in 32-bit to work well? According to IBM researchers: voice recognition. Yep, their research finds that being able to keep around 2GB+ enables huge, qualitative improvements. In my own work, I've hit the limit trying to do something simpler: provide really good auto-complete suggestions based on the individual user's corpus of work. Then of course there's all sorts of other searching algorithms, there's a huge difference in usability when you can provide a user a "browsable" interface that responds literally at the speed of thought vs requiring the user to compose the entire query and then wait a few seconds.

    Meanwhile, I need only load up my badly coded evolutionary program to see my machine scream at the ~12 GB hit to the RAM. I say badly coded because I have found a few tricks to help get some additional memory savings out of it...also on topic, the aggression level was kind of low, so I imagine future tests might break the 32 GB barrier easily. Currently thinking of giving it a SSD for virtual memory...

    So is 64-bit with 64GB in your computer not an option for you?

  106. Re:Did it really work? by MachineShedFred · · Score: 1

    But 64 is twice as big as 32, thus it must be twice as good!

    --
    Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
  107. time_t by Anonymous Coward · · Score: 0

    Don't forget the obvious benefit of a 64-bit time_t data type when 2038 comes along.

  108. Re:Did it really work? by wagnerrp · · Score: 1

    Current intel mainstream desktop CPUs support four DIMMs and current high end high end desktop CPUs support eight DIMMS

    Not exactly. Memory controllers support ranks, not DIMMs. One rank is one fully populated bus width. Standard DDR memory controllers are 64-bits wide, and memory modules are typically 8-bits, meaning you have eight modules to a rank. The memory controllers on desktop CPUs typically support two ranks per channel at full speed, and four ranks at reduced speed, so two ranks per double-sided DIMM, and two DIMMs per channel. On the other hand, if you get high density quad-rank DIMMs, then you can only add one per channel.

  109. Re:Did it really work? by UnknownSoldier · · Score: 1

    > Afaict the largest DIMM of desktop memory* currently available is 8GB.
    > * DDR3, unregistered non-ecc.

    Depends how you define "desktop memory"

    16 GB, and 32 GB sticks are "available" in extremely limited supplies

    $360 Kingston 16GB 240-Pin DDR3 SDRAM DDR3 1333 Desktop Memory Model KVR13LR9D4L/16
    http://www.newegg.com/Product/Product.aspx?Item=N82E16820239525

    $1400 HP 627814-B21 32GB DDR3 SDRAM Memory Module
    http://www.newegg.com/Product/Product.aspx?Item=N82E16820326202

    Not sure if this counts as desktop memory ... (technically NewEgg lists it as Server Memory)
    $1400 IBM 32GB DDR3 ECC Registered DDR3 1066 (PC3 8500)
    http://www.newegg.com/Product/Product.aspx?Item=N82E16820135081

    > I've seen systems that claim support for up to 2TB of ram.

    The HP ProLiant servers support up to 2 TB with 64 DIMM slots. Only $10K for the mobo, the RAM will only cost you $90K :-)
    http://h10010.www1.hp.com/wwpc/us/en/sm/WF04a/15351-15351-3328412-241644-3328422.html?dnr=1

    But yeah, looks like we have to wait another 10 - 20 years before we start seeing "normal" desktop motherboards support more then 128 GB. The 4 or 8 DIMM sockets will be "good enough" for a long time.

  110. Re:Did it really work? by wagnerrp · · Score: 1

    you're limited to about 1.2GB of heap space

    That always pissed me off when trying to load large datasets. I remember buying a pricey, brand new dual-core Opteron and 2GB of memory back in 2005, so I could work on some things at home, and having to reboot into Linux to actually make use of it. Even on XP64, 32-bit applications still fell under the same restriction.

  111. Re:Did it really work? by wagnerrp · · Score: 1

    Actually, the x87 FPU has always been 80-bit precision, even on old 32-bit processors. There was no significant improvement in floating point performance between K7 and K8, besides clock rate. 32-bit versus 64-bit only holds relevance for integer math and pointers.

  112. Re:Did it really work? by wagnerrp · · Score: 1

    Thank you. The people spouting nonsense about 32-bit programming, and how they can't understand why 64-bit computing would be faster (in the x86 world) drive me loony...

    To be fair, increased register space is completely independent of 32-bit versus 64-bit processors. It was a much needed architectural improvement that just happened to coincide with the transition. The only direct computational improvement of a 64-bit CPU is when doing 64-bit integer math.

  113. Re:Did it really work? by wagnerrp · · Score: 1

    Because various things suddenly take up twice as much memory, and thus require more memory bandwidth to operate at the same speed. In reality, the performance hit is slight, and more than accounted for by the increased register space available to applications properly compiled for x86-64.

  114. Re:Did it really work? by nukenerd · · Score: 1

    Beyond me why they are having this argument at all. Is it even possible to buy a 32-bit PC any more? I have had 64-bit PCs for the last 5 years, running 64-bit software. I don't "need" 64-bit over 32-bit and I am sure that some of what I do, like editing, could be on 8-bit. It's not for any performance gain, 64-bit is just the current standard as far as I am concerned.

    Yet there are millions of people out there running 32-bit OS's on 64 bit PCs - why?

  115. Re:with 32 bit on some system you get like 2.5-3.7 by wagnerrp · · Score: 1

    Actually, you get however much of that memory is split off to userspace. The default on Windows is a 2GB/2GB split. Linux defaults to a 3GB/1GB split, offering more available to the application. In both cases, that is a user-configurable option.

  116. Re:Did it really work? by wagnerrp · · Score: 1

    There is no "31-bit size limit". It's a 32-bit computer, able to access 32-bits, or 4GB, of memory.

  117. Re:Did it really work? by wagnerrp · · Score: 1

    That 3.3GB cap is likely because you have 512MB towards your video memory, and another couple hundred MB consumed elsewhere. All accessible memory gets lumped into the same 4GB cap. Why not make a separate swap partition independent of your encrypted system partition?

  118. Re:Did it really work? by TheRaven64 · · Score: 1

    I'm not sure what in my post you think you're referring to. My point about SSE is that all x86-64 chips support it, only some x86-32 chips do. It is part of the basic ISA, not an extension. This means that ABIs use it (for example, the SysV ABI uses SSE registers for parameter passing and value returning). And, because it's always there, the compiler can use it. It is vastly easier to generate code for a machine that has 16 registers, any of which can be used as operands for any instruction, than one where you have 8 registers and most operations can only use 2 and a lot have the side effect of moving all values up or down one register. You often end up with a lot of spills in x87 calculations because register allocation and instruction selection are really hard.

    --
    I am TheRaven on Soylent News
  119. Re:Did it really work? by petermgreen · · Score: 1

    Despite the fact that two of those modules are listed in the "desktop memory" section they are all listed as being "registered". Afaict at least in the intel world "registered" memory can only be used with "server" platforms.

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  120. Re:Nobody's said 64 bit Linux 4 years before Windo by squiggleslash · · Score: 1

    I'm confused. If the first chip supporting x86 64 didn't come out until ten years ago, how did Linux support it two years prior?

    --
    You are not alone. This is not normal. None of this is normal.
  121. Re:Slashdot refuses to respond to abuse... apk by Anonymous Coward · · Score: 0

    Dammit, that's two days in a row that a very "interesting" post has gotten my attention enough for me to click the Parent link and it's been this stupid spam post that wasn't useful or current when it first started being posted several years ago.

    Just stop replying to this crap! Sure, sure, their parents won't piss in their skull if you don't make the effort, but damn I'm tired of having to scroll past all of that.

  122. you want to celebrate... this? by Anonymous Coward · · Score: 0

    This might be the 10th anniversary of 64 bit x86 chips, but it's been 22 years since MIPS released their R4000 chip, and 21 years since DEC released the first ALPHA chip. Either of which are superior architectures to that lowly AMD Athlon 64. But... neither of which ran the "standard" of x86 instructions.

    Yet another example of mediocrity beating out a superior technology through better marketing. And the customers -- that would be us -- paid the price by waiting nearly 25 years for the inferior tech. to finally catch the frell up. And this, you want us to celebrate? Um.... no.

  123. Re:Did it really work? by Anonymous Coward · · Score: 0

    PAE is more or less old school segmentation. You can't say 'it has a 3% slow down' because it has 0 slowdown if that particular page is already in memory, and if not ... it has the same 'slowdown' as an other paging operation plus a fixed number of cycles. [..]

    You really have no idea how processors work.

    Actually any x86-64 processor in Long mode uses PAE with an extra level in the heirarchy. See for yourself:
    http://en.wikipedia.org/wiki/Physical_Address_Extension

    I don't think you can make an argument that PAE is slower than x86-64 when it is in fact used by x86-64 in long mode.

  124. Re:Did it really work? by Anonymous Coward · · Score: 0

    I think if you understand how truly horrifying PAE is, you would have no doubt at all that 64 bit platforms were the way to go. There's a lot of memory management cruft in the Linux kernel that x86_64 eliminates.

    x86_64 also slipped in a few much needed enhancements to the ia32 architecture, including some extra general purpose registers.

    http://en.wikipedia.org/wiki/X86-64

    You do realize that in X86-64, PAE is actually the addressing scheme used right? See for yourself:
    http://en.wikipedia.org/wiki/Physical_Address_Extension

    I don't think you can argue that x86-64 is somehow superior to PAE when it in fact always PAE addressing in long mode.

  125. Re:Did it really work? by Blaskowicz · · Score: 1

    Only 256MB are mapped to the vid card even if you have more vid memory (possibly 256MB are reserved no matter what even with a 128MB or 64MB one?). Only having multiple video cards will make you waste more memory.

  126. Re:Slashdot refuses to respond to abuse... apk by ImprovOmega · · Score: 1

    I suppose it's the slashdot equivalent of being Rick-Rolled.

  127. Re:Did it really work? by jellomizer · · Score: 1

    Microsoft dropped the ball on 64bit, Linux does too, however because most Linux tools are open source they can just be recompiled, so it isn't as big of an issue.
    However compared to Solaris and Apple Implementation from 32bit to 64Bit, the PC transition is very sloppy.

    We have Windows 32bit and 64bit. You would expect if you have a 64bit Computer that getting the 64bit OS would be the best choice. No not really, there are too many (Not most, but a lot of them) 32bit apps out there that just will not work, or if you need to have them talk across each other you get more issues.
    I though .NET would have worked on helping resolve the issue 10 years ago, why else would we have a development platform that compiles to run as slow as Java but only works for Windows, I figured it would be for an easy transition to 64bit systems. No .NET doesn't even do that too well.

    Sure the old 16bit apps for Windows 3.1 have finally died, I can get over that, but if you have Office 2007 and Office 2010 apps installed on the same system, you can get into trouble with some other tools that integrate with them.

    Working with Solaris during this transition a few years earlier, it was seamless apps worked as designed and we weren't fighting 32bit vs 64bit. Apple too made it transparent. But Microsoft really dropped the ball, they could have allowed the move to 64bit happen much earlier, but they were too busy fighting Linux and Apple and Google vs trying to make this migration easier.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  128. Re:Slashdot refuses to respond to abuse... apk by Anonymous Coward · · Score: 0

    I don't know what's more disturbing. That you think someone will bother spending more than couple of seconds even glancing over at that drivel. Or that I did.

    I mean what the fucking hell is that crap? Oh well..

  129. Re:Did it really work? by vistapwns · · Score: 1

    "Let me guess, you ran it in 32 bit mode, then ran it again immediately after in 64 bit mode ... and then ignored the disk cache completely?"

    Nope, I did dozens of runs for each, ignoring the first result that was obvious disk I/o bound (because it was much longer). As others have explained, and I said, some code benefits greatly from x64.

    --
    "...I think the Microsoft hatred is a disease." - Linus Torvalds
  130. CPU had to be test booted BEFORE public release by raymorris · · Score: 1

    AMD wouldn't launch a chip they had never booted, so it couldn't really be publicly released BEFORE it had any OS support. So, you need an OS before you release.

    In fact, they really wanted to test the instruction set and design before spending big bucks fabbing the chips. Linux had already been 64 bit for five years on Alpha, so 64 bit Linux was well proven. So they created an emulator for the new instruction set and Linux was ported, running on the emulator. Therefore, Linux supported x86_64 before the processor physically existed.

    1. Re:CPU had to be test booted BEFORE public release by squiggleslash · · Score: 1

      Thanks! That makes sense.

      --
      You are not alone. This is not normal. None of this is normal.
  131. Re:Did it really work? by swillden · · Score: 1

    On Windows you can tune the amount of your address space taken by the kernel down

    Link? I'm interested to see how that's done.

    On Linux, you can't even do that and are stuck with 2GB to start.

    Huh? The default 32-bit Linux memory split is 3/1; 3 GiB for userspace, 1 GiB for kernel space. If you compile a custom kernel you can configure this differently.

    Did you perhaps swap "Windows" and "Linux" in your comment?

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  132. A new VAX by unixisc · · Score: 1

    I think a good project would be to port OpenVMS to a still surviving RISC such as MIPS, POWER or even SPARC. Both MIPS & POWER have open consortiums, so making a 'VAX' platform based on either, and then porting OVMS to it would elongate its lifetime for those who must have it. Even better would be to re-implement the Alpha 21364 architecture - the netlists and whatever - to today's lithographies w/o changing a thing about frequencies, or anything else. You'll get a lot cooler CPU which runs the same OVMS software w/o any changes, and can then just extend the lives of existing Alphaservers indefinitely. Certainly better than nervously following HP and Itanium to goodness knows where.

  133. Re:Slashdot refuses to respond to abuse... apk by drakaan · · Score: 1

    Love that idea. Send it here: mailto:feedback@slashdot.org.

    --
    "Murphy was an optimist" - O'Toole's commentary on Murphy's Law
  134. Re:Did it really work? by Guy+Harris · · Score: 1

    I suppose it depends on how you look at it. If you view the page directory as a bank select then it is a sort of segmentation.

    If you view the page directory as a bank select then any form of paging with more than two levels of page table is a sort of segmentation, including x86 paging without PAE (all the way back to the 80386), and the form of paging on just about every modern processor.

  135. Re:Did it really work? by Just+Some+Guy · · Score: 1

    ...typically using a signed int for pointers so that you don't have to have separate code paths for adding and subtracting from an address. The host itself can address 4GB but the OS may not let individual processes go over 2GB. This is the case in Windows x86, for example.

    --
    Dewey, what part of this looks like authorities should be involved?
  136. Re:Did it really work? by sribe · · Score: 1

    Disclaimer: I'm a Mac guy working in OS X (which is different yet), so I'm only generally familiar with Windows & Linux VM details. Anyway:

    On Windows you can tune the amount of your address space taken by the kernel down

    Link? I'm interested to see how that's done.

    It's a /3GB switch in boot.ini. Took me a while to find any decent info about it (because in the Windows world there seem to be a bunch of hosers with blogs who don't know the difference between kernel address space and the page file, and google doesn't that know their posts are tripe)--the normal is the even split of 2GB each to kernel and user space, this makes it 1GB to kernel and 3GB to user space.

    On Linux, you can't even do that and are stuck with 2GB to start.

    Huh? The default 32-bit Linux memory split is 3/1; 3 GiB for userspace, 1 GiB for kernel space. If you compile a custom kernel you can configure this differently.

    Did you perhaps swap "Windows" and "Linux" in your comment?

    No, I was simply mistaken about the default split on Linux. (And my comment about "can't do that" referred to user-accessible configuration, not building your own kernel...)

  137. Re:Did it really work? by Burning1 · · Score: 1

    Yes, this is a bit of an oversight on my part. I really should have been discussing the way PAE is implemented on ia32 processors. I had a little bit of difficulty finding information about it online, I'll have to consult my architecture books at home, and will expand on the original post. Here's a bit of the info I could find about PAE weirdness on IA32.

    https://www.kernel.org/doc/gorman/html/understand/understand005.html#sec: High Memory

    The key quote:

    PAE allows a processor to address up to 64GiB in theory but, in practice, processes in Linux still cannot access that much RAM as the virtual address space is still only 4GiB. This has led to some disappointment from users who have tried to malloc() all their RAM with one process.

    Secondly, PAE does not allow the kernel itself to have this much RAM available. The struct page used to describe each page frame still requires 44 bytes and this uses kernel virtual address space in ZONE_NORMAL. That means that to describe 1GiB of memory, approximately 11MiB of kernel memory is required. Thus, with 16GiB, 176MiB of memory is consumed, putting significant pressure on ZONE_NORMAL. This does not sound too bad until other structures are taken into account which use ZONE_NORMAL. Even very small structures such as Page Table Entries (PTEs) require about 16MiB in the worst case. This makes 16GiB about the practical limit for available physical memory Linux on an x86. If more memory needs to be accessed, the advice given is simple and straightforward, buy a 64 bit machine.

  138. Re:Did it really work? by wagnerrp · · Score: 1

    That is NOT the case for Windows x86. That is the case for the default configuration of Windows x86. It could be modified to the users' preference using a boot flag.

  139. Re:with 32 bit on some system you get like 2.5-3.7 by cbhacking · · Score: 1

    How does 32-bit Linux handle video cards with a gig or more of VRAM? Honest curiosity; the last time I ran 32-bit Linux I think my video card had only 256MB. On Windows, I think it would just fail to load the video driver, although I haven't checked (been a long time since I ran 32-bit Windows on raw hardware too).

    --
    There's no place I could be, since I've found Serenity...
  140. Re:Did it really work? by cbhacking · · Score: 1

    Aha, thanks for the clarification. I know a few assembly languages, including x86, but had never really read up on the x64 extensions. Doubling the register count *and* doubling the width is definitely a huge improvement, as is being able to do relative jumps with large offsets.

    --
    There's no place I could be, since I've found Serenity...
  141. Jeremiah Cornelius: Grow up by Anonymous Coward · · Score: 0

    You're embarassing yourself Jeremiah Cornelius http://slashdot.org/comments.pl?sid=3581857&cid=43276741 since you posted that using your registered username by mistake (instead of your usual anonymous coward submissions by the 100's the past 2-3 months now on slashdot) giving away it's you spamming this forums almost constantly, just as you have in the post I just replied to.

    1. Re:Jeremiah Cornelius: Grow up by Anonymous Coward · · Score: 0

      Hello Paul.

      p.s. What do you make of this?

    2. Re:Jeremiah Cornelius: Grow up by Anonymous Coward · · Score: 0

      Forty Two Tenfold has Jeremiah Cornelius = "friend" in his account on /. = obvious he's a sockpuppet account you, Jeremiah Cornelius, use. Yes we know it's you Jeremiah Cornelius that I am replying to now, you off topic troll. Jeremiah Cornelius, you blew it with your 100's of anonymous coward trollings by accidentally submitting one of them by your registered username here Jeremiah Cornelius http://slashdot.org/comments.pl?sid=3581857&cid=43276741 and you, Jeremiah Cornelius failed badly giving yourself away troll. Doing sock puppets along with that? No big trick to the likes of you, clearly caught in the act in that link above..

    3. Re:Jeremiah Cornelius: Grow up by Anonymous Coward · · Score: 0

      Hello Paul. Good to see you're still monitoring week-old threads.

    4. Re:Jeremiah Cornelius: Grow up by Anonymous Coward · · Score: 0

      Speak for yourself Mr. AC troll hypocrite pot calling a kettle black Jeremiah Cornelius (the sockpuppet master himself).

  142. Re:Did it really work? by Anonymous Coward · · Score: 0

    You need to learn a bit more logic I think. There are logically many reasons, but most are not obvious.

  143. Re:Did it really work? by swillden · · Score: 1

    No, I was simply mistaken about the default split on Linux

    Ah, okay.

    And my comment about "can't do that" referred to user-accessible configuration, not building your own kernel...

    On Linux building your own kernel is user-accesible configuration :-)

    Seriously, on Debian/Ubuntu, it takes a maximum of three commands (including installing all of the required tools), and it may be doable without touching the command line at all. It definitely doesn't require editing any files. It's more time-consuming (due to the time required to download tools and build) but may be easier than finding and modifying boot.ini.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  144. Re:Slashdot refuses to respond to abuse... apk by DocHoncho · · Score: 1

    Thing is, I'm positive there were multiple copycats, and with the content Kristopeit usually wrote it was trivial to duplicate. In a sense it made the whole Kristopeit troll even more epic because not only would he (or it?) be writing troll posts but copycats too and it just spiraled into madness.

    With APK it's almost impossible for a normal, rational individual to accurately duplicate the unique style of a genuine APK rant. That long copy pasta that's been making the rounds is a pretty good approximation though.

    --
    Celebrity worship is a poor substitute for Deity worship and costs more to boot.
  145. Re:Did it really work? by Anonymous Coward · · Score: 0

    Dude. It's not 1993 any more. Being too impressed with a bunch of screensaver effects is definitely a 1993 thing. And nobody sane wants to manually bankswitch memory in and out. It's not going to speed anything up, it's just extra pain, for no good reason given that today's processors are 64-bit.

    Speaking of which, the "scene" had an awful lot of people working on 68K platforms (Atari ST, Amiga). The 68000 had no such thing as bank switching, it had a simple linear 32-bit address space. This was not a problem for democoders. In fact, they loved it. Freed them up to focus on the things which mattered.

    For that same reason, it was pretty common for advanced DOS games (and probably demos too) to use a DOS extender (either one of the standard ones, or homegrown) so they could enjoy 32-bit mode with a 32-bit flat address space.

    (Also... what video decompressor uses even 1 gigabyte of RAM?! It's amusing that you think this is a great example of a problem for which Bank Switching Is The Answer.)

  146. Re:Did it really work? by Burning1 · · Score: 1

    I need to offer you credit; you are right. The issue isn't really PAE, it's how the kernel manages memory on 32 bit x86 architectures with more than 1GB of memory installed. PAE simply exacerbates the problem. Here's an explanation of the complaint:

    On ia_32 systems, the kernel splits memory into 3 zones; DMA, NORMAL, and HIGHMEM.

    ZONE_DMA is the first 16MB of memory, and is generally avoided unless needed (due to lack of available higher memory, or for DMA mappings.) The kernel tries to reserve this address range for devices that use DMA mapping.

    ZONE_NORMAL is an address space that is directly accessible to the kernel, and extends from 16MB to 896MB. Kernel data structures are stored in this space, including the kernel page tables. Memory mappings start to consume a lot of memory in ZONE_NORMAL, and thus PAE on ia_32 with a lot of installed memory can cause out of memory issues, even when there is a lot of available physical memory. User data can be allocated into ZONE_NORMAL, but is preferred to be placed in ZONE_HIGHMEM to free ZONE_NORMAL for kernel data structures.

    ZONE_HIGHMEM is memory above the 896MB barrier. This address range is not directly accessible to the kernel. In order for the kernel to access anything in this zone, a temporary map must be made into ZONE_NORMAL. These mappings consume pages of ZONE_NORMAL, and suffer a performance hit. User space processes can access these pages directly (handled by the virtual memory manager system, of course.)

    Generally, memory will be allocated to ZONE_HIGHMEM, ZONE_NORMAL, or finally ZONE_DMA in that order of preference.

    The x86_64 architecture eliminates the need ZONE_HIGHMEM. ZONE_NORMAL extends all the way from 16MB to the end of physical memory. This approach simplifies memory management, improves performance, and is generally more flexible.

    You're correct that there was a major issue with my original post... My memory of the kernel architecture had garbled HIGHMEM with PAE, and I was thinking that PAE required mapping pages above 4GB into lower memory. This would of course cause a huge performance penalty for any process consuming memory above 4GB. I deserve downmods for the technical inaccuracy.

    Here's a very brief summary of the problems with HIGHMEM:
    http://linux-mm.org/HighMemory

    Here's a bunch of links used to refresh my memory:

    http://www.makelinux.net/ldd3/chp-15-sect-1
    https://www.kernel.org/doc/gorman/html/understand/understand005.html
    http://unix.stackexchange.com/questions/5143/zone-normal-and-its-association-with-kernel-user-pages

  147. Re:Did it really work? by Anonymous Coward · · Score: 0

    Different AC here. You're reading far too much into that kernel.org doc. PAE is not weird at all. An ia32 CPU with PAE supports page tables which map 32-bit virtual addresses to 36-bit physical addresses. That is, the physical address in a page table entry (PTE) is 36 bits wide, instead of the original 32. Simple.

    What the kernel.org doc is talking about is practical limits in ia32 Linux, not problems with PAE. In order to permit fast user/kernelspace transitions, ia32 Linux defaults to a 3/1 split. This means the currently running user process and the kernel share page tables. The user process lives in the lower 3GB of virtual address space, the kernel (and memory mapped IO devices etc.) live in the upper 1GB, hence 3/1. That's key to understanding that doc you quoted. It doesn't say so outright, because to a kernel hacker it's like breathing, but kernel address space on ia32 is a precious resource. Apparently at about 16GB physical RAM the kernel data structures required to track more RAM simply become too large.

    Other operating systems have alternate design choices. For example, OS X uses a 4/4 split on ia32. User processes get 4G address spaces with no kernel stuff mapped, and the kernel gets its own 4G address space with no user stuff mapped. There's a performance penalty (the system has to flip between user and kernel page tables every time the user process makes a system call), but it's also better at supporting large memory configurations on ia32+PAE because the kernel's address space layout is much less cramped.

    It's certainly possible to replumb ia32 Linux to make more than 16GB work. But as the kernel.org doc implies, the kernel community's response to this idea boils down to "here's a nickel, kid, buy yourself a real computer". Which is not actually unreasonable today, given the ubiquity of 64-bit x86, the 64-bit Linux kernel, and 64-bit Linux application software.

  148. Re:Did it really work? by Burning1 · · Score: 1

    Yep, you're right. I corrected myself in another post.

  149. Re:Slashdot refuses to respond to abuse... apk by DrXym · · Score: 1

    Thanks for the suggestion, I did just that.

  150. Re:Slashdot refuses to respond to abuse... apk by drakaan · · Score: 1

    If they implement that, I'm going to have to figure out a way to give you a virtual hug.

    --
    "Murphy was an optimist" - O'Toole's commentary on Murphy's Law
  151. Hello, Paul. by Anonymous Coward · · Score: 0

    You fail it, Paul. Your skill is not enough.