Slashdot Mirror


BigTux Shows Linux Scales To 64-Way

An anonymous reader writes "HP has been demonstrating a Superdome server running the Stream and HPL benchmarks, which shows that the standard 2.6 Linux kernel scales to 64 processors. Compiling the kernel didn't scale quite so well, but that was because it involves intermittent serial processing by a single processor. The article also notes that HP's customers are increasingly using Linux for enterprise applications, and getting more interested in using it on the desktop..."

247 comments

  1. 64 Itanics! by Anonymous Coward · · Score: 0

    That's what, 640.000?

    1. Re:64 Itanics! by cicadia · · Score: 4, Funny
      That's what, 640.000?

      That should be enough for anybody :)

      --
      Living better through chemicals
    2. Re:64 Itanics! by crummynz · · Score: 0

      Hahaha... no mod points to give but that's funny :) I'm sure other people will get the joke...

      --
      ~ Crummy
    3. Re:64 Itanics! by Anonymous Coward · · Score: 0

      haha, Mr. 818547 UID. We all get the joke, it's just not funny anymore. Most of us have underwear older than you. Please come back when you've graduated grade 12.

    4. Re:64 Itanics! by Anonymous Coward · · Score: 1, Funny

      Time to throw out your old undies, mister Gates.

      Oh, and don't be so bitter about that 640 comment. Everyone makes mistakes. I know you deny saying it but you are a liar.

    5. Re:64 Itanics! by Anonymous Coward · · Score: 0

      Just keep posting lame comments like this one, I'm sure you'll earn mod points in no time.

  2. So this time.. by puntloos · · Score: 4, Funny
    The age-old Slashdot question should read:

    Does it run Linux well?

    1. Re:So this time.. by ikewillis · · Score: 5, Interesting
      This is the real question which is oft ignored. There is far too great an emphasis of being able to manage n CPUs rather than how effectively kernel services operate on n CPUs.

      The answers have to do with fine grained locking of kernel services, so that the number of resource contentions between processors can be mitigated through a diverse number of locks with the hope that diversifying locks will ensure that fewer will be likely to be held at a given time, or designing interfaces that don't require locking of kernel structures at all.

      At any rate, Amazon successfully powers their backend database with Linux/IA64 running on HP servers. YMMV, but if it's good for what most would consider the preminent online merchant, it's probably good enough for you too.

    2. Re:So this time.. by Trejkaz · · Score: 3, Funny

      I'd rather know if it can run Longhorn...

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
    3. Re:So this time.. by pchan- · · Score: 1

      It's about time Linux was ported to the Commodore-64. They said it couldn't handle all 64 Kilobytes of memory, but this shows otherwise.

    4. Re:So this time.. by Anonymous Coward · · Score: 0

      Dude

      Don't put that in your .sig please. My girl almost walked in on me. Although I may just remember that for when I need it :) :D

    5. Re:So this time.. by tuxter · · Score: 1

      By the time Longhorn is out, it'll be able to run 128 ;)

    6. Re:So this time.. by aichpvee · · Score: 0

      You mean your mom walked in on you.

      --
      The Farewell Tour II
    7. Re:So this time.. by Anonymous Coward · · Score: 0

      How can I see his .sig? Do I need to be logged in?

    8. Re:So this time.. by Decaff · · Score: 2, Interesting

      This is the real question which is oft ignored. There is far too great an emphasis of being able to manage n CPUs rather than how effectively kernel services operate on n CPUs.

      Absolutely. This is why we should be wary of claims that have been made (and posted on Slashdot recently) that Linux 'scales to 512 or 1024 processors' (as in some SGI machines). This size of machine is only effective for very specialised software. A report that the kernel scales well to 64 processors is far more believable, and is a sign of the increasing quality of Linux.

    9. Re:So this time.. by jedidiah · · Score: 1

      That is a truism.

      Specialized software is required to take advantage
      of any NUMA architecture. The code has to be tuned
      to take advantage of it. This is true even if you're
      only talking about 12 cpus.

      A claim that Linux scales well to 64 cpus is no
      more or less believable than a claim that it scales
      to 1024. Both represent a scale of problem that few
      ever deal with.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    10. Re:So this time.. by Decaff · · Score: 1

      A claim that Linux scales well to 64 cpus is no
      more or less believable than a claim that it scales
      to 1024.


      Surely the lesser number is more believable!

      Both represent a scale of problem that few ever deal with.

      Large businesses like large numbers of processors. Some software scales naturally to this kind of setup: web and application servers, for example.

    11. Re:So this time.. by jedidiah · · Score: 1

      Web and java appservers are the IT equivalent of a renderfarm. They have no place in this discussion.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    12. Re:So this time.. by Decaff · · Score: 1

      Web and java appservers are the IT equivalent of a renderfarm. They have no place in this discussion.

      I disagree. First, the post I replied to said, about high numbers of processors that..."Both represent a scale of problem that few ever deal with." Well, web and appservers are very common!

      Secondly, these types of application frequently make a lot of use of kernel services. They can require networking and/or disk activity to make use of databases or other forms of information storage, or to connect to other services in the company.

    13. Re:So this time.. by jedidiah · · Score: 1

      Webservers and java servers don't share any common resources. Much like a renderfarm, you can split the problem onto as many single or dual cpu systems as you like with no scaling penalty.

      The database would be their only shared component and the only part of the system that has any place in this discussion. It's doing all the heavy lifting and concurrency management.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    14. Re:So this time.. by Decaff · · Score: 1

      Webservers and java servers don't share any common resources. Much like a renderfarm, you can split the problem onto as many single or dual cpu systems as you like with no scaling penalty.

      This isn't true. Well-written server-side Java applications make a lot of use of shared resources (without this, it would not be practical to run hundreds of threads in a few GB of memory). Most of the information needed to present web pages or provide web services is placed in either application or session context, and shared between threads. This shared information is not read-only - there are processes such as user and session tracking and cookie management for example.

    15. Re:So this time.. by jedidiah · · Score: 1

      If the shared context is read only, then ZERO concurrency management issues arise and my original comments stand.

      Serious SMP applications have to worry about every thread altering data that any other thread might need to access. This is what separates Oracle from a glorified renderfarm.

      A most, a java application server will mark part of it's shared cache as dirty.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    16. Re:So this time.. by Anonymous Coward · · Score: 0

      My mom is 1000 km away so no.

    17. Re:So this time.. by Decaff · · Score: 1

      If the shared context is read only, then ZERO concurrency management issues arise and my original comments stand.

      As I said, the shared contexts of most Java server applications are not read only (see below).

      Serious SMP applications have to worry about every thread altering data that any other thread might need to access. This is what separates Oracle from a glorified renderfarm.

      Any thread on a web service can alter data. Even a simple commercial shopping application will need to maintain a consistent centralised record of the results of requests. Click 'add to basket', then log on on another client machine and a different request thread can pick up what you have stored. It is also common practice for such applications to log each and every page request. This means that each thread will communicate with a single logging process at some point.

  3. Pardon my ignorance, but... by wizard_of_wor · · Score: 3, Interesting

    What parallel-computing activity doesn't involve intermittent activity by a single processor? You have to spawn the parallel job somehow, and typically that starts as a single process. Is the implication here that compiling is pipelined, but linking is a single-CPU job?

    --
    If you mod me down, I shall become more powerful than you can possibly imagine.
    1. Re:Pardon my ignorance, but... by drmarcj · · Score: 2, Interesting

      I could imagine an SMP job where you immediately spawn N new processes each which computes a certain subset of a given dataset. Assuming you never collected the results at the end (say, you just write out the results to files on disk for later analysis), you would technically never need inter-process communication, thus no serial processing by a single "master" process. But yes, you're right. You almost never do this in parallel processing, and in that sense the post is misleading in assuming there is anything but a theoretical possibility of no overhead in an SMP.

    2. Re:Pardon my ignorance, but... by Anonymous Coward · · Score: 0

      Well most of the linking is parallel, but the final big link is serial.

      And most if not all parallel computing has some serial sections. The point is, in this case it was a function of the workload and not the Linux kernel.

    3. Re:Pardon my ignorance, but... by bluGill · · Score: 2, Interesting

      As a simple question you are correct that every parallel computing job has some single processing parts. Those who study parallel systems spend most of their time looking for way to make sure that all processors are in use. Often an algorithm that less than optimal for single processor systems can use more processors, so a choice needs to be made.

      The other major issue is communication time. An algorithm that depends on all the CPUs talking all the time may appear fast on paper, but it will be slower than the single processor version!

      In short, you came really close to the point, while missing it.

    4. Re:Pardon my ignorance, but... by wizard_of_wor · · Score: 1

      Well... the "point" I was making was that there aren't a whole lot of parallel jobs that don't have some intermittent serial aspect to them, but the article summary seemed to treat this as if it's an exceptional case. drmarcj's reply to my post explains in better detail.

      --
      If you mod me down, I shall become more powerful than you can possibly imagine.
    5. Re:Pardon my ignorance, but... by rgmoore · · Score: 3, Insightful

      That type of processing is frequently called "embarrasingly parallel", and it's far more common than you seem to think. I think that 3D rendering and web serving that doesn't require writing to a database can all be handled this way. There are also many categories of scientific data processing- think SETI@home- that work this way. The real reason that this kind of SMP isn't interesting is because it's so easy that you don't need fancy hardware like 64-way servers to take advantage of it. It can be farmed out to clusters of cheap PCs, or even distributed over the network to volunteers.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

    6. Re:Pardon my ignorance, but... by norton_I · · Score: 1

      In many applications the length of the serial part (the part of the program during which only 1 CPU may be executing) grows slower than the parallel part with the problem size. Thus, for large problems, the serial part becomes a small (but not usually insignificant) part of the computation.

      Note that the relative size of the serial part of a program is independent of the ease or parallelization. The size of the serial part determines how many processors can be committed to the task before reaching diminishing returns. Compiling tends to have relatively long serial sections compared to many applications. The ease of parallelization has to do with the degree of synchronization required by the parallel part, and determines how closely connected the CPUs must be for the gain from adding CPUs to exceed the increased syncrhonization overhead.

      Compiling code is embarassingly parallel (it can be distributed over almost any network) but has a large serial section (linking). Therefore, it easily scales even on a slow network, but tapers off once linking starts to take over. Other applications fall in all four quadrants of this plane.

    7. Re:Pardon my ignorance, but... by Anonymous Coward · · Score: 0
      FREEPEOPLEFINDERSANDPEOPLESEARCH

      ADDRESSES-ADDRESSES.INFO-
      criminalrecordsandother publicrecordsdirectlysearch ablebycustomers.BEST-PEOPLE-SEARCH-WEBSITES.INFO
      -providesservicesforfindingunlistedandunpublishe dt elephoneand
      cellularphonenumbers.
      BIRTH-DATES.INFO-onlineprivate
      investigatorspecia lizinginassetsearches,phonenumbe rlookups,drivingrecords,
      licenseplatesearches,peo plesearches,andpublicrecor ds.
      CLASSMATES-CLASSMATES.INFO
      -providesaccesstorealestateinformationforfinding cu rrentandformer
      ownership,deeds,mortgages,property descriptions,par celmaps,andmore.
      E-FREE-PEOPLE-FINDER.COM-
      CSRAprovidesasubscripti ondatabaseforpeopletracingo ntheInternet,with
      probateandgenealogicalresearch, skiptracingandmore.
      E-FREEPEOPLESEARCH.COM-offers
      onlinesearchofpubli crecorddatabasesanddocumentretr ievalforcountyrecords,
      propertytitleresearch,andb uildingpermits.

      E-PEOPLE-FINDER.COM-providing
      criminalrecordsands exoffenderlistsforbackgroundche cksandverification
      neededforemployment,volunteer, andtenantscreening.
      FREE-PEOPLE-FINDER-FREE-PEOPLE-FINDER.INFO
      -offersonlinebackgroundchecksandvariouspublicrec or dssearchesincluding
      criminalandcivilrecords,bankr uptcyinformation,ands ocialsecuritynumber
      verification.
      FREE-PEOPLE-SEARCH-PEOPLE.INFO
      -providesaccesstoDMVdrivingrecordsnationwideandl ic ensemonitoring.

      FREE-SEARCH-PEOPLE.COM-offers
      personalbackgroundc hecks,andcriminalrecordsandasse tsearchesnationwide,
      basedonpublicandprivaterecor ds.
      I-PEOPLE-SEARCH.COM-provides
      onlineaccesstostatew ide,county-level,andmulti-stat ecriminalhistoryrecords
      forindividuals,landlords, andbusinesses.
      IDENTITYVERIFICATION.INFO
      -assetsearchesincludingbankaccounts&otherliquida ss ets.
      INTELIUS.INFO-Softwareengineering,
      hardwareandsys temsintegration,informationsecuritya ssessments,andinvestigative
      services.
      LOST-FRIENDS.INFO-offersonline
      searchofpublicreco rddatabasesanddocumentretrievalf orcountyrecords,
      propertytitleresearch,andbuildin gpermits.
      PEOPLE-DATA.INFO-offerscriminal
      recordssearchfori ndividualandemployment.
      PEOPLE-SEARCH-PEOPLE-FIND.INFO
      -books,spycameras,andresources.
      PEOPLE-SEARCH-PEOPLE-FINDER.INFO
      -provideslegal,financial,andbusinesscommunitiesw it hskiptracing,
      creditreporting,backgroundverificat ion,assetdiscov ery,andpeoplelocating
      information.
      PEOPLE-SEARCH-PEOPLE-SEARCH.INFO
      -confidentialsearchesforbackgroundchecks,unliste dn umbers,criminal
      records,assets,telephonerecords,a ndmore.

    8. Re:Pardon my ignorance, but... by harrkev · · Score: 1

      Computer science people have a name for everything. Some tasks are easily parallelizable, and others are not.

      There is something called "Amdahl's Law" which is all about this topic. Basicly, you can predict the percent speedup based on how parallelizable the task itself is. This law, of course, assumes that the OS and multitasking is completely overhead-free, so it is sort of an upper bound on how fast something can go. But it is still interesting.

      THIS is simply the first result form a google of "Amdahl Law".

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
  4. Geez... by byronne · · Score: 4, Funny

    I haven't had a 64-way since college.

    And you?

    --
    "Look, Smithers! I'm Davy Crockett!"
    1. Re:Geez... by drmarcj · · Score: 1

      Gee, I didn't know you could fit that many CS students into a VW Bug.

    2. Re:Geez... by atriusofbricia · · Score: 1

      Yes, I see two of them everyday at work.

      --
      I was raised on the command line, bitch

      "Nemo me impune lacesset"

    3. Re:Geez... by crunk · · Score: 1

      I had 64 1-ways. Does that count?

      --
      It's the battle of the minds, and everyone's unarmed.
    4. Re:Geez... by ReeprFlame · · Score: 1

      Naa. Gotta think like a cluster. All at the same time means less time and less work for you. Even though 64 1's may be more fun! lol

    5. Re:Geez... by nacturation · · Score: 4, Funny

      Most slashdotters are still working on upgrading to a 2-way.

      --
      Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
    6. Re:Geez... by shoemakc · · Score: 1


      I ::::must:::: be a geek, because my ::::first:::: thought was "SMP?"

      -Chris

      Most slashdotters are still working on upgrading to a 2-way.
      --
      --an unbreakable toy is useful for breaking other toys--
    7. Re:Geez... by Anonymous Coward · · Score: 0

      2-way? Is that both hands at once?

    8. Re:Geez... by Anonymous Coward · · Score: 0

      ...and many are still trying to figure out the one way too! :)

    9. Re:Geez... by simcop2387 · · Score: 1

      yea but the way i understand it in order to do that, you need this fantastic new herbal suppliments from canada, they're really cheap and will help you get up to a 3way even! only $699.95! they're called SCOre MORE

    10. Re:Geez... by Anonymous Coward · · Score: 0

      True but they can sure overclock their single unit. I hear most slashdotters can expect to spawn a process to completion in just a few minutes...

    11. Re:Geez... by Anonymous Coward · · Score: 0

      Left and right hands.

    12. Re:Geez... by Anonymous Coward · · Score: 0

      Asymetrical, preferably. A hetrogenous environment so to say

    13. Re:Geez... by Anonymous Coward · · Score: 0

      > Most slashdotters are still working on upgrading to a 2-way.

      Amen, brother!

      God bless ./ and the Holy Basement!

    14. Re:Geez... by Anonymous Coward · · Score: 0

      I hear most slashdotters can expect to spawn a process to completion in just a few minutes...

      The real problem is all the memory leaks.

  5. Hrmm by Nailer · · Score: 4, Interesting

    SGI
    Unisys
    Fujitsu
    HP

    It looks like there might actually be a competitive marketplace for scalable multiprocessor Linux systems real soon now (if not already).

    1. Re:Hrmm by Anonymous Coward · · Score: 0

      I remember seeing a screenshot several years ago of Linux booted on a 64-way Sun E10K. HP's SuperDome is in good company.

    2. Re:Hrmm by Anonymous Coward · · Score: 0

      Pffth. Wait till you see it boot on a 64-way IBM POWER5.

    3. Re:Hrmm by Anonymous Coward · · Score: 1, Insightful

      *yawn* Why stay up that late when it booted on a 512p Altix this morning?

    4. Re:Hrmm by SunFan · · Score: 1

      ... ...

      Where's Microsoft?!?

      (snicker)

      --
      -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
    5. Re:Hrmm by superpulpsicle · · Score: 1

      This kind of article bugs me. Fact being SGI is supposedly way ahead of linux in this department with 1024 processors. Linux at 64 shouldn't be that big a fuss.

    6. Re:Hrmm by Anonymous Coward · · Score: 0

      The same HP box ships with Windows funny guy.

    7. Re:Hrmm by chthon · · Score: 3, Interesting

      This is about an unmodified 2.6 kernel.

      I have the articles at home (Linux Journal) about the SGI systems. First they do measurements on their systems, and then patch the bottlenecks in the kernel.

      I don't think these patches can easily be put into a standard kernel.

    8. Re:Hrmm by Nailer · · Score: 1

      On some of the Unisys boxes.

    9. Re:Hrmm by Anonymous Coward · · Score: 0

      Funny Uni "we have the way out" sys switched to Linux after discovering Microsoft's trash couldn't hack the pace.

    10. Re:Hrmm by AhBeeDoi · · Score: 1

      Consulting with their attorneys.

    11. Re:Hrmm by joib · · Score: 1

      Actually, most of the work SGI did was against the 2.4 kernel (in fact, Altix systems still ship with a SGI patched 2.4 kernel). The vast majority of these patches were included in the 2.6 kernel, which probably is the major reasons that the standard 2.6 kernel scales so well up to at least 64 cpu:s, as this test showed.

    12. Re:Hrmm by dfiguero · · Score: 1

      From the article:

      "The 2.6 kernel is NUMA aware," said Cabaniols. Some patching was necessary, he said, but "all patches developed for the BigTux project are going into the mainstream Linux kernel and are included in standard distributions."

      --
      My penguin ate my sig
    13. Re:Hrmm by Handpaper · · Score: 1
      "The 2.6 kernel is NUMA aware," said Cabaniols

      Bah. I'll be impressed when it's self aware.

  6. For those wondering by Anonymous Coward · · Score: 0

    "serial processing" is most probably the linking step... "intermittent" probably means that they incrementally link groups of .o files, etc.

  7. Re:A little factoid for you by Anonymous Coward · · Score: 0

    Zero points for trying.

  8. Wow... by ZiZ · · Score: 0
    My kernel only goes up to 11.

    But seriously, this is pretty cool - though I think the best thing about multi-processor systems past two or four is really the ability to run virtualized servers with two or four dedicated CPUs each inside an uber-CPU'd system.

    --
    This flies in the face of science.
    1. Re:Wow... by Bingo+Foo · · Score: 1, Insightful
      It says something about how provincial the IT world has become when someone says "the best thing about multi-processor systems past two or four is really the ability to run virtualized servers with two or four dedicated CPUs each inside an uber-CPU'd system."

      There's a reason you pay so much more per CPU for an SMP or NUMA system, and it ain't for network services.

      --
      taken! (by Davidleeroth) Thanks Bingo Foo!
  9. I work on a SuperDome by atriusofbricia · · Score: 1

    I work on a SuperDome and would love to see it running Linux. HP-UX is such a pain!!!

    --
    I was raised on the command line, bitch

    "Nemo me impune lacesset"

    1. Re:I work on a SuperDome by SunFan · · Score: 2, Funny


      You must have high ceilings in your office.

      --
      -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
    2. Re:I work on a SuperDome by johannesg · · Score: 1

      Nah, he is responsible for keeping this thing clean...

  10. Re:A little factoid for you by elmegil · · Score: 1

    I would hardly consider a 26x speed up for a 64x processor multiple scaling "well". They can wave their hands and claim it's because part of their compile is single threaded, but until they demonstrat a real world app (or even standard benchmark; a kernel compile doesn't really qualify there either) that scales better than 26x, I'm not very impressed.

    --
    7 November 2006: The day Americans realized corruption and incompetence weren't addressing 11 September 2001
  11. Re:A little factoid for you by AngryElmo · · Score: 1

    Did you not RTFA? The compile test showed non-linear results, but other benchmarks did prove (near) linear.

  12. Re:MOD PARENT UP by Anonymous Coward · · Score: 0

    Quite informative and typical of what I've seen in a few other cases. Poorly "equipped" IT admin makes "dumb" but well-meaning proposal to switch to Linux/OSS. In some cases, he/she's kicked out on his/her butt (as in this case). Other times, the shop switches with disasterous results.

  13. excuse my ignorance by g0dsp33d · · Score: 2, Informative

    I know linux is pretty good from a security sence (compared to windows, at least), and I'm not surprised to find it operates on exotic setups, but is there that many programs out there that support such a setup? or ones that will actually benefit from this many processors? Or is the point of this system to develop custom business for their use? Or is it for a data server of some sort that can benefit from multiple cores answering requests?

    --
    lol: You see no door there!
    1. Re:excuse my ignorance by Anonymous Coward · · Score: 3, Insightful

      is there that many programs out there that support such a setup?

      As they say, if you have to ask, you don't need it.

      The point for stuff like this isn't the number of programs that will support it, it's that you already have *one* program that not only supports it, but requires it.

      Think weather modeling. It's a specialized application that requires massive CPU horsepower - and it's written specifically for the task at hand. This isn't something you'd pick up at Best Buy, or download from Freshmeat - it's a custom app that requires massive amounts of horsepower to do a specific task.

    2. Re:excuse my ignorance by AstroDrabb · · Score: 4, Informative
      There are still many uses for this many processors. Think of a monster DB. It is much easier to have more processors on you DB than to have many small systems and have to worry about syncing the data.

      Think about virtualization. I would love to have a 64-way system and break that up into 32 2-way systems or 16 4-ways systems. It would make system management much easier. And with software, you can instantly assign more processors in a virtualized system to a server that was being hit hard. So your 4-way DB can turn into a 8-way or 16-way DB in an instant. Once the load is gone, you set it back to a 4-way DB.

      I personally still prefer to load balance many smaller servers to save costs. However, this could be an excellent option for some enterprises. I know where I work we have some big Sun boxes and we just add processors as we need. However, that has proven to be rather expensive and virtualizing could help save some big costs.

      --
      If Tyranny and Oppression come to this land,
      it will be in the guise of fighting a foreign enemy. -James Madison
    3. Re:excuse my ignorance by Anonymous Coward · · Score: 0

      Oracle, WebSphere, etc.

    4. Re:excuse my ignorance by Anonymous Coward · · Score: 0

      Programs that create seperate processes to solve the problem in parallel are what will benefit the most, while programs that run entirely in a linear fashion will hardly benefit.

      make -j 64
      dist-cc
      a busy multiuser system....

    5. Re:excuse my ignorance by jd · · Score: 4, Interesting
      A 64-way system may or may not be useful. It depends on the speed of the interconnects, and the way it handles bus locking. (On a 64-way system, any given CPU can only have control of a given resource 1/64th of the time. Unless this is handled extremely well, this is Bad News.)


      In general, people use clusters of single or dual-processor systems, because many problems demand lots of hauling of data but relatively little communication between processors. For example, ray-tracing involves a lot of processor churning, but the only I/O is getting the information in at the start, and the image out at the end.


      Databases are OK for this, so long as the data is relatively static (so you can do a lot of caching on the separate nodes and don't have to access a central disk much).


      A 64-way superscaler system, though, is another thing altogether. Here, we're talking about some complex synchronization issues, but also the ability to handle much faster inter-processor I/O. Two processors can "talk" to each other much more efficiently than two ethernet devices. Far fewer layers to go through, for a start.


      Not a lot of problems need that kind of performance. The ability to throw small amounts of data around extremely fast would most likely be used by a company looking at fluid dynamics (say, a car or aircraft manufacturer) because of the sheer number of calculations needed, or by someone who needed the answer NOW (fly-by-wire systems, for example, where any delay could result in a nice crater in the ground).


      The problem is, most manufacturers out there already have plenty of computing power, and the only fly-by-wire systems that would need this much computing power would need military-grade or space-grade electronics, and there simply aren't any superscaler electronics at that kind of level. At least, not that the NSA is admitting to.


      So, sure, there are people who could use such a system, but I cannot imagine many of them are in the market.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    6. Re:excuse my ignorance by Anonymous Coward · · Score: 0

      What about a bunch of small systems all mounting the database from a common NAS, all connected with low latency interconnect?

    7. Re:excuse my ignorance by burns210 · · Score: 1

      What is the advantage of 64-way processor box in place of a HPCC of a few(dozen?) single or dual processor boxes that are cheaper?

      Why would I want a 16-way processor in place of 8 dual processor boxes with a gigabit backbone network to them?

    8. Re:excuse my ignorance by Anonymous Coward · · Score: 1, Informative

      Coordination of row locks updates across many machines would be slow and kind of a pain. It's way faster, easier, and more reliable to handle that in a single system image, than across several.

    9. Re:excuse my ignorance by AstroDrabb · · Score: 4, Insightful
      Why would I want a 16-way processor in place of 8 dual processor boxes with a gigabit backbone network to them?
      It all depends on what you are doing. Where I work we replaced a few bigger boxes with a bunch of smaller/cheaper boxes behind a load balancer for web apps. However, when it came to DB performance, the bigger boxes were much better. Well, at least to a point. Our 8-way DB was much better then are 4 2-way DB's. The cost wasn't much more, so an 8-way worked well.

      I do agree, that "big iron" is losing the power it once had. Especially when one can cluster a bunch of much cheaper 2-way boxes.

      --
      If Tyranny and Oppression come to this land,
      it will be in the guise of fighting a foreign enemy. -James Madison
    10. Re:excuse my ignorance by Sir+Nimrod · · Score: 4, Interesting

      Take this with a grain of salt, because I was part of the group that developed the chipset for the first Superdome systems (PA-RISC). I'm probably a little biased.

      A 64-way Superdome system is spread across sixteen plug-in system boards. (Imagine two refrigerators next to each other; it really is that big.) A partition is made up of one or more system boards. Within a partition, each processor has all of the installed memory in its address space. The chipset handled the details of getting cache blocks back and forth among the system boards.

      That's a huge amount of memory to have by direct access. Access is pretty fast, too.

      Still, they were doubtless pretty expensive. HP-UX didn't allow for on-the-fly changes to partitions, but the chipset supports it. (The OS always lagged a bit behind. We built a chip to allow going above 64-way, but the OS just couldn't support it. A moral victory.) Perhaps Linux could get that support in place a little more quickly....

      --
      The United States of America: We mean well.
    11. Re:excuse my ignorance by PornMaster · · Score: 3, Informative

      There are plenty of them on the market, and as the price comes down, there will be even more.

      To whom do you think HP has been selling the SuperDome line? And to whom has Sun been selling the E10/12/15K?

      One of the benefits of using a huge multiprocessor Sun box, though, besides the massive numbers of CPUs you can have in a single frame running under a single system image is the ability to dynamically reconfigure resources, like a few other posters have touched on.

      Imagine this... you have a box with 64 CPUs and 128GB of RAM. During the day, you have developers who are working with 16 CPUs and 32GB of RAM, working on the next generation of the database you'll be running for your business. A development domain.

      You have another domain of 16 CPUs and 32GB as a test domain. Like when stuff goes out to beta, you run tests on the stuff you've pushed out from your development copy to see if it's ready for prime-time.

      You have a third domain of 32 CPUs and 64GB in which you run production. It's a bit oversized for your needs for the work throughout the day, but it's capable of handling peak loads without slowing down.

      Then, you have a nightly database job that runs recalculating numbers for all the accounts, dumping data out to be sent to a reporting server somewhere, batch data loads coming in that need to be integrated into your database. Plus you have to keep servicing minimal amounts of requests from users throughout the night, but hey, nobody's really on between 10PM and 4AM.

      Wouldn't it be nice to drop the dev and test databases down to maybe 4CPUs if they're still running minimal tasks, and throw 56CPUs and 112GB of RAM at your nightly batch jobs? They get what's almost the run of the machine... until you're done with the batch jobs. Then you shrink production back to half the machine, and boost up the test and dev to a quarter each... so everyone's happy when the day starts.

    12. Re:excuse my ignorance by Anonymous Coward · · Score: 0

      Are you advertising for Oracle RAC? :-)
      Check out the Linux benchmarks.

    13. Re:excuse my ignorance by jd · · Score: 2, Informative
      A SSI cluster that supported roles for defining the distribution of tasks would probably be more cost-effective. You'd also need Distributed Shared Memory, though, and distribution of threads as well as processes.


      Having the entire engine on one multi-way motherboard wouldn't really gain you much, because none of the work you described needs tight interconnects.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    14. Re:excuse my ignorance by Anonymous Coward · · Score: 0

      you can calculate pi, Really really fast.

    15. Re:excuse my ignorance by ppanon · · Score: 2, Informative

      You'd also need Distributed Shared Memory, though, and distribution of threads as well as processes.

      Right, and that's exactly the situation where a single honking box is going to kick on any kind of cluster that's connected more loosely than what you get with a high-cpu count multiprocessor box.

      It all depends on how much interdependence on memory access between threads/processes (i.e. how well you can partition your data set to match your cluster topology). Often, it's a lot cheaper for a company to buy a $200,000 box than to throw three top-level programmers at rewriting the problem for a cluster (assuming they have the source code and it's a problem that can be tackled that way). Of course, companies like Microsoft or Oracle or IBM can sell enough copies to make that development worthwhile, but because the market is fairly small, they don't get their usual 90% profit margin, even when they charge many more times what they charge for the non-clustered versions. The only reason some of them do it is competition for bragging rights on benchmarks like TPC.

      Having the entire engine on one multi-way motherboard wouldn't really gain you much, because none of the work you described needs tight interconnects.
      Say what? Databases don't need tight interconnects in a dynamic scaling environment? Are you planning on repartitioning that 180GB data set each day yourself or were you planning on hiring a handful of university interns to do it?

      --
      Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
    16. Re:excuse my ignorance by afabbro · · Score: 4, Insightful
      It would make system management much easier.

      I prefer to say "might" make systems management much easier. The problem with the One Big Box is the same whether it's Sun, HP, Linux, etc.:

      • Something bad happens to the One Or Two Critical Components. If you know of any open systems box has no one single point of failure, I'd sure like to see it. If you want one big box without a single point of failure, you buy a mainframe. Every open systems big box I'm aware has at least one or ten SPOFs...and I've had the backplane go out on more than one Sun E10K. At that point, you don't lose just one system, you lose everything if you've consolidated to one 64-way box.
      • It's time to do some hardware maintenance. Good luck coordinating that with 32 different user groups. "Ah, but we can do everything hot with this big box." Always sounds good on paper. I've always run into things for each of them that required a power-off maintenance.
      • Or perhaps it's not even maintenance...it's just something weird. I had a Big Box once where a power supply made a popping noise and emitted a small puff of smoke. It burned out. Not a big deal in the end - it could be replaced hot - but it was a nervous couple of hours. Versus a cluster where you'd fail over to the spare (yes, I know you could cluster your Two Big Boxes, but we start getting into financial justifications).
      • ISVs say things like "You want to run XYZ 1.0 on your 64-way box? That's a tier 9 platform and that will be $100,000, thank you." "But I'm only using it on one 2-way partition!" "You might dynamically reconfigure it after we sell you the license and our software isn't that smart, so it's $100K or no deal. And then you can use it on all your partitions!" "But I don't need it on all of them!" You'd be amazed how many prominent software companies tier based on the overall box and don't support virtual partitions, etc. from a licensing perspective. And you're guaranteed to have a user who needs one of their products.
      • Department B bought SAN gizmo X and your big box is exotic enough that there is no driver for it. They really want SAN gizmo X, so they go off and buy a new 4-way box for themselves. Or they want to run SuSE and SuSE doesn't support your box. Or everyone wants his own gig-E or two and you don't have 128 ports out the back. Etcetera - there are lots of scenarios where you can't get the technical architecture brainiacs to think ahead or you can't get the vendors' stars to line up and you wind up with people who don't want to be on the big box...and pretty soon the data center is proliferating again.

      Etcetera...of course, there are just as many if not more problems with the "we'll just build a giant cluster of 64 boxes and scale across it!" approach...I'll rant on that some other day.

      It's all trade-offs. And no matter which way you go, you'll discover some truly ugly hidden costs that never seem to show up in those vendor white papers. And none of it works exactly the way it should or you'd like it to. But I'm not jaded or anything ;)

      --
      Advice: on VPS providers
    17. Re:excuse my ignorance by Bill+Currie · · Score: 1
      vising and lighting quake (or any other fps) maps would be one application close to many /.ers' hearts. Of course, the tools themselves have to support threading (qfvis and qflight from quakeforge do), but that's just a minor detail :)

      Instead of taking a day to compile a complex map, it could be done in the time it takes to brew a jug of coffee :)

      --

      Bill - aka taniwha
      --
      Leave others their otherness. -- Aratak

    18. Re:excuse my ignorance by Anonymous Coward · · Score: 0

      You unix guys* are so cute.

      Sheesh, OS/400 has done this sort of thing since, at least, 1990.

      *yeah, I'm one as well.

    19. Re:excuse my ignorance by g0dsp33d · · Score: 1

      Now if you could only apply that processing power to running the game(s). muhahaha I dub it the gib-o-nator!

      --
      lol: You see no door there!
    20. Re:excuse my ignorance by SunFan · · Score: 2, Funny


      The answer, of course, is to have a hot spare E10K! Doesn't everyone have an extra one lying around?

      --
      -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
    21. Re:excuse my ignorance by jd · · Score: 2, Informative
      No, databases don't need tight interconnects, unless the data is changing rapidly, relative to the number of queries.


      I'd personally expect to see a system where common views of the data were cached locally, where the "authoritative" database was accessed via a SAN rather than the processor network, and where interprocessor communication was practically nil. There's not a whole lot that different threads would need to sent to each other.


      The whole point of SANs, "local busses" and other such technologies is to take all the heavy work off the lines that need to be highly responsive. It's generally better to have several specialized networks than one network that over-generalizes and is therefore not as good at any specific thing.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    22. Re:excuse my ignorance by jedidiah · · Score: 1

      Actually, it seems to be much easier to manage global locking between a bunch of smaller databases than trying to scale a single database to an absurd size. Database applications by their nature are either highly parallel or they will choke your RDBMS.

      A database server of size 1/n is much more common,
      well understood and debugged problem then a database
      server of size n.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    23. Re:excuse my ignorance by jedidiah · · Score: 1

      The key with the database problem is how tightly coupled the changes would be. If the data can be effectively partitioned at runtime, then it should cluster quite well. Now if the data can't be partitiond at runtime then you are going to have scalability problems with a single system image anyways.

      Everyone is going to be blocking on the data that everyone else wants.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    24. Re:excuse my ignorance by Cajal · · Score: 2, Informative

      Just remember that almost no open-source databases use parallelized algorithms. PostgreSQL, Firebird and MySQL certainly don't. OpenIngres is the only one I know of with a parallel query engine. By this I mean the ability of a single query to use multiple processors (say, for handling a complex join and a large sort). The only way PG, FB and MySQL can use multiple CPUs is if you have multiple queries running. But for OLAP-style workloads, you won't see much benefit from SMP.

    25. Re:excuse my ignorance by Anonymous Coward · · Score: 0

      Superdomes use a non-blocking crossbar, so your assumption is faulty.

    26. Re:excuse my ignorance by Anonymous Coward · · Score: 0

      Ah, another ex-con on slashdot.

    27. Re:excuse my ignorance by RandomIO · · Score: 1

      In a superdome, you can create an N-Par, which is a hardware partition. This partition gets it's own hardware resources. These resources (including processor) are physically isolated from other N-Par's. You can power down, upgrade, remove, hot swap, etc... without affecting the other N-Pars. If you do V-PAR (Virtual Partitions) then you run into the sharing of resource problem.

      Each N-PAR can run it's own Heterogeneous OS (HP-UX, Linux, Windows or soon OpenVMS). These can all be run simultaneously.

    28. Re:excuse my ignorance by jd · · Score: 1

      Agreed, completely. That's actually one reason why supercomputer experts get paid serious money - it's not easy to partition data effectively.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  14. Re:A little factoid for you by AstroDrabb · · Score: 5, Informative
    A little factoid for you
    Where did you get your facts from? You are way off champ or should I say troll?

    While FreeBSD is a great OS/kernel, it doesn't scale as well as Linux, end of story.

    Until you start talking about double that amount of procs, which is what Windows Server does these days
    Huh? What smoke are you craking? Here is the comparison of MS's latest and greatest Windows 2003 server editions
    Web Edition supports up to 2-way.
    Standard Edition supports up to 4-way.
    Enterprise Edition supports up to 8-way.
    Datacenter Edition supports up to 64-way.
    So, umm where is this double of what Linux supports? Plain vanilla Linux 2.6 can do 64-way no problem. Actually, SGI has had single image 128-way Linux system out for a while. They should have 256-way, single image Linux system out soon. That is more then MS can even touch. Maybe do some research before you just shoot off FUD.
    --
    If Tyranny and Oppression come to this land,
    it will be in the guise of fighting a foreign enemy. -James Madison
  15. Re:A little factoid for you by Anonymous Coward · · Score: 0

    What's more they estimated the parallel element of the compile workload had scaled linearly.

  16. Re:A little factoid for you by Cheeze · · Score: 2, Insightful

    Hey, at least they tried. How many news articles have you read that compares linux kernel compiles on a 64 processor machine? probably only one.

    it took 19 minutes to compile with a single cpu, and 26x faster for the 64 processor machine. Does that equate to about 43 seconds for a kernel compile? It'd probably take longer than that just to untar/unbzip2 the source, since that would be running on only 2 cpus (one process for tar, one for bzip2).

    --
    Why read the article when I can just make up a snap judgement?
  17. Re:A little factoid for you by tetromino · · Score: 3, Informative

    If it can scale to 16 procs well, it will scale to 64 procs well.
    Until you start talking about double that amount of procs, which is what Windows Server does these days


    Wrong. Windows Server 2003 supports a maximum of only 64 processors, and I believe it was significantly tested only on 32-way and smaller machines.

  18. Threads vs. Processes by Dancin_Santa · · Score: 5, Insightful

    Looking at the literature, Linux and Unix in general seems to be designed to keep processes as lightweight as possible. OTOH, Windows processes are a little heavier and take longer to start up.

    Then, OTOH, Windows threads are very lightweight compared to the equivalent thread model in Linux. Benchmarks have shown that in multi-process setups, Unix is heavily favored, but in multi-threaded setups Windows comes out on top.

    When it comes to multi-processors, is there a theoretical advantage to using processes vs threads? Leaving out the Windows vs Linux debate for a second, how would an OS that implemented very efficient threads compare to one that implemented very efficient processes?

    Would there be a difference?

    1. Re:Threads vs. Processes by Anonymous Coward · · Score: 0

      Linux thread creation and scheduling is significantly improved in 2.6 vs. 2.4 kernels, such that it is now more biased performance-wise to executing many threads per process rather than many processes.

      I'm not sure where you heard about Windows having better threading. On multiprocessor systems, Linux has a much more scalable scheduler than Windows, and the benchmarks show it.

    2. Re:Threads vs. Processes by Anonymous Coward · · Score: 2, Informative

      Nice thing about processes is that they do not share memory. As such, the processes will be localized as would all the memory access. OTH, if you had just ONE big process loaded with nothing but threads, you would likely find the memory backplane going into highgear as data would be moved around abit.

    3. Re:Threads vs. Processes by IHateSlashDot · · Score: 1
      First of all, Windows does not have very efficient theads. OK, compared to Linux they might be good, but then compared to Linux my old VIC20 had efficient threads.

      But to answer your question, you only have to look at Solaris. It can easily linearly scale a single process/multithreaded applicated up to 64 processors. I've done it. I would have tested it further but we only had a 64way machine.

      Of course Solaris can trivially scale a multiprocess application to 64 processors. I can't understand an operating system that would have have problems with that. If its is a big deal that Linux can do this then its pretty sad.

      Linux is no where close to scaling its threads up to 64 processores. It can barely do well at only 4 processors in a multi threaded app.

    4. Re:Threads vs. Processes by Anonymous Coward · · Score: 0

      Sorry boss, SGI have recently been posting results on lkml of the performance of a multithreaded app scaling up to 512 CPUs.

      Solaris shmolaris.

    5. Re:Threads vs. Processes by AstroDrabb · · Score: 3, Informative
      First of all, Windows does not have very efficient threads. OK, compared to Linux they might be good
      Linux is no where close to scaling its threads up to 64 processors.
      Dude, what crack are you smoking? Have you used any _recent_ Linux thread? LinuxThreads is an implementation of the Posix 1003.1c thread package.
      Unlike other implementations of Posix threads for Linux, LinuxThreads provides kernel-level threads: threads are created with the new clone() system call and all scheduling is done in the kernel.

      The main strength of this approach is that it can take full advantage of multiprocessors. It also results in a simpler, more robust thread library, especially w.r.t. blocking system calls.

      Oh, and if you think the latest implementation of Linux thread are slower, especially slower then MS Windows, you are an idiot. Here is are some test from IBM. Current Linux threads were spawning at more then 10,000 PER SECOND while MS Windows was spawning barely 6,000. Linux Thread performance, scroll down to the "pretty" graphs. Oh, and these numbers are higher then Solaris. Linux threads and Linux processes spawn _MUCH_ faster then the best MS has to offer and faster then Solaris.

      --
      If Tyranny and Oppression come to this land,
      it will be in the guise of fighting a foreign enemy. -James Madison
    6. Re:Threads vs. Processes by neuro88 · · Score: 1, Informative

      LinuxThreads is the old implementation still used in vanilla 2.4. It wasn't entirely POSIX compliant I believe (but very close).

      IBM worked on their own threading implementation for linux (NGPT) that was 2 times the speed of LinuxThreads. Then NPTL was developed which was 4 times the speed of IBM's implementation.

      I believe the link you provide are the benchmarks for IBM's implemenation (but not sure, I merely skimmed through).

      Anyway, here are some good links on NPTL and NGPT:
      http://kerneltrap.org/node/429?PHPSESSID=d1 755784e 3d90d637b774f233d5b8f42
      http://kerneltrap.org/nod e/422

    7. Re:Threads vs. Processes by tetromino · · Score: 5, Informative

      Have you used any _recent_ Linux thread? LinuxThreads is an implementation of the Posix 1003.1c thread package.

      Dude, get with the times, LinuxThreads are obsolete. Kernel 2.6 / glibc 2.3 use NPTL, which launches new threads four times faster than LinuxThreads, allows you to have more than 8192 threads per process, doesn't require you to have lots of manager threads that don't do anything useful, delivers signals to threads as opposed to processes, and is actually more-or-less POSIX compliant.

      I've been using NPTL on my workstation for 12 months, and I haven't looked back (except when early versions of Mono were incompatible with NPTL). You talk about "any _recent_ Linux thread" - but it looks like you are using a Debian Woody...

    8. Re:Threads vs. Processes by Anonymous Coward · · Score: 0

      Single-threaded processes are easier to debug. That's one of the reasons why Unix was originally designed to be made of so many different, small, component programs: it's easier to change/fix/improve them and still have a fully functional system at the end of the day.

    9. Re:Threads vs. Processes by AstroDrabb · · Score: 1

      I know NPTL is much better. I have been using them on my Linux boxes for a while now. The GP was talking about how MS windows threads were "faster" then Linux threads. That is why I posted a link to the _older_ Linux threads showing how much faster they were. And since NPTL, there is no compitiion.

      --
      If Tyranny and Oppression come to this land,
      it will be in the guise of fighting a foreign enemy. -James Madison
    10. Re:Threads vs. Processes by harrkev · · Score: 1
      IBM worked on their own threading implementation for linux (NGPT) that was 2 times the speed of LinuxThreads.
      So why hasn't this been rolled back into the official kernel? Does it totally suck for uni-processor systems?

      So far, it seems that IBM has been willing to share their candy with OSS people.
      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    11. Re:Threads vs. Processes by neuro88 · · Score: 1

      I'm sorry I wasn't more clear. NGPT has not been rolled into the official kernel, because NPTL, which is 4 times faster than NGPT (and thus 8 times faster than the old LinuxThreads), has been used instead.

      NPTL has been found in Linux 2.6 since 2.6.0 (in fact, I believe it was in 2.5 for a while too). You also need a somewhat recent release of glibc.

      I brought up NGPT because I think it was the implementation used in the IBM benchmarks found in the link of the parent of this thread. So I thought it might be useful to point out that the threading implementation that did find its way into the mainline kernel is quite a bit faster than the IBM implementation used in the benchmark.

  19. HP by ReeprFlame · · Score: 1

    I like the way HP is taking their software distributing with offering Linux as a solution along with AMD processors. Dell attempts it but only for servers. HP I believe does the same, but at least them seem like they care more. nd that is what matters if we are going to be pushing Linux into enterprises AND the home... Hooray for HP!

  20. Re:A little factoid for you by Anonymous Coward · · Score: 0

    Minor correction. SGI have 512-way single system image Linux computers out there *now*. They are aiming to soon go to 1024 and 2048 way.

  21. Re:HP - more to come... by Anonymous Coward · · Score: 0

    Check out AMDZone and the Inq.

  22. As a matter of fact,.... by Anonymous Coward · · Score: 0

    MS now recommends just this kind of a system for a desktop when doing Longhorn :).

  23. Re:The "Tux" Got Me Fired, Guys by Anonymous Coward · · Score: 0

    wouldn't it have been better to suggest something more comercial like xandros or linspire using evolution for the groupware ?

  24. Re:A little factoid for you by AstroDrabb · · Score: 4, Informative
    This is where reading TFM whould help.
    In the STREAM benchmark, memory bandwidth rose from 5GB/s with one 'cell' of four processors, to 10GB/s with two cells, and continued to double until all 64 processors -- or all 16 cells -- were switched on to provide 80GB/s of bandwidth.

    The HPL benchmark, which is used to measure performance when solving large linear equations, produced similar results, rising from 18 gigaflops with one cell of four processors to 277 gigaflops with all 16 cells, or 64 processors, running.

    --
    If Tyranny and Oppression come to this land,
    it will be in the guise of fighting a foreign enemy. -James Madison
  25. Re:A little factoid for you by MBCook · · Score: 4, Insightful
    This is almost troll, as far as I'm concerned.

    First of all, a 26x speedup is GOOD. That said, if you are trying to use a cluster of 64 Itanium 2 processors to compile things, you're an idiot. IIRC, the long pipeline and VLIW, highly scheduled, architecture of the Itanium 2 make it bad at compiling. You could get that performance with cheapter Athlon 64s or Xeons. Not only that, but compiling one thing will ALWAYS be partly serial. Now if they were to compile multiple things (say 3 kernels, or the kernel, X, and KDE) at the same time, they should see closer to that 64x speedup. It's all about how much you can make parallel.

    Which is something else. If you were to give that same thing a better application, it WOULD give you near 64x performance. If you used it to batch convert WAVs to MP3s, or RAW images to JPEGs, or MPEG4 to DiVx, or even just raytrace images (all things where no part is dependant on another part so they are highly parallizable), things will go great. In the article, they give the example of some bandwidth benchmark where the bandwidth scales almost perfectly with the number of processors they throw at it.

    PS: Interesting fact I saw the other day. The human brain can only do about 200 operations per second, which is why computers are much faster at math. But the brain can do MILLIONS of things at once. So while it may only be able to process the image from our eyes at 200 "operations" per second, it do that for the millions of little bits of information all at once, which is why people are so good at visual things, pattern matching, chess, etc. Just FYI.

    --
    Comment forecast: Bits of genius surrounded by a sea of mediocrity.
  26. Re:A little factoid for you by pantherace · · Score: 2, Informative

    NASA's Columbia cluster ^ 512-way SGI machines running Linux (actually 20 of them...) Not to mention "Columbia's record results were achieved running the LINPACK benchmark on 8,192 of the NASA supercomputer's 10,240 processors. Columbia also achieved an 88 percent efficiency rating on the LINPACK benchmark, the highest efficiency rating ever attained in a LINPACK test on large systems." from http://www.sgi.com/company_info/newsroom/press_rel eases/2004/october/worlds_fastest.html

  27. Grrr...64 processors? by Rick+Zeman · · Score: 0, Offtopic

    The hell with that--I just want a wireless driver for my Dell (Broadcom) PCI card. :-(

    1. Re:Grrr...64 processors? by slavemowgli · · Score: 1

      Write one.

      --
      quidquid latine dictum sit altum videtur.
    2. Re:Grrr...64 processors? by dicepackage · · Score: 1

      If you use ndiswrapper (http://ndiswrapper.sourceforge.net) you can use your Windows drivers with linux. Just be aware that it isn't as easy as getting other Wireless cards working in linux. Just make sure to follow the instructions step by step and you shouldn't have any problems.

    3. Re:Grrr...64 processors? by DeepHurtn! · · Score: 1

      There's not too much anyone can do as long as Broadcom keeps on acting like jerks. I'm suffering from the same problem as you on my laptop, but it's important to recognise that it's purely Broadcom's fault.

  28. Re:A little factoid for you by afidel · · Score: 3, Interesting

    Correct, AFAIK the biggest windows 2003 datacenter installs are on Unisys ES7000's and those only support 32-way windows partitions. The box can hold 64 Xeon's so I would say that Unisys isn't comfortable with the scalability of windows to the full system size, otherwise they'd be shouting it from the rooftops.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  29. welll isnt that sweet by Exter-C · · Score: 1

    thats great to know that the kernel can handle 64way machines.. Especially since i just ordered one from my local pc store in bits to build myself..

    Really the key will be when the system scales to 128processors and beyond.

    1. Re:welll isnt that sweet by Corbie · · Score: 1

      Actually, from the article it seems we are talking about 64 processor systems, not single 64-bit chips. Bit of a difference.

  30. Re:MOD PARENT UP by Anonymous Coward · · Score: 0

    Ah, shut up and get back to work, Bill!

  31. Wow by Anonymous Coward · · Score: 5, Funny

    Never mind Linux for a moment, I'm just amazed that 64 Itanium 2's have actually been sold...

    1. Re:Wow by SunFan · · Score: 1


      No, it turns out they were running an emulation of 64 Itanium CPUs on the surplus HP Itanium workstation they pulled out of the storage closet.

      --
      -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
    2. Re:Wow by Anonymous Coward · · Score: 0

      big deal - my c=64 was like all decked out with neon and 3" above the ground

  32. Re:A little factoid for you by Anonymous Coward · · Score: 0

    Wrong! wrong! you (and the other guy who replied) are wrong.

    Just take a look at some of the public benchmarks. Just take a look at what some folks are currently running in production. Do a little work and see what's publicly referenced out there. Windows 2003 scales quite nicely up to 64 processors (and 512 Gb). Worst case is 1.7x from 32p to 64p. Some apps 1.8x 1.9x. Depends also on whether you're interleaving memory across the backplane or whether (if you can) you know enough about tweaking your app ( and the hardware allows it) to run your memory locally. Soon to be 1 Tb RAM and there's been a box in Redmond that's been running 128p now for some time (here's a hint, kiddies, it's not a Unisys). BTW, how many procs can a Unisys support in a single domain?

  33. Interesting. by jd · · Score: 2, Insightful
    The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.


    To be efficient, the processors would need gigantic caches, to keep the load on the rest of the system down. Either that, or you COULD run the CPUs out of step over a bus that is 64 times faster than normal. I'd hate to be the person designing such a system, though.


    Now, this system could be of extreme interest in the supercomputer world. One of the biggest complaints about clustering is the poor interconnects. This would seem to get round that problem. A Blue Gene-style cluster where each node is a 64-way SMP board, and you're running a few thousand nodes, would likely be an order of magnitude faster than anything currently on the supercomputer charts.


    On the other hand, do we need to know what the weather is not going to be, ten times as often?

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:Interesting. by Anonymous Coward · · Score: 5, Informative

      The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.

      No, you have a misconception. On these REAL big iron systems, each CPU (or each few CPUs) does have its own busses, memory, and io busses.

      So in that regard it is as good as a cluster, but then add the fact that they have a global, cache coherent shared memory and interconnets that shame any cluster.

      The only advantage of a cluster is cost. Actually redundancy plays a role too, although less so with proper servers, as they have redundancy built in, and you can partition off the system to run multiple operating systems too.

      To be efficient, the processors would need gigantic caches, to keep the load on the rest of the system down. Either that, or you COULD run the CPUs out of step over a bus that is 64 times faster than normal. I'd hate to be the person designing such a system, though.

      Now, this system could be of extreme interest in the supercomputer world. One of the biggest complaints about clustering is the poor interconnects. This would seem to get round that problem. A Blue Gene-style cluster where each node is a 64-way SMP board, and you're running a few thousand nodes, would likely be an order of magnitude faster than anything currently on the supercomputer charts.


      Not really. Check the world's second fastest supercomputer. It is a cluster of 20 512-way IA64 systems running Linux.

    2. Re:Interesting. by kesuki · · Score: 1

      20 512-way IA64
      So what you're saying is that this 64-way linux setup is behind the game ;) that this article is only news in that YAN computer company is offering a barely there mass CPU enterprise linux server ;)

    3. Re:Interesting. by jd · · Score: 4, Insightful
      Global shared-memory can be done on OpenMOSIX, using the Migshm extension, which provides you with Distributed Shared Memory.


      The Altix uses 4-way CPU "bricks", along with networking and memory bricks, which you can then use to assemble a system. Yes, resources are visible globally, and it is a LOT faster than a PoP (pile-of-pcs) cluster using ethernet, but it is still a cluster of 4-way nodes.


      It also doesn't avoid the main point, which is that any given resource can only be used by one CPU at a time. If processor A on brick B is passing data along wire C, then wire C cannot be handling traffic for any other processor at the same time. That resource is claimed, for that time.


      When you are talking a massive cluster of hundreds or thousands of CPU bricks, it becomes very hard to efficiently schedule the use of resources. That's one reason such systems often have an implementation of SCP, the Scheduled Communications Protocol, where you reserve networking resources in advance. That way, it becomes possible to improve the efficiency. Otherwise, you run the risk of gridlock, which is just as real a problem in computing as it is on the streets.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    4. Re:Interesting. by Anonymous Coward · · Score: 0

      Are you a stock holder in SGI?

    5. Re:Interesting. by Anonymous Coward · · Score: 0

      Are you a stock holder in sucking my anus?

    6. Re:Interesting. by Anonymous Coward · · Score: 2, Interesting

      The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.

      Are you a cluster salesman by chance?

      A "big iron" system like one of these has exactly the same CPU-memory ratio as any cluster box - they are COMMODITY CPUs, you put 2-4 of them per bus in these big systems just as you put 2-4 of them on a bus in each box of a cluster. And each of these buses has a chunk of memory located off that bus right next to those CPUs, and an interface to IO as well. So your implication that clusters are somehow "faster" because nothing is shared is ludicrous - one of these big boxes can do exactly the same thing.

      The difference between a cluster and a big iron setup like these is "What happens when I need to get to memory/other CPUs/disk that is not local to the CPU?"

      And that's where clusters suck. While a big, single-image system can have a processor on its own bus with its own memory and disk just as well as a cluster can, when a cluster needs to get at non-local stuff, it has to spend micro to milliseconds pushing those transactions through a few network layers out onto a slow physical net where they then have to be readdressed once they arrive at the remote system and accepted and interpreted by that operating system. In one of these big systems, remote resources look exactly like local resources, except for access time, which instead of taking micro or milliseconds, takes nanoseconds.

      And this isn't new either, supercomputers have been doing this since the 80's. How you figure multiple CPUs running separate OS's over ethernet is faster than multiple CPUs running under the same OS on a NUMA archetecture is beyond me.

    7. Re:Interesting. by Anonymous Coward · · Score: 1, Funny


      I'd like to be, but the waiting list is too long.

    8. Re:Interesting. by Anonymous Coward · · Score: 2, Informative

      Global shared-memory can be done on OpenMOSIX, using the Migshm extension, which provides you with Distributed Shared Memory.

      There is a world of difference between emulating it with the operating system / programming environment, and having hardware cache coherent global shared memory.

      The Altix uses 4-way CPU "bricks", along with networking and memory bricks, which you can then use to assemble a system. Yes, resources are visible globally, and it is a LOT faster than a PoP (pile-of-pcs) cluster using ethernet, but it is still a cluster of 4-way nodes.

      No it is not. The big difference is that it isn't just "networking" them anymore than 2 CPUs on a SMP motherboard are networked. It is a specialty interconnect with higher bandwidth and lower latency than you'll find in anything you think of as a network. It also directly carries the cache directory protocol on the wire rather than TCP packets.

      It is not a cluster. If you think it is then you either don't know what a cluster is or you don't know what an Altix is.

      It also doesn't avoid the main point, which is that any given resource can only be used by one CPU at a time. If processor A on brick B is passing data along wire C, then wire C cannot be handling traffic for any other processor at the same time. That resource is claimed, for that time.

      I'll repeat it for you for the 100th time. This does not get any better in a cluster. In fact, it gets *much* worse because the latency and bandwidth on the interconnect is so much worse.

      Why do you think people pay so much money for one when they could get 1000 cheap P4's and cluster them? Do you seriously think you know more about the subject than the people making and buying these things? (Hint: you don't)

    9. Re:Interesting. by ptbarnett · · Score: 2, Interesting
      The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.

      If you were to read more about Superdome, you would find that each set of 2 or 4 processors have their own memory, and PCI I/O bus, comprising what is called a "cell".

      The memory and I/O devices in a cell are accessible to all the other cells via a interconnect. The speed, latency, and bandwidth varies based on how "distant" the destination cell resides from the source cell, but it is still much faster than most clusters.

    10. Re:Interesting. by jd · · Score: 1
      Why do you think people pay so much money for one when they could get 1000 cheap P4's and cluster them?


      You mean, the way most people do? I don't know there are accurate figures out there, but I'd be willing to bet that there are more MOSIX, OpenMOSIX, OpenSSI and Beowulf clusters in production enterprise environments than there are Origins, Altix' and Crays.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    11. Re:Interesting. by Anonymous Coward · · Score: 0

      No you stupid blathering moron, I'm talking about the ones who do buy the Altixes. Why do you think they don't buy clusters?

      You're so slow.

  34. That should please Orion Multisystems by toby · · Score: 1
    ...who are selling very affordable, low power 12 CPU desktop & 96 CPU deskside clusters.

    Imagine a Beo-- oh wait...

    --
    you had me at #!
  35. Maybe you should... by dickeya · · Score: 0

    ... use 10 faster processors.

  36. Re:A little factoid for you by ikekrull · · Score: 1

    Yeah but how fast does the 128-processor beast from Redmond compile the Linux kernel?

    --
    I gots ta ding a ding dang my dang a long ling long
  37. Mandatory... by Anonymous Coward · · Score: 0

    Imagine a Beowulf cluster of those!

    1. Re:Mandatory... by Tablizer · · Score: 3, Funny

      Mandatory... Imagine a Beowulf cluster of those!

      Two mod points if you can work a good goatse or overlord joke into this topic. Although, the thought of a 64-way goatse overlord gives me the jeebies.

    2. Re:Mandatory... by tuxter · · Score: 1

      I for one welcome our new 64 way penguin overlords.

      There, done.

    3. Re:Mandatory... by Anonymous Coward · · Score: 0

      Just a thought, but wouldn't a 64-way
      goatse be a "cluster fsck"?

    4. Re:Mandatory... by Bastiaan · · Score: 1

      Your comment got 'Score: 3, Funny'
      Wow, this is selffulfilling! You should have said four mod points instead of two.

  38. Re:A little factoid for you by Anonymous Coward · · Score: 0

    Ahh lick my balls. I'll believe you when I see the benchmark results.

    OOooh, Redmond has a crappy 128-way now, do they? Pity Linux has been running on 512-ways IN PRODUCTION for the past year, and SGI are looking up to 1024 and 2048 way.

    Windows is a heap of shit.

  39. Re:A little factoid for you by bovinewasteproduct · · Score: 1

    While FreeBSD is a great OS/kernel, it doesn't scale as well as Linux, end of story.

    Well I hope the article is wrong concerning how long it took to compile that kernel using a single processor Itanium 2...19 min?

    Thu Jan 13 03:22:14 MST 2005
    Thu Jan 13 03:41:52 MST 2005
    make buildworld = 19.75 min

    Tue Jan 18 21:32:08 MST 2005
    Tue Jan 18 21:35:54 MST 2005
    make buildkernel = 4 min

    That is 25 min to compleatly rebuild FreeBSD 4.11 from source.

    This is a P4 2.55 (no HTT) with on 1GB ram and PATA disks running FreeBSD 4.11. It was runing an X server, acting as a NAT router for my internal network, DNS server, web server and general purpose workstation (including SetiAtHome active).

    It took my 6.0-Current (Sempron 2400+ 512MB/PATA) box 12 min for the kernel with ALL the debugging (aka WITNESS/INVARIANTS/DEBUGGERS) stuff in the compiling kernel.

    Even a single processor Itanium 2 should have blown EITHER of my two boxes away.

    Maybe they should concentrate on getting good performance from a single processor (which is way more common) before adding more CPUs (walk before running???).

    BWP

  40. Re:A little factoid for you by AstroDrabb · · Score: 1

    Are they 512-way single image systems? If so, that is pretty impressive!

    --
    If Tyranny and Oppression come to this land,
    it will be in the guise of fighting a foreign enemy. -James Madison
  41. Interesting. Almost exactly a year ago... by gnunick · · Score: 3, Interesting
    IBM packs 64 Xeons into a single server (Jan 15, 2004)
    "[CTO of IBM's xSeries server group Tom Bradicich] acknowledges that there are challenges in producing such a large system -- including building support into Windows and Linux, neither of which are suited for 64-processor systems today"

    Looks like someone was up to those challenges, eh? 64-processor support *and* 64-bit support. Awesome news.

    --
    I have no special gift, I am only passionately curious. --Albert Einstein
  42. Re:A little factoid for you by Anonymous Coward · · Score: 1, Informative

    Yes. From the link:

    Brooks and his team instead pointed to Kalpana, an Intel® Itanium® 2-based, 512-processor SGI® Altix® 3000 system in use at NASA Ames since November 2003 and named to honor Kalpana Chawla, a NASA scientist lost in the Columbia accident.. In less than six months, Taft says, the Kalpana system - the first 512-processor Linux® system ever to operate under a single Linux kernel - had revolutionized the rate of scientific discovery at NASA for a number of disciplines. On NASA's previous supercomputers, simulations showing five years worth of changes in ocean temperatures and sea levels were taking 12 months to model. But on the SGI® Altix® system, scientists could simulate decades of ocean circulation in just days, while producing simulations in greater detail than ever before. And the time required to assess flight characteristics of an aircraft design, which involves thousands of complex calculations, dropped from years to a single day. "That kind of leap is incredible," says Taft. "What took a year on the best computing technology previously available, we could now accomplish in days on the Altix system."

  43. Will we ever see by stratjakt · · Score: 3, Interesting

    Smaller, say 4 or 8 way NUMA boards, that are within the means of the average geek?

    I'm not talking about mere mortal SMP systems, I wan't all the crazy memory partitioning and whatnot.

    --
    I don't need no instructions to know how to rock!!!!
    1. Re:Will we ever see by SunFan · · Score: 2, Interesting


      8-way multicore chips will be available within a year. Not exactly NUMA, but they'll probably have other nuances to keep you entertained.

      --
      -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
    2. Re:Will we ever see by joib · · Score: 1

      AFAIK, all multi-cpu AMD Opteron systems are NUMA. Each processor connects to its own memory (since the processors sport on-die memory controllers), access to memory attached to other processors is via the hypertransport link.

    3. Re:Will we ever see by iabervon · · Score: 1

      This came up on the lkml recently. Cheap multi-Opteron systems are often set up to have a single bank of memory rather than two, and are actually uniform. More expensive ones have the two banks, although you can generally make them uniform with a BIOS setting (which is occasionally the default).

  44. Re:A little factoid for you by AstroDrabb · · Score: 1
    Even a single processor Itanium 2 should have blown EITHER of my two boxes away.
    Umm, no. The Itanium sucks at these kinds of tasks due to a long pipe line. Read this post for more info.

    Comparing the Itanium complie times to anything is just stupid. I can compile my Linux kernel on a P4 or AMD _much_ faster then 19 minutes.

    --
    If Tyranny and Oppression come to this land,
    it will be in the guise of fighting a foreign enemy. -James Madison
  45. Enterprise funds you by PhYrE2k2 · · Score: 1

    You seem to forget that the enterprise users which fund development on big machines are usually the ones that supporting the entire projects you use.
    Between the kernel, your latest DBMS, etc, lots of companies fund the dollars to these projects (or the man hours).

    Nonetheless- write one :)

    --

    when you see the word 'Linux', drink!
  46. Re:A little factoid for you by jd · · Score: 5, Insightful
    If it can scale to 16 procs well, it will scale to 64 procs well.


    Someone wasn't awake when their Comp Sci class covered Ahmdal's Law. Or the Dining Philosopher's Problem. Or vector processing. Or networking. Or the parallelization problem. Or...


    Actually, the troll can be made to serve a useful purpose, because there are probably a lot of people who read Slashdot who didn't do Comp Sci.


    Part of the problem with parallelization is that not all problems can be divided up that way. If one man takes 60 seconds to dig a posthole, how long would it take 60 men to dig a single posthole? Answer - 60 seconds. Exactly the same amount of time is spent, because only one person can be digging the posthole at a time. Having more people doesn't help.


    Another part of the problem is sharing resources. Let's say you have some computer memory that can respond to a read operation in one clock cycle. Let's also say that the computer program never reads from memory. (Very unlikely.) The first processor fetches an instruction (which is a read operation) and then executes it. The second processor can't do anything while the first one is reading, so has to wait until it has finished with that part, before it can do a read of its own.


    If the instruction takes 1 clock cycle to execute, then the first processor will be ready after the second one has performed its fetch. In which case, you will be running the memory flat-out with just 2 processors. Any more than that, and the system will actually slow down, because the processors will have to wait.


    Likewise, if the average time to run an instruction is N clock cycles, you will (on average) be able to have N+1 processors, before the memory is maxed out.


    In practice, processors run about an order of magnitude faster than RAM, which is why modern systems have lots of L1 and L2 cache (and sometimes L3), pipelining, etc. These are all tricks to try and access the somewhat slower main memory as little as possible.


    Also in practice, programmers try to avoid "expensive" (in terms of clock cycles) operations because you can generally get the same results faster by other means. (That's why RISC technology became popular - make the fast operations faster, rather than adding stuff that people will try to avoid.)


    In consequence, sharing resources is a very difficult problem. It is not the only problem that many-way systems face, though. If you have N processors, there are !N possible ways for those processors to communicate. In this case, it would be !64 (64x63x62x...x2x1), which is a horribly large number. You couldn't have one link per pathway, for example, which means you've got to share links, which means you've got to have some damn good scheduling and routing mechanisms. Even then, with limited resources, you can only have so many processors talking at a time, before you are overwhelmed. Which means that "chatty" problems will involve a lot of processors spending a lot of time simply waiting for their turn to chat.


    (This goes back to why people generally build clusters, rather than many-way SMP systems, and why high-end clusters use the fastest networking technology on the planet. Clustering is easy. Getting the communication speeds up is the problem. Getting communication speeds to the point of being useful for scientific applications is a very complex, expensive problem. Which is the main reason Mr. Cray charged more than Mr. Dell for his computers - and why people would pay it.)

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  47. Re:A little factoid for you by Anonymous Coward · · Score: 0

    Ahh but compiling code for ia64 is pretty resource intensive, especially with gcc.

    Don't worry about Linux performance though, back in the day when the IBM guys were getting excited about Linux scalability, they got their kernel compiles down to under 4 seconds on a 32 way POWER4. I wonder what the 64-way POWER5s could do it in. (probably not much better, as most of that few seconds was the final link phase).

  48. Re:A little factoid for you by Anonymous Coward · · Score: 0
    Part of the problem with parallelization is that not all problems can be divided up that way. If one man takes 60 seconds to dig a posthole, how long would it take 60 men to dig a single posthole? Answer - 60 seconds. Exactly the same amount of time is spent, because only one person can be digging the posthole at a time. Having more people doesn't help.

    but then why does it require 6 computer scientists to change a light bulb?

  49. The SGI Altix is scaling to 256 cpus... by Thaidog · · Score: 1

    Using Itanium 2 cpus jsut like the Superdomes... how is this new news?

    --

    ||| I still can't believe Parkay's not butter.

    1. Re:The SGI Altix is scaling to 256 cpus... by stratjakt · · Score: 5, Informative

      This is an unmodified stock 2.6 kernel (well it's patched with stuff that's in distros, and will be in the next kernel). Out of the box, it detected the NUMA set up, memory partitions, the whole bit.

      The SGI boxes are nothing like the stock kernel.

      --
      I don't need no instructions to know how to rock!!!!
    2. Re:The SGI Altix is scaling to 256 cpus... by Anonymous Coward · · Score: 0

      No, because they are still using the 2.4 kernel in their production machines. Yep, even the 512 CPU systems. Of course they are heavily modified with scalability features from 2.6 (starting with the O(1) scheduler).

      SGI's upcoming 2.6 kernel patches have very little in the way of performance and scalability work though. Mostly it has performance monitoring and job management utilities.

    3. Re:The SGI Altix is scaling to 256 cpus... by Anonymous Coward · · Score: 0

      It's true that SGI's 2.4-based vendor kernel is significantly divergent from stock 2.4 -- much of the functionality needed for a large NUMA system simply wasn't present in stock 2.4.

      Stock 2.6 runs swimmingly on Altix, however.

    4. Re:The SGI Altix is scaling to 256 cpus... by Anonymous Coward · · Score: 1, Informative

      SGI has a linux Machine that scales to 512 proccessors... Right now in real world enviroments. Being used, being bought.

      The point of the article is that this is the STANDARD Linux kernel. The same exact thing that you can download from www.kernel.org

      Not a hacked setup designed specificly to work with hardware like they did with your special Altix, AIX, or hacked up Linux 2.4 series versions.

      This is proof that you can make a kernel that you can use to run a embedded platform on a 100mhz Pentium is scalable enough to run 64bit 64proccessor classic Unix Big Iron machine.

      Just a couple years ago there were originizations that would of have to payed hundreds of thousands of dollars in software developement, licensing fees, and support costs to be able to do the safe thing.

      Hell, with the OpenSSI clustering technology I have 3 PC's in my basement running Debian that have in a single image cluster. (one unified root filing system, failover capabilities, network load balancing capabilities, one unified /dev/, proccesses load balance from machine to machine, etc etc.)

      All of it rocks and is free and Free. Try doing this with Windows or Altix you'd be broke before you get finished.

      Right now the Linux developement model is creating free software that rivals and even in some cases surpasses all other closed source rivals. The only thing that is "better" is when you take a AIX or a Solaris setup and specificly design it to be used with a specific machine. However that is increasingly impractical and partally explains why the future of Solaris looks bleak and IBM is switching focus from AIX to Linux and why other traditional Unix companies are beginning to abandon their propriatory OSes.

      It's going to take a few more years and probably a 3.0 linux kernel to complete the transformation of Unix back to it's original open source (think AT&T giving source code away with the OS, and the original BSD project) roots. , but it's happenning.

    5. Re:The SGI Altix is scaling to 256 cpus... by lisaparratt · · Score: 1

      Who do you think contributed the support to the 2.6 kernel?

    6. Re:The SGI Altix is scaling to 256 cpus... by nagora · · Score: 1
      Who do you think contributed the support to the 2.6 kernel?

      SCO?

      Joke, joke! *exit pursued by villagers with buring torches*

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    7. Re:The SGI Altix is scaling to 256 cpus... by Anonymous Coward · · Score: 0

      Red Hat and SuSE did a lot, as well as kernel developers just in their own time (Andrew Morton, Ingo Molnar, David Miller, Linus, to name a few).

      IBM did quite lot.

      Intel did some.

      SGI didn't do much raw scalability work in 2.6. They did the code for their sn2 (altix) architecture, and quite a bit of ia64 work in general. They did some on NUMA memory allocation policies IIRC.

      But none of these companies suddenly decided to start using Linux when it was crap and magically made it good. They didn't say "hey this Linux is really crap, I bet we could spend a lot of money and make it better". They said "hey this Linux is pretty good, it looks like the only way out of Microsoft slavery. Let's do it".

      Or do you seriously want us to believe Intel, AMD, IBM, SGI, Sony, Toshiba, NEC, Fujitsu, Dell, etc. are all much stupider than you?

  50. Re:A little factoid for you by SunFan · · Score: 2, Funny


    They did try Windows Server 2003 on a 64-way machine, but the kernel got scared and hid under the disk controller.

    --
    -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
  51. Re:A little factoid for you by bovinewasteproduct · · Score: 1

    Umm, no. The Itanium sucks at these kinds of tasks due to a long pipe line. Read this post for more info.

    But a 5 times speed increase for me running a machine with a load with ATA disks?

    If the pipeline clears/stalls are that bad (even with their massive L3 cache (1.5MB to 9MB), it looks like the Itaniums are really only good for number crunching and not much else.

    BWP

  52. Re:A little factoid for you by bob+beta · · Score: 1

    It's a violation of the Microsoft License to compile the Linux kernel on Microsoft Visual C++.

  53. Re:A little factoid for you by Saint+Stephen · · Score: 1

    I was going to comment on this "continued to double" . It doesn't "continue to double." That would be 5*2^16, or exponential growth.

    This is 5*16. They said it wrong.

  54. Re:A little factoid for you by SunFan · · Score: 1


    Mmmm...a 128 CPU spam zombie...

    --
    -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
  55. Re:A little factoid for you by jd · · Score: 1

    Because it's a hardware problem.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  56. Re:A little factoid for you by AstroDrabb · · Score: 1
    It looks like the Itaniums are really only good for number crunching
    That is what the Itanium was made for. However, that is also why they failed. Xeon and now AMD Opterons give just about the same amount of "number crunching" at much less cost. Not many people wanted Itaniums. I think SGI is the only company that really made any decent Itanium servers and they were mostly for scientific processing. Real work gets done by regular AMD and Intel processors.
    --
    If Tyranny and Oppression come to this land,
    it will be in the guise of fighting a foreign enemy. -James Madison
  57. Re:A little factoid for you by Anonymous Coward · · Score: 0
    First of all, a 26x speedup is GOOD. [going from 1 to 64 processors]
    In other words: 40% efficiency.

    I guess the kernel compile system could use some work if this is a really important benchmark of system performance.
  58. Re:A little factoid for you by Anonymous Coward · · Score: 0

    MPEG4 to DiVx
    I don't think that part is really CPU intensive

  59. But wait...we hate Itanium! by ajp · · Score: 1

    I'm so confused. Itanium bad. Linux kernel scalability good. Help!
    ---
    Posted as me for the negative karma whoring.

    1. Re:But wait...we hate Itanium! by Anonymous Coward · · Score: 0

      You have a misconception.

      Very few people on slashdot think that the Itanium 2 is a bad chip. The problem is that nobody wants it, and Intel is bleeding money because of it. It's a shitty busienss decision.

      Strictly as a chip, the Itanium kicks ass.

  60. The specs by jd · · Score: 1
    The second-fastest supercomputer is an Altix. The Altix uses the same "brick" structure as the Origin, where you bolt together pre-fabricated computing blocks. Essentially a cluster, rather than a N-way SMP system. The specs for the bricks say that the processor bricks are 16-way. A tad shy of the 512-way that AC boasted. :)


    True, you can build very large clusters from these bricks, but the bricks themselves don't scale beyond a relatively small number of CPUs.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:The specs by Anonymous Coward · · Score: 0

      Excuse me, moron.

      http://www.sgi.com/products/servers/altix/

      That link tells you SGI sell 256-way single system image systems (they are NOT clusters) to the general public.

      http://www.sgi.com/features/2004/oct/columbia/

      That one tells you SGI has builds and runs 512-way single system image systems for some customers.

      I quote: "Kalpana system - the first 512-processor Linux® system ever to operate under a single Linux kernel"

      That's right, five hundred and twelve.

    2. Re:The specs by jd · · Score: 1

      512 processors, divided into 16-way bricks. Your point?

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    3. Re:The specs by Refried+Beans · · Score: 1

      First, we call it 512p. "p" as in Processor. X-way is an IBM term. We know they aren't SMP system, they are NUMA systems. SGI's NUMA isn't Non-Uniform, it's Near-Uniform.

      Now, if you ever log into one of our Altix or Origin systems, you'll see all 512p available to you when you cat /proc/cpuinfo. You'll see them all in the boot messages. You can't address all of the nodes as separatesystems because they aren't separate systems. It's ONE system with 512 processors and direct access to ALL of the memory.

    4. Re:The specs by Anonymous Coward · · Score: 0

      32 16-way bricks? Running a single Linux Kernel? You're nothing but a clueless troll.

    5. Re:The specs by jd · · Score: 1
      The University of Manchester in Britain was building such architectures in the 1970s, so I'm quite familiar with them. I'm glad you state quite clearly that they are NOT SMP, because that clearly refutes 99.9% of all the contradictions to my posts. Since you imply that you are actually from SGI, I'll take it that my assertion is therefore correct.


      Yes, NUMA does mean that you have direct access to all memory and all other resources. It's an ingenious design which handles the inevitable skew you get from having a physically large array. I'm not doubting or contradicting any of that.


      However, I believe my assertion still stands that the low-level microscale behaviour within a single brick cannot be identical to the higher-level macroscale behaviour of a network, however fast or sophisticated. The nature of the problems is different, so the nature of the solutions must also be different.


      If, at some level, there is a distinction, then even if everything is visible from all points, there is a fundamental difference between "local" and "global" connections, even if the OS can abstract out such distinctions.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    6. Re:The specs by Anonymous Coward · · Score: 0

      Nice try at a recovery. But don't forget this tangent started with your claim:

      The Altix uses the same "brick" structure as the Origin, where you bolt together pre-fabricated computing blocks. Essentially a cluster, rather than a N-way SMP system.

      The Altix is NOT a cluster. Full stop.

      Depending on who you talk to, there are varying defined criteria that a cluster satisfies. The Altix satisfies NONE of these. It does not use commodity hardware, it does not use a general purpose network or network protocol, it does have cache coherent globally addressable memory.

    7. Re:The specs by Anonymous Coward · · Score: 0

      just one thing I'd like to point out.. the worlds 2nd fastest super computer Is a cluster... of 20 512-way Altix ;) they're calling it a 'super cluster' but a single altix system will perorm much more like an n-way smp...

  61. Read my lips by Chatz · · Score: 5, Interesting

    Linux scaling to 512 processors:
    http://www.sgi.com/features/2004/oct/ columbia/

    The story should be HP has finally caught up to where SGI were 2 years ago.\

    --
    There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower
    1. Re:Read my lips by hackstraw · · Score: 2, Interesting

      I've heard through the grapevine that the mods to the linux kernel have stability issues.

      I am someone who might be in the market for a SGI Altix or XD1, but a very parallel broken box does not scale that well in my opinion.

  62. Re:i know this is OT, but I can't find it anywhere by Anonymous Coward · · Score: 0

    PF isn't really any better than iptables as far as I know. Lots of (openbsd) people like the syntax better... but if you can't handle iptables syntax, you shouldn't be administering a complex firewall in the first place.

  63. Ready to tackle the future by theendlessnow · · Score: 1

    Support for today's problems and the future DRM problems of tomorrow.

  64. -1 Wrong by Anonymous Coward · · Score: 1

    It also doesn't avoid the main point, which is that any given resource can only be used by one CPU at a time. If processor A on brick B is passing data along wire C, then wire C cannot be handling traffic for any other processor at the same time. That resource is claimed, for that time.

    While it's true that you can only send one signal down a wire at "a time" (absent weird frequency stuff, although the wires are bidirectional, so you can really send two signals), "a time" in these systems is on the order of nanoseconds. So while only one CPU can use a wire in any given nanosecond, hundreds of CPUs can use the same wire within the same millisecond, which is close enough to "at the same time" to work as "at the same time", so you can have multiple streams of traffic using the same physical connection.

    The only resource a CPU locks on is an exclusively owned (writable) cache line. CPUs share access to I/O space, and share access to cache lines that are read-only. CPUs can talk to "local" memory (on the same node) or memory on a node on the opposite side of the system, in an identical manner except for access latency (i.e. the address for a particular piece of memory is the same no matter which CPU is addressing it).

  65. You're an idiot. by Anonymous Coward · · Score: 0

    How does how many CPUs are in a brick have anything to do with whether it's an N-Way SMP system? A brick is just a physical box. The interconnect that connects the processors together extends over multiple bricks. The bricks just provide modularization - you could put all 64 CPUs in one brick if you wanted to, but the only difference would be cosmetic (additional pieces of metal between boards).

    Do you really think anyone is building single boxes with 512 processors in them? These things come in *RACKS*.

    1. Re:You're an idiot. by jd · · Score: 1
      I would absolutely hate to be your Comp Sci professor, because I hate failing people. But I'd give you bad marks for a lousy attitude, for a start, and worse marks for not knowing the difference between SIMD, MISD and MIMD.


      Or maybe it's because I was building transputer arrays before you learned what a computer was. (For that matter, I've probably been programming clusters for longer than many Slashdotters have been alive.)


      Am I arrogant because of that? Maybe, but I'd much rather express what I know. Do I have an elitist attitude that looks down with contempt on those whose "contribution" to a discussion is to slam others? Especially behind the cloak of anonymity? Oh, definitely.


      ACs have a place. They are very useful for expressing things where it would be inconvenient or harmful for them to be associated with the speaker. Using them to troll is somewhere between pitiful and pathetic.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  66. +0.5 by jd · · Score: 1
    True, to an extent. Yes, you control the wire for a nanosecond, and therefore many processors can share that one wire. In theory. In practice, as I've said elsewhere, that means running all the processors out of step. That's a hellish problem in design, so most designs don't do that. They test to see if the line is busy, and if it isn't, they use it.


    To maximize resources to the absolute limit, you'd need a completely asynchronous computer. Such computers exist, sure, but they're usually very specialized and I know of none that are superscaler.


    I'm not sure of the state-of-the-art for massively parallel asynchronous CPUs, but my guess is that they're nowhere near the same level as more traditional synchronous designs.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  67. 11?!? by A+nonymous+Coward · · Score: 2, Interesting

    My kernel only goes up to 11.

    How'd you get a three processor system? Is it a quad board, discounted heavily because one socket was broken? That'd be neat, where'd you get it?

    1. Re:11?!? by pmjordan · · Score: 1

      A 2-socket motherboard, one single-core CPU, one dual-core CPU, perhaps? According to AMD, that kind of stuff is going to be possible with the upcoming multi-core Opterons.

      ~phil

    2. Re:11?!? by Anonymous Coward · · Score: 0

      Spinal Tap...

  68. Re:Interesting. Almost exactly a year ago... by Anonymous Coward · · Score: 0

    Way to quote a fucking article from a year ago, douchenozzle...

    Where do you think all this NUMA awareness came from? Sequent Engineers, that's where. Where do you think they are now?

  69. Question by BriniestMark · · Score: 1
    Just out of curiosity, what kind of applications are enterprises running with the multiprocessor monstrosities? What applications benefit from this kind of thing?

    I was under the impression that enterprise applications were normally limited by the speed of the hard-drive and RAM, applications like webserving and database management.

    --
    You see that brine there? That's my brine.
    1. Re:Question by Belfy · · Score: 1

      Databases are often run on these systems. Often with multiple independant links to a large high performance disk array. ______________ Do YOU have a database measured in TeraBytes in YOUR basement? Yeah, me either. Not yet.

    2. Re:Question by BriniestMark · · Score: 1

      There are seriously disk arrays so fast that it takes 64 processors or more to handle the data? I take it that ain't no ATA bus we're taking about.

      --
      You see that brine there? That's my brine.
  70. 19 minutes??? by Anonymous Coward · · Score: 0

    a kernel compile using a single Itanium 2 processor took about 19 minutes

    And a kernel compile on a four way PPro 200 MHz took about two and a half minutes. Ok, that was a 2.2 kernel, where they probably used a 2.6 kernel, so that may account for a bit of the extra time, but still, 19 minutes? No wonder they need a 64-way box to make Itaniums do anything serious.

  71. Re:A little factoid for you by pointwood · · Score: 1

    Just a little note: A version of bzip does exist that scales lineary on SMP machines - you can find it here.

  72. It's a joke by Anonymous Coward · · Score: 0

    I take it you've never seen This Is Spinal Tap.

  73. Re:i know this is OT, but I can't find it anywhere by randallpowell · · Score: 1

    Get Firestarter. It's a GUI for iptables. Best thing to do is figure out what port need to be blocked and write a bash script so iptables can block those, allow others, etc and instant firewall assuming you won't chang eit much (home use).

  74. Slashdot hype and RTFA by Donny+Smith · · Score: 1

    The fucking news and the fucking article itself are misleading.

    > A 64-way system may or may not be useful. It depends on the speed of the interconnects, and the way it handles bus locking.

    Of course it IS useful. It is great for database consolidation (especially for SQL Server which practically doesn't scale horizontally), for example, as upgrades can be done in minutes and the whole goddamned thing is as stable as an Intel box can be.
    And in case you missed what the FA said, they did NOT run an OS on 64 CPUs (that's why it's bullshit and misleading) but they partitioned those 64 CPU is 16 four-way servers. But hey - this is Slashdot and any Linux related hype is welcome....

    > So, sure, there are people who could use such a system, but I cannot imagine many of them are in the market.

    Sorry, pal, but HP sold $1b of such boxes in 2004. Manufacturing, telcos, utilities and many other users need "boxen" like these. I think they're slightly more suitable for Windows because of the way it can "add" (allocate, actually) processors to Exchange and SQL Server systems.

    1. Re:Slashdot hype and RTFA by jd · · Score: 1

      I don't quite see what the speed of interconnects or bus-locking has to do with "Linux hype", Linux in general, or any other OS. Either it has the speed or it doesn't. If it doesn't, the architecture won't scale. (Ahmdal's Law.)

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:Slashdot hype and RTFA by Anonymous Coward · · Score: 0

      If the architecture scales and the operating system doesn't, the system doesn't scale. The system does scale therefore the operating system does.

    3. Re:Slashdot hype and RTFA by Donny+Smith · · Score: 1

      > I don't quite see what the speed of interconnects or bus-locking has to do with "Linux hype", Linux in general, or any other OS.

      It doesn't - that's the reason I was pissed off by the fact that suddenly "Linux scales" (while other OS, presumably, perform like shit on the same hardware).

      And the article was misleading in the sense that it indicated how Linux scales well on 64-way SMP systems (perhaps it indeed does but in this case the box was partitioned). Bunch of bullshit.

    4. Re:Slashdot hype and RTFA by Anonymous Coward · · Score: 0

      It doesn't "suddenly" scale. It has scaled for a long time. SGI report good scalability of unpatched 2.6 kernels on their 512 CPU systems.

      The point is there are still people like you who don't believe this, which is probably why they're making this effort.

      Of course, in your case it doesn't matter, since you aren't going to have anything to do with such purchasing decisions anyway.

  75. Parallelizing a C compilation by peterpi · · Score: 1
    Compiling the kernel didn't scale quite so well, but that was because it involves intermittent serial processing by a single processor.

    Really? I would have thought that the compilation of loads and loads of .c files is exactly the sort of thing that could be shared among processors. It certainly has been on projects that I've worked on.

    make -j (num of processors) ?

  76. Re:A little factoid for you by Anonymous Coward · · Score: 0

    The interconnects needed are not 64! (64x63x...x2x1).

    They are only 63+...+1 = (63*64)/2 = 32*63 = 2016.
    The first must connect to the other 63, but the second has to connect to only 62 (it is already connected to the first), the third to only 61, and so on...

  77. Re:A little factoid for you by Anonymous Coward · · Score: 0

    In consequence, sharing resources is a very difficult problem. It is not the only problem that many-way systems face, though. If you have N processors, there are !N possible ways for those processors to communicate. In this case, it would be !64 (64x63x62x...x2x1), which is a horribly large number. You couldn't have one link per pathway, for example, which means you've got to share links

    Ever heard of a crossbar switch, Einstein?

    A 64 node system could be connected using 64 links, and a 64x64 crossbar switch. There is no benifit to anything higher.

    Your figure of 64 factorial (it is 64!, by the way, not !64) is ridiculous, and is nothing more than a wild guess.

  78. Re:A little factoid for you by elmegil · · Score: 1

    Not a single one of which was a standard benchmark, which leads me to believe that they were manufactured to be linear. Woo hoo.

    --
    7 November 2006: The day Americans realized corruption and incompetence weren't addressing 11 September 2001
  79. Explaining jokes by A+nonymous+Coward · · Score: 1

    Good gosh, slashdot is really going to pieces. Two people explain Spinal Tap to me, another comes up with a possibly real, possibly tongue-in-cheek answer, and, worst of all, someone mods me up as "insightful". What do I have to do, add footnotes and explanations?

    I guess the two Spinal Tap explainers never heard the joke about only 10 people in the world. No, I'm not going to explain that.

    This is pathetic. Insightful, jeez. Now watch someone mod this as flamebait or funny.

  80. it runs on a c-64 now? by Anonymous Coward · · Score: 0

    Did I read that correctly, they've got Linux working way good on a c-64?? ;)

  81. Re:A little factoid for you by Ninja+Programmer · · Score: 1

    STREAM is not a real world app. Its just a massively parallel vector copy/sum/dot-product. Every SMP kernel in the world (even the older Linuxes) should be able to scale STREAM perfectly well.

    The HPL results are more impressive, but keep in mind that linear equation solving code has advanced quite considerably (i.e., it tends to behave a lot like STREAM) to the point that its not very limited by kernel behavior.

  82. YOU ARE A MORON! by Anonymous Coward · · Score: 0

    Do you know what an ethernet switch is? And why it's better than a hub? you're assuming that the resource management on these systems works like a hub. It doesn't - it works like a switch. the *ONLY* place that a CPU shares resources with other CPUs is on the processor bus - 2-4 CPUs share that, *EXACTLY* the same as in a cluster of Xeons or any other dual-CPU box. Once you get past the processor bus, everything is buffered. The CPU sends out whatever data request it has and off it goes. The interconnect takes care of making sure the wires are used appropriately, the CPU doesn't have to worry about it.

    Now, yes, it's possible that a CPU needs something from memory or IO and it has to wait for it to come back, but EXACTLY the same thig would happen in a CPU in a cluster as well.

    You very simply have asbsolutely no clue what you're talking about - a node in one of these huge systems functions pretty muc identically to one box in a cluster - it is archetectually the same. You don't add processors by sticking them on the same processor bus, you add processors by adding more nodes, each with their own memory and IO, and having a REALLY FAST interconnect between them, and an OS where everything is one system image.

    Cluster: Many distinct computers networked together. Supercomputer: Many distinct nodes networked into one computer.

  83. Re:A little factoid for you by Anonymous Coward · · Score: 0

    the long pipeline

    You call a 7-stage pipeline long ? You don't know anything about Itanium, do you ?

  84. You are not informed enough to be insulting by jd · · Score: 1
    Supercomputer: Many distinct nodes networked into one computer.


    Like I've said, I've used transputers. Let me know when you find a distinct node in an array. You can't? Oh dear.


    Seymore Cray, for many years, resisted multi-processor computers. Most of his designs were monolithic, on the grounds that a good design doesn't need to be MP. I guess that means that his designs weren't supercomputers, then. No? They were? Oh.


    I guess that the only conclusion is the one in the Princess Bride - "You keep using that word. I do not think it means what you think it means".

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:You are not informed enough to be insulting by Anonymous Coward · · Score: 0

      I don't nessecarily agree with the parent poster, but times have changed, old boy. A supercomputer is no longer a monolithic system and hasn't been for probably a decade.

      That said, some circles consider a cluster to be supercomputer too.

      Notice how Cray's new supercomputers are big clusters of Opterons?

  85. Developers by phorm · · Score: 1

    I find it interesting how well developed this is. I mean, how many linux coders actually have access to such hardware for testing/development purposes? Many of the larger projects can have a huge base of devs from within the userbase supplying patches/fixes/upgrades. I'm guessing that the userbase for the system described isn't very high (much less so for those able to much with running kernels on such)

    Or perhaps most of it just scales up very nicely from smaller systems?

  86. Re:The "Tux" Got Me Fired, Guys by Anonymous Coward · · Score: 0

    "Well, now I'm unemployed just like you all"

    +1 insightful