Slashdot Mirror


Mini-ITX Clustering

NormalVisual writes "Add this cluster to the list of fun stuff you can do with those tiny little Mini-ITX motherboards. I especially like the bit about the peak 200W power dissipation. Look Ma, no fans!! You may now begin with the obligatory Beowulf comments...."

34 of 348 comments (clear)

  1. Floating point performance by October_30th · · Score: 5, Interesting
    I thought about this some time ago.

    I decided against a mini-ITX cluster because the floating point performance (why else would you build a cluster?) of VIA CPUs is just abyssmal.

    Is there any reason why there are no P4 or AMD mini-ITX mobos around?

    --
    The owls are not what they seem
    1. Re:Floating point performance by wed128 · · Score: 3, Interesting

      i would imagine they run too hot for such a small form factor...this is just a guess, so treat it as such.

    2. Re:Floating point performance by Anonymous Coward · · Score: 1, Interesting

      the power consumption of the desktop p4/amd chips would kind of defeat the purpose of building one from these

    3. Re:Floating point performance by 0x1337 · · Score: 2, Interesting

      The reason why you don't see any Mini-ITX mobos around the Athlon, is power consumption. I recently built a mini-ATX computer around a T-Bird (1gHz, should have picked something less of an oven), and the mini-ATX power supply crapped out on me, making me buy a REAL ATX powersupply. Gah, still can;t find a 300 WAtt mini-ATX supply.

      Btw, you're wrong - there ARE P4-based mini-ITX mobos.

    4. Re:Floating point performance by October_30th · · Score: 2, Interesting
      Sounds excellent.

      In fact, a Pentium M platform would be a perfect choice as long as the mobile Athlon mobos are impossible to find.

      Does anyone have a link?

      --
      The owls are not what they seem
    5. Re:Floating point performance by a20vertigo · · Score: 2, Interesting

      By Sandra's floating-point benchmark, the 1ghz VIA Ezra CPU couldn't match my old AMD K6/2-550... and that wasn't even that fast of a chip! Or that hot of one, either.

      --
      No matter where you go, there you are; even before you arrive.
    6. Re:Floating point performance by steveha · · Score: 5, Interesting

      the floating point performance [...] of VIA CPUs is just abyssmal.

      Older C3 cores run the FPU at half the clock rate. If you get the fanless 600 MHz EPIA motherboard, the FPU will be running at 300 MHz.

      The newer, Nehemiah core C3 chips run the FPU at full clock speed. Any C3 newer than Nehemiah should run the FPU at full speed.

      He used the VIA EPIA V8000A motherboard with an Eden core CPU. From what I found on google (here), the Eden core does run the FPU at full clock speed.

      In any event, he said the cluster has more processing power than a four-P4 SMP system, while taking less electricity to run. And it will be quieter and more reliable. I'd like to see actual benchmarks, but it seems like it makes enough sense.

      I read about a cluster of PocketPCs, and that didn't make practical sense. It was just a fun project.

      steveha

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    7. Re:Floating point performance by dabadab · · Score: 3, Interesting

      There's one thing that makes VIA CPUs very interesting performance-wise: the xcrypt instruction. Using it the VIA CPUs just beat - and beat badly - anything else in certain task.

      Check out Theo de Raadt's little benchmark:
      http://marc.theaimsgroup.com/?l=openbsd-misc&m=107 577297024182&w=2

      --
      Real life is overrated.
    8. Re:Floating point performance by mangu · · Score: 4, Interesting
      The floating point is just a convenience. Almost any algorithm can be modified to work with fixed point precision -- and without loss of performance.


      But at a significantly higher development and debugging cost. Why go for integer adaptation, if a P4 can do four FP operations in one clock, using SSE2? I have tested my 2.4GHGz P4 at 6 gigaflops, in a practical application doing matrix inversion. The theoretical maximum for my machine would be 9.6 Gflops. If you RTFA, you'll see they mention 3.6 Gflops performance for their cluster, about 60% of my single-processor system. I see no point at all in building that cluster.

    9. Re:Floating point performance by Anonymous Coward · · Score: 1, Interesting

      The theoretical maximum for my machine would be 9.6 Gflops.

      I'm interested. Do you have to use assembly to get this, or can you plunk down some C code that reaches this?

    10. Re:Floating point performance by mi · · Score: 2, Interesting

      Mars is not made any closer to Earth by the revelation, that Alpha Centauri is really far...

      How about scientific computing?

      This is why you might need the FP performance. I was answering a totally different question -- what would you do without the good floating point performance.

      The other stuff shows your lack of knowledge of other disciplines by the fact that you think these are computationally expensive tasks.

      Thank you, thank you.

      Would you, please, demonstrate, how I can rebuild a project of 3000+ files, modified by 100+ developers (ccache helps, but still)? Or compress a 32Gb database dump? Granted, these tasks are nothing compared with, say, protein folding, but they are computationally expensive still.

      --
      In Soviet Washington the swamp drains you.
    11. Re:Floating point performance by QuantumFTL · · Score: 2, Interesting
      Ugh. I was hoping to avoid a long discourse in the theory of computational arithmetic however you leave me no choice.

      Point by point:

      You did not get it. You are looking at the bit (and byte, and word) as a number. I suggest you look at it as a unit of information. With 64 bits you can only have 2^64 distinct possibilities. If you choose to treat them as numbers -- fine, you only have 2^64 distinct numbers.

      Okay FYI I do remember first year discrete math, I have experience with inner details of computer architecturs, I understand two's compliment representation, IEEE floating format, how ALUs work for integer operations etc... I have read Shannon's information theory paper etc. What you are saying doesn't change the reality of the situation though, because you are looking at numbers without regard for the staggering dynamic range in real life scientific computations.

      You may split the 64 bits to use some of them to represent the mantissa and some as the exponent. Or -- use all of them to represent the integer number of the smallest units in your application's domain.

      Yes and both approaches have very different merits. For most numerically intensive programs, there are very specific requirements for numerical precision. All numerical programs are necessarily approximations to infinite precision arithmetic, and the following constraints are the norm:
      • Additions/Subtractions of similar magnitudes must be accurate
      • Multiplications/Divisions of very different magnitudes must be accurate.
      • Square roots, trigonomic operations, and exponentiation must be accurate.

      Now if you examine fixed point arithmetic, you will find that the first criterion is met, however the second and third are not met nearly as well as they are in FP of the same number of bits.

      The nature of scientific operations is such that it makes sense to think about numerical calculations in terms of significant digits, due to the nature of error propagation in numerical arithmetic.

      The second method, actually, gives you better precision in a controllable (by you) fashion. If the difference between the smallest and the biggest quantity of those minimal units in your application exceeds 64 orders of binary magnitude, than 64 bits is not enough for you -- regardless of whether you use floating or fixed point. You either lose precision (FP) or overflow (int).

      It most certainly does not give better precision for the same number of bits used. Think of FP as a lossy compression algorithm. It allows the use of orders of magnitude less number of bits because it alters the density distribution of the representable numbers to meet the above specificiations.

      Also you should note that many applications do not have a "basic unit". For instance, what is the "basic unit" of length? What if your "basic unit" of lenght is of a radicaly different exponent than your "basic unit" of energy or "basic unit" of time? In physics applications these basic units are near infinitesimal... we're talking 10^-51 or smaller! Then add to that that astrophysics simulations tend to work on scales that are 10 orders of magnitude greater than 1, you're talking about a dynamic range that is clearly rediculous!

      The reason to use FP may be because it is more convenient to think in terms of standard units, rather than the minimal units of the application (its precision). Also, many CPUs have special features allowing to do FP computations really quickly. But it is possible to go without them.

      The problem here is that you are thinking in terms of absolute error. In most cases it is the *relative* error that is important, not the absolute. Because of the exponential notation, relative error is minimized for any given number of bits used to represent numbers.

      Another issue that you fail to mention is that integer overflow is rediculously easy to run into when using several multiplications in a row.

    12. Re:Floating point performance by mangu · · Score: 2, Interesting
      some matrices can be inverted with nothing more than a transpose


      No, I tested it with a random matrix, as in

      void matmuls()
      {
      int i, j, k, c1, c2, c3;
      char opa = 'N', opb = 'T';
      float alfa, beta, s, *AT, *BT, *CT;

      AT = calloc(N * N, sizeof(float));
      BT = calloc(N * N, sizeof(float));
      CT = calloc(N * N, sizeof(float));

      for (i = 0; i < N * N; i++) {
      AT[i] = (float)rand() / (float)RAND_MAX;
      BT[i] = (float)rand() / (float)RAND_MAX;
      }

      c1 = N;
      c2 = 1;
      c3 = 0;
      alfa = 1.0;
      beta = 0.0;
      gettimeofday(&tv, &tz);
      bs = tv.tv_sec;
      bu = tv.tv_usec;
      sgemm_(&opa, &opb, &c1, &c1, &c1, &alfa, AT, &c1, BT, &c1, &beta, CT, &c1);

      gettimeofday(&tv, &tz);
      du = tv.tv_usec - bu;
      ds = tv.tv_sec - bs;
      }
    13. Re:Floating point performance by mi · · Score: 2, Interesting
      Think of FP as a lossy compression algorithm. It allows the use of orders of magnitude less number of bits because it alters the density distribution of the representable numbers to meet the above specificiations.

      This makes sense. With integers the density is uniform, which is impediment in some cases, but of help in others. [Any attempt to quantify the number of cases in each group is silly and will reveal nothing, but the attempter's personal bias. With my bias, I'll insist you are underestimating the number of cases, where such uniform distribution of density is useful and desirable.]

      However, unless you carefully choose the basic unit, you don't have control over the precision distribution. If most of your computations involve quantities on far edges of (you claimed 20 orders of (decimal?) magnitude) -- you are less precise than you may realize and the (carefull) use of integers may improve your results.

      Also you should note that many applications do not have a "basic unit".

      Of course, they all have basic units! Usually, it will depend on the application's desired precision.

      For instance, what is the "basic unit" of length?

      Depends on the application. In yours, it is, probably, some fraction of light year.

      What if your "basic unit" of length is of a radicaly different exponent than your "basic unit" of energy or "basic unit" of time?

      Who cares? Even if my program operates internally on units as horrible as, say "pounds per square inch" (a.k.a. PSI) -- so be it. If 4.5 Newtons is my basic unit of force I want and the 0.025 meter is as precise as I want the length to be -- fine.

      I'll leave the problem of how many bits it takes to achieve multiplication of four numbers of values from 0 to 10^30, base unit 1.

      Wait, we started with 20 orders of magnitude. Is it 30 now? Fine, the 128 integers (long long) will be able to store that. But I don't believe, the tasks where so wide-ranging amounts of the same thing are common place (my bias?). Whether you are using floating or fixed point, you are not going to do this easily -- you'll risk losing precision dramaticly, or overflowing. Whichever it is, it is, probably, better to consider modifying the algorithm.

      Also, the modern processors (at least -- Intel's) "cheat". Their FPU's internal precision is 80 bit by default (64 significant bits) -- if I'm reading ``icc -help'' output correctly. So they can "promote" the numbers to higher precision when applying precision-losing operations. So, floating point might win. :-)

      In most cases it is the *relative* error that is important, not the absolute.

      Very valid point. However, sometimes (often?) carefully picking the basic unit and the number of bits it is possible to avoid all computational imprecisions, simply by having more bits left at your disposal, whereas the blind use of the floating will mask them and further compound all other sources of errors (measurements, estimates, &c.)

      And I don't even want to think about fixed point division by numbers very close to 0.

      You don't need to think about it, because 1 is as close as you can get to zero with integers...

      it requires that the basic unit be no larger than the smallest representable number in the floating point system

      No it just has to match the smallest reasonably needed by the application -- something, it'll never need a half of. And I urge you to pick such units carefully even if you stick with floating point, because otherwise, even the smallest number in the floating point system might not be small enough at some point, and you will waste a few teracents of taxpayers' money :-)

      That being said, I don't think, anyone else reads this, but us. It is hard to justify continuing the thread. Thanks for your input!

      --
      In Soviet Washington the swamp drains you.
  2. Seriously, though... by Short+Circuit · · Score: 5, Interesting

    All things considered, what's the cost-per-tflop of that sort of system. These guys don't require as much cooling, space, or whatever else you care to think about.

    Has anyone tried stuffing several into a single 1U chassis? For a sort of cluster of clusters?

    1. Re:Seriously, though... by drinkypoo · · Score: 4, Interesting

      You could get (maybe) 2-4 boards into a deep 1U box. It would be better to use a ~6U box and put lots of them on their sides. You could make a 12" deep 6U with probably 18 or so of these things in it, without having to have cables coming out the front AND back of each box.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:Seriously, though... by bhtooefr · · Score: 3, Interesting

      A SIX U? No way do you need that much. As long as you're careful, a 4U gives you PLENTY of space. Giving 3" per board, you can put 6 boards wide. Allowing .5" between boards front to back, you need at least 14" deep. So, a 14" deep 4U will fit 12 of these. Make it 5U if you'd feel more comfortable that way, as these puppies don't put out much heat when there's just one, but when there's 12? Cool 12 like you would a single P4.

  3. shuttle by trmj · · Score: 2, Interesting

    My favorite use for those mini-itx boards is making a nice shuttle xpc. Cheap, fast gaming computers that are quite portable as well.

    The only problem I've found so far is they ony come with nvidia onboard graphics, but that's what the agp slot is for.

    --
    Work sucked, until it became unemployment, when it became slightly more tolerable. -Tet
  4. This with Chess by SamiousHaze · · Score: 3, Interesting

    You know I seriously wonder if this would be a viable option for Computer chess programs (http://www.chessbase.com/newsdetail.asp?newsid=25 ). It certainly is getting cheap to get massive hardware processing power.

  5. I built a fanless ITX system... by Kenja · · Score: 1, Interesting

    I built one of these, cost me six times as much for one third the power. Unless you NEED a quiet system, dont bother.

    --

    "Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
  6. Cool stuff ... by Lazy+Jones · · Score: 4, Interesting
    This rocks - we were considering something similar for our clustering-R&D needs (for trying out new network file systems, failover solutions etc.), but we decided to go with plain P4 barebones instead. They can be stacked nicely, are relatively quiet and the fast CPUs with HT come in handy when you want good latencies at CPU-intensive tasks (dynamic websites etc.).

    Here's a picture of our first 4 boxes. The USB stick seen sticking out from one of the boxes is bootable and an excellent replacement for floppy disks...

    --
    "I love my job, but I hate talking to people like you" (Freddie Mercury)
  7. FLASH... by Short+Circuit · · Score: 2, Interesting

    Ouch...He's using flash as the HD for the computing nodes. Hope they're set to be mounted read-only.

    Maybe he should consider PXE instead.

    1. Re:FLASH... by technomancerX · · Score: 5, Interesting
      "He's using flash as the HD for the computing nodes"

      Actually, he's not. IBM Micro Drives are not CF, they just have a CF form factor/interface to be compatible with hand held devices. They are hard drives.

      --
      .technomancer
  8. Whilst not clustering... by Alioth · · Score: 4, Interesting

    Whilst not clustering, a good use for these low power systems would be for web hosts or budget dedicated servers. I'm sure a server room full of these would require much less airconditioning (and power) than the typical servers. Many people require dedicated servers for security (they are the only one on the box) and don't require fast FPU performance.

  9. Re:Inexpensive for testing purposes, by addaon · · Score: 5, Interesting

    I agree, but that's actually a very interesting use. It also lets you play around with network topologies, and interconnects, and such. And of course, these boards do have one PCI slot, as well as the standard assortment of serial and parallel, so the hardware people can have fun too. For real number crunching? Not a chance. For doing a $2000 prototype, in 15 nodes, of a $50000 50-node cluster? I can't really think of a more flexible, more convenient, or more affordable option. For doing a $1000, 6-node flexible network simulator, purely for education? Also more than worth it, with few other options around.

    --

    I've had this sig for three days.
  10. Sounds Fun by RAMMS+EIN · · Score: 5, Interesting

    I have been thinking about this lately. I get disgusted by the fanns everywhere (especially since the one in my laptop makes an awful amount of noise sometimes and still doesn't prevent the beast from overheating and shutting down). Aside from being noisy, computers have way more CPU power than I need, and cost more than I am willing to spend. And they suck up a lot of power. (Some might add that they take a lot of space.)

    I think all of these could be solved at once. What if someone built low-power, low-noise, and low-cost computer, good enough for running light office applications? I don't mean OpenOffice, but rather lightweight programs that implement the functionality people use _without_ the bloat. My 486 handles email just fine and the WYSIWYG word processors were once satisfied with a first-generation Pentium (and even these were already bloated).

    Current PDAs have more than enough processing power to handle those tasks, and I've noticed that company's like gumstix build and sell devices almost like what I have in mind (the gumstix don't seem to have display connectors, though). Hey, these machines could actually be portable and have a really decent battery life (more than a full working day); that would be a killer!

    Am I just daydreaming here or are others with me? Maybe you know of devices that do this job? Someone recommended Sharp's Zaurus, which is excellent, but still rather more expensive than what I have in mind.

    --
    Please correct me if I got my facts wrong.
  11. Re:Inexpensive for testing purposes, by slashbofh · · Score: 2, Interesting

    ... but that's about all it'll be useful for. A Nehemiah CPU is really weedy by todays standards, even the 1GHz one is about the same as a 600MH P3. So, he's got 12 of them, which is probably less CPU power than an average dual P4 motherboard...

    Why is it that most people think that 1 4GHz system is just as fast as 2 2GHz systems? This is the fallacy that never fails to irritate me. The fact is that for a lot of things, the number of machines matters. It's a pipeline, and a CPU can only do one thing at a time. For many application having multiple CPUs that are slower will give you faster response time than a single fast CPU. Of course, most people here don't get that, give it up when trying to talk to the PHB about it.

  12. Re:Imagine.. by Mr.+Bling+Bling · · Score: 2, Interesting

    Yo what up?

    Would it be possible to set up a clusta of these in a stretch Escalade? If so, how much would it cost and can I get some iced out (real diamonds, not no zircon encrusted shiat) 1U or smaller cases for the nodes in the clusta? Anybody willing to set something up for me. I gotst cash fo it.

    --
    You can't touch my shizzles you biznitches! However, you can kiss my bling bling watch, biotch!
  13. A beowulf cluster of FreeBSD machines? by Anonymous Coward · · Score: 2, Interesting

    I had no idea all this stuff ran on FreeBSD, but apparently it does. A bit of googling turned up an article on a pretty decent size cluster running FreeBSD at aerospace corporation, and other clusters running FreeBSD too. What with Mac OS X being used widely for clusters, and FreeBSD, it sounds like Linux is no longer the only name in the game. So question: do people consider FreeBSD or OS X clusters also to be Beowulf clusters, or is there some other name?

  14. Re:Inexpensive for testing purposes, by Space+cowboy · · Score: 2, Interesting

    I don't think they equate the same. I said CPU power, not measured performance. I remember sitting on a UK working group panel debating the Block-Synchronous Parallel computing strategy for highly-parallel systems. I was only there because it was good for my CV :-) But I did learn a reasonable amount, all those years ago...

    That said, the only time a cluster of servers will do better than a fast single node is when the task divides well over the cluster. Great for clustered webservers, even distributed databases (in fact most server processes), but pretty damn useless if you're trying to do interactive work, or calculate something which *doesn't* divide well. Anything with time-dependent processing (ie: you need the results of the last step to calculate the current one) will run as slow as your fastest node, minus some for overhead...

    This doesn't dispute your point of course, but I think the sense of how you said it over-stated the case for the usefulness of the system.

    Simon

    --
    Physicists get Hadrons!
  15. HA-Cluster on Mini-itx boards by DeBaas · · Score: 2, Interesting

    Not as impressive as their performance cluster, but perhaps interesting as well, we build a High Availability cluster more than a year ago based on mini-itx boards: HA-cluster
    It was used for demonstration, but the mini-itx machines are still used quite a bit for testing etc.

    --
    ---
  16. Why this particular set of software / booting? by merlin_jim · · Score: 4, Interesting

    I mean, those IBM 340 MB microdrives aren't really that cheap... you can get full size hard drives for the same price...

    I've always wondered; why not PXE boot something like this? Set your node controller to also do DHCP and you're set.

    While you're at it, use the CL version for the controller which has two network cards and build a NATTING firewall into the node controller too. Then you have a plug-in appliance that doesn't interfere with your network topology at all. PXE boot it and the motherboards will only need RAM.

    The board he used is available for $99 with proc. A stick of 256 is probably around $20.

    The best price froogle would give me on the drives he's using is $60, and they're prone to wear and tear.

    Add in the $10 CF-IDE adapter and the drive is %60 of the cost of the motherboard itself...

    Hell if you don't want the network bogged down with a bunch of PXE booting nodes all the time, just get cheap CD drives and put dyne:bolic on it, which does automagic clustering...

    Personally, if I were to do it, I'd set dynebolic to PXE boot, get a huge stack of motherboards and RAM, and do it that way. Then adding/changing nodes is relatively simple... IIRC, they're even factory set to try PXE booting if no IDE devices are found...

    The only other change I would make would be to ditch the 16-port switch... move to 4-ports, connect those to a 4-port with gigabit uplink, and connect that to a gigabit switch. Of course at this point I'm talking about really scaling the cluster up, to a few hundred nodes or so. At that point I'd stop using a mini-ITX board for my node controller and go with a motherboard with a bit more juice behind it, dual procs, RAID 0/1, the whole shebang...

    Now if only I had a couple grand burning a hole in my pocket... speaking of which:

    motherboard: $100
    RAM: $20
    DC-DC converter: $30
    CF adapter: $10
    Microdrive: $60

    Total: $220
    Total PXE booter: $150
    Savings: 30%

    So, not counting the costs of cabinets, power rectifier/UPS, wiring, network gear, and labor, you can increase the size of your cluster by %30 for the same cost, just for setting up PXE boot...

    --
    I am disrespectful to dirt! Can you see that I am serious?!
  17. Re:Inexpensive for testing purposes, by EvilTwinSkippy · · Score: 2, Interesting
    Samba file server.

    Samba throws open a hell of a lot of threads. (At least on my network of 200 people.) A cluster with each node posessing an external network port would be able to split the threads across dedicated processors. Not too useful for me, but if someone was trying to serve a few thousand clients at a time, that would be useful.

    TMYK

    --
    "Learning is not compulsory... neither is survival."
    --Dr.W.Edwards Deming
  18. Power. by Absurd+Being · · Score: 2, Interesting

    Your P4 uses what, >300W? This cluster has a peak load of 200W. Plus you can do more varieties of hardware interfacing at once. That's a reason to build this cluster, if you don't find that clustering things because you can to be a good enough reason.

    --
    Karma: Excellent^(-t/Tau), Tau=Wittiness/Trollishness