Ask Slashdot: Best Bang-for-the-Buck HPC Solution?
An anonymous reader writes: We are looking into procuring a FEA/CFD machine for our small company. While I know workstations well, the multi-socket rack cluster solutions are foreign to me. On one end of the spectrum, there are companies like HP and Cray that offer impressive setups for millions of dollars (out of our league). On the other end, there are quad-socket mobos from Supermicro and Intel, for 8-18 core CPUs that cost thousands of dollars apiece.
Where do we go from here? Is it even reasonable to order $50k worth of components and put together our own high-performance, reasonably-priced blade cluster? Or is this folly, best left to experts? Who are these experts if we need them?
And what is the better choice here? 16-core Opterons at 2.6 GHz, 8-core Xeons at 3.4 GHz? Are power and thermals limiting factors here? (A full rack cupboard would consume something like 25 kW, it seems?) There seems to be precious little straightforward information about this on the net.
Where do we go from here? Is it even reasonable to order $50k worth of components and put together our own high-performance, reasonably-priced blade cluster? Or is this folly, best left to experts? Who are these experts if we need them?
And what is the better choice here? 16-core Opterons at 2.6 GHz, 8-core Xeons at 3.4 GHz? Are power and thermals limiting factors here? (A full rack cupboard would consume something like 25 kW, it seems?) There seems to be precious little straightforward information about this on the net.
Why not start with looking at what S/W you plan to run, and then see what advice is available from them (and from other users) as to what H/W they would recommend.
You mention you are interested in CFD. Intel Phi processors have been known to do well here: http://www.cfd-online.com/Foru... . In that linked story, a single Intel Phi processor beats a 1024 core cluster. Moreover, Thinkmate is literally giving away Intel Phi processors: http://www.thinkmate.com/syste... . But not all workloads fit the Phi, so you really need to do some benchmarking before you buy.
Disclosure: I have worked for Penguin Computing in the past, though I currently have only a customer relationship with them (we use their Penguin on Demand service). I strongly recommend you talk to a few of the HPC vendors out there about your needs and get a few quotes. Obviously Penguin is one I recommend, I'm not sure who else is still in the business, I think at least one of the major ones I've gotten a quote from in the past went under. Just do a little googling. They are probably familiar with your applications and can get you a turnkey solution that's well-suited for your application.
Error 404 - Sig Not Found
You haven't said anything about your application. Do you run it continuously? Sporadically? Will the machine be sitting idle much of the time? Do you have the staff to support it? What about networking and storage? Do you have the ability to rapidly move and store data as the actual computing is only part of the story.
It may make sense to rent the time due to lower storage and maint. costs than to actually buy and maintain the infrastructure.
putting the 'B' in LGBTQ+
These will offer far better performance than the Opteron solution.
Can you compile your own application? If so, use the Intel compilers, and make sure you compile targeting the Haswell instruction set (-O3 -Xhost -march=corei7-avx2 -mtune=corei7-avx2 if I recall correctly): the full AVX2 Haswell instruction set is rather more powerful for your app than the predecessor "AVX" SandyBridge/IvyBridge instruction set, which is far more powerful than the previous Nehalem/Westmere SSE4.2 instruction-set, which is somewhat more powerful than a simple "-O3". If you can't compile on your own, try to make sure the vendor's executables target AVX2; the right compile-flags will double your performance over "-O3"...
"My opinions are my own, and I've got *lots* of them!"
Unless you need to transfer A LOT of data from your cluster, Amazon AWS will probably be cheaper than dedicated hardware. Especially if you can use spot instances (that are 5-10 times cheaper than the regular Amazon EC2 instances).
"...straightforward information about this on the net." Because it's not straightforward. If you are have enough grasp on your requirements to understand the apps you want to use and you are using commercial CAE / CFD codes, your ISVs should be able to give you some guidance about what typical customers are running (how many cores over how many nodes configured with how much ram and storage with what kind of cluster interconnect and MPI message passing etc) for workloads similar in size to yours. If you're actually considering writing your own, please reconsider unless you have some very particular requirements - but if you do, you'll already have a really good idea of what level of parallelization your cluster architecture requires.
There are plenty of costs beyond the actual computer, including power, power conditioning, battery backup, heat removal, etc... that make up most of the cost.
If you still decide to build your own hardware, then pay close attention to
1. Compatibility with your chosen software, i.e. the best system in the world is worthless if it does not run the software that you want. You may be building your own software, then you will still need to consider OS, compiler, libraries, etc
2. Ability of the operating system to provide enough resources to your software, in the 'good old days' Windows only provided a limited amount of RAM to processes, even in today's world Windows system swap aggressively and may not give you the RAM performance that you may see in the Enterprise *nixes
3. Internal bus structure of the system you choose, The biggest growth in PC hardware has been the internal bus width and speed. Look around, but for cost's sake, you will probably be using a variety of PCIe from Intel. You will probably also see better integration with the PCIe bus with Intel chips. If you are using GPU accelerators, that is a whole 'nother kettle of fish that will affect your other decisions above and below
4. Methods provided for disk access, used to be the Fibre-Channel was the King, but times have changed with iSCSI making inroads, and local disk architecture provides the greatest bang for the buck with SATA starting to edge out SCSI. If you go the SAN or iSAN routes, it will have additional costs for rackspace, power and cooling.
5. Disk system that you choose, most people would suggest butt-loads of local SSD, after RAM, solid state drives will probably be your highest costs
Just my two bits, plus I completely ignored tape system vs spinning-disk hard drives for backup, which would add more rack space, power supply and cooling to anything that you try and put together. Try and put together a realist estimate for purchasing and supporting your hardware for a couple of years and compare it to cloud cost for similar resources
Wherever You Go, There You Are
i did this before, on a very small scale, for GBP 1,000 about 10 years ago. sales teams kept offering me 2ghz dual-core machines at GBP 300 each and i had to tell them this:
"look, i have a budget of 1,000 GBP. you're offering me a 2ghz system for 300. so i can only buy 3 machines, right? so that's a total of 6 ghz of computing power. on the other hand, if i buy this GBP 125 machine which has only a 1ghz processor, i can get 8 of those, which gives a total of 8 ghz of computing power. so _why_ would i want FASTER?"
so i bought qty 8 of motherboard, CPU, 128mb RAM, low-cost case containing a PSU already, and accidentally included a 3com network card because i didn't realise that the built-in ethernet on the motherboard could do PXE boot..... but still, all-in that was 125 GBP and each one took 15 minutes to assemble so it was no big deal. got myself 8ghz of raw computing power, which was the best that i could get for the money that i had.
and that's the question that you have to ask yourself. what's the highest performance / price metric that can be achieved?
the highly specific problem that i was endeavouring to parallelise was a very small memory footprint non-I/O-bound task: running the NIST.gov Statistical Test Suite. i booted all 8 machines off of my laptop, over PXE boot with an NFS read-only root filesystem. had to wait 30 seconds between each because my 800mhz P3 laptop with 256mb of RAM reaaallly couldn't cope with 8 machines hammering it... not over a 100mbit/sec link, anyway.
once started, i wrote a script that ssh'd into each and left them running the STS for a day at a time. very little actual data was generated: a report.
but the issue that you're solving may involve huge amounts of disk I/O, it may involve huge amounts of inter-connectivity (inter-dependence between the parallel tasks). you may even have to use a GPU (OpenCL) if it's that computationally expensive... ... and that's where anyone's advice really ends, because unless you know exactly what it is you need to do - in real, concrete terms of I/O per second, GFLOPs/sec, GMACs/sec, inter-communication/sec, you really can't and shouldn't even remotely consider spending any money.
so please consider writing a spreadsheet, based on the performance/price metric, extending it to the domain(s) that you're interested in optimising. then the answer about what to buy should be fairly self-evident.
oh and don't forget to include the power budget (and cooling) because i think it will shock the hell out of you. remember you need to include the maximum specs, not the "average" or "scenario design power".
Your options really depend on what sort of 'high performance' you have in mind. When it comes to performance per core, Xeons typically crush Opterons; but the pricing reflects that, especially if you need the 4-8 socket support and RAS features. If what you need is large amounts of RAM with the lowest possible spending on the system around it, Opterons have tepid performance per core; but are likely to be the cheapest option that still supports ECC, more than one socket, buffered DIMMs, and any other niceties you wouldn't get from just desktop. If your application is one that can be made to fit, GPUs are enormously powerful for the range of things that they are capable of doing well.
They also depend on how big you need your system to be and how tightly coupled it has to be. If your intended application handles its own network-level parallelization and doesn't depend on very low latency, blessed are you. Price per core skyrockets if you go above 2 sockets, and GbE is effectively free(at least in the sense that you pretty much can't buy a system or motherboard that doesn't come with at least 2 NICs by default, often more) and relatively cheap to switch. If you need lower latency, this will hurt more and you are looking at myrinet or infiniband. If your application needs a cluster that presents a single system image; especially one that also has genuinely low latency, you probably need to fortify your checkbook and consult an expert. You can get systems with more than 8 sockets and the appropriate custom interconnect; but you won't like paying for one.
Unless you value your time at surprisingly low rates; you probably won't want to build your own systems from parts; but depending on how tightly coupled you need, this may be something that you need to purchase as a system or something you construct from multiple computers you purchase.
Can you use either hardware you have or AWS(or one of their similar competitors) to better characterize what your application actually needs?
So you're using workstations now, my first order of business would be to figure out how you'd work together on a server or cluster. Does your software and workflow actually support that or will it just be like a super high end workstation. Once you've got that done, you can start working on what is it your workload actually needs. How many nodes, CPU, RAM, network and so on. In general if your software scales well, more and less powerful nodes will do the job cheaper. Quad-core systems are expensive and should really just be used if you need >28 cores in a single machine.
Live today, because you never know what tomorrow brings
Take a look at Limulus systems from Basement Supercomputing: http://www.basement-supercompu...
These are fast low power/noise/heat systems with a fully installed open source HPC software stack.
HPC for Primates. Read Cluster Monkey
I've used supermicro in the past. Very cost effective if you like to assemble your own systems. I have no financial relationship. This is just an unsolicited testimonial. http://www.supermicro.com/prod...
- Things are the way they are because they're coded that way -
Because Soulskill recommends it. Fourteen thousand Raspberry Pi, in fact.
Get free satoshi (Bitcoin) and Dogecoins
There is another factor to consider. If you ever license software that is priced using a per-core model (for example LSF), you will find a great advantage in going with the Intel solution.
The real "Libtards" are the Libertarians!
Really... it depends. What software are you using (starccm+? openfoam? custom + mpi?)? Planning on InfiniBand interconnect (if so QDR? FDR? ... DDR I guess?)? 10GbE? What about storage for the cluster file system? Lustre? NFS? How much sustained IOPs are you expecting to need? How much RAM per Core / or overall RAM is required for your application?
... yeah, we'd be throwing that back for more info. There isn't enough to make any sort of determination. Can _guess_ ... but that's all it'd be is a guess.
Without more info it's impossible to give a good answer.
-J
PS. Not advertising, but I do actually work for a company that sells CPU time on HPC clusters. And with what you have given
The problem is that you don't know what you're looking for so you're not asking the right questions.
- Power is a factor. You mention 25KW. Wrong units. You should look for KVA. You'll never know what the wattage is until you know the power factor (PF) and you won't know that until you populate the device with spindles and fans (which have a different PF than CPUs, GPUs, PSUs,) and then run it under load and measure.
- 25KVA is a medium rack. 35-50KVA is a dense rack. How many racks you choose to have is up to you, but the "25" number is not a good random one to shoot for. If you search for "30KVA" and "High density rack" you'll get an idea of what servers do populate such things.
- You won't be running anything of this magnitude at your deskside, unless you are in Alaska or Siberia and have no other source of heat. Also most businesses don't like running 4 30A 3-phase 208VAC to employees' desksides. Just sayin'... And again, if you're not Alaska or Siberia with an open door and window, you won't move enough air through your office to cool that beast. (Air mass is directly related to cooling, and unless you're doing dielectric-immersion cooling, the sheer amount of air requires massive fans and lots of space.)
- Two other responses said "See what your software vendor says." Software is abstracted by compilers. The real question is "how much CPU, GPU, DISK, or other IO does it do" and plan for that. That will also change the PF and the KW and the heat load.
There's a reason nobody builds deskside compute servers with today's technology. Density, power, and cooling.
Keywords to google: KVA PF KW, high density rack server, PUE (PUE is the inverse of PF and is applied to an entire data center which includes cooling.)
Other places to look: look up abstracts for talks at Data Center World.
"Or is this folly, best left to experts? "
Since you don't seem to be or have experts nor money, it's obviously folly.
I regularly work with large corp's and governments building large HPC, mainframe replacements, large clusters and you appear to be falling into the same trap a lot of them do. As others have said, it isn't the hardware, anyone that tells you what is best based on your summary doesn't have a clue as it is all about the SOFTWARE. I recently watched an organisation spend the best part of a million dollars on high core count machines only to then find the app doesn't perform well in parallel and in fact scales much better on high clocked low core count machines that would have cost them less than a quarter of the price. Memory, Core count, machine architecture, core clock speed are all essential items that can only truly be determined by good application profiling, absent that you are just pissing money in the wind. PS: Don't build these sort of machines yourselves, the amount of integration testing and trouble you can run up against with these type of configs costs you more than you can save by not going with a recognised vendor.
Save your money and use it to move somewhere without Fag Marriage.
You can marry a cigarette where you live?
Got two Ivy Bridge dual-socket 12-core Xeon boxes a couple of years ago. I called up Red Barn. They helped me figure out what hardware would give me more bang for my buck (two dual-socket Ivy Bridge blades got me more cores than one Sandy Bridge with four sockets), built it up for me, installed the OS, and delivered it. Smooth as butter. IIRC, the whole deal cost me around $24000, for one compute/server node and one compute node. For $50K, if prices have scaled similarly to Haswell Xeons by now, you'd probably get that and another three compute notes. (Also, you'd probably get more cores -- IIRC, at one point we were expecting like 15-core Haswell Xeons to come out, but I haven't kept track.)
You need to look into the problem at hand more closely! The software plays a very important role. Perhaps it can benefit more from a GPU cluster rather than a CPU cluster? Can it benefit from the instruction set of the latest Xeons or will the older (and now cheaper) generation suffice? CFD simulations are quite memory-hungry, so 3 GB per core is pretty standard. Also, you need to make sure that the cores can talk to the RAM efficiently, so definitely pick a CPU with 4 memory channels. After 6 cores per cpu or so the communication between the cores and the RAM becomes the bottleneck, so don't stack too many cores on the chip. Dual socket motherboards and CPU combos are also pretty cost effective. Also, users tend to suck the performance advantage of such a machine quite rapidly, so you shouls also plan for the future.
Was just reading about a 25 GPU cluster for brute forcing passwords. You can use them for supercomputing too. You could probably homebrew one with used equipment and save some cash. Anyway, here is some inspiration: http://arstechnica.com/securit...
Chewbacon
The Bible is like Wikipedia: written by a bunch of people and verifiable by questionable sources.
The cloud is the last place you want to do CFD.
They will build the system for you. In fact, they require that they build the system for you in order for you to get ANY warranty service.
It's worth it. For just a bit over $100K you can have a 192 thread 1TB RAM 12GPU quad-cluster loaded with 24 SSDs.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
You have asked "What is the best car I could buy? Also, should I build it myself or get one from the showroom?"
As many other posts here suggest, the first question is kind of meaningless without knowing what you want to do with said car. Is it for trips around town? To carry 7 kids? A lean mean street-fightin' machine?
As for the second question, if your budget is $50k, then I suggest neither. You cannot (should not try to) build a general-purpose HPC solution and its infrastructure for that kind of money. If your use-case is not heavily dependent on high-bandwidth data transfer then definitely consider AWS/Google compute/Azure. If you have a very specialized use-case, perhaps a single compute job that was trivially parallelizable with little or no I/O, you could probably put something together for $50k and run it under a desk. But general-purpose HPC is not just a bunch of server units. A high-speed switch between your compute nodes alone could cost that much. A very basic chassis from Dell suitable as a compute node costs around $5k. Stuff it full of memory, 2 or 4 xeons, GPUs if you need them, fast local scratch disk, redundant 10GB network connects... again, you're looking well north of $25k per unit. Not to mention, as others have, you need a climate-controlled room with abundant, reliable and redundant power to put the thing in.
You might look at http://limulus.basement-superc... for a concrete example of the sort of system you are talking about. Also, http://www.beowulf.org/ is the home of the community of people who build compute clusters from any old hardware and run open source software on it.
If you're pinching the dollars, there is little better you can do than save money and build it yourself (given that there is no extra labor cost incurred for doing so). Many shops will also assemble a system for ~$250-500 and another $250-1500 for a 3-5 year warranty (depending on the cost of the system etc). You could also go with IBM/HP/Oracle and lease the bunch. It's really up to you what type of support contract you require (generally you won't need it). You could calculate the Amazon/"shared hosting" aka cloud route but I figured out (in my instance) that unless you need the thing less than ~30% of the time in a year, you are better out doing it yourself.
I am currently running several SuperMicro GPU workstations (4U with 4 GPU's each) and several more SuperMicro 1U/2U servers (storage). They are run-of-the-mill and really nothing anyone can improve upon, especially cost (and I've had several salesmen concede that point, most recently Nexenta). Off course if you don't know anything about it, it might be worth having a support/management contract.
Also a rack drawing 25kW peak does not mean you require 25kW of power. IF you run at more than 70% of your peak power as rated by the power supplies, you are running dangerously close to the limits. I run a half rack of storage and a GPU server and it draws a consistent 20-30A@230V (that's 4-6kW) although it's peak power based on power supply alone is closer to 15. The power supplies are really intended for the worst-case scenarios (8x 3.5" HDD, 100% CPU usage, 100% RAM slots filled, 4 5.25" devices, 4 GPU and several other add-ons).
As far as Xeon/Opteron, you would really have to benchmark them both against your application. For some loads one outperforms the other and although AMD is generally cheaper per performance unit, Intel may allow you to get more out of a single machine.
Custom electronics and digital signage for your business: www.evcircuits.com
Find a consultant and talk to him about precisely what software you want to run, so he can work out as future-proof a solution as possible for you.
There is no way I can recommend hardware without this basic information. Do you want to run a Quake server? Video streaming? Virtualisation? HD encoding? CGI rendering? Studio or OB relay broadcasting? Teleconferencing? CRM? Wiki/bulletin board/IRC/SL? Multi-point data aggregation? Physics simulations? All have specific and wildly different hardware requirements. Hell, I couldn't recommend a processor architecture at this juncture.
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
Or a bundle of sticks.
Never answer an anonymous letter. - Yogi Berra
If you're considering GPU look through any bitcoin mining forum. Setting up a reliable GPU farm is a nice tech challenge if you really want to grow your own. A few years ago I ran out of power at 50 GPUs, they're hogs. Heat is a whole other problem as the previous posted made note of. For a business use you may want to just use AWS or GPU appliances.
Time to bump an AC for a good solution.
At 100 to 150 US Dollars the parallella provides 18 cores per Raspberry pi sized board.
Run them with the deadhead distro and use a small Linux PC for the head.
Speed is impressive for the cost.
http://www.siliconmechanics.co...
They take all that commodity hardware and figured out how to make nice high-density systems for you. Stop trying to figure out how to do it yourself. They've done the hard work and (in my personal opinion) have absolutely outstanding support (before and after the sale). I'm just a happy customer.
"The great thing about multitasking is that several things can go wrong at once." -me
You see, if you (the one who posted the question) were a numerical mathematician or a computational physicist and looking for adequate performance in a research setting at rock-bottom cost, I'd say:have a look at GPU's (see e.g. here http://www.nvidia.com/object/c... ) and e.g. the Navier Stokes solver from Stanford U. (see here: http://mc.stanford.edu/cgi-bin... ). For such applications, hardware based on GPU's tends to give a much better price/performance ratio than rigs based on CPU's. But they're s lot harder to use as well. So find a suitable solver and see what hardware makes it shine.
But you probably aren't, or you'd have known that already (or looked it up in the literature or figured it out yourself).
Err ... if you are (as far as I can make out) just a computing guy who doesn't know Navier from Stokes but wants to put a "FEA/CFD rig" together, I think you're simply not the right person to do that. Yes, you're probably capable a few PCB's together that can run a generic Matlab-based solver, and will then find that it gives you an abysmal price-performance ratio on your particular workload.
And why? Well, the problem with "supercomputers" is: they're much more powerful than a general purpose computer only on very *specific^ problems. Change the problem and watch the performance change as well.
So you've got to tune your hardware to your problem in order to get realy good price-performance. And your "problem" is your solver. You need to choose that as well. And for that you need yo understand a bit about what a solver does relative to the problem you really want to solve. And it sounds as if you haven't a clue. Sorry.
The alternative is to let other people (consultants, vendors) do the thinking, and buy a custom solution. That will work, and will give you reasonable (but not great) price-performance ratios at a reasonable price level. But make darned sure that your hardware-software combinations is a good fit.
My suggestion: talk to your team member who knows what the formulas look like, what solver you're going to be using, whether low-accuracy is acceptable (e.g. if the objective is to obtain a graphical solution) or whether high accuracy is a must (e.g. for engineering purposes).
If low accuracy is acceptable but you want the best speed, think GPU's. If not ... determine what the particularities of your problem instances are, what your solver grid will look like, and what type of computing resources you'd need for that and how much.
Simply saying "a CFD problem" isn't nearly specific enough. And getting to a hardware configuration that has truly good price/performance levels is something for a specialist (or a team effort).
99% of the time, there is no real usable improvement between the last years and this years model (i.e. maybe in the 10% range if at all). But you can get last years version for half (or less) of the introduction price which usually is way lower than the current versions price.
Depends on a lot of things - if you have something massively parallel to do then core per $ is going to matter and speed is nowhere near as important. For other things speed is going to matter so you'll probably end up with a mix.
Memory is bound to be something that will drive the design. Do you want a LOT of shared memory (which means a few huge machines) or can you parcel it out to nodes on a much cheaper cluster? A network is NEVER fast enough for some things when the alternative is a big pool of memory.
If you are running commercial software your setup is going to be dictated by their licence conditions more than effectiveness at running their software - I went from an effective cluster to a small number of individual 64 core machines, with a slower speed, due to the licence changing to a per host model. A cluster full of the fastest eight core Xeons could get stuff done in half the time of some 64 core machines but at eight times the licence cost the difference in price with some software could pay for a few 64 core machines per year.
GPUs are nice if your problem can actually use them, sadly they still don't have the memory for a lot of things.
A good analogy of what is going on here is if the question was "why is the sky blue" and you have answered "dust" while the other has mentioned rayleigh scattering and a variety of other factors.
While "get some servers" is correct it's not exactly a useful answer is it? The above poster is right, for some stuff you want speed and for others you want as many cores as you can afford and don't give a shit about the speed, and without knowing what the submitter's software runs best on the choice is not clear.
...leave this HPC shit to people that know how to cover there ass.
You know, this HPC stuff really does not require the gay. That is optional.
"So long and thanks for all the fish."
In pretty much every HPC cluster I've seen or been personally involved with (mostly oil/seismic processing or crash simulations), the type of CPU is only one of the cost drivers!
Typically you end up spending about as much on fast interconnects as you do on motherboards/cpus/ram etc. The main exception to this rule is when you have an embarrassingly parallelizable workload, with small memory footprint and no need for cross-system communication, i.e. like a Monte Carlo simulation or password cracking.
For oil we used the largest single-image NUMA/SMP machine we could get at the time, this machine did the initial gridding of the problem space, then a standard cluster of 1K dual-cpu motherboards (i.e. 2K cpus) took over and did the main part of the actual processing.
There are exceptions though, like if you are doing Linear Programming type optimization which can be really hard to parallelize, or if you are using very expensive SW:
When you pay more for the SW than for the HW it is running on, then it makes sense to use bleeding-edge (gamer type) cpus.
Terje
"almost all programming can be viewed as an exercise in caching"
You may need a (screamin) front-end machine that splits up the work and hands it off to multiple multi-core machines. These multi-core machines may only be available at lower clock-speeds.
Dont just "look at your application". But look at what parts of your application are subject to parallelism and what parts must stay single threaded. You may need a special single-thread machine that can keep the other ones fed.
My company decided it wanted a new FEA machine. They decided to stay with the existing software company, so I called up the company and explained the situation and asked for the department that provided pre-sales support, specifically hardware recommendations. Turns out they had a strong bench of people ready to help with that and detailed Known Good configurations for each major hardware company. We simply decided looked at the software licensing costs, the hardware costs and how long our average scenario would take under various software/hardware configs and sized it to handled the existing number of jobs plus expected growth.
We decided what we could live with in terms of how long an average job would take (we decided we could live with 24 hours as an average). We then decided what sort of tradeoffs we could make in terms of hardware (an up front sunk cost amortized over many years) versus the annual software license fees. A little more spent on hardware up front meant we could save on software licensing costs by taking a step down in numbers of processors permitted. We then presented this decision to their presales people to get it vetted and asked for suggestions. In our case we took the suggestion that we archive saved results to our enterprise grade disk array and put the money into Raid 0 SSD's to speed up the overall job time. As always, RAM was the cheapest upgrade so we maxed that out.
Everyone signed off, we took the specs to a local system builder with a good reputation, told them no changes of ANY sort to any component, negotiated a price that included acceptance testing to ensure compliance and made sure they had enough of a profit margin so as to discourage shortcuts. They delivered, we tested, it gave expected performance results, we accepted it and paid them. That system is installed today and delivers the results it was designed for.
I would suggest a similar course for you.
1. Decide on the software first. Make darn sure it will do what you want.
2. Decide on how fast jobs need to be finished and how many per week/month/year to prevent over specing
3. Call presales support to get hardware recommendations
4. Make the decision on hardware cost versus software licensing cost versus number of jobs to be done
5. Do your homework! Understand what you are specing, talk to others, particularly customers.
6. Take your new found knowledge back to presales a few times to make sure you did not miss any improvements and you truly understand what you are doing, you are betting your job on this.
7. Find a builder, local or the hardware manufacturer, negotiate terms. Make sure you leave a decent profit margin to avoid the temptation to skimp.
8. Test, test, test. Confirm all configuration decisions with presales support.
9. Pay 'em and install the machine.
10. Don't forget to follow up to ensure it continues to work as designed and that procedures are being followed. In my case, we checked that all runs were backed up on our enterprise disk array.
There is some truth in this, but it will really hurt you in ongoing support and any upgrade paths. Although, most HPC's should be run in a silo'ed environment and replaced rather then upgraded piecemeal.
Cheap storage VM.
Yeah, this way he can burn through his $50k in a month or two.
Cheap storage VM.
You could do a lot worse than to buy one of these:
http://www.microway.com/produc...
Microway have been supplying high quality, high performance systems for decades and
they should have figured out how to do it right by now. If I had the spare money, it's what
I would choose.
For your CFD software, you might consider the open source OpenFOAM system.
Be careful with the memory subsystem to select DIMMs that will run at the maximum
rate of the system. Quad-rank DIMMs typically run slower. This may mean you can't
use the full range of 1TB dram that this system can address.
I was once tasked with the same scenario when I was an Engineering IT Manager for an aerospace startup. I would first ask the definition of "Bang for the Buck". Are we referring to Max TFLOPs/Hardware cost, HPC Utilization/Total cost of ownership, Value to business (Time to market, minimizing prototyping and tooling costs, material optimization) / Total cost of ownership, or something else entirely? The best bang for the buck is usually to use a managed cloud HPC provider but based on your post it sounds like you really want to build and maintain HPC. Given the ITAR nature of our business and lack of ITAR cloud vendors we had to build ours. Below was our process feel free to modify as needed. As someone previously stated, the first key considerations are workloads. Explicit vs Implicit solutions and different FEA and CFD solutions scale very differently and most will plateau on certain interconnects. There is a very large difference in architecture between running NASA Fun3D, Ansys CFD, LSDyna and Siemens Nastran and the architecture will change depending on the requirement(s) and performance goal. Once you know the target solutions, next consider interconnect requirement. Some codes do not scale well across multiple nodes eliminating the need but some codes demand extremely low latency (ie infiniband). Those codes are completely inefficient without microsecond RDMA capability. Next evaluate GPU compute compatibility. Be careful as some vendors have only partially implemented GPU compute and are only used under certain circumstances. When evaluating CPU choice, we always used Performance/Watt as the benchmark. Check spec.org for normalized performance comparisons and divide by TDP. Memory per system is a function of Solution memory size * number of concurrent jobs needed / number of nodes * ~1.25 Supermicro is by far the cheapest solution if you are going to integrate yourself. linux is the standard operating system with most RHEL derivatives supported by most software vendors Don't forget to consider storage. Most HPC systems generate many TBs of information and during a Job need high bandwidth storage access, usually shared. Phase 1 for us was to build a storage server with 10TB of SSDs and share the volume to all nodes with NFS. We dreamed of DDN but could not afford it. Connect everything together, determine MPI stack and any interconnect RDMA requirements (OFED, etc.). Install OS, configure a workload manager such as SLURM, write your submission scripts and start Testing. Plan for a very long testing cycle if you integrate yourself. Experts are hard to find and can be expensive, good luck and I hope you have a team of rock star linux gurus