Ask Slashdot: Best Bang-for-the-Buck HPC Solution?

Look for other users of the S/W for advice by peterjt · 2015-07-18 09:13 · Score: 5, Insightful

Why not start with looking at what S/W you plan to run, and then see what advice is available from them (and from other users) as to what H/W they would recommend.

Re:Look for other users of the S/W for advice by JamesTRexx · 2015-07-18 09:22 · Score: 5, Insightful

Precisely this. Do not look at the hardware for hardware's sake, look at the needs to run the software as best as you can. Does it benefit from parallelism? Throw tons of Opteron cores at it. Does it benefit from speed? Get the fastest Intels. Can it do everything in RAM? Stuff the servers with it, etc. etc.. Also, if it is built to scale, start with one or two servers, then see what kind of load it causes and base the next nodes you add on that data. You might even want to consider starting off with a virtual environment for portability to other hardware or cloud providers.

--
home
Re:Look for other users of the S/W for advice by Anonymous Coward · 2015-07-18 09:42 · Score: 1

Sure you do business with CDW+G, talk to your rep, have him put some consultants on the phone with you. Over year head? OPT in for professional services...
not a CDWG employee, but have used their consulting arm in the past with good results.
Re:Look for other users of the S/W for advice by Tough+Love · 2015-07-18 10:08 · Score: 0

Seems like mainly a way of avoiding the real question. It's pretty obvious what software the OP wants: PC server stuff. Any ideas, or did you just intend to hijack the thread?

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Look for other users of the S/W for advice by Anonymous Coward · 2015-07-18 10:16 · Score: 1

Spoken like someone who has no idea what the fuck they are talking about. Building for the solution is the right thing to do since the OP was so vague. They need to figure that out before even looking at the hardware. They don't know they are doing and shouldn't be handling this project at all. If anything, find someone who does and pay them before fucking up a purchase and looking more the moron than they already do by having to come to Ask Slashdot for advice.
Re:Look for other users of the S/W for advice by rwa2 · 2015-07-18 10:30 · Score: 1

Yes, look at software requirements first. FEA and CFD software can be extremely hardware specific. Cant they make use of powerful GPGPUs? Most server chassis will have great CPU/RAM but crap in the way of PCIe slots and especially GPU power plugs. What OS will the SW need to run? HP doesn't even certify "consumer grade" OSes on much of their rackmount lineup, and if you use Windows Server 20XX you often can't get the latest certified GPU drivers on the Server OSes, so you may well lose product support one way or the other. Ask me how I know.
Where are these servers/workstations going to be located? Servers are NOISY and belong in a climate-controlled server room, and then you'll need some sort of remote-access mechanism to them. Depending on latency and distance requirements, that can get pretty expensive.
If these are just headless number cruncher units, by all means absolutely use AWS (they also have some sort of CUDA farm if your software can leverage GPGPU). Then you can scale out the wazoo and pay only for what you need when you use it. Do your development work on your own mini-cluster (could be just a bunch of VMs in a workstation) if you want to keep standing operational costs down, but then farm out all of the big jobs to AWS and automatically shut those VMs down after they're done doing their thing. HPC clusters are a lot of work to design and keep running (something somewhere is always breaking once you get up past a dozen nodes or so). Unless what you're doing is classified, I doubt it's worthwhile getting into operating your own server farm, especially if you don't have one already.
Re:Look for other users of the S/W for advice by Anonymous Coward · 2015-07-18 10:37 · Score: 4, Interesting

this this this!!!!
For example the work I do with a HPC would need a monster DB able to handle millions of inserts a day. Which needs bottom rack intel video chips but monster data interconnects (think 40gb per sec and up). But someone doing oil topographical analysis or making a movie may want top of the line quadra nvidia cards and lots of memory and minimal disk space.
A HPC runs the gamut of what is out there.
For about 50k I am sure you could build something from HP or Cisco that is in the 100-200 cpu range. But what are you going to do with it? What sorts of network interconnects are you looking at what sort of storage do you need? If you need say 500k sustained IOPs per second 50k will not cut it (start thinking in the 400-1million range).
That just gets you the hardware. Do you need a particular bit of software? What is that going to cost and ongoing cost? For example something like splunk can costs several hundred k per month in the right environment.
Without the specs of what you are doing I would be randomly guessing what you need.
My advice? Start with a prototype of bottom run 'crap' 'costco special' hardware. Work your way up and decide what you need. Perhaps hire someone who knows how to plug this all together. Having done this a few times it can be a challenge just to manage 5000 bits of hardware all showing up one day and getting it all put together. Finding a location and power sources can actually be a challenge. Depending on how big it is you may not be able to plug it into your buildings mains. I suggest a high level design then work your way down to lower designs. But most of all HIRE SOMEONE who knows this stuff. There are thousands of people out there that need a job that can do EXACTLY this sort of thing.
Re:Look for other users of the S/W for advice by Anonymous Coward · 2015-07-18 11:41 · Score: 2, Insightful

I will third this. I will also state that I was directly involved in building a home grown cluster that was highly ranked in the Top500 List a little over a decade ago.
You MUST begin with needs analysis and that goes WAY beyond just looking at research domains, in this case FEA and CFD. You have to know what software you want to run. You must also research and find out if there are alternatives to what software you currently run (or are initially planning to run) that may have modern competitors that run more efficiently.
I will also note that FEA and CFD have different resource needs, and therefore different hardware configurations that would be optimally suited for those tasks (I think someone else in this thread has already brought this point up below), so if you do want to run both types of software packages on the same machine you will be making some compromises on efficiencies and configuration to do that. Most of the FEA code that our researchers ran was run on single-system image, shared memory machines (SGI), not an MPI-based, distributed memory cluster where the CFD and MD/QD folks get their best bang for the buck. I don't know how much that has changed in the last few years, but I would imagine, not much.
I will keep an eye on this thread over the next couple of days. If the OP wishes to contact me I'd be happy to help them work on this challenge. We can figure out how to connect if I get a reply to this post.
Re:Look for other users of the S/W for advice by ShanghaiBill · 2015-07-18 12:30 · Score: 1

Why not start with looking at what S/W you plan to run, and then see what advice is available from them (and from other users) as to what H/W they would recommend.
Bingo. The correct HW depends entirely on the SW. Depending on the SW, the GPU likely matters more than the CPU.
You should also consider just renting the HW on demand from AWS. Unless you are going to run your rig 24/7 (you won't), it is likely that AWS will be cheaper, and you won't be stuck with outdated HW in a few years. If you need results quickly, just spin up additional instances.
Re:Look for other users of the S/W for advice by sumdumass · 2015-07-18 12:40 · Score: 2

Just wanted to add, don't stop at the recommendations the software suggest.
I had a client who decided to go with the hardware recommendations provided by the software vendor against my objections. Six months after we were up and running, the software which was the entire point of the ordeal released an update that slowed everything down enormously. Turns out, their "recommended" hardware specs were slightly better than their minimum specs on the new version of the software and the server had also been purposed to do a few other minor things that ran in conjunction with the software. You might as well say it was the minimum.
So by stopping at the recommendations of the software vendor, they were presented with a setback no one was really thinking about. They could either roll back the software version negating the support and upgrade purchase plan, suffer the slow speeds and hope the vendor doesn't slow it any more, or replace good hardware that they really didn't have a use for outside of the specific software. They eventually let me completely overkill a server to replace it.
So unless the software vendor says there is a limit, reasonably increase the power and memory of their recommendations for future proofing. Just keep in mind you will want to eventually replace the hardware anyways else risk suffering down time from the inevitable failures.
Re: Look for other users of the S/W for advice by Anonymous Coward · 2015-07-18 12:59 · Score: 0

Also look into Starcluster for easily running multi-node HPC clusters in AWS.
Re:Look for other users of the S/W for advice by mlts · 2015-07-18 13:47 · Score: 1

I will add another voice into this list in agreement. The problem is that what is needed is so vague.
There is just no way to recommend hardware. Do you need a lord-king-God-Almighty interconnect backbone switch between all nodes so they can push 40 gigs/sec between each other? A blade/enclosure is a must. Do you need I/O performance above all else, or CPU performance? It might be cheaper to buy a ton of 1U ProLiant G7s with HBAs[1] and 10GigE cards.
Oracle RAC? Again, need a hefty SAN connection, perhaps with a beefy HBA that has up to a terabyte of temporary storage which can help deal with heavy I/O for the active/active needs.
What is your SAN topology like, or do you even have a SAN, as Windows Server 2016 Storage Spaces Direct is being readied as a SAN alternative, provided the links between each of the machines is fast. (The ideal would be InfiniBand... but for the fastest speed, it may wind up being 10GigE.)
Then comes software. Throwing ESXi on the entire cluster will make life a lot easier than spinning up, updating, and wiping bare metal OS installs. However, virtualization does come at a slight performance price and a hefty licensing price.
If I had to recommend hardware for the OP's project, there is no way I can even point in a usable direction, and one mistake can be disasterous when it comes to time/money.
[1]: HBA as in fiber channel or Ethernet CNA. Whatever your heart desires. If you have fiber channel switches, even old 8G fiber channel will handle more than most operating systems can chug out.
Re: Look for other users of the S/W for advice by kenh · 2015-07-18 14:25 · Score: 1

Seems like mainly a way of avoiding the real question. It's pretty obvious what software the OP wants: PC server stuff. Any ideas, or did you just intend to hijack the thread?
What? You don't build a High Perfomance Computer (HPC) to run 'PC Server Stuff'...
But let's say he does, he wants to build a monster PC Server to run 'PC Server Stuff', wouldn't a file server be different from a VM Host, a database server different from a compute server? And what determines how you build up the server? The software you choose to run on it.

--
Ken
Re:Look for other users of the S/W for advice by Anonymous Coward · 2015-07-18 19:45 · Score: 0

Hello, OP here. You seem to know what you speak of so I would appreciate to communicate with you.
I made a "faceless" Email account because I really don't want to invite the spam lunatics to my regular one: bulvankonto[at]gmail[dot]com
Re:Look for other users of the S/W for advice by KGIII · 2015-07-19 00:34 · Score: 2

I made use of their services quite some years ago. This is a seconding for them. They became our go-to for hardware and hardware recommendations even while we were mostly a Sun shop. The reps were knowledgeable and polite. The service was top-notch. The after-sale support was surprisingly good. I have been out of the loop for about eight years as I am now retired but I keep my ear to the ground a little bit and have not heard anything that would make me inclined to believe they have changed.

--
"So long and thanks for all the fish."
Re:Look for other users of the S/W for advice by Thumper_SVX · 2015-07-19 02:39 · Score: 1

I wish you hadn't posted AC, and I wish I had mod points!
This is the right answer. The workloads aren't clear because we don't know what OP is trying to accomplish with this setup. Is he building an HPC cluster to do engineering analysis, or is he building it because he has convinced his management that it's cool? If the latter... well, he'll be looking for a new job after he builds it and it does nothing to help his customer (his employer).
Start with the application. What are its workload characteristics? What kind of backend does it need?
Then look at the backend. What are ITS workload characteristics.
Only then should you look at hardware. Answers to the first two will answer what hardware you need, but even then there are a lot of moving parts to take into account. How many nodes? How much CPU per node? Highly parallel or high speed? How much memory per node? Storage infrastructure? Output type?
This is a terrible AskSlashdot question because it requires an intimate knowledge of the workload being proposed. I am a consultant for a living so I do exactly this process above every single day. Any answers will literally just be spitballing because the information we have available is so vague... and any actual answers are guaranteed virtually useless and a quick way for OP to lose his job.
Re:Look for other users of the S/W for advice by postbigbang · 2015-07-19 06:57 · Score: 1

You're right.
On on plane of the graph is CPU family. The next one is speed and cache management. IO periodicity considerations is another vector. Which freaking OS, and if it's scaled up/out or is static through its lifecycle.
Containers? One fat app? What does what talk to what, via hypervisor, container-hosting, or linear OS? How much network, how often, and with what concurrency to which apps/VMs/containers, etc? Quiet or aperiodic duty cycle? Transaction processing? Must be highly parallel/available? Talks to what, when, with what legacy infrastructure?
Is this a software question looking for hardware, or a systems question looking for efficiency, budget constraint, or just sexy buzzwords?

--
---- Teach Peace. It's Cheaper Than War.
Re: Look for other users of the S/W for advice by Tough+Love · 2015-07-19 06:59 · Score: 1

PC server stuff = OS that runs on a PC. Apps that run on a PC. Middleware that runs on a PC. Sad that I had to spell it out.

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Look for other users of the S/W for advice by Tough+Love · 2015-07-19 07:07 · Score: 1

I wish you hadn't posted AC, and I wish I had mod points!
This is the right answer.

No it isn't. You conveniently forgot that OP clearly stated "FEA/CFD", narrowing down the appliction area and available solutions more than sufficiently. Perhaps you focused on jumping on the loudmouth wagon instead of actually considering the question that was asked?
Could have been a great thread if not for you guys.

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Look for other users of the S/W for advice by Tough+Love · 2015-07-19 07:16 · Score: 1

Is this a software question looking for hardware, or a systems question looking for efficiency, budget constraint, or just sexy buzzwords?
The first two and not the third, quite obviously. OS is easy: Linux. I vote -1 for containers. One fat app sounds like a bad idea, except certain hot spots, but that's not what the OP asked, is it?
You're getting warm with the speed and cache management. Now add a cost axis and you're addressing the original question. I hope.

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Look for other users of the S/W for advice by postbigbang · 2015-07-19 07:29 · Score: 1

There's really insufficient info lent to what the app is. Canned? Scale? Lifecycle? Only HPC. Connects to what. Needs what IO.
There are fat apps that may be more than sufficient for what's needed without VM/container/walls overhead of any kind.
Many variables are unstated and someone asking for the behavioral characteristics of processor families that Intel makes vs big hulking hardware generic platforms.
I can see wanting to use scale up/out ideas, but this is far too nebulous to call this a nail, as in the kind which you can hit with a hammer. It might be a hammer, looking for a nail, instead.

--
---- Teach Peace. It's Cheaper Than War.
Re:Look for other users of the S/W for advice by Tough+Love · 2015-07-19 18:21 · Score: 1

Wrong. The OP stated fluid dynamics. Read again please.

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Look for other users of the S/W for advice by Gazzonyx · 2015-07-19 22:16 · Score: 1

Agreed. I'd like to see the look on their face when they realize they have enough power for racks of servers but not nearly enough cooling. We've actually got a staging area at work where all of the power comes into the building, but it's only got enough cooling for a couple of racks (and not dense racks, at that) full time and then a couple of racks meant for short tests. Running everything full for any lengthy amount of time probably wouldn't fly even in the Minnesota winter with the receiving garage door open. To be fair, the room wasn't created for racks of full time servers on purpose.

--
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
Re:Look for other users of the S/W for advice by nikkipolya · 2015-07-20 00:01 · Score: 1

You conveniently forgot that OP clearly stated "FEA/CFD", narrowing down the appliction area and available solutions more than sufficiently
I worked for a major CAD/CAM/FEA software maker and they have only recently added distributed computing capability to their software and it is still very limited in its capabilities. They still do not use GPU capabilities beyond rendering. While I know from my erstwhile colleagues who are now working for another major FEA/CFD maker, and their software has cluster computing capabilities for many years now, their software utilizes GPU too. So I agree with the its important to understand the capabilities/requirements of the S/W before you procure the hardware.
Re:Look for other users of the S/W for advice by Anonymous Coward · 2015-07-20 03:08 · Score: 0

they also will want to consider power usage which will eliminate amd fairly quickly.
also i'm not to keen on only assessing current needs, as i would also consider possible longer term needs partially based on budget and how current needs affect that etc
diy maybe if you feel that you have the internal competency and time for those so competent to handle it. i'd also factor in maintenance etc. time for them as i presume that if you order elsewhere that you'd likely at least get a minimal support conract in the deal.
really alot for an outsider to say much other than pure generalities imnho
Re:Look for other users of the S/W for advice by Anonymous Coward · 2015-07-21 01:02 · Score: 0

" If you need say 500k sustained IOPs per second 50k will not cut it"
err, there are some new generation *stupidly fast* ssd drives http://www.tomsitpro.com/articles/mangstor-mx6300-enterprise-nvme-ssd,2-936.html#fragment-1
($12k for 2.4T of storage, 900k read iops sustaned; 3.7/2.4GB/second sustained read/write) - a resonable workstation, dual/quad cpu, max out the ram (128+gb?) and that could be quite a good workhorse for a resonable set of HPC use cases.
Re:Look for other users of the S/W for advice by LWATCDR · 2015-07-22 08:53 · Score: 1

Exactly. You have a specific task and probably specific software for that task. If the software supports CUDA then you might want to spend money on Tesla cards over CPUs. Does it use Open CL? Then you might want to look at AMD GPU compute cards.
Do you need a large memory space?
Do you need a lot of threads or just a few really fast ones.
If you have 50k for the system then I suggest you spend a little of it on someone that really knows this subject.
It may make more sense to just use Amazon E2C.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:Look for other users of the S/W for advice by Anonymous Coward · 2015-07-30 07:29 · Score: 0

Sorry for the delayed response. I had some unexpected travel. Please check that email address as I have sent info to you.
Cheers.

Supercomputers are very workload specific by mdtiemann · 2015-07-18 09:15 · Score: 2

You mention you are interested in CFD. Intel Phi processors have been known to do well here: http://www.cfd-online.com/Foru... . In that linked story, a single Intel Phi processor beats a 1024 core cluster. Moreover, Thinkmate is literally giving away Intel Phi processors: http://www.thinkmate.com/syste... . But not all workloads fit the Phi, so you really need to do some benchmarking before you buy.

Re:Supercomputers are very workload specific by thedanyes · 2015-07-18 20:06 · Score: 0

The site you linked requires registration to actually see the posted benchmark score. Can you give a summary?
Re:Supercomputers are very workload specific by mdtiemann · 2015-07-19 00:08 · Score: 1

May be I am wrong, by I will try compare results. There is some data
http://www.hector.ac.uk/cse/di... and from topic starter
Xeon Phi for 50 time steps
grid size - 90^3 - 175^3
best time - 200s - 1500 s
Hectors 4 core of AMD 2.8GHz dual-core Opteron 5 time steps
grid size - 100^3 - 200^3
time - 795s - 8800 s
Hectors 1024 core of AMD 2.8GHz dual-core Opteron 40 time steps
grid size - 200^3
time - 1490 s
So, single Xeon Phi card for OpenFOAM is compatible with 1024 core cluster (for this benchmark)
Re:Supercomputers are very workload specific by Cytotoxic · 2015-07-19 00:41 · Score: 1

I don't work in this area, so I wouldn't know..... but are the different grid sizes significantly different? I would assume that going from 175^3 to 200^3 could be a major jump - the sort of thing that imposes big costs for handling exponentially increasing amounts of data.

Get some quotes by hawkeyeMI · 2015-07-18 09:18 · Score: 2

Disclosure: I have worked for Penguin Computing in the past, though I currently have only a customer relationship with them (we use their Penguin on Demand service). I strongly recommend you talk to a few of the HPC vendors out there about your needs and get a few quotes. Obviously Penguin is one I recommend, I'm not sure who else is still in the business, I think at least one of the major ones I've gotten a quote from in the past went under. Just do a little googling. They are probably familiar with your applications and can get you a turnkey solution that's well-suited for your application.

--
Error 404 - Sig Not Found

Re:Get some quotes by Tough+Love · 2015-07-18 10:13 · Score: 1

And what OP wants most probably would be as commodity as possible. What about cheapo meta shelving with minitowers carrying dual socket server boards with mid range 8 core machines? Density low enough to cool with built in fans and ambient air.

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Get some quotes by hawkeyeMI · 2015-07-18 10:30 · Score: 1

Plenty of HPC folks out there selling rebranded supermicro gear, including Penguin, with a variety of cluster management systems on them, open source and proprietary. That's pretty commodity.

--
Error 404 - Sig Not Found

Just rent it by Anonymous Coward · 2015-07-18 09:22 · Score: 0

This is the sort of situation cloud computing is made for, you just don't have to try and answer these questions any more. Signup for an AWS account, fire up the computers you need for the specific job you are running right now, then turn them off when you are done and pay a few dollars each time, be a hero. You can pick from a variety of different types of computers depending on the needs of your specific job on the day: compute optimised (18 cores), RAM optimised (244Gb), IO optimised, you just don't have to worry about investing tens of thousands of dollars up front and getting it wrong any more.

Re:Just rent it by Anonymous Coward · 2015-07-18 09:54 · Score: 0

That might work if there is no need for low latency communication with the cluster nodes and the used software package allows it.
Re:Just rent it by wezelboy · 2015-07-18 12:22 · Score: 1

The cloud is the last place you want to do CFD.
Re: Just rent it by Anonymous Coward · 2015-07-18 12:43 · Score: 0

I'd rather do it in the cloud than in your parents basement.
Re: Just rent it by Anonymous Coward · 2015-07-18 15:13 · Score: 0

You just advocated Cloud if you think about it.

Why not rent the time? by plopez · 2015-07-18 09:24 · Score: 3, Informative

You haven't said anything about your application. Do you run it continuously? Sporadically? Will the machine be sitting idle much of the time? Do you have the staff to support it? What about networking and storage? Do you have the ability to rapidly move and store data as the actual computing is only part of the story.
It may make sense to rent the time due to lower storage and maint. costs than to actually buy and maintain the infrastructure.

--
putting the 'B' in LGBTQ+

Haswell-EP Xeons by coats · 2015-07-18 09:24 · Score: 2

I would go with Haswell-EP Xeons -- probably 2697v3 (14 cores @ 2.6-3.6): a two-socket motherboard gives you 28 physical cores per board, for prices in the $12K range. Just one of these is quite a powerful system. If you can get by with a 2-node system, then 10GE interconnect is good enough (AND MUCH CHEAPER); for more nodes, you will need Infiniband (since 10GE does not scale well). The 4-node/IB cluster will be on the order of $60K, and will offer more performance than a $160K solution of a couple of years ago.

These will offer far better performance than the Opteron solution.

Can you compile your own application? If so, use the Intel compilers, and make sure you compile targeting the Haswell instruction set (-O3 -Xhost -march=corei7-avx2 -mtune=corei7-avx2 if I recall correctly): the full AVX2 Haswell instruction set is rather more powerful for your app than the predecessor "AVX" SandyBridge/IvyBridge instruction set, which is far more powerful than the previous Nehalem/Westmere SSE4.2 instruction-set, which is somewhat more powerful than a simple "-O3". If you can't compile on your own, try to make sure the vendor's executables target AVX2; the right compile-flags will double your performance over "-O3"...

--
"My opinions are my own, and I've got *lots* of them!"

Re:Haswell-EP Xeons by Tough+Love · 2015-07-18 10:23 · Score: 1

...If you can get by with a 2-node system, then 10GE interconnect is good enough (AND MUCH CHEAPER); for more nodes, you will need Infiniband (since 10GE does not scale well)...
Useful commentary for the most part, but this bit is just wrong. Nobody needs Infiniband. If you think you need Infiniband then get some RDMAoE instead. Save yourself some money and some grief.

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Haswell-EP Xeons by Anonymous Coward · 2015-07-18 11:26 · Score: 0

Sorry to say but your post exhibits a lot of ignorance, no nothing you said was strictly incorrect. But this question shouldn't be answered with hardware, it should be answered with, "Stop talking about hardware, tell me about your use cases and application". making any hardware recommendation without that basic information you are just shooting blindly.
Re:Haswell-EP Xeons by Anonymous Coward · 2015-07-18 12:01 · Score: 0

Untrue, *most* people don't need infiniband. If you have about 200 nodes or more and a lot of node-to-node communication, ethernet switching fabric starts falling down on performance for most configurations (there are some places that have allowed IB like fabric with ethernet, but using openflow controller implementations that they aren't really sharing their secret sauce with the worldright now).
Mellanox has had the industry by the balls for some time and gouge accordingly. With Intel getting back into the game of HPC fabric in a serious way, hopefully that changes to a more competitive pricing scheme.
Re:Haswell-EP Xeons by jlehtira · 2015-07-18 19:37 · Score: 1

The original poster did say it's for FEA/CFD. You may not know what this means, but that doesn't mean there's no basic information. Incidentally, the suggestion for 28 core Xeon nodes is exactly what we got for weather prediction this year (and CFD would be similar - need for huge number crunching power, with lots of communication between threads every timestep). The only obvious alternative would be a GPU-based solution, but to my knowledge most existing FEA & CFD codes don't use GPUs.
Re:Haswell-EP Xeons by coats · 2015-07-18 23:43 · Score: 1

You've never run CFD,have you? ...and don't know its message-patterns, either. That's what the OP said he was doing (though he didn't say whether he was running someone else's "canned" code, or was compiling his own). Unless it's specifically compiled for Haswell (which is unlikely,but sad), the "canned" code will not take proper advantage of the Haswell AVX2 instruction-set.

--
"My opinions are my own, and I've got *lots* of them!"
Re:Haswell-EP Xeons by Anonymous Coward · 2015-07-19 12:10 · Score: 0

I do know what it means. however FEA/CFD software vary's greatly and depending on size of the datasets and actual business requirements the actual hardware specifications will differ massively.

Maybe owning the hardware is not the best solution by Anonymous Coward · 2015-07-18 09:28 · Score: 0

I am wondering if you have considered using Amazon EC2 servers instead of owning your own hardware. There are GPU instances as well has high-performance CPU instances and you only have to pay for the time you actual are running, plus you don't have to maintain the equipment and you can upgrade as technology increases.Unless you are running full out 24x7, buying hardware means having to pay for the peak load all the time.

Amazon AWS by Cyberax · 2015-07-18 09:30 · Score: 4, Interesting

Unless you need to transfer A LOT of data from your cluster, Amazon AWS will probably be cheaper than dedicated hardware. Especially if you can use spot instances (that are 5-10 times cheaper than the regular Amazon EC2 instances).

Re:Amazon AWS by Anonymous Coward · 2015-07-18 09:56 · Score: 0

there you go.... http://www.computer.org/csdl/mags/co/2009/04/mco2009040035-abs.html
Re:Amazon AWS by kimanaw · 2015-07-18 10:05 · Score: 2

This. AWS has a GPU tier (kinda pricey, but probably cheaper than standing up an equivalent on your own). I'm guessing your FEA/CFD will probably need GPUs. $50K will rent a lot of GPU time. Not sure how available the spot instances for them are.
otoh, if you're looking to use regular CPUs, Azure has an infiniband tier that may be a better interconnect for HPC purposes than AWS's 10 Gbps VPC's.

--
007: "Who are you?"
Pussy: "My name is Pussy Galore."
007: "I must be dreaming..."
Re:Amazon AWS by Sebo · 2015-07-18 10:58 · Score: 1

As suggested, take a look at http://aws.amazon.com/hpc/getting-started/. If nothing else, it gives you an option that's faster to implement, reduces CAPEX, and doesn't leave you with physical infrastructure that is immediately depreciating and becoming "obsolete".
Re:Amazon AWS by Anonymous Coward · 2015-07-20 00:48 · Score: 0

This is pretty much your answer. It sounds like you don't really know how your software is going to work (a prerequisite to doing any HPC project) so pop on a cloud provider and figure it out. Once you know your software performance characteristics and your service cost structure, then you can do a legitimate comparison to buying an optimized solution.
The term "optimization" is something that it's also critical to understand. Buying "fast hardware" is generally a good idea, but the performance trade-space (especially on a small budget like yours) is crucial to understand, and it can swing the economics wildly. An optimized $50k solution could be 100x faster than a non-optimized solution, so you need to know what parts you can cheap-out on, and which ones are worth the price.
Re:Amazon AWS by Anonymous Coward · 2015-07-20 06:51 · Score: 0

My SWAG advice would be, get your feet wet with AWS. You might discover that you need more than AWS can provide but that can come later. It seems premature to commit to a hardware configuration unless you can start to get a decent application profile.

" There seems to be precious little..." by Glasswire · 2015-07-18 09:31 · Score: 1

"...straightforward information about this on the net." Because it's not straightforward. If you are have enough grasp on your requirements to understand the apps you want to use and you are using commercial CAE / CFD codes, your ISVs should be able to give you some guidance about what typical customers are running (how many cores over how many nodes configured with how much ram and storage with what kind of cluster interconnect and MPI message passing etc) for workloads similar in size to yours. If you're actually considering writing your own, please reconsider unless you have some very particular requirements - but if you do, you'll already have a really good idea of what level of parallelization your cluster architecture requires.

See what you can do with leasing or cloud by garyisabusyguy · 2015-07-18 09:32 · Score: 2

There are plenty of costs beyond the actual computer, including power, power conditioning, battery backup, heat removal, etc... that make up most of the cost.

If you still decide to build your own hardware, then pay close attention to

1. Compatibility with your chosen software, i.e. the best system in the world is worthless if it does not run the software that you want. You may be building your own software, then you will still need to consider OS, compiler, libraries, etc

2. Ability of the operating system to provide enough resources to your software, in the 'good old days' Windows only provided a limited amount of RAM to processes, even in today's world Windows system swap aggressively and may not give you the RAM performance that you may see in the Enterprise *nixes

3. Internal bus structure of the system you choose, The biggest growth in PC hardware has been the internal bus width and speed. Look around, but for cost's sake, you will probably be using a variety of PCIe from Intel. You will probably also see better integration with the PCIe bus with Intel chips. If you are using GPU accelerators, that is a whole 'nother kettle of fish that will affect your other decisions above and below

4. Methods provided for disk access, used to be the Fibre-Channel was the King, but times have changed with iSCSI making inroads, and local disk architecture provides the greatest bang for the buck with SATA starting to edge out SCSI. If you go the SAN or iSAN routes, it will have additional costs for rackspace, power and cooling.

5. Disk system that you choose, most people would suggest butt-loads of local SSD, after RAM, solid state drives will probably be your highest costs

Just my two bits, plus I completely ignored tape system vs spinning-disk hard drives for backup, which would add more rack space, power supply and cooling to anything that you try and put together. Try and put together a realist estimate for purchasing and supporting your hardware for a couple of years and compare it to cloud cost for similar resources

--
Wherever You Go, There You Are

Re:POOPING IS A VITAL VIRTUE by Anonymous Coward · 2015-07-18 09:34 · Score: 0

But is the opposite of a sin still a sin?

small cluster: performance/price metric by lkcl · 2015-07-18 09:35 · Score: 5, Interesting

i did this before, on a very small scale, for GBP 1,000 about 10 years ago. sales teams kept offering me 2ghz dual-core machines at GBP 300 each and i had to tell them this:

"look, i have a budget of 1,000 GBP. you're offering me a 2ghz system for 300. so i can only buy 3 machines, right? so that's a total of 6 ghz of computing power. on the other hand, if i buy this GBP 125 machine which has only a 1ghz processor, i can get 8 of those, which gives a total of 8 ghz of computing power. so _why_ would i want FASTER?"

so i bought qty 8 of motherboard, CPU, 128mb RAM, low-cost case containing a PSU already, and accidentally included a 3com network card because i didn't realise that the built-in ethernet on the motherboard could do PXE boot..... but still, all-in that was 125 GBP and each one took 15 minutes to assemble so it was no big deal. got myself 8ghz of raw computing power, which was the best that i could get for the money that i had.

and that's the question that you have to ask yourself. what's the highest performance / price metric that can be achieved?

the highly specific problem that i was endeavouring to parallelise was a very small memory footprint non-I/O-bound task: running the NIST.gov Statistical Test Suite. i booted all 8 machines off of my laptop, over PXE boot with an NFS read-only root filesystem. had to wait 30 seconds between each because my 800mhz P3 laptop with 256mb of RAM reaaallly couldn't cope with 8 machines hammering it... not over a 100mbit/sec link, anyway.

once started, i wrote a script that ssh'd into each and left them running the STS for a day at a time. very little actual data was generated: a report.

but the issue that you're solving may involve huge amounts of disk I/O, it may involve huge amounts of inter-connectivity (inter-dependence between the parallel tasks). you may even have to use a GPU (OpenCL) if it's that computationally expensive... ... and that's where anyone's advice really ends, because unless you know exactly what it is you need to do - in real, concrete terms of I/O per second, GFLOPs/sec, GMACs/sec, inter-communication/sec, you really can't and shouldn't even remotely consider spending any money.

so please consider writing a spreadsheet, based on the performance/price metric, extending it to the domain(s) that you're interested in optimising. then the answer about what to buy should be fairly self-evident.

oh and don't forget to include the power budget (and cooling) because i think it will shock the hell out of you. remember you need to include the maximum specs, not the "average" or "scenario design power".

Phi ... by Anonymous Coward · 2015-07-18 09:35 · Score: 0

A long time there was the Intel answer to the GPGPU - Larrabee. But 3 months before roll-out management required a rewrite of 1/3 of the code base, and surprisingly enough the performance benchmarks were not met, so the group was disbanded and "re-org-ed" into servers. The "LRB 2" was renamed "Knights Corner" and was released as Phi.

It is a cluster on a single piece of silicon. They claim it is a "Xeon" but that disguises the nature of the silicon. It is warp-drive.

So look at the Tianhe-2. The Intel guys say "our Xeons" (not Phi) made it fast. They really didn't. What made it fast were the gpgpu's onboard. The makers of the supercomputer say "Yay Nvidia" and oh, yeah there are other vendors parts.

IMO - there are boxes that allow several Phi's in the same system. I would start there. Several of those in a microserver are going to be quite dense.

Place to start: software vendors by Anonymous Coward · 2015-07-18 09:36 · Score: 0

You should leverage the CAE software vendors. They usually run benchmarks on various hardware, and work with hardware partners, to help show the performance for various configurations.

One thing to keep in mind: CFD s/w generally scales very well, but FEA doesn't always scale as well (e.g., less than proportional improvement with increased numbers of cores). You may find one setup works well for CFD, but not so well for FEA, or you may want to limit number of cores for FEA.

You are going to need more detail. by fuzzyfuzzyfungus · 2015-07-18 09:37 · Score: 1

Your options really depend on what sort of 'high performance' you have in mind. When it comes to performance per core, Xeons typically crush Opterons; but the pricing reflects that, especially if you need the 4-8 socket support and RAS features. If what you need is large amounts of RAM with the lowest possible spending on the system around it, Opterons have tepid performance per core; but are likely to be the cheapest option that still supports ECC, more than one socket, buffered DIMMs, and any other niceties you wouldn't get from just desktop. If your application is one that can be made to fit, GPUs are enormously powerful for the range of things that they are capable of doing well.

They also depend on how big you need your system to be and how tightly coupled it has to be. If your intended application handles its own network-level parallelization and doesn't depend on very low latency, blessed are you. Price per core skyrockets if you go above 2 sockets, and GbE is effectively free(at least in the sense that you pretty much can't buy a system or motherboard that doesn't come with at least 2 NICs by default, often more) and relatively cheap to switch. If you need lower latency, this will hurt more and you are looking at myrinet or infiniband. If your application needs a cluster that presents a single system image; especially one that also has genuinely low latency, you probably need to fortify your checkbook and consult an expert. You can get systems with more than 8 sockets and the appropriate custom interconnect; but you won't like paying for one.

Unless you value your time at surprisingly low rates; you probably won't want to build your own systems from parts; but depending on how tightly coupled you need, this may be something that you need to purchase as a system or something you construct from multiple computers you purchase.

Can you use either hardware you have or AWS(or one of their similar competitors) to better characterize what your application actually needs?

Experience with 40 node cluster by Anonymous Coward · 2015-07-18 09:41 · Score: 0

So some guys at our math department got their hands on money for a cluster (back in 2008). They bought 40+ vanilla pcs without hdd and were going to "install Linux" on it. Some of these guy were Debian and Gentoo freaks. Fast forward two months where four-five people continuously work on this project (all the while doing zilch research) and the cluster works... sorta. It never really worked properly: vanilla hardware is not made for continuous operation at 100% and high temps without "local" cooling. Fans had to be replaced continuously, dust was a problem. Stability was not too great either.

So do not roll your own would be my advice!

Re:Experience with 40 node cluster by Anonymous Coward · 2015-07-18 10:35 · Score: 0

times are a changing http://www.rocksclusters.org

Operating System by Anonymous Coward · 2015-07-18 09:42 · Score: 0

Though you're asking about hardware, it might be good to mention this as for a software recommendation: Rocks cluster OS. It's a breeze to install on the head node, and the compute nodes install themselves over PXE.

What can the software you will use do? by Kjella · 2015-07-18 09:44 · Score: 1

So you're using workstations now, my first order of business would be to figure out how you'd work together on a server or cluster. Does your software and workflow actually support that or will it just be like a super high end workstation. Once you've got that done, you can start working on what is it your workload actually needs. How many nodes, CPU, RAM, network and so on. In general if your software scales well, more and less powerful nodes will do the job cheaper. Quad-core systems are expensive and should really just be used if you need >28 cores in a single machine.

--
Live today, because you never know what tomorrow brings

Don't ask Slashdot! by Anonymous Coward · 2015-07-18 09:46 · Score: 0

The vast number of people in a place like this don't really know your field. As a CFD programmer... $50k will barely get you started in the rack-mounted world. Get the nicest workstation you can, forget about the rest unless you have 6 or 7 figures to play with. Your funds are within a deadzone... less is clearly workstation area, more is clearly big cluster area. In between... meh. You're bound to waste money on stuff that seems nice but is useless -- I'm speaking from experience on that. You can talk to "HPC companies", but they'll encourage you to waste your money. Get a decent workstation, get someone used to running the codes on it, then ask that person if they want more, tell them how much money they have to play with, and listen to them.

GPUs by Anonymous Coward · 2015-07-18 09:46 · Score: 0

Spend the money on rewriting/purchasing software that can run efficiently on GPUs (there's plenty of PDE solver algorithms that can; both for solid & fluid dynamics) and spend the rest on a bunch of Tesla boards.

RE: Best Bang-for-the-Buck HPC Solution? by Anonymous Coward · 2015-07-18 09:57 · Score: 0

i know of a new company that does this type of thing, but they do not advertise their services to the general public yet. p.a,r.a,l.l,e.l,a.w,a.r,e$dot%com. they have a contact email address on their website. they focus on massively parallel hpc stuff. i suppose that includes cfd.

Re:Maybe owning the hardware is not the best solut by Anonymous Coward · 2015-07-18 10:01 · Score: 0

and for you too: http://www.computer.org/csdl/mags/co/2009/04/mco2009040035-abs.html

Real HPC next to your desk by deadline · 2015-07-18 10:09 · Score: 1

Take a look at Limulus systems from Basement Supercomputing: http://www.basement-supercompu...

These are fast low power/noise/heat systems with a fully installed open source HPC software stack.

--
HPC for Primates. Read Cluster Monkey

Re:Real HPC next to your desk by Anonymous Coward · 2015-07-18 16:11 · Score: 0

I had never heard them before you posted. I took a look..I'm not all that impressed. It doesn't take much to buy a couple low end systems with the fastest processors and max the memory and throw Cloudera or Hortonworks on it. Their biggest system is $14k for four 3.2 GHZ w/ 32GB. I can get the same amount of horsepower from dell for about $5k + another couple thousand in SSDs. And only connected with gigabit.
I'm sure it is great for some university professor who has to piss away grant money but don't kid yourself about it being an "HPC" machine.

Good DIY Option by Zorlon · 2015-07-18 10:10 · Score: 1

I've used supermicro in the past. Very cost effective if you like to assemble your own systems. I have no financial relationship. This is just an unsolicited testimonial. http://www.supermicro.com/prod...

--
- Things are the way they are because they're coded that way -

Re:Good DIY Option by Anonymous Coward · 2015-07-18 17:12 · Score: 0

I've used supermicro in the past. Very cost effective if you like to assemble your own systems.
I have no financial relationship. This is just an unsolicited testimonial.
http://www.supermicro.com/prod...

I've used SM in the past too. They are shit. - Slashdot. fair and balanced.

Raspberry Pi cluster by ArcadeMan · 2015-07-18 10:15 · Score: 1

Because Soulskill recommends it. Fourteen thousand Raspberry Pi, in fact.

--
Get free satoshi (Bitcoin) and Dogecoins

It all comes down to the application... by Anonymous Coward · 2015-07-18 10:17 · Score: 0

The hardware for a HPC solution is defined by the application. Whilst it is possible to create a general HPC solution just by clustering, sometimes that will not be very useful. For example, in the stock market HPC solutions may involve quick protocol exchanges (High Frequency trading), thus CPUs can add latency by requiring data to come off the PCI bus. A better solution is to design custom networking cards with CPUs, GPU, or even FPGA or ASICs built directly on them and have the data pushed to the card.

If you application parallelizes well, then cheap low-end chips may be better. You may find that 200MHz with 1KB of cache will be enough to perform a unit of work and that you can by a thousand chips like that for the price of a single high end multi-core CPU.

The best approach is to attempt to identify what a 'unit of work' means to you. In HFT trading, a unit of work will be creating creating a trade request, or analyzing a market position, etc. In a scientific scenario, it may be the comparison of data to a signature of a particle. Once you have identified all your 'units of work', its a matter of crunching numbers using different architecture type to maximize throughput. Each architecture will have a dollar cost, select the one that you can afford.

Its not really rocket science, but the devil can be in the details. Obscure things at the hardware level, such as seldom used OP codes, or optimized registers, obscure bugs, etc., can change a picture dramatically. So, when developing such a solution, knowing the hardware intimately and being able to cut through all the marketing bull to be able to do a side-by-side comparison on equal terms is critical. Given the amount of hardware in the market, that's a very tall order and there a limited number of tech people with such skill.

Core considerations by whoever57 · 2015-07-18 10:22 · Score: 1

And what is the better choice here? 16-core Opterons at 2.6 GHz, 8-core Xeons at 3.4 GHz? Are power and thermals limiting factors here? (A full rack cupboard would consume something like 25 kW, it seems?) There seems to be precious little straightforward information about this on the net.

There is another factor to consider. If you ever license software that is priced using a per-core model (for example LSF), you will find a great advantage in going with the Intel solution.

--
The real "Libtards" are the Libertarians!

Re:Core considerations by Cassini2 · 2015-07-19 04:02 · Score: 1

Per core or Per CPU software pricing can dominate the cost calculation. We have a CFD application, and we were considering boosting the hardware. One look at the software costs discouraged us.
A costly complication is that 3-D CFD (or FEA), is an O(n^3) problem. Doubling the mesh density means 2^3=8 times the CPU time. An increase of 10 times in the mesh density requires 10^3=1000 times the CPU time. If you are pushing the extreme, small changes in the mesh density have significant cost impacts.
It makes me wonder how many research groups are paying full-cost for the commercial CFD packages. Many universities, some quasi-government labs, and many small startups will not have the money for the full-price commercial packages.

What CFD software, Interconnect, Storage.... by jason.stover · 2015-07-18 10:31 · Score: 1

Really... it depends. What software are you using (starccm+? openfoam? custom + mpi?)? Planning on InfiniBand interconnect (if so QDR? FDR? ... DDR I guess?)? 10GbE? What about storage for the cluster file system? Lustre? NFS? How much sustained IOPs are you expecting to need? How much RAM per Core / or overall RAM is required for your application? Without more info it's impossible to give a good answer. -J PS. Not advertising, but I do actually work for a company that sells CPU time on HPC clusters. And with what you have given ... yeah, we'd be throwing that back for more info. There isn't enough to make any sort of determination. Can _guess_ ... but that's all it'd be is a guess.

*LOTS* of info on the net by gavron · 2015-07-18 10:41 · Score: 2

The problem is that you don't know what you're looking for so you're not asking the right questions.

- Power is a factor. You mention 25KW. Wrong units. You should look for KVA. You'll never know what the wattage is until you know the power factor (PF) and you won't know that until you populate the device with spindles and fans (which have a different PF than CPUs, GPUs, PSUs,) and then run it under load and measure.
- 25KVA is a medium rack. 35-50KVA is a dense rack. How many racks you choose to have is up to you, but the "25" number is not a good random one to shoot for. If you search for "30KVA" and "High density rack" you'll get an idea of what servers do populate such things.
- You won't be running anything of this magnitude at your deskside, unless you are in Alaska or Siberia and have no other source of heat. Also most businesses don't like running 4 30A 3-phase 208VAC to employees' desksides. Just sayin'... And again, if you're not Alaska or Siberia with an open door and window, you won't move enough air through your office to cool that beast. (Air mass is directly related to cooling, and unless you're doing dielectric-immersion cooling, the sheer amount of air requires massive fans and lots of space.)
- Two other responses said "See what your software vendor says." Software is abstracted by compilers. The real question is "how much CPU, GPU, DISK, or other IO does it do" and plan for that. That will also change the PF and the KW and the heat load.

There's a reason nobody builds deskside compute servers with today's technology. Density, power, and cooling.

Keywords to google: KVA PF KW, high density rack server, PUE (PUE is the inverse of PF and is applied to an entire data center which includes cooling.)
Other places to look: look up abstracts for talks at Data Center World.

Re:*LOTS* of info on the net by Anonymous Coward · 2015-07-18 11:50 · Score: 0

> There's a reason nobody builds deskside compute servers with today's technology.
Eduardo Snowdenus leaked the NSA has push-cart mounted, water cooled supercomputers that can be requested by the deskside, in case a crypto-cracking analyst requires more power.
Re:*LOTS* of info on the net by Anonymous Coward · 2015-07-18 16:20 · Score: 0

35-50 KVA as a "dense" rack? If by dense you mean cheap commercial stuff bolted together and all air cooled then I guess so. Open Compute is pushing the upper end of that limit and guys like Cray, SGI and HP are well beyond for semi custom systems.
Re:*LOTS* of info on the net by Thumper_SVX · 2015-07-19 02:52 · Score: 1

There's a reason nobody builds deskside compute servers with today's technology. Density, power, and cooling.
And the fact that a deskside system is highly unlikely to be utilized 100% of the time... probably more like 10% of the time. In that case it's more cost-effective to farm it out to a bigger cluster in a server room, or run it in some AWS/Azure nodes for the time it needs and then shut it down.
The fact that there are many more high performance computing resources available relatively cheaply is as good a reason as any not to do deskside compute on a large scale.
Re:*LOTS* of info on the net by pnutjam · 2015-07-20 03:12 · Score: 1

I currently have three HPC clusters I manage. Those things definitely move some air. If I open the mesh rack door, without holding it, the air flow will slam it into me pretty hard. It's just a light door, but it's full of holes and the air still pushes it hard.

--
Cheap storage VM.

Easy by nospam007 · 2015-07-18 10:53 · Score: 1

"Or is this folly, best left to experts? "

Since you don't seem to be or have experts nor money, it's obviously folly.

project destined for disaster by bloodhawk · 2015-07-18 11:02 · Score: 1

I regularly work with large corp's and governments building large HPC, mainframe replacements, large clusters and you appear to be falling into the same trap a lot of them do. As others have said, it isn't the hardware, anyone that tells you what is best based on your summary doesn't have a clue as it is all about the SOFTWARE. I recently watched an organisation spend the best part of a million dollars on high core count machines only to then find the app doesn't perform well in parallel and in fact scales much better on high clocked low core count machines that would have cost them less than a quarter of the price. Memory, Core count, machine architecture, core clock speed are all essential items that can only truly be determined by good application profiling, absent that you are just pissing money in the wind. PS: Don't build these sort of machines yourselves, the amount of integration testing and trouble you can run up against with these type of configs costs you more than you can save by not going with a recognised vendor.

Re:project destined for disaster by pnutjam · 2015-07-20 03:15 · Score: 1

There are so many components to manage, working with individual vendors, who will just blame each other, is a nightmare. You definitely need to have a comprehensive vendor who has the clout to manage this. HPC clusters that are used, require constant attention.

--
Cheap storage VM.

Re:I advise against it by lenart · 2015-07-18 11:17 · Score: 3, Funny

Save your money and use it to move somewhere without Fag Marriage.

You can marry a cigarette where you live?

Finer detailed bang for the buck? by Anonymous Coward · 2015-07-18 11:21 · Score: 0

CFD/FEA is full of repetitive "simple" math.
- CPU's suck at repetitive "simple" math.
- Computation rate is I/O limited. Feeding data takes time.

Cluster communications are constrained by network speed.
"Bang for buck" = cost per calculation/speed.
CPU from 1 "step" back = 1/3 the cost of current cpu, 80% of performance.
Memory from 1 "step" back = 3/4 the cost of current memory, 80% of performance.
Coherent hardware architecture = optimized OS images, minimized kernel size, easy part swaps, etc.

If I was building another "local" cluster: (my last build in parentheses)
- Use the most stable and reasonably affordable multi-core supporting motherboard from a year ago. (Asrock)
- The most powerful cpu available for it at the time it was designed. (6 core AMD)
- As much memory as reasonable (16 Gigs of DDR3, but motherboard can handle 32 gigs)
- Heatsinks and cooling solutions for 2x TDP (Didn't trust water, went with Coolermaster and Thermaltake)
- A pair of reasonably large OpenCL/CUDA capable video cards (Nvidia) in each motherboard. Compile your stuff to use the GPUs (openfoam/opencascade)
- The fastest (and least CPU load) network card available (Intel Pro's)
- The fastest network switch/hubs available that support the network cards (Linksys and Alcatel)
- Use an OS that actually supports HPC and massively parallel computation. (Linux, duh.)
- "Live" data [Analytical data] storage should be on some form of SSD or DDR drive. (Seagate and 1 gig virtual drive in system memory)
- "Dead" data [OS, completed calculation data, etc] on rotating platters - (HP)

If you are building a HPC, build for stability, ease of management, and cost. A "small" or "government contract" HPC can go with the big, hot, and expensive iron - because there's no real budget involved. If you want something that works and doesn't require its weight in gold to pay for it, you need to take a step back (literally and figuratively) from the "bleeding edge" and start looking for parts/pieces at the "starting to get cut" level of performance. It'll save you about $2000 per box in your cluster.

Re:Finer detailed bang for the buck? by pnutjam · 2015-07-20 03:16 · Score: 1

There is some truth in this, but it will really hurt you in ongoing support and any upgrade paths. Although, most HPC's should be run in a silo'ed environment and replaced rather then upgraded piecemeal.

--
Cheap storage VM.

Red Barn by Theovon · 2015-07-18 11:29 · Score: 1

Got two Ivy Bridge dual-socket 12-core Xeon boxes a couple of years ago. I called up Red Barn. They helped me figure out what hardware would give me more bang for my buck (two dual-socket Ivy Bridge blades got me more cores than one Sandy Bridge with four sockets), built it up for me, installed the OS, and delivered it. Smooth as butter. IIRC, the whole deal cost me around $24000, for one compute/server node and one compute node. For $50K, if prices have scaled similarly to Haswell Xeons by now, you'd probably get that and another three compute notes. (Also, you'd probably get more cores -- IIRC, at one point we were expecting like 15-core Haswell Xeons to come out, but I haven't kept track.)

You need to gather more info by excelsior_gr · 2015-07-18 11:47 · Score: 1

You need to look into the problem at hand more closely! The software plays a very important role. Perhaps it can benefit more from a GPU cluster rather than a CPU cluster? Can it benefit from the instruction set of the latest Xeons or will the older (and now cheaper) generation suffice? CFD simulations are quite memory-hungry, so 3 GB per core is pretty standard. Also, you need to make sure that the cores can talk to the RAM efficiently, so definitely pick a CPU with 4 memory channels. After 6 cores per cpu or so the communication between the cores and the RAM becomes the bottleneck, so don't stack too many cores on the chip. Dual socket motherboards and CPU combos are also pretty cost effective. Also, users tend to suck the performance advantage of such a machine quite rapidly, so you shouls also plan for the future.

Speak with vendors by Anonymous Coward · 2015-07-18 11:56 · Score: 0

I work for for Lenovo's HPC group. We deal with these sorts of things all the time, and that amount of budget can get you far, but the specifics depend a lot (Xeon or accelerator centric compute, whether a high speed fabric is useful or not, etc.

Right now it seems *usually* the default answer for this sort of thing is dual socket E5 systems, sometimes with nothing, sometimes with nVidia, sometimes with Phi.

You should speak with vendors to get quotes and guidance. Obviously I would be happier if you sought out Lenovo or a business partner (http://shop.lenovo.com/us/en/systems/solutions/hpc/#tab-hpc_industry_solutions), but you should get multiple bidders (HP, Dell, and Cray are all possibilities, and if in Europe, you could add Bull to the list of competent vendors). If you mention your specifc country, might be able to suggest a local company that you could talk to pretty easily (though I only know Lenovo business partners, others may mention other relevant companies).

GPUs? by Chewbacon · 2015-07-18 12:02 · Score: 1

Was just reading about a 25 GPU cluster for brute forcing passwords. You can use them for supercomputing too. You could probably homebrew one with used equipment and save some cash. Anyway, here is some inspiration: http://arstechnica.com/securit...

--
Chewbacon
The Bible is like Wikipedia: written by a bunch of people and verifiable by questionable sources.

If you go Supermicro by Khyber · 2015-07-18 12:28 · Score: 1

They will build the system for you. In fact, they require that they build the system for you in order for you to get ANY warranty service.

It's worth it. For just a bit over $100K you can have a 192 thread 1TB RAM 12GPU quad-cluster loaded with 24 SSDs.

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.

Re:If you go Supermicro by Khyber · 2015-07-18 12:31 · Score: 1

For reference: A system from them I had made.
http://i.imgur.com/d4gPjNM.png
And actually that's just over $91K
So even less than what I was saying initially.

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Re:If you go Supermicro by Khyber · 2015-07-18 12:32 · Score: 1

Also, go with Xeons. AMD has been lagging behind and HBM isn't going to help lose the gap very much.

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.

Obligatory car analogy by wheelbarrio · 2015-07-18 12:38 · Score: 1

You have asked "What is the best car I could buy? Also, should I build it myself or get one from the showroom?"

As many other posts here suggest, the first question is kind of meaningless without knowing what you want to do with said car. Is it for trips around town? To carry 7 kids? A lean mean street-fightin' machine?

As for the second question, if your budget is $50k, then I suggest neither. You cannot (should not try to) build a general-purpose HPC solution and its infrastructure for that kind of money. If your use-case is not heavily dependent on high-bandwidth data transfer then definitely consider AWS/Google compute/Azure. If you have a very specialized use-case, perhaps a single compute job that was trivially parallelizable with little or no I/O, you could probably put something together for $50k and run it under a desk. But general-purpose HPC is not just a bunch of server units. A high-speed switch between your compute nodes alone could cost that much. A very basic chassis from Dell suitable as a compute node costs around $5k. Stuff it full of memory, 2 or 4 xeons, GPUs if you need them, fast local scratch disk, redundant 10GB network connects... again, you're looking well north of $25k per unit. Not to mention, as others have, you need a climate-controlled room with abundant, reliable and redundant power to put the thing in.

Re: Obligatory car analogy by Anonymous Coward · 2015-07-18 20:00 · Score: 0

If your budget is $50k, buy a small ready made thing from a tier 2 vendor. In the US typical names include Aspen, Atipa, Microway, Penguin. This work is their spot - turnkey solutions - show up with something and it works. It is only quick and easy when you have done this before. They will also know what to recommend hardware-wise.
Yes, they will charge a premium but you will spend many days of your time making this work otherwise. That costs.
The cost of getting it wrong will be in the output. If you build it wrong - or buy the wrong system - you could easily leave 25-50% of the performance of the system untapped. The wrong servers, bad network configuration, bad filesystem set up, etc. If you are using commercial CFD software, then you are not only paying for wasted CPU cycles but for software that can't run at its best.
That wasted percentage could be spent on getting the right system.
NB. most commercial CFD charge per core - they also get less efficient at higher core counts - so fewer more powerful cores is generally the right thing to do. Intel Xeon is dominant for a reason..

Useful links by LStewart2 · 2015-07-18 12:46 · Score: 1

You might look at http://limulus.basement-superc... for a concrete example of the sort of system you are talking about. Also, http://www.beowulf.org/ is the home of the community of people who build compute clusters from any old hardware and run open source software on it.

Rent, not buy at first by Anonymous Coward · 2015-07-18 12:53 · Score: 0

Ansys sells HPC licenses that supports Amazon cloud and may be Azure and Google too. From what I understand your local machine just runs license server and user interface. It connects to the HPC on the cloud. Images of OS with the software preinstalled are provisioned for your account. Unless you are writing your own FEA, this might get you what you want for a lower initial cost. If the usage justifies it, then you can think of buying one.

Typical FEA developer in StarCCM, Ansys or Abaqus or a similar company now a days uses 40 processor, 128 GB machine with a 0.5 TB solid state disk and a couple of 2 TB scsi disks. Usually two 28 inch 1920x1200 displays. The QA/benchmark/marketing teams typically have access to 128 node or a 256 node cluster, each with about 4GB memory. Some GPU clusters with 16 or 32 graphics cards are also in use.

For smaller companies it is better to taste HPC on rent before committing to buy/build a cluster in house.

Disclaimer: I work for one of these companies, as a developer. So posting anonymously.

Re:Rent, not buy at first by pnutjam · 2015-07-20 03:21 · Score: 1

Yeah, this way he can burn through his $50k in a month or two.

--
Cheap storage VM.

Parallella board cluster from Adapteva by Anonymous Coward · 2015-07-18 13:29 · Score: 0

Depending on the scope would the Parallella board cluster from Adapteva meet the requirement? See their site http://www.adapteva.com/parallella-board/ for more info.

Re:Parallella board cluster from Adapteva by AlabamaCajun · 2015-07-18 16:37 · Score: 1

Time to bump an AC for a good solution.
At 100 to 150 US Dollars the parallella provides 18 cores per Raspberry pi sized board.
Run them with the deadhead distro and use a small Linux PC for the head.
Speed is impressive for the cost.

SuperMicro is fine by guruevi · 2015-07-18 13:33 · Score: 1

If you're pinching the dollars, there is little better you can do than save money and build it yourself (given that there is no extra labor cost incurred for doing so). Many shops will also assemble a system for ~$250-500 and another $250-1500 for a 3-5 year warranty (depending on the cost of the system etc). You could also go with IBM/HP/Oracle and lease the bunch. It's really up to you what type of support contract you require (generally you won't need it). You could calculate the Amazon/"shared hosting" aka cloud route but I figured out (in my instance) that unless you need the thing less than ~30% of the time in a year, you are better out doing it yourself.

I am currently running several SuperMicro GPU workstations (4U with 4 GPU's each) and several more SuperMicro 1U/2U servers (storage). They are run-of-the-mill and really nothing anyone can improve upon, especially cost (and I've had several salesmen concede that point, most recently Nexenta). Off course if you don't know anything about it, it might be worth having a support/management contract.

Also a rack drawing 25kW peak does not mean you require 25kW of power. IF you run at more than 70% of your peak power as rated by the power supplies, you are running dangerously close to the limits. I run a half rack of storage and a GPU server and it draws a consistent 20-30A@230V (that's 4-6kW) although it's peak power based on power supply alone is closer to 15. The power supplies are really intended for the worst-case scenarios (8x 3.5" HDD, 100% CPU usage, 100% RAM slots filled, 4 5.25" devices, 4 GPU and several other add-ons).

As far as Xeon/Opteron, you would really have to benchmark them both against your application. For some loads one outperforms the other and although AMD is generally cheaper per performance unit, Intel may allow you to get more out of a single machine.

--
Custom electronics and digital signage for your business: www.evcircuits.com

Parallel benchmark website for FE by Anonymous Coward · 2015-07-18 13:48 · Score: 0

If you want to see how a commercial parallel FE code scales with different HPC configurations, go to www.topcrunch.org. It reports benchmarks of different hardware vendors, CPUs, interconnects etc for the commercial FE code LS-DYNA (www.lstc.com), which is used heavily for simulating car crashes for meeting automobile safety requirements. Their typical user has between 100 to 1000 cores, and the benchmarks reflect it. The site allows some basic searching and sorting, nothing terribly fancy but you'll learn a lot in an hour.

As far commercial CFD codes: if they use block regular meshes, then they will scale almost perfectly with the number of processors. If they use an unstructured mesh, the scaling will eventually plateau like FE codes with their unstructured meshes.

Yet another voice of experience by ihtoit · 2015-07-18 14:53 · Score: 1

Find a consultant and talk to him about precisely what software you want to run, so he can work out as future-proof a solution as possible for you.

There is no way I can recommend hardware without this basic information. Do you want to run a Quake server? Video streaming? Virtualisation? HD encoding? CGI rendering? Studio or OB relay broadcasting? Teleconferencing? CRM? Wiki/bulletin board/IRC/SL? Multi-point data aggregation? Physics simulations? All have specific and wildly different hardware requirements. Hell, I couldn't recommend a processor architecture at this juncture.

--
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel

Re:Yet another voice of experience by Anonymous Coward · 2015-07-18 20:32 · Score: 0

The *first* line of the summary say the OP is doing FEA/CFD.
RTFS.
Re:Yet another voice of experience by ihtoit · 2015-07-19 05:47 · Score: 1

what, finite element analysis and computational fluid dynamics? Finite difference methods, finite element methods, finite volume methods, polynomial fitting, spectral methods, boundary element methods, iterated function systems...? Which? All? They each have different requirements, and you're talking about Big Iron to deal with them all - not something you're ordering from Dell using their web configurator.

--
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel

Re:I advise against it by funwithBSD · 2015-07-18 15:08 · Score: 1

Or a bundle of sticks.

--
Never answer an anonymous letter. - Yogi Berra

GPU by witherstaff · 2015-07-18 15:36 · Score: 1

If you're considering GPU look through any bitcoin mining forum. Setting up a reliable GPU farm is a nice tech challenge if you really want to grow your own. A few years ago I ran out of power at 50 GPUs, they're hogs. Heat is a whole other problem as the previous posted made note of. For a business use you may want to just use AWS or GPU appliances.

Re:I advise against it by Anonymous Coward · 2015-07-18 17:06 · Score: 0

man, that sounds so gay! Listen article submitter, I suggest you stop reading now, it's not going to get any better. Go it to buy an arduino or something, and leave this HPC shit to people that know how to cover there ass.

Silicon Mechanics! by Ydna · 2015-07-18 17:14 · Score: 1

http://www.siliconmechanics.co...

They take all that commodity hardware and figured out how to make nice high-density systems for you. Stop trying to figure out how to do it yourself. They've done the hard work and (in my personal opinion) have absolutely outstanding support (before and after the sale). I'm just a happy customer.

--

"The great thing about multitasking is that several things can go wrong at once." -me

Re:Silicon Mechanics! by iggymanz · 2015-07-18 18:04 · Score: 1

indeed the smart solution is to pay for services, buying anything will just mean solution will be totally obsolete in 18 months
Re:Silicon Mechanics! by Anonymous Coward · 2015-07-18 20:01 · Score: 0

what's wrong with obsolete? Services are obsolete after you pay the bill for that month. If you don't pay next month, you have no services.
Better to debate cost for requirements. It's always possible to pay more for more. Of course, if you don't have certain knowledge or skills, you pay for that, or learn it. There are many more requirements to think about, other than "can I run the software I want". If you can't think of the requirements, then you either need to accept "discovering them" on the fly, or pay for someone who will walk you through the requirements you may be assuming.

Bittersweet by Anonymous Coward · 2015-07-18 20:33 · Score: 0

This is the kind of asshole who's replacing seasoned sysadmins these days. Glad I moved on from IT and somewhat amused that companies are now reaping the reward of their fresh, cheap, easily-exploited NOC monkeys, who have to resort to panicked online pleas to help them out of the holes they've dug for themselves.

Good luck. You and your employer deserve each other.

Well ... it depends ... by golodh · 2015-07-18 21:19 · Score: 1

Sorry, but it really does. The right answer depends on you and your application. And previous posts to this effect are right: start with the software that solves your problem.

You see, if you (the one who posted the question) were a numerical mathematician or a computational physicist and looking for adequate performance in a research setting at rock-bottom cost, I'd say:have a look at GPU's (see e.g. here http://www.nvidia.com/object/c... ) and e.g. the Navier Stokes solver from Stanford U. (see here: http://mc.stanford.edu/cgi-bin... ). For such applications, hardware based on GPU's tends to give a much better price/performance ratio than rigs based on CPU's. But they're s lot harder to use as well. So find a suitable solver and see what hardware makes it shine.

But you probably aren't, or you'd have known that already (or looked it up in the literature or figured it out yourself).

Err ... if you are (as far as I can make out) just a computing guy who doesn't know Navier from Stokes but wants to put a "FEA/CFD rig" together, I think you're simply not the right person to do that. Yes, you're probably capable a few PCB's together that can run a generic Matlab-based solver, and will then find that it gives you an abysmal price-performance ratio on your particular workload.

And why? Well, the problem with "supercomputers" is: they're much more powerful than a general purpose computer only on very *specific^ problems. Change the problem and watch the performance change as well.

So you've got to tune your hardware to your problem in order to get realy good price-performance. And your "problem" is your solver. You need to choose that as well. And for that you need yo understand a bit about what a solver does relative to the problem you really want to solve. And it sounds as if you haven't a clue. Sorry.

The alternative is to let other people (consultants, vendors) do the thinking, and buy a custom solution. That will work, and will give you reasonable (but not great) price-performance ratios at a reasonable price level. But make darned sure that your hardware-software combinations is a good fit.

My suggestion: talk to your team member who knows what the formulas look like, what solver you're going to be using, whether low-accuracy is acceptable (e.g. if the objective is to obtain a graphical solution) or whether high accuracy is a must (e.g. for engineering purposes).

If low accuracy is acceptable but you want the best speed, think GPU's. If not ... determine what the particularities of your problem instances are, what your solver grid will look like, and what type of computing resources you'd need for that and how much.

Simply saying "a CFD problem" isn't nearly specific enough. And getting to a hardware configuration that has truly good price/performance levels is something for a specialist (or a team effort).

Buy last year's stuff by YoungManKlaus · 2015-07-18 21:37 · Score: 1

99% of the time, there is no real usable improvement between the last years and this years model (i.e. maybe in the 10% range if at all). But you can get last years version for half (or less) of the introduction price which usually is way lower than the current versions price.

Efficient software by Anonymous Coward · 2015-07-18 22:09 · Score: 0

There can easily be a factor of 10,000 difference between vector processing programs. So a craplication on Windows may very well need a cluster, while a better program may do the same on a laptop machine with Fedora.

Depends on a lot of things by dbIII · 2015-07-18 23:11 · Score: 1

Depends on a lot of things - if you have something massively parallel to do then core per $ is going to matter and speed is nowhere near as important. For other things speed is going to matter so you'll probably end up with a mix.
Memory is bound to be something that will drive the design. Do you want a LOT of shared memory (which means a few huge machines) or can you parcel it out to nodes on a much cheaper cluster? A network is NEVER fast enough for some things when the alternative is a big pool of memory.
If you are running commercial software your setup is going to be dictated by their licence conditions more than effectiveness at running their software - I went from an effective cluster to a small number of individual 64 core machines, with a slower speed, due to the licence changing to a per host model. A cluster full of the fastest eight core Xeons could get stuff done in half the time of some 64 core machines but at eight times the licence cost the difference in price with some software could pay for a few 64 core machines per year.
GPUs are nice if your problem can actually use them, sadly they still don't have the memory for a lot of things.

Ah - the "why is the sky blue" question by dbIII · 2015-07-18 23:26 · Score: 1

Seems like mainly a way of avoiding the real question.

A good analogy of what is going on here is if the question was "why is the sky blue" and you have answered "dust" while the other has mentioned rayleigh scattering and a variety of other factors.

While "get some servers" is correct it's not exactly a useful answer is it? The above poster is right, for some stuff you want speed and for others you want as many cores as you can afford and don't give a shit about the speed, and without knowing what the submitter's software runs best on the choice is not clear.

Re:Ah - the "why is the sky blue" question by Tough+Love · 2015-07-19 07:12 · Score: 1

A good analogy of what is going on here is if the question was "why is the sky blue" and you have answered "dust" while the other has mentioned rayleigh scattering and a variety of other factors.
Your analogy fell down and can't get up. The OP asked about cost effective hardware for fluid dynamics. You wrote some poetry that doesn't even rhyme.

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Ah - the "why is the sky blue" question by dbIII · 2015-07-19 21:52 · Score: 1

With respect, the analogy was a polite way to point out that your "get some servers" was missing the point of the question entirely, which was about what type of "servers", and without knowing what sort of methods are being used to solve the fluid dynamics problems then it's hard to work out what sort of machines - eg. a few fast cores, lots of machines with fast cores or lots of slow cores in machines with a lot of memory that it can get to a lot more quickly than lots of machines with fast cores. Simple enough yet?
As an aside, the one and only time I ever saw an analog computer in use was to run a fluid dynamics simulation of the experimental rig next to it.
Re:Ah - the "why is the sky blue" question by Tough+Love · 2015-07-20 05:19 · Score: 1

..the analogy was a polite way to point out that your "get some servers" was missing the point of the question entirely, which was about what type of "servers"...
You don't need to tell anybody what the question was, it was plainly stated, e.g., "Is it even reasonable to order $50k worth of components and put together our own high-performance, reasonably-priced blade cluster?"
Maybe try answering that instead of twisting more. Not sure why you're putting in so much energy trying to find reasons to be irrelevant. For example: "there's no such thing as a commodity blade."

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Ah - the "why is the sky blue" question by dbIII · 2015-07-20 12:20 · Score: 1

Perhaps you should try reading the part of my post starting with "without knowing what sort of methods are being used to solve the fluid dynamics problems" and you'll get some idea about what everyone else is discussing and why "just use a computer" is not an answer of any value.
Re:Ah - the "why is the sky blue" question by dbIII · 2015-07-20 12:22 · Score: 1

For example: "there's no such thing as a commodity blade."
Who are you quoting and what does it have to do with my post or even the topic?

Re:I advise against it by KGIII · 2015-07-18 23:53 · Score: 1

...leave this HPC shit to people that know how to cover there ass.

You know, this HPC stuff really does not require the gay. That is optional.

--
"So long and thanks for all the fish."

It is getting impossible to read /. by Anonymous Coward · 2015-07-19 00:13 · Score: 0

The page keeps jumping back to the top of the thread.

Might be the some sort of auto page refresh is brain dead.

Did it 3^H4 times while trping this.

here is what I do by Anonymous Coward · 2015-07-19 00:23 · Score: 0

I have quad socket (supermicro/tyan) motherboards with 16 core opterons on them (64 cores total) and 128 - 256 GB of ram.
If you just need to run within one node (fast networking not needed), this is probably the most cost effective way to do it. AMD's are much cheaper than intel. My openmp code (which is not far from CFD) scales fairly well to all 64 cores. AMD is reportedly also coming up with new CPUs
this or early next year. For good openmp performance, I found that the following will give the best performance with gcc:
OMP_SCHEDULE auto
OMP_PROC_BIND TRUE
OMP_WAIT_POLICY ACTIVE

Tyan boards are fairly good but their customer support sucks! I didn't have any issues with supermicro, so I don't know anything about their support (probably equally bad).

But, as others have said, will depend somewhat on your application.

What about the interconnect? by Terje+Mathisen · 2015-07-19 02:10 · Score: 1

In pretty much every HPC cluster I've seen or been personally involved with (mostly oil/seismic processing or crash simulations), the type of CPU is only one of the cost drivers!

Typically you end up spending about as much on fast interconnects as you do on motherboards/cpus/ram etc. The main exception to this rule is when you have an embarrassingly parallelizable workload, with small memory footprint and no need for cross-system communication, i.e. like a Monte Carlo simulation or password cracking.

For oil we used the largest single-image NUMA/SMP machine we could get at the time, this machine did the initial gridding of the problem space, then a standard cluster of 1K dual-cpu motherboards (i.e. 2K cpus) took over and did the main part of the actual processing.

There are exceptions though, like if you are doing Linear Programming type optimization which can be really hard to parallelize, or if you are using very expensive SW:

When you pay more for the SW than for the HW it is running on, then it makes sense to use bleeding-edge (gamer type) cpus.

Terje

--
"almost all programming can be viewed as an exercise in caching"

Just went through same thing... by Anonymous Coward · 2015-07-19 02:13 · Score: 0

Just went through the same thing for a cluster used for FEA (mechanical explicit code, around 40,000 euros).
I did not find a lot of infos either.
In the past, there were a lot of results on topcrunch.org (enough to select something in a few minutes) but it is now very limited for recent hardware.
I found some posts left and right (e.g. google Comparing Haswell Processor Models for HPC Applications from Dell) but nothing comprehensive.

The software vendor could not give me straightforward guidelines.
All hardware vendors I talked to pushed the fastest 2697v3 on 8 cores (on which they probably make great margin) on the basis of the higher frequency and memory bandwidth.

Sure. On paper it's great but for the budget, you can't buy much of it and they had no benchmarks to really quantify the benefits.
If the application scales well across nodes and licensing costs are reasonable, getting more and cheaper processors can make more sense (it depends on who pays for the electricity and AC). In the end, one vendor ran some benchmarks and I went with more 2640v3 because our licenses are virtually free (academics).
But if we had to pay for commercial licenses, I would have gone with the 2697v3.

So I don't think you will get a general answer as this really depends on your situation...
Try to get some benchmarks for similar applications and play with the numbers (licensing, electricity, performance).

Just one comment on assembling yourself. If you factor the time to search how to do it, install, configure, check performance, configure the queue etc, it can be quite expensive if you don't have help from someone doing it on a regular basis. I did one install in the past and I regret it.
I think it is more cost effective to pay someone that does it regularly and get some service for the cluster (just my 2 cents).
Look for companies selling ready to use clusters they will be able to advise you.

Good luck

Hello, OP here. by Anonymous Coward · 2015-07-19 03:10 · Score: 0

Thank you all for your valuable input! *)

I'm actually a mechanical design engineer usually tasked with doing FE analysies, using various different suites such as ANSYS. Not a system administrator.
I currently work for a small (10 ppl) company that plans a fairly high-end product, and we find that our current workstations are staring to become too weak for our needs. We operate at a small town at the edge of civilization, in Europe.

So it is not just a simple matter of lifting the phone and asking some expert to drop by and tell us what to do. It would surprise me if there are more than a dozen individuals in a 100 km radius that know any of this stuff, and all of them are likely employed by industries and not available for us. Thus, I ask whether it is sane to put together an inexpensive rack of our own, and where to start in such case.

Most of you advice that it depends on the software. Thing is, we have not decided one single software to run yet, we will likely use a few different. None of which are in the slightest CUDA/GPGPU-friendly, I'm afraid. And I don't really expect to get much usable information calling up LSTC and asking what to run their software on. I expect to get the regular answer "You preferably use a vastly overpriced and underequipped but certified rig from this VERY short list of vendors that we did a deal with about this very type of thing." At least that is the usual case when you speak of CAD workstations.

It does sound interesting turning to the cloud, it may not be allowed though since our clients may have strong opinions on the IP security aspect. I'll look into it.

Some of you posted links to various companies that offer this type of products. Even though they in particular are not available on our market, that is valuable information because it shows the approximate configurations and cost levels we can expect.

If we roll out this investment, we will certainly need a skilled administrator to coddle the rig. Rest assured that we do not expect my humble self to be able to set up this system. I'm mainly perusing the options available.

Friendly, /Some Guy

* - Except the gay haters. Take a long walk off a short pier.

Re: Hello, OP here. by Anonymous Coward · 2015-07-19 07:51 · Score: 0

Hi OP - even in the sparsest areas of Europe, you can probably get something from a national tier 2, or perhaps one that serves more of Europe like Clustervision. I don't know all the vendors serving every country, but maybe you could call up your nearest university HPC group (often centralized service) or their CFD group and ask for recommended suppliers - they might be flattered to be asked People like ANSYS would also be able to advise - it is in their interests to have good well installed systems running their software, it cuts down the support tickets..

Spend money on optimization instead by Anonymous Coward · 2015-07-19 03:29 · Score: 0

I work as performance consultant and in _all_ cases I worked on hardware costs could be reduced by 50%-95% after software or process optimization. Before you spend dozens of thousands of dollars on hardware hire someone to review your hardware, software, setup and process. It's trivial example but a real one: I had a client who was going to buy cluster of four web servers and two MySQL servers (in master/slave setup), but it turned out that I/O was the bottleneck and all he needed was $200 SSD drive. Also this: http://www.commitstrip.com/en/2015/07/08/true-story-fixing-a-self-ddos/

The machines may need to be heterogeneous by Marrow · 2015-07-19 04:41 · Score: 1

You may need a (screamin) front-end machine that splits up the work and hands it off to multiple multi-core machines. These multi-core machines may only be available at lower clock-speeds.

Dont just "look at your application". But look at what parts of your application are subject to parallelism and what parts must stay single threaded. You may need a special single-thread machine that can keep the other ones fed.

ClusterVision by Anonymous Coward · 2015-07-19 08:21 · Score: 0

I work for this company:
http://clustervision.com/
and we do custom designs of HPC clusters based on your needs.
Feel free to contact us!

Here is what I did... by Taz1672 · 2015-07-19 14:06 · Score: 2

My company decided it wanted a new FEA machine. They decided to stay with the existing software company, so I called up the company and explained the situation and asked for the department that provided pre-sales support, specifically hardware recommendations. Turns out they had a strong bench of people ready to help with that and detailed Known Good configurations for each major hardware company. We simply decided looked at the software licensing costs, the hardware costs and how long our average scenario would take under various software/hardware configs and sized it to handled the existing number of jobs plus expected growth.

We decided what we could live with in terms of how long an average job would take (we decided we could live with 24 hours as an average). We then decided what sort of tradeoffs we could make in terms of hardware (an up front sunk cost amortized over many years) versus the annual software license fees. A little more spent on hardware up front meant we could save on software licensing costs by taking a step down in numbers of processors permitted. We then presented this decision to their presales people to get it vetted and asked for suggestions. In our case we took the suggestion that we archive saved results to our enterprise grade disk array and put the money into Raid 0 SSD's to speed up the overall job time. As always, RAM was the cheapest upgrade so we maxed that out.

Everyone signed off, we took the specs to a local system builder with a good reputation, told them no changes of ANY sort to any component, negotiated a price that included acceptance testing to ensure compliance and made sure they had enough of a profit margin so as to discourage shortcuts. They delivered, we tested, it gave expected performance results, we accepted it and paid them. That system is installed today and delivers the results it was designed for.

I would suggest a similar course for you.
1. Decide on the software first. Make darn sure it will do what you want.
2. Decide on how fast jobs need to be finished and how many per week/month/year to prevent over specing
3. Call presales support to get hardware recommendations
4. Make the decision on hardware cost versus software licensing cost versus number of jobs to be done
5. Do your homework! Understand what you are specing, talk to others, particularly customers.
6. Take your new found knowledge back to presales a few times to make sure you did not miss any improvements and you truly understand what you are doing, you are betting your job on this.
7. Find a builder, local or the hardware manufacturer, negotiate terms. Make sure you leave a decent profit margin to avoid the temptation to skimp.
8. Test, test, test. Confirm all configuration decisions with presales support.
9. Pay 'em and install the machine.
10. Don't forget to follow up to ensure it continues to work as designed and that procedures are being followed. In my case, we checked that all runs were backed up on our enterprise disk array.

Re:Here is what I did... by Taz1672 · 2015-07-19 14:13 · Score: 1

Forgot to mention that if you use the software vendors presales support team and one of their Known Good configurations support will be a LOT easier to get from them.

nVidia / Phi by Anonymous Coward · 2015-07-19 17:24 · Score: 0

If your code is easily parallel-ized, you might want to consider GPUs. Certain types of apps (Black-Scholes) perform several orders of magnitude faster on a GPU than on a CPU. And the dang things are CHEAP in terms of BIPS/$.

Depends on work load by Anonymous Coward · 2015-07-19 23:12 · Score: 0

If your work load is bounded by memory bandwidth, having more cores is not going to help.

I'm selling my own church guys, I work for Ciara. by Anonymous Coward · 2015-07-20 07:09 · Score: 0

hxxp://ciaratech.com/business-unit-1-en.html

We used to build for Cray, we have a bunch of models. Take a look on the site.
Might also want to speak with our sales rep, we custom build for customers.

Good luck, I hope we can help. Check us out.

Microway have been doing this a long time by Tesseractic · 2015-07-21 14:13 · Score: 1

You could do a lot worse than to buy one of these:

http://www.microway.com/produc...

Microway have been supplying high quality, high performance systems for decades and
they should have figured out how to do it right by now. If I had the spare money, it's what
I would choose.

For your CFD software, you might consider the open source OpenFOAM system.

Be careful with the memory subsystem to select DIMMs that will run at the maximum
rate of the system. Quad-rank DIMMs typically run slower. This may mean you can't
use the full range of 1TB dram that this system can address.

My Experience by psgodwin · 2015-07-21 15:34 · Score: 1

I was once tasked with the same scenario when I was an Engineering IT Manager for an aerospace startup. I would first ask the definition of "Bang for the Buck". Are we referring to Max TFLOPs/Hardware cost, HPC Utilization/Total cost of ownership, Value to business (Time to market, minimizing prototyping and tooling costs, material optimization) / Total cost of ownership, or something else entirely? The best bang for the buck is usually to use a managed cloud HPC provider but based on your post it sounds like you really want to build and maintain HPC. Given the ITAR nature of our business and lack of ITAR cloud vendors we had to build ours. Below was our process feel free to modify as needed. As someone previously stated, the first key considerations are workloads. Explicit vs Implicit solutions and different FEA and CFD solutions scale very differently and most will plateau on certain interconnects. There is a very large difference in architecture between running NASA Fun3D, Ansys CFD, LSDyna and Siemens Nastran and the architecture will change depending on the requirement(s) and performance goal. Once you know the target solutions, next consider interconnect requirement. Some codes do not scale well across multiple nodes eliminating the need but some codes demand extremely low latency (ie infiniband). Those codes are completely inefficient without microsecond RDMA capability. Next evaluate GPU compute compatibility. Be careful as some vendors have only partially implemented GPU compute and are only used under certain circumstances. When evaluating CPU choice, we always used Performance/Watt as the benchmark. Check spec.org for normalized performance comparisons and divide by TDP. Memory per system is a function of Solution memory size * number of concurrent jobs needed / number of nodes * ~1.25 Supermicro is by far the cheapest solution if you are going to integrate yourself. linux is the standard operating system with most RHEL derivatives supported by most software vendors Don't forget to consider storage. Most HPC systems generate many TBs of information and during a Job need high bandwidth storage access, usually shared. Phase 1 for us was to build a storage server with 10TB of SSDs and share the volume to all nodes with NFS. We dreamed of DDN but could not afford it. Connect everything together, determine MPI stack and any interconnect RDMA requirements (OFED, etc.). Install OS, configure a workload manager such as SLURM, write your submission scripts and start Testing. Plan for a very long testing cycle if you integrate yourself. Experts are hard to find and can be expensive, good luck and I hope you have a team of rock star linux gurus

Slashdot Mirror

Ask Slashdot: Best Bang-for-the-Buck HPC Solution?

150 comments