How Far Can Large Commercial Applications Scale?

It's the network! by Boone^ · 2006-04-18 12:55 · Score: 1

Unfortunately, once you use up the "local" processors you're forced to branch off to other attached machines. If the app is embarrassingly parallel, you only need 2 tin cans and a string. However, if these application rely on low latency, high bandwidth connections between processors, you're going to get greatly dimished returns by using clusters connected by 10G Ethernet or Myrinet.

Re:It's the network! by multimediavt · 2006-04-18 16:21 · Score: 3, Informative

I'm gonna go ahead and disagree with you there. The network alone is not to blame. Also, keep in mind that the latency differences between most 10GigE implementations and Myrinet are radically different especially once you get above the hardware and protocol levels. They are getting better, Force10's new 10GigE switches being good examples, but they're not that close when you put something like MPI and then a poorly implemented-algorithm wise-application on top of that. Another thing to keep in mind is that there are other interconnect technologies like Infiniband and Quadrics that may give you better performance.

The real scaling issues (in a lot of cases) are within the application itself. Some applications scale really well. I'll use scientific codes as examples. For instance, we've gotten LAMPSS (a molecular dynamics code) to scale very well across our 1024 node, 2048 processor cluster. It is capable of using the entire system to process jobs; all 2048 processors with an Infiniband interconnect and MVAPICH. However, applications like AMBER, another molecular dynamics code, don't scale at all well beyond 256 processors on our system. It's not a fault of the hardware, the network, or the message passing interface in a lot of cases. It's simply that the algorithm used in the code just doesn't scale well beyond a certain point. The code just isn't optimized well, or it just won't scale, period. There are other code bases that are being used by our researchers that do well in an SMP, shared-memory architecture, but simply won't run at all in a distributed memory, cluster architecture. Some because they require a large memory footprint, others simply because the problem the code needs to solve cannot be decomposed and spread across nodes in a cluster. As far as performance goes, we've actually seen some codes, like the quadrature code (APREC) run by David Bailey of LBL, actually achieve super-linear gains. He ran a series of jobs in his quest to do the largest one-dimensional quadrature calculation (which he achieved and published at SC04) starting with one processor and scaling to 512 nodes (1024 processors). At the 16, 64, and 256 processor range, his code actually got 17.66, 69.79, and 270.17 times speed up over a single processor, respectively. Now this is not typical behavior. Typically, you don't get this kind of speed up (usually you do see significantly lower efficiency; in the range of 15 to 20 percent in a lot of cases), and his code did fall off to 919.22 times speed up for 1024 processors. My point is, the application itself has as much impact on performance as the architecture it is being run on. And, don't forget compiler differences, but this could go on for days.

I would strongly urge the original poster to talk to the vendors that develop the software you use and simply ask them if the reason they don't make a cluster version of the software is due to economic reasons, or simply because the application just won't work in that architecture. Remember, computing is a right-tool-for-the-right-job arena. There's no single platform that will do everything for everybody.

It all depends on the applications... by georgewilliamherbert · 2006-04-18 12:57 · Score: 3, Insightful

I've run oracle on 32 processor Sun E10Ks with reasonably linear speedup from few-processor performance, back in the Solaris 7 days.

I've run (now obsolete) ATG Dynamo on the same, with similar results.

I've run Apache (1.3.x) on the same, with similar results.

I've seen applications which stopped scaling well at much less than that.

"Large business applications" isn't specific enough.

Enterprise by Procyon101 · 2006-04-18 13:00 · Score: 4, Funny

It depends on how much enterprise you have in them. Enterprise is expensive, but when added liberally you can scale to huge amounts.

I like to add a couple hundred enterprise myself.

Re:Enterprise by MBCook · 2006-04-18 13:18 · Score: 2, Funny

Luckily there are lots of examples of Enterprise quality out there. The Daily WTF has lots of great stuff. Here are two recent examples.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.

scale by hashing by pikine · 2006-04-18 13:00 · Score: 2, Insightful

I'm still studying computer science with little practical experience, but you can divide certain aspects of your application by hashing---you hash datasets or queries. This distributes the workload across a cluster of computers. However, implementing hashing requires you to make intrusive changes to your code, and maybe most companies aren't willing to do so. Hashing generally has to be implemented from the very beginning, which requires foresight. Google is the one company that does it well.

--
I once had a signature.

Re:scale by hashing by Anonymous Coward · 2006-04-18 13:09 · Score: 1, Funny

Keep studying.
Re:scale by hashing by HotNeedleOfInquiry · 2006-04-18 13:36 · Score: 1

Thanks for the coffee spew.

--
"Eve of Destruction", it's not just for old hippies anymore...
Re:scale by hashing by dgatwood · 2006-04-18 13:50 · Score: 3, Informative

There are many ways to divide up a set of queries. It all depends on what the application is, how much data sharing is needed, etc.
One way divide the data is per-user or per-group. Divide data according to its owner so that each user account is hosted on a given machine and has first-class access to his/her own data and his/her group's data, but second-class (network-based) access to everyone else's data.
Another way, as you mention, is to do hashing based on some well-defined key, but for this to be useful requires that the front end be thoroughly abstracted from the back end so that multiple front ends share multiple back end stores. Otherwise, you are probably just moving the bottleneck around. It also requires that this key be known in advance, which means that it doesn't generally work well if, for example, you need to do a join on two tables and one of those tables is scattered across multiple machines. The only way that it would work for such use would be if either the key being used for the join is the hashed key or if each machine has a table index that spans multiple machines' content, at which point, you are going to have cache coherency problems.
Which brings us to a fairly nice compromise solution: a replicated database with each of the outer-ring database servers being read-only caches with some sort of built-in cache consistency protocol, and the central database accepting write queries from clients, but with all the read queries directed to the outer ring. Makes for seriously scalable database access.
This, of course, assumes that the app in question is a front-end for a database. If you're doing some other sort of application, then all bets are off. Give us more information.

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:scale by hashing by theonetruekeebler · 2006-04-19 02:13 · Score: 1

Hashing is a good way to split data up, but now that it's spread across N nodes, you will have a hell of a time joining it back up again. Your reporting queries are now a nightmare of joins and unions.
Say you have a bunch of products, each sold by a different department of your business. So split your data based on a hash on the department, to keep all of a department's information together. Later, a business decision consolidates two departments, or splits a department, or moves a whole slew of products from one department to another. You now have to address each of these issues.
Not to mention keeping certain tables synchronized across all nodes, such as application-wide lookups.
And also not to mention that you now have N, rather than 1, tasks to perform with each update to the system or the application. Scripting will help this, but if one of your instances gets out of sync, or you succumb to the temptation to customize this one node...
It all gets very ugly very quickly. A number of database products support both clustering and replication. Much of this happens behind the scenes: You set it up when you design the schemas, and the application is none the wiser.

--
This is not my sandwich.

It may seem offtopic.... by riprjak · 2006-04-18 13:05 · Score: 2, Interesting

...but have you considered trying to contact the EVE-Online developers at CCP.

Their game is little more than a MASSIVE database application supporting tens of thousands of simultaneous users... They have lag issues but, on the whole, seem to be scaling bloody well.

Re:It may seem offtopic.... by Anonymous Coward · 2006-04-18 13:09 · Score: 2, Funny

Or better yet, call Blizzard and ask for tips about scalability and reliability. Then do the opposite.
Re:It may seem offtopic.... by Anonymous Coward · 2006-04-18 17:13 · Score: 0

I don't know if that works.... If you have to get 20 things right for it not to suck, but missing any one makes it really suck, it could be that Blizzard has as many as 19 things right. So if you do the opposite, you could be doing as little as 1 thing right.

Bring me that drawing board again.
Re:It may seem offtopic.... by AtlanticGiraffe · 2006-04-18 22:02 · Score: 1

They're running something like 20-40 dual-cpu windows boxes, one thread per cpu, scheduled using "tasklets" in Stackless Python. Additionally, they have a central database machine with an huge external hard drive enclosure. It doesn't contain hard drives though, it contains RAM.
Re:It may seem offtopic.... by Jaruzel · 2006-04-18 23:25 · Score: 1

Wow. Do you have a link to back that info up ? As an indie MMORPG developer, I'm quite interested in stuff like this.

Cheers,

-Jar.

--
Together, We Can Make Slashdot Better. I Do NOT Mod ACs. - Check Me Out
Re:It may seem offtopic.... by ahodgson · 2006-04-19 10:08 · Score: 1

I'm pretty sure the Eve guys bought one of these: http://www.superssd.com/products/ramsan-400/
Re:It may seem offtopic.... by mute47 · 2006-04-19 10:43 · Score: 1

Solid state disks and other servers

--
Don't mind me, I'm just carping the diem...

yes by larry+bagina · 2006-04-18 13:14 · Score: 2, Interesting

I did some freelance work a few years back for a client. They were converting some custom inhouse applications from a 64 processor Cray Superserver 6400 to a cluster-based approach. I can't comment on what they were doing, but they needed all the ram and cycles they could get ahold of.

Anyhow, they started out on a 4-way machine and had scaled up to the 64-way without many code changes. If it had been cost effective, they would have kept on scaling upwards.

--
Do you even lift?

These aren't the 'roids you're looking for.

Answers: by Anonymous Coward · 2006-04-18 13:17 · Score: 0

Answers:

Yes. 16 processors, 48 gigs of ram. Scales well.

Are there any... real questions? / burns

Vague question... Vague answers by subreality · 2006-04-18 13:21 · Score: 4, Insightful

Different problems in computer science scale differently. You haven't given us enough data to really know what problem you're solving, so you're really not going to get a reasonable answer.

I work for a company that has a large commercial application. We knew we needed to scale our data set and processing power to be huge, so we made sure from the start that the heavy lifting could be divided into little chunks, and thrown to the cluster. For our purposes, back end scalability is basically linear. When we need more, we just bring another rack of little 1U critters online. There are a few theoretical bottlenecks, but we'll never see them before we have our own nuclear power plant to run the data centers.

For other applications we use, there is *no* scalability. The algorithm has to be single threaded. It doesn't matter if I run it on a cluster, or a machine bristling with CPUs. So we basically buy the data center equivalent of a gaming PC: The fastest processor and memory that fits our budget.

So there are the ends of the spectrum. Your scalability will be somewhere between zero and infinity, depending on the problem at hand.

Re:Vague question... Vague answers by Mr+Z · 2006-04-18 17:27 · Score: 2, Informative

Some problems are like the "baby" problem. It takes nine months to make a baby, no matter how many couples are assigned to the problem. BUT, if the task is to make 1000 babies, you can still do that in 9 months—if you can find 1000 couples. But, if you only need one, you're stuck. It's a parallelism granularity problem.

Other times you get stymied by serial bottlenecks in an application. Sometimes you can gain fractional benefit from additional compute resource by allowing various CPUs in the cluster compute redundant results in lieu of waiting for intermediate results from prior computation. For example, one problem I was working on recently had this problem. It was an optimization problem that built up "new answers" from "previous answers" in an attempt to find the shortest sequence of operations to meet some criterion. The kernel operation iterated over pairs of previous results, combined them, and determined if the new result was unique. (There are more details. I'll keep this brief.)

At its heart, the algorithm was effectively a breadth-first-search shortest path algorithm, where the edges in the graph are described algorithmically, not discretely. The issue is, when doing such a search, where the exits from a particular state (node) aren't known explicitly apriori, you can't mark the discovered states as "visited" to cull the traversal through the state space without serializing everything. In fact, in this particular problem, I was converting the algorithmic description of the edge connections into an explicit description.

The trick to parallelizing here is to periodically divide your work queue of "nodes to visit" among your compute nodes, and then merge the results back, knowing that you will have many redundant "node visits." You can filter these out with some other structure. In this case, my total state bitmap was 512MB--easily held in one node--so the merge process looks like a "Hey, have I seen this? Nope? Pass it on." Even the merge can be performed hierarchically, so eliminate redundancies in stages. At each level of the hierarchy, you can subdivide the state space you're merging to gain parallelism that way.

So, sometimes there ARE ways to speed up serial computations, but at the expense of computing redundant intermediate results.
--Joe

--
Program Intellivision!
Re:Vague question... Vague answers by Mr+Z · 2006-04-18 17:29 · Score: 1

In fact, in this particular problem, I was converting the algorithmic description of the edge connections into an explicit description.

Poorly worded. I meant: "In fact, the entire goal of this exercise was to convert the algorithmic description of the edge connections into an explicit form." (The end goal was to generate a databased that expanded the compact-but-slow form into a fast-but-eats-my-hard-disk form.)
--Joe

--
Program Intellivision!

Much much farther than... by Anonymous Coward · 2006-04-18 13:33 · Score: 0

FOSS shit!

Posting Anon to save my ass by Anonymous Coward · 2006-04-18 13:40 · Score: 1, Interesting

I can certainly declare that applications scale nicely to at least 128 processors that I have personally seen. That is if the application is designed and implemented to allow that.

That is just the database server which handles approx 40,000+ user sessions at one time.

Of course in front of that you have your liberal sprinkling of app server and database proxy servers and whatnot, amounting to about 100 other seperate systems.

As others have noted, you need lots of enterprise, which costs money.

The flip side is there is only one core environment to maintain, which reduces staff and administration requirements.*

Going to a single system environment eases the pains of distaster recovery, hot standby and failover by a large margin. But the cost of that one large environment (Serve Chassis(s), CPUs, Memory, SAN, etc) will almost always cost more than many smaller systems of the same aggregate capacity. Meaning it costs a _lot_ more to have a spare hot failover in the datacenter and another system located in a disaster recovery site. Plus you will need to have a copy for Dev/Test, maybe two.

So you never end up with just one big system. Following the enterprise ruleset, *you'll need at least 5 if not more.

The moral of the story: Once you pass the 8-cpu mark, things get really, really, really expensive in a hurry. Your best bet is to find a way to load balance a large app across many smaller servers. If it is at all possible.

Must be made to scale by Akiba · 2006-04-18 13:59 · Score: 1

An application that was written in a "serial" way will not scale by throwing more CPUs after the first few. Those applications are better served by a very fast CPU rather than several CPUs. If you are trying to scale an application that much, the application itself must be built with scalling in mind to allow parallelism. In that case, how much you can scale it depends on how much paralelism exists in the nature of the problem you are solving. Typically you stop getting good speedup after adding lots of CPUs with a given input data set but if you increase the size of the problem (or number of users or whatnot) then you can keep scaling forever.

Re:Must be made to scale by Anonymous Coward · 2006-04-18 14:07 · Score: 0

Unless of course the "large application" is a webapp consisting of 50000 individual little cgi's that don't parallelize at all individually, but on a 486-66 take less than a second to complete, in which case a billion processors would make this thing support a billion users.

Now, the database behind it may not scale to a billion users, but the question's too poorly worded to tell just what the hell it's asking.

How big a machine can you buy? by loony · 2006-04-18 14:34 · Score: 1

I work in a department that creates provision software for one of the large telcos... As you said, the problem is usually application... OS, DB and such are usually no issue anymore but unfortunately "Enterprise" in the US seems to mean disorganized mess with completely incompetent management - management that rather wants to keep pointless dates that have no meaning in the real workd than doing things right.

We have a few well designed apps and there the answer is pretty much "How big a machine can you buy" - scalability isn't an issue. As I said before, Oracle, DB2 and Informix all scale well enough these days to span 64 or more cpus without issues _IF_ you have your table design right. OS tuning is fairly simple these days. DB tuning takes a little more skill but that's not too hard either. Getting the right hardware set up is next to impossible because of processes. So, no matter what direction you look, it all ends up on the development team's shoulders.

If you asked a little more specific, I'm sure you would get better answers...

Peter.

Re:How big a machine can you buy? by Anonymous Coward · 2006-04-18 17:18 · Score: 0

I work in a department that creates provision software for one of the large telcos...unfortunately "Enterprise" in the US seems to mean disorganized mess with completely incompetent management...

Working in support for one of the large telecos I'll say if what you mean by "incompetent management" is that the vendor designing and building the application just doesn't get it, I'll agree.

As far as the original poster's question you have to look at the users. And I don't mean the back-end report-generating dickweeds who have lots of pull. I mean the majority (in sheer numbers) of users who'll be using the application.

If you can make the real users feel the application is quick and responsive to them (first) and reliable from their point of view (second) then you've succeeded. Anything else and either you've failed or your salesdroids oversold (<sarcasm>which, I can assure you, never happens</sarcasm>).

Very little to go by ... by kbahey · 2006-04-18 15:05 · Score: 3, Interesting

Your description is very little to go about suggesting solutions ...

You have to tell us many many specific things before we can suggest specific solutions. All we know is that the application runs on a 32 cPU system, and has 64 GB. This is all about the hardware. The application is a "large commercial application", and there is "contention within the application or the operating system". We do not even know what the hardware is, nor what operating system it is.

Anyways, here are some generic suggestions form past experience, most of it on UNIX systems, many with Oracle, and most with commerical non-web systems.

- Is the application CPU bound, memory bound, or I/O bound? If you do not know then you have to find out first, then attack the area of

- Is the application transactional in nature or batch? Is it an operational system, or a decision support type of application?

- Does the application use a database (probably does)? Is the database on the same box that runs the application? If so moving the database to a separate box with a fast connection (FDDI or Gigabit Ethernet) may help things.

- Does the application uses queues or message passing? Do these queues fill up at certain peak hours causing slow downs?

- Can you benchmark/load test the application on a similar box? If you have transaction generation/injection tools, then you can simulate the real load and then run tools for profiling, performance and the like in real time (e.g. sar, vmstat, top, ....etc. if you are on a *NIX type of system).

Performance tuning is an iterative process that is more of an art than a science. Start with the 80/20 rule, and get the low hanging fruit (attack the easiest and most obvious area that would gain you some performance, then move to the next area, ...etc until you hit the diminishing returns areas).

--
2bits.com, Inc: Drupal, WordPress, and LAMP performance tuning.

My experience with Solaris/Oracle by brokeninside · 2006-04-18 15:10 · Score: 3, Interesting

One place I used to work had a system that scaled up to well over 20 Sun boxes each with 10 more CPUs. It all depends on having the design right. For example, if you have a batch job, you architect the job to follow a master/worker paradigm where a master process doles out chunks of works to worker processes that may or may not be running on the same machine (think SETI@Home). Not every job can be redesigned to to this, but it it's a fairly easy way to do a large number of different tasks. Further, there's no reason that this design couldn't be used by Linux/PostgreSQL or some other Free Software stack rather than Solaris/Oracle. There are also other paradigms. Perhaps you should do a search on scholarly comp sci papers instead of asking /.. The problem of scaling is not exactly new. Quite a few papers have been written on various way to solve the problem depending on what sort of computational tasks you have to accomplish.

Not far enough. by Onan · 2006-04-18 15:11 · Score: 2, Interesting

Do you mean to ask how far things can scale "vertically", by buying progressively bigger individual machines? That's an easy one: never far enough.

Even if you can magically get a single system that's big enough for your needs forever, you'll still pay orders of magnitude too much money for it, and get no added reliability through redundancy.

Any application that requires a solitary, unique, big server is just definitionally broken. It needs to be redesigned to allow it to be spread over an arbitrary number of small systems in geographically diverse locations. For reliability, your serving infrastructure needs to be at least n+1 at every layer to allow for planned maintenance, unexpected failures, and site-destroying disasters. And for scale, it needs to allow you to continue to plug in more batches of cheap little machines and get more throughput.

Re:Not far enough. by Richard+Steiner · 2006-04-19 07:30 · Score: 1

Any application that requires a solitary, unique, big server is just definitionally broken.

Why? Centralization is often the best solution for many reasons (performance, security, legal issues, recoverability, reliability can all be factors depending on the nature of the system).

Only an extremist advocates one type of computing solution for all problems. :-)

Disclaimer: my background is medium-scale airline online-transaction applications where monolithic systems (read: mainframes) still tend to work very well.

--
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.

Really... by ratboy666 · 2006-04-18 15:43 · Score: 1

You don't give anywhere nearly enough information.

I do SUN PS gigs, so if its SUN hardware, I can help out (just contact SUN). Ask for "PACP" (Performance Analysis and Capacity Planning). I helped design the service. Also, google "adrian cockcroft". Or http://www.cs.washington.edu/homes/lazowska/qsp/

Or IBM or HP: they have equivalent services.

You can also get any number of other people to help: try datacenterworks.com, or treklogic.com (off the top of my head).

Yes, the problem falls directly into my domain, but the service isn't free. I need to eat, too.

Ratboy.

--
Just another "Cubible(sic) Joe" 2 17 3061

Experience to share by gus+goose · 2006-04-18 16:31 · Score: 1

Yup, I currently develop software in that scale... I am doing "volume testing" right now so I have two "sandboxes" to work in. 1 16xDual-core Solaris machine with Oracle database shared on same hardware, and 1 48core IBM (SMT core - like Pentium HT - looks like 96 CPU) p595 which is partitioned .... I have 8 cores for "my" DB2, 16 cores for "me", and someone else plays with the rest....

For our application these machines are over-spec'd. While our app has many components in many languages, (COBOL, C, Java, Perl), I am responsible for some of the Java parts.... and... well, scalability in the primary Java component is linear to a point, but then there is some other bottleneck.... ... in our case, the database (or, more accurately, the disks supplying data to the database ... actually, it really is the Fiber-channel SAN infrastructure limiting me to two 2Gigabit connections to the disks).

So, we are linnear in both enviroments AIX+db2 and Solaris+Oracle to about 10 to 12 CPU's, but then start hitting data starvation at the database end.

I imagine that every "enterprise" encounters a limit somewhere...

gus.

Now, if only I could get another 2 HBAs and one more DS4800's, then I could probably scale through to about 20 CPU's but that would cost 200K.

--
.. if only.

Re:Experience to share by Anonymous Coward · 2006-04-18 17:32 · Score: 0

You sound proud.

Like you have some serious hot shit going on.

I assure you, you are just on the foothills of "Enterprise".

You are only slightly beyond all the Steve "blow" Jobs fanbois who are impressed that their soon-to-be-put-out-to-pasture G5's can handle, *gasp* 8 or even 16 GB of RAM.

Catch me! I'm going to swoon!
Re:Experience to share by charlesnw · 2006-04-19 08:27 · Score: 1

You must be using EMC :)

--
Charles Wyble System Engineer

How well does "Enterprise" scale? by Philip+K+Dickhead · 2006-04-18 16:35 · Score: 1

Lessee, Kid. How much you spending?

--
"Speaking the Truth in times of universal deceit is a revolutionary act." -- George Orwell

Problems scale too by Hyperhaplo · 2006-04-18 18:08 · Score: 1, Interesting

We had a small (Os/390) Dev box that was upgraded recently. One thing we noticed was that one application (SAS Websrv) was taking 30% CPU at some times. When we upgraded the box (and moved to ZOS) this was much more noticable. (please don't ask why it wasn't noticed before the upgrade). Funnily enough.. no one really noticed on the older machine, but we noticed pretty quickly on the new one.

The moral of the story is:
You're not just scaling up your effeciency / work load. You are also scaling up the other variables as well.

If you think it's fun watching a little OS/390 LPAR flogging itself silly (max 30% for the Websrv task), just wait until you see the look on the Performance Team's faces when that 30% finally gets noticed :-) **

** FYI: A task hogging 30% of an LPAR all of the time after the application has crashed is quite significant effect on your budget if left unchecked!

--
You have a sick, twisted mind. Please subscribe me to your newsletter.

Re:Problems scale too by Anonymous Coward · 2006-04-19 03:07 · Score: 0

(please don't ask why it wasn't noticed before the upgrade).

Why wasn't it noticed before the upgrade?
Re:Problems scale too by Hyperhaplo · 2006-04-24 01:28 · Score: 1

I'm glad you asked! (I didn't have time to go into it earlier)

We (the developers) did not notice because the websrv process only went to 100% capacity (30% of the CPU) when the SAS websrv crashed - which is not often. Also, this process was 'low priority' and was constantly trumped by most other jobs. A situation where during the day programmer compiles take a lot of the CPU (plus online systems), and at night.. no one noticed 30% of the CPU not being available.

On the old machine it did not *cost* too much for this process to be running wild. However.. when we upgraded the machine (and got slugged by software vendors for increases in *their* cut) the amount of CPU (Read: money) that this application was taking was finally noticed. Around this time we actually got a hold of a few people to come together in a 'performance management team' to look into these problems.

This highlights another point: When you scale your server up (in terms of CPU's, ram, etc) some software houses will demand more from you to run their software. See Oracle and SQL Server for good examples here. While most software can only scale so far you will find that software vendors will say anything to get you to increase your licences.

Thanks for the troll! I forgot that I was going to get back to this comment :)

And now, since no one's going to read this post anyway since this topic is long gone.. let's continue..

Since the Performance Management guys joined our team I've seen a lot more of the 'other side' of programming than I ever knew existed. We outsource to IBM for everything not directly related to what we need to program with (they supply the desktops, upgrade the utilities, maintain the mainframe and midrange boxes, etc etc etc. We program applications). There's a whole world of tweaking and tinkering that goes on around the corner from us programmers. The question originally asked here was 'how does your application scale and what does it use'. This is the question that our Performance guys ask every day.. however they don't say 'let's add another 8 CPU's!'.. they are more likely to say 'which program can be fix so that it is more efficient. What can we tweak to make better use of what we have got now?'. It's these guys that management go to before they upgrade a machine. This post is dedicated to Tony and The Guys - Thanks for the education. It's because of these guys that I am no longer a programmer and I am enjoying life in Operations.

--
You have a sick, twisted mind. Please subscribe me to your newsletter.

Reservation industry? by Lando · 2006-04-18 19:43 · Score: 1

http://www-306.ibm.com/software/htp/tpf/

--
/* TODO: Spawn child process, interest child in technology, have child write a new sig */

100 users at a time tops.... by Anonymous Coward · 2006-04-18 21:59 · Score: 0

...if you work for my company. We are forced to use this handy little app called IMS that they need to reboot every 6 hours or so. I mean why would you want to be able to have consistant performance with the application that contains all our client data? It makes things way more exciting when you are in a meeting and you do a query on the spot because you have a client asking you for info and it takes 40 minutes to run a simple query to the SQL server.

baby problem by way2trivial · 2006-04-18 22:19 · Score: 1

1000 babies in 9 months does not require 2000 mouths

techincally you need 1001 people to produce 1000 babies in 9 months-not subtracting for multiple births.

it's all in how you look at it.

--
every day http://en.wikipedia.org/wiki/Special:Random

Re:baby problem by Jaruzel · 2006-04-18 23:36 · Score: 1

Tweaking this pedantic thread slightly longer ;) , technically you only need 1000 people to produce 1000 babies. If I recall correctly, a few months ago a baby was born of two mothers, that is to say, the boffins had created a viable embryo from the DNA of two women.

-Jar.

--
Together, We Can Make Slashdot Better. I Do NOT Mod ACs. - Check Me Out
Re:baby problem by mshurpik · 2006-04-19 00:31 · Score: 1

It's a good point that you can do it with only 1001 people but it would take longer than 9 months. Probably more like 15.
Re:baby problem by Mr+Z · 2006-04-19 03:44 · Score: 1

FWIW, I said "couples." Couples, in the specific context of making babies, tend to be pairings of males and females. The same male could be paired with two different females, giving two couples, but 3 people. As someone else pointed out, though, you do need enough sperm to go around, and the time to make all these couplings. :-)

Those pedantics aside, I wanted to avoid the sometimes-called-misogynistic formulation that's somewhat more common. And, well, multiple births in this analogy are like data dependent cache misses--for certain cases, you run noticeably faster than average, but you can't really count on it when planning capacity.
--Joe

--
Program Intellivision!

EE Java by Anonymous Coward · 2006-04-18 22:56 · Score: 0

I thought these were the kinds of problems Enterprise Java was created to solve. You get distributed transactions, automatic load balancing and all that stuff "for free" and can concentrate on the business logic.

I talked to a programmer at OMX group, they use Java to do the stock exchange systems for, for instance, the Australian, Italian, Swiss and Singapore stock exchanges.

eBay is holding a seminar at this years Java ONE called "The eBay matrix: Designing an enterprise object and development model that scales across the globe".

But to answer your question, no, I haven't worked with it myself. :-)

180 MILLION by way2trivial · 2006-04-19 00:47 · Score: 1

http://www2.oakland.edu/biology/lindemann/spermfac ts.htm

--
every day http://en.wikipedia.org/wiki/Special:Random

Re:180 MILLION by mshurpik · 2006-04-19 09:04 · Score: 1

Sigh. Well if you're going to bring technology into the picture then 1001 seems like less of a miracle.

Enterprise vs. Voyager by Sindri · 2006-04-19 01:19 · Score: 1

Voyager systems however are a lot faster than Enterprise but don't scale as well. :oP

--
Sindri Traustason.

Re:Enterprise vs. Voyager by bill_mcgonigle · 2006-04-19 14:03 · Score: 1

But try as they have, nobody's managed to actually like one.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)

Large Project on Server Cluster by psalm33 · 2006-04-19 02:18 · Score: 2, Insightful

My company has developed a large software project on a server cluster for the backend. Our server-side architecture is (in theory) scalable as large as we want to go. We use BEA Tuxedo to assign different applications to different servers, and all the databases are available via a SAN. The Unix servers use are currently configured with 4 to 8 CPUs each, and 8 to 16 GB memory. The server cluster is currently configured between 2 and 10 servers for our current deployments, though we could scale larger simply by rearranging the tuxedo configuration files if we needed to.

Now, some server-side apps in our system are architected to scale very well, and some we have had to spend the last few months tweaking the code as we grow with our current customer's deployment. In general though, our system tends towards lots of specific apps running simultaneously to handle individual tasks, rather than a small number of large, monolithic apps. I think it is very much making sure you have large system scalability in mind from the beginning, and not starting small and then realizing "Oh no! We never realized we'd have to handle THIS much traffic!" Our project is a perfect example of learning that lesson over and over as we've had to tweak or rewrite pieces of it as we add more and more clients to our customers' deployment. It can be done, but depending on how you've written your apps, it may not be easy.

Short-sighted project management by CarrotLord · 2006-04-19 03:06 · Score: 2, Insightful

In my experience, the custom applications I deal with seem to be built with not just incorrect assumptions regarding load, but *no* assumptions regarding load. When I first fired up one particular application in a production environment, we were seeing 6000 incoming messages per second. I asked the lead developer what we should be expecting to see. He had no idea.

This is caused by short sighted project management, which translates into short sighted programming. The necessary questions about throughput aren't asked, because it all works fine on the developers' PC with a test load. In our case, we eventually got the application running OK, but changes that have been made since have not taken into account anything to do with I/O, so the fact that our CPU usage is not maxing out seems to indicate to the development team that we are not bound by the server performance, and hence have not reached any scalability thresholds.

Obviously this is madness. If one was to investigate the scalability of this application properly, one should be looking at where I/O happens, where interprocess communication happens, where object creation and destruction happens, and so on... There is no other way to scale an application -- you have to define what the "load" is, find what happens when you increase it, work out where any bottleneck is, and how parallelisable this bottleneck is. Anything less is no more than buzzwords.

--
Quidquid latine dictum sit, altum videtur.

My experience by lonesome+phreak · 2006-04-19 13:37 · Score: 1

At my current position we deal in heavy datasets, full virtual-reality simulations, and a multiple shard world comprising up to 8,000,000 simultaneous online players. We're running into some scalability issues as well...our subscriber base is outgrowing our ability to correctly run the FIVR system.

Sometimes, it's just time to look for another job because your way out of your league when people ask vague questions!

--
Maybe we DID take the blue pill. You wouldn't remember anyway.

Slashdot Mirror

How Far Can Large Commercial Applications Scale?

56 comments