How Far Can Large Commercial Applications Scale?
clusteroid81 asks: "I've been working with customers who run large commercial applications on big iron (16-32 symmetric multi-processor systems - 64GB or more memory ). There are always numerous other front-end servers involved, but the application on the back end server is often difficult to spread across multiple systems or clusters due to the application architecture. Scaling is done by increasing memory and processor counts. As things progress, the bottleneck is usually contention within the application or operating system. Are there folks here on Slashdot who work with large single system commercial applications? What kind of processor counts and memory do the applications have and how well do they scale?"
Your description is very little to go about suggesting solutions ...
....etc. if you are on a *NIX type of system).
...etc until you hit the diminishing returns areas).
You have to tell us many many specific things before we can suggest specific solutions. All we know is that the application runs on a 32 cPU system, and has 64 GB. This is all about the hardware. The application is a "large commercial application", and there is "contention within the application or the operating system". We do not even know what the hardware is, nor what operating system it is.
Anyways, here are some generic suggestions form past experience, most of it on UNIX systems, many with Oracle, and most with commerical non-web systems.
- Is the application CPU bound, memory bound, or I/O bound? If you do not know then you have to find out first, then attack the area of
- Is the application transactional in nature or batch? Is it an operational system, or a decision support type of application?
- Does the application use a database (probably does)? Is the database on the same box that runs the application? If so moving the database to a separate box with a fast connection (FDDI or Gigabit Ethernet) may help things.
- Does the application uses queues or message passing? Do these queues fill up at certain peak hours causing slow downs?
- Can you benchmark/load test the application on a similar box? If you have transaction generation/injection tools, then you can simulate the real load and then run tools for profiling, performance and the like in real time (e.g. sar, vmstat, top,
Performance tuning is an iterative process that is more of an art than a science. Start with the 80/20 rule, and get the low hanging fruit (attack the easiest and most obvious area that would gain you some performance, then move to the next area,
2bits.com, Inc: Drupal, WordPress, and LAMP performance tuning.
One place I used to work had a system that scaled up to well over 20 Sun boxes each with 10 more CPUs. It all depends on having the design right. For example, if you have a batch job, you architect the job to follow a master/worker paradigm where a master process doles out chunks of works to worker processes that may or may not be running on the same machine (think SETI@Home). Not every job can be redesigned to to this, but it it's a fairly easy way to do a large number of different tasks. Further, there's no reason that this design couldn't be used by Linux/PostgreSQL or some other Free Software stack rather than Solaris/Oracle. There are also other paradigms. Perhaps you should do a search on scholarly comp sci papers instead of asking /.. The problem of scaling is not exactly new. Quite a few papers have been written on various way to solve the problem depending on what sort of computational tasks you have to accomplish.