Slashdot Mirror


Learning High-Availability Server-Side Development?

fmoidu writes "I am a developer for a mid-size company, and I work primarily on internal applications. The users of our apps are business professionals who are forced to use them, so they are are more tolerant of access times being a second or two slower than they could be. Our apps' total potential user base is about 60,000 people, although we normally experience only 60-90 concurrent users during peak usage. The type of work being done is generally straightforward reads or updates that typically hit two or three DB tables per transaction. So this isn't a complicated site and the usage is pretty low. The types of problems we address are typically related to maintainability and dealing with fickle users. From what I have read in industry papers and from conversations with friends, the apps I have worked on just don't address scaling issues. Our maximum load during typical usage is far below the maximum potential load of the system, so we never spend time considering what would happen when there is an extreme load on the system. What papers or projects are available for an engineer who wants to learn to work in a high-availability environment but isn't in one?"

207 comments

  1. 2 words by andr0meda · · Score: 2, Informative
    --
    With great power comes great electricity bills.
    1. Re:2 words by teknopurge · · Score: 3, Interesting
      I just finished reading that paper and was left with the impression that I had just wasted 10 minutes. I could not find a single insightful part of their algorithm - and in fact can enumerate several 'prior art' occurrences form my CPSC 102 class during my undergrad - all were lab assignments.

      I did, however, find this sentence disturbing:

      However, given that there is only a single master, its failure is unlikely; therefore our current implementation aborts the MapReduce computation if the master fails. Huh? So, because there is only one master it is unlikely to fail? This job takes hours to run. This is similar to saying that if you have one web server, it is unlikely to fail. I can't help but think this is a logical fallacy. I don't care how simple of complicated a job is - a single-point-of-failure is a single-point-of-failure.
    2. Re:2 words by teknopurge · · Score: 2, Funny

      I can feel the grammar Nazi's stalking me even now...

    3. Re:2 words by stonecypher · · Score: 0

      People who think map reduce is the same thing as scalability have no idea what scalability is.

      --
      StoneCypher is Full of BS
    4. Re:2 words by fimbulvetr · · Score: 1

      While the condescending attitude might make you feel better about yourself, it seems that they took this "lab assignment" and honed it into a system to make themselves a few bucks.

      Oh, and they also use it all the time on one of the world's largest data warehouses.

    5. Re:2 words by andr0meda · · Score: 2, Insightful


      Well, it's easy to say something isn't 'A', and then not spend a word on what IS 'A'.

      If I'm so wrong on scalability maybe you can explain it here to me. Thanks.

      --
      With great power comes great electricity bills.
    6. Re:2 words by nschubach · · Score: 1

      They may just accept you as their personal love slave and overlook your mistakes.

      --
      Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
    7. Re:2 words by nschubach · · Score: 1

      Your too hung up on 'A' and 'B' is getting lonely.

      --
      Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
    8. Re:2 words by PitaBred · · Score: 2, Informative

      A single point of failure is better than multiple points of failure, though, where any one failing would stop things dead in the water (think how a RAID0 array is less reliable than a single drive by itself, statistically). I'd hope that anyone working at Google would realize that, and would mean that, rather than the meaning you took from it :)

      But who knows... you could be right. I'm just playing devil's advocate.

    9. Re:2 words by greedyturtle · · Score: 1

      The Grammar Nazi is secretly turned on by those with horrible grammar. Which is, of course, all the more reason to write correctly.

    10. Re:2 words by Anonymous Coward · · Score: 0

      A single point of failure is better than multiple points of failure, though, where any one failing would stop things dead in the water (think how a RAID0 array is less reliable than a single drive by itself, statistically).

      Huh?

      Statistically the odds are the same for either drive failing as for a single drive failing. This is not a combined event, the variables are independent.

    11. Re:2 words by stonecypher · · Score: 5, Informative

      This is a bit like saying "auto mechanics is a matter of turning a wrench," and when someone points out to them that there's a lot more to it, saying "well then maybe you could teach me to be a mechanic in your reply." If scalability was an issue simple enough to explain in a slashdot post, people wouldn't have trouble with it. Scalability isn't a problem; it's a family of problems. Suggesting that a single algorithm from a single library magically waves away the issues involved in a heavily parallel server is simply an exposition that you aren't aware what goes into scalable servers.

      The Macy's Door Problem is a great example of a Scalability 101 problem that map_reduce has no way to address. In the early 30s, when most department stores were making big, flashy front entrances to their stores with big glass walls and paths for 12 groups of people at a time, doormen, signage, the whole lot, Macy's elected to take a different approach. They set up a small door with a sign above it. The idea was simple: if there was just the one door, it would be a hassle to get in and out of the store; thus, it would always look like there was a crowd struggling to get in - as if the store was just so popular that they couldn't keep up with customer foot traffic. The idea worked famously well.

      In server design, we use that as a metaphor for near-redline usage. There's a problem that's common in naïve server design, where the server will perform just fine right up to 99%. Then, there'll be a tiny usage spike, and it'll hit 101% very briefly. However, the act of queueing and disqueueing withheld users is more expensive than processing a user, meaning that even though the usage drops back to 99%, by the time those 2% overqueue have been processed, a new 3% overqueue has formed, and performance progressively drops through the floor on a load the application ought to be able to handle. I should point out that Apache has this problem, and that until six years ago, so did the Linux TCP stack. It's a much more common scalability flaw than most people expect.

      Now, that's just one issue in scalability; there are dozens of others. However, map_reduce has literally nothing to say to that problem. Do I need to rattle off others too, or maybe is that good enough? I mean, we have the exponential growth of client interconnections (Metcalfe's Law, which is easily solved with a hub process;) we have making sure that processing workloads is linear growth (that is, o(1) as opposed to o(lg n) or worse), which means no std::map, no std::set, no std::list, only pre-sized std::vector and very careful use of hash tables; we have packet fragmentation throttling; we have making sure that you process all clients in order, to prevent response-time clustering (like when you load an apache site and it sits there for five seconds, so you hit reload and it comes up instantly,) all sorts of stuff. Most scalability issues are hard to explain, but maybe that brief list will give you the idea that scalability is a whole lot bigger of an issue than some silly little google library.

      Talk to someone who's tried to write an IRC server. Those things hit lots of scalability problems very early on. That community knows the basics very, very well.

      --
      StoneCypher is Full of BS
    12. Re:2 words by Brian+Gordon · · Score: 1

      If you have a 200GB file spread across 10 hard drives in a RAID0, then there's 10 points of failure. Any one of the ten hard drives could fail and you've just lost your file. On the other hand, if you have your file taking up the whole of a single hard drive, then the only way you'll lose that file is if that particular drive fails. If a drive fails you stand to lose more data if they're on a RAID0 compared with a bunch of independent HDDs.

    13. Re:2 words by Anonymous Coward · · Score: 0

      ircd programmers > google

      lol

    14. Re:2 words by Anonymous Coward · · Score: 0

      If the odds of a single drive failing are 1 in 1000, the odds of that drive failing in a RAID are also 1 in 1000.

      This is a common statistical fallacy; increasing the number of trials does not increase the probability of an outcome unless you're doing combination. In the case of the RAID, the odds of any drive failing are the same as a lone drive failing.

      In this case, you lose your file 1 in 1000 times regardless of the number of points of failure (because one drive failing kills the RAID0).

    15. Re:2 words by WhiplashII · · Score: 2, Insightful

      Im afraid you are wrong. Here is how to describe the situation is statistical terms:

      Probability of failure with a single drive: 1 in 1000

      Probalitity of failure with ten drives: equals the probability of drive 1 failing or drive 2 failing or drive 3 failing or drive 4 failing or drive 5 failing or drive 6 failing or drive 7 failing or drive 8 failing or drive 9 failing or drive 10 failing.

      The easier way to solve that equation is to reverse it - it equals the probabiblity of drive 1 not failing and drive 2 not failing and drive 3 not failing and drive 4 not failing and drive 5 not failing and drive 6 not failing and drive 7 not failing and drive 8 not failing and drive 9 not failing and drive 10 not failing. (Think about it, in order for the raid 0 to stay up, all the drives have to be good - calculate that probability).

      Mathematically, this is 0.999^10, or 0.99. So the failure rate fell from 1/1000 to about 1/100, just as you would expect.

      --
      while (sig==sig) sig=!sig;
    16. Re:2 words by uhlume · · Score: 4, Funny

      You can feel the grammar Nazi's what stalking you?

      --
      SIERRA TANGO FOXTROT UNIFORM
    17. Re:2 words by Pollardito · · Score: 1

      you're using the term "multiple points of failure" to describe a situation which is probably better described as "multiple single-points of failure"

    18. Re:2 words by andr0meda · · Score: 1

      Thanks for going a bit deeper into it, I actually learned a new thing!

      Of course you're a bit bitten by the fact that someone shouts "2 words" on such a broad topic, but I think I did take one example that speaks for itself. In case anyone wonders, yes, it is not the true one answer to every question / issue in the realm of scalability.

      MapReduce is one algorithm and it covers only one possible strategic way to attack [a] problem. But most importantly it's an (interesting) way to take a problem, think about it in rather unconventional ways and solve it alike. You asserted that it says nothing about the technical network architectures and database tiers, or their issues than can arise, but it *does* say something about how to organise and scale the back-end without fundamentally changing the architecture, which is conceptually a strong idea if you have to deal with scalability. I'm quite sure that, in order to implement MapReduce effectively, you will no doubt come accross some of the issues that you mentionned. After all, all data driven systems need to communicate with the clients and pump data in and out as effeciently as possible.

      Next time I promise I will write more words ;)

      --
      With great power comes great electricity bills.
    19. Re:2 words by Zwack · · Score: 1

      But RAID0 isn't intended to be reliable... It's intended to be FAST. RAID 1 is intended to be reliable...
      RAID 0+1 is intended to be fast and reliable.

      A single master is a single point of failure. However if that server failing doesn't cause running issues then they can ignore that single point of failure as it doesn't matter to production. I would imagine that the code on the Master is well tested by now (and may be very simple anyway) which just means that they now have to worry about hardware failures...

      Z.

      --
      -- Under/Overrated is meta-moderation, and therefore is Redundant.
    20. Re:2 words by Hemogoblin · · Score: 2, Insightful

      1/100 is larger than 1/1000. Therefore, it rose from 1/1000 to 1/100. That is, 10 single points of failure is more risky than 1 point. I believe that was what you were arguing anyway, so no worries.

    21. Re:2 words by Anonymous Coward · · Score: 0

      Okay, I'm big enough to admit when I'm wrong... or rather, I wasn't "wrong," I was...ahem...misinformed.

      Oh well, at least I can say I learned something new today. How about you? :-P

    22. Re:2 words by Anonymous Coward · · Score: 0

      im retarded?

      self obsessed?

      thinkim cool?

      thisisnt new?

      stolit fromlisp?

      web weenie?

      what ever?

    23. Re:2 words by Anonymous Coward · · Score: 0

      Yeah. By this logic, if I understand the theory of special relativity better after reading a textbook on the subject than after reading Einstein's private letters on unrelated subjects, then the textbook author must be a better physicist than Einstein, because they wrote a book which explained the subject better than something which Einstein wrote(nevermind that what Einstein wrote wasn't actually meant to explain the subject).

      The MapReduce algorithm is meant to solve a specific problem, not to solve all scalability problems, and certainly not to teach you about all scalability issues.

    24. Re:2 words by James+Youngman · · Score: 1

      MapReduce? There is hardly a paper on Google's Labs site which explains a technology less suited for the kind of high-availability server side applications that the poster is asking about!

  2. Give it up! by Anonymous Coward · · Score: 0

    "Give it up, give it up," said he, and turned away with a great sweep, like someone who wants to be alone with his laughter.

    1. Re:Give it up! by homey+of+my+owney · · Score: 1

      And if you don't give up, duck

  3. Well... by Stanistani · · Score: 3, Informative

    You could start by reading this book for a practical approach:

    Zawodny is pretty good...

    1. Re:Well... by tanguyr · · Score: 3, Informative

      I also recommend the book Building Scalable Web Sites, also from O'Reilly. Loads of good ideas on clustering, performance monitoring, even some ideas on scaling the development process itself. Scalability and high availability are not the same thing, but much of the material covered in this book is relevant to both. /t

      --
      #!/usr/bin/english
    2. Re:Well... by tholomyes · · Score: 1

      I just picked "Building Scalable Web Sites" up four or five weeks ago and I'll second that recommendation; the book is really well written and actually a fairly quick read, a rarity even among O'Reilly books. It covers a lot of ground comprehensively, and is organized in a way that makes sense.

      --
      When did the future switch from being a promise to a threat? -C. Palahniuk
    3. Re:Well... by bjk002 · · Score: 1

      I would recommend this book

      Great read to help understand good (and bad) design principles.

      --
      Opinion:=TMyOpinion.Create(Me);
    4. Re:Well... by porkThreeWays · · Score: 1

      Zawodny? Tay Zawodny?

      --
      If an officer ever threatens to taze you, say you have a pacemaker.
    5. Re:Well... by Anonymous Coward · · Score: 0

      I found "Release It!" better than either "Building Scalable Web Sites" or "Scalable Internet Architectures". The latter two were written by system administrators who work on Perl/PHP systems. While quite informative, they aren't useful from a programmer's point-of-view in an enterprise-style platform (e.g. message bus, Java, etc). "Release It!" focused more on how to scale version 1.0 and recognizes that the developers will be woken up at 5am repeatedly every week until the architectural (code and networking) issues are resolved. A much more pragmatic book.

  4. scalability by Anonymous Coward · · Score: 0

    so your are having scalability issues because of a poor design - "a second or two slower" - wow how do you get away with that poor performance? anything over a 1.5 seconds and I get major complaints here.

    How does high availability come into it? and high availability isn't exactly difficult you just need budget

    1. Re:scalability by romango · · Score: 1

      The load you describe is incredibly low. You must have very large DB tables with improper indexes to get that poor of a response.

  5. look at Saas development by CBravo · · Score: 1

    Generally, Saas (software as a service) providers have to scale their apps. The development issues they have are more or less solved. Look it up on Google... ('saas scalability problem').

    --
    nosig today
  6. Here goes... by Panaflex · · Score: 2, Informative

    Ok, first up:
    1. Check all your SQL and run it through whatever profiler you have. Move things into views or functions if possible.
    2. CHECK YOUR INDEXES!! If you have SQL statements running that slow, the likely cause is not having proper indexes for the statements. Either make an index or change your SQL.
    3. Consider using caching. For whatever platform you're on there's bound to be decent caching.

    That's just the beginning... but the likely cause of most of your problems. We could go on for a month about optimizing.. but in the end if you just stuck with what you have and checked your design for bottlenecks you could get by just fine.

    --
    I said no... but I missed and it came out yes.
    1. Re:Here goes... by bugg_tb · · Score: 1

      I don't think the poster was referring about his system and how to optimize it, I think he's interested in learning HA/Failover techniques that don't yet need to be implemented in his database.
      I do vaguely recall an article a while back about, myspace iirc and how they had so much trouble expanding as soon as the boom took off, it wasn't very practical but gave a nice insight into how a large load on servers can cause interesting challenges.
      Tom

    2. Re:Here goes... by rdavidson3 · · Score: 1, Informative

      Just to appending to this list:
      4. Get your servers clustered and this will help with server load (not really necessary at this this time for what you need, but will position you for the future) and redundancy if the server dies. If this is not possible, look at "warm" backups. But then again ask the business side what is their expectation when a problem happens, and then plan for it.
      5. For performance tuning look at the execution plans on the SQL
      6. Use transactions whenever possible (BEGIN TRANSACTION / COMMIT / ROLLBACK).
      7. If you see deadlocks on tables, try using table hints (NOLOCK) on SELECT statements.
      8. Get an experienced DBA to peer-review your setup and code if necessary.

      And a change to 1. You can use stored procs as well as views and functions. But moving the SQL code into views / functions will bring performance gains from the server by having the code already compiled and creating a saved execution plan.

    3. Re:Here goes... by Panaflex · · Score: 1

      Understood - it's just that oftentimes people overlook the basic issues. He could get by without having to spend much time & effort - just a little basic knowledge of databases could go a long way. He mentioned 1-2 second queries and I suspected he's just missing the basics.

      --
      I said no... but I missed and it came out yes.
    4. Re:Here goes... by Panaflex · · Score: 1

      Thanks, all good advice.

      --
      I said no... but I missed and it came out yes.
    5. Re:Here goes... by jbplou · · Score: 1

      Using parameterized sql will likely cause execution plan to be saved as well depending on RDMS platform if stored procedures are not desired.

    6. Re:Here goes... by rdavidson3 · · Score: 0

      True, as long as you do this: SELECT * FROM table1 WHERE id = @id But, if the SQL is dynamic: SELECT * FROM table1 WHERE id = 1 SELECT * FROM table1 WHERE id = 2 SELECT * FROM table1 WHERE id = 3 SELECT * FROM table1 WHERE id = 4 Then the server has to create a new plan every time. I have seen this all too often on the client side code.

  7. Mastering Ajax with JSON on the server side by IndioMan · · Score: 0, Offtopic

    As discussed in the previous article in this series, JSON is a useful format for Ajax applications because it allows you to convert between JavaScript objects and string values quickly. In this final article of the series, you'll learn how to handle data sent to a server in the JSON format and how to reply to scripts using the same format.

  8. High availability!=high performance by dominux · · Score: 5, Insightful

    start by being clear about what you want to achieve. If it is HA then you want to look at clustering, failover, network topology, DR plans etc. If it is HP then look for the bottlenecks in the process, don't waste time shaving nanoseconds off something that wasn't bothering anyone. At infrastructure level you might think about cacheing some stuff, or putting a reverse proxy in front of a cluster of responding servers. In general disk reads are expensive but easily cached, disk writes are very expensive and normally you don't want to cache them, at least not for very long. Network bandwidth may be fast or slow, latency might be an issue if you have a chatty application.

    1. Re:High availability!=high performance by TheRaven64 · · Score: 1

      Exactly right. I'd like to add that if you want to write really scalable code then use an asynchronous approach as much as possible. Some programming languages and toolkits make this easy, some make it hard, but it's possible in any. If your database server is slow responding to your application server, make sure your app server can do useful work while it's waiting. The same is true of communication between parts of the server.

      I'd thoroughly recommend that you learn Erlang, if you haven't already. The language is almost certainly not suited to the kind of task described (it might be, but it's unlikely), however the programming style it encourages can be applied to languages that are. Learning Erlang helps you write scalable code in any language, just as learning Smalltalk helps you write good OO code, irrespective of whether you actually use either language in production.

      --
      I am TheRaven on Soylent News
    2. Re:High availability!=high performance by leuk_he · · Score: 1

      Agreed.

      High performance=short response times. In your case you can think about caching more and tuning the system and database access. Maybe you can make the application more scalable, but once you move the database to different server than the application you first get some extra (network) overhead instead of performance, specially in low load situations. And more iron/servers means more money.

      High availably is about a 24x7 and no singe point of failure. One method for this is clustering (more application web servers ,more replicated databases,)

      You want to learn about eXtreme load? Simulate it in your testing environment. Testing is important but not simple.

    3. Re:High availability!=high performance by demi · · Score: 1

      Luckily, we now have a good Erlang book in print (again): Programming Erlang by Joe Armstrong. Learn it, live it, love it.

      The language is almost certainly not suited to the kind of task described (it might be, but it's unlikely)

      I disagree. Erlang is perfectly good for general programming tasks and particularly well-suited for the sorts of demands placed on public web applications (which is sort of the undercurrent of the requester's question, I think). And while it's true that messaging, lightweight parallelism, supervision trees, exit trapping and so forth are principles that might could be applied to other programming languages: the question would be, "why?" It's so much easier in Erlang. And Erlang isn't just a programming language, it's also a technology for building concurrent distributed non-stop systems. It's "batteries included" and comes with lots of useful libraries, including a web development kit and so forth. It is also easy to integrate with external libraries (indeed, this was Erlang's primary problem domain as it was developed) such as those for Java and C, and unlike other languages does so in a safe way (where a crash in the library won't crash the program).

      --
      demi
    4. Re:High availability!=high performance by TheRaven64 · · Score: 1
      The reason I wouldn't recommend Erlang for this kind of task is that it's string manipulation sucks. Native strings are ASCII (WTF) and are stored in an incredibly inefficient way. You end up having to write your own code mapping between unicode and binaries (unless this has made it into OTP while I wasn't looking). Even then, the string manipulation syntax is a pain.

      Calling C code from Erlang is not a task for the faint-hearted either. You have to write a port driver to do it safely, which is a lot of glue code, or use the IDL tools (which are completely undocumented).

      --
      I am TheRaven on Soylent News
    5. Re:High availability!=high performance by fifedrum · · Score: 3, Insightful

      clustering, word up (if we're allowed to use old catch phrases like that)

      disk reads and writes are the least of our troubles when we scaled much more than a small enterprise level of data. The sheer number of moving parts in our environment (not just physical parts, but bits flowing too) killed productivity and we wound up with the complete inability to cache anything.

      There's simply too much data flowing back and forth to make caching pay for itself and too often will a hard drive fail requiring even more background noise.

      Then God forbid you introduce a load-balancer into the equation because even if they are redundant, they aren't infalable, and they're still a network choke point which you will choke, guaranteed, before too long.

      With horizontally scaled architectures, pretty soon you wind up with thousands/hundreds of disk drives (failures on a daily basis) and hundreds of power supplies, CPU coolers, sticks of RAM etc. Successfully scaling horizontally means overbuilding the living shit out of everything envolved to the point where you can handle a significant percentage of your environment going offline, and THEN being able to rebuild the missing parts with hot spares without interrupting your processing. On top of it, your servers have to be stateless with users tied to nothing but the storage backend which is so insanely overbuilt as to never lose a bit of data.

      Or you pay through the nose for a real HA/HPC environment like a big iron box with a PB of DASD and 10 engineers who sleep with the box every night.

      Anything less than that is a DR waiting to happen or, as we like to call it, compromise.

      The problem with most architects building "scalable" environments is that they don't build scalable environments they build affordable environments.

    6. Re:High availability!=high performance by demi · · Score: 1

      True (sort of), but that's a specific weakness for a non-specific application description. It's true that for a lot of people, all data looks like strings that you parse, extract, and throw regexes at until you're blue in the face, but as a general programming approach that's more an artifact of Perl and shell scripting habits than anything else. I would like to see a standard and high-performance bstring implementation, though (and yes, I did start writing my own unicode/bstring thing :) ). Here's another place where a larger community could help.

      C calling is a matter of taste, I guess. I didn't find doing it from Erlang any more difficult or complicated than writing XS modules or any similar tasks. The model of a port driver is much safer and has a lot to recommend it over being forced to link in external libraries (though you can do that in Erlang, too, if you really want).

      There are definitely powerful pieces of Erlang that are insufficiently documented, I hope this increasingly becomes the focus of the community (documentation and how-tos). IIRC I did see a good how-to on writing C drivers.

      Since we're criticizing, one valid criticism I've heard is that Erlang's metaprogramming system is clunky (which it is). I think someone was working on a general macro facility for Erlang, though.

      --
      demi
    7. Re:High availability!=high performance by TheRaven64 · · Score: 1

      I think someone was working on a general macro facility for Erlang, though. A few people have done something towards this. The required facilities are all there via (barely-documented) parse transforms. I started writing some code to generate parse transforms from macro definitions, but got bored.
      --
      I am TheRaven on Soylent News
  9. One Of Many Possible Futures by rubicante · · Score: 1

    I think the word scalable gets people into trouble when they program for a future that will never arrive. Instead focus on building elegant applications -- they are easy to maintain, and you know you'll be doing a lot of that.

    1. Re:One Of Many Possible Futures by try_anything · · Score: 1

      Hear, hear. I'm sure there are interesting issues in the OP's workplace that he can work on here, now, and in an indubitably real environment. He should start attacking the issues his coworkers struggle with. Does the app deploy smoothly? How much administration does it require? Can the interface be improved to save users time and brainpower? Can you decrease the training time of new employees or the number of technical support requests? Find a real, quantifiable problem, try to create a real solution, and see what happens. You'll learn so much more than if you try to create and test a enterprise-level scaling solution on your home network. With fake problems, you can go through the motions, but you'll never know if you're doing it right. Real problems will bite back if your solution is half-baked, and I'm sure your company has plenty of real problems. If there aren't any urgent problems that are being neglected, then look for a new job NOW before the next wave of cost-cutting!

  10. I don't code for it directly by Applekid · · Score: 3, Interesting

    Our in-house applications don't get built around performance at all (personally I find it disappointing but I don't write the rules... yet). We generally scale outwards: replicated databases, load distribution systems, etc.

    Many of the code guidelines we have established are to aid in this. Use transactions, don't lock tables, use stored procedures and views for anything complicated, things like that.

    I guess my answer is that we delegate it to the server group or the dba group and let them deal with it. I guess this means the admins there are pretty good at what they're doing. :)

    --
    More Twoson than Cupertino
    1. Re:I don't code for it directly by gatesvp · · Score: 1

      I'm in a small shop and we do this too. Truth is, we don't even have a real DBA, but a few of us know SQL Server really well. The reason we actually do it this way is cost. On small projects, Dev time is really expensive, server resources are not. If you can support 30 more clients with one $5k server, then it's simply not worth Dev time to stress over performance.

      Truth is, if performance is becoming an issue, then the project should be generating enough revenue to justify the Dev time spent on performance tuning. As Devs, we'd like to build a highly-performant system every time, but as business owners, dev time is the greatest expense and needs to be kept in check. Really, using SPs and Views and building in good logic into your Data Access / Business Object / Entities / Whatever layer(s), is likely the best way to keep up performance for small to mid-sized solutions.

      Classic example, we just implemented some "partitioned table" in SQL Server 2005. We didn't use the actual SQL Server feature, b/c we're not running the Enterprise version, so we did the poor-man's version and agreed that we'll use the real version if the loads increase. If the loads increase to this point, then we'll be able to both afford the Enterprise version and pay for a couple weeks of Dev time to re-optimize the appropriate code.

      It's a very business-oriented approach to say "Hey, if we ever get that big, we'll have the money to make the system perform better". It may rankle the developer in us, but it's important to be able to identify wasted energy.

    2. Re:I don't code for it directly by yfarren · · Score: 1

      How do you not lock a table, if you are using a transaction? Don't transactions implicitly lock tables?

    3. Re:I don't code for it directly by Applekid · · Score: 1

      Not sure how exactly it works, not being a big DB guy myself, but if I were forced to implement a database application without locking tables I'd probably do it by insulating the transactions from each other using shadowing and just settling things down at COMMIT, sort of how CPUs can issue instructions that play with the registers out-of-order.

      --
      More Twoson than Cupertino
  11. Watch videos/presentations. by Nibbler999 · · Score: 3, Informative

    There are some good presentations on the web about how youtube, digg, google etc handle their scaling issues. Here's an example: http://video.google.com/videoplay?docid=-630496435 1441328559

  12. get a just in case server by ILuvRamen · · Score: 0

    I've heard of companies who offer server networks for websites and corporate server backups in case of a massive flood of traffic. Basically it's just about free cuz you rarely use it but if your website shows up on The Daily Show and you get 1 million visitors, they sense that and host it from the backup on 50 of their servers at once until traffic dies down and bill you for it later.
    Same with a corporate network. A bunch of people have to get in their last minute stuff on the last day of the quarter or whatever and your server is going nuts with the traffic so they're there to save you. Just take a day or so and write a "switch" sort of program on your server(s) that detects tons of traffic and contacts the emergency offsite servers that the company has your apps and DBs just sitting on and you use multiple servers of theirs until the traffic dies down. There is a little bit of a higher fee for corporate services but it's still really cheap. It's like a rented server but 99.999% of the time, they don't need to allocate any bandwidth at all to it so it's like 25x cheaper than renting a dozen actual, full time servers.

    --
    Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'
  13. 2 Cents by twoplustwo · · Score: 1

    Well, HA typically has to do with availability not performance. However, if you add redundant equipment, e.g. another column, you can improve performance and improve availability. So, db scaling issues can be resolved by adding memory and CPUs. Applications can be scaled by adding by cloning vertically, add memory, cpus, etc. A redundant column of equipment, e.g. web servers, etc.

  14. check these out... by BillAtHRST · · Score: 4, Informative

    These are both decent starting points. Please report back if you find something good -- I'd be very interested.
    http://highscalability.com/
    http://www.allthingsdistributed.com/

  15. Not sure what you want to test. by funwithBSD · · Score: 3, Insightful

    Stress testing? Use LoadRunner or some other tool to simulate users.

    If you are using Java on Tomcat, BEA, or Websphere, use a product like PerformaSure to see a call tree of where your Java program is spending it's time. Sorts out how long each SQL takes too, and shows you what you actually sent. If you have external data sources, like SiteMinder, it will show that too.

    If you mean "What happens if we lose a bit of hardware" simulate the whole thing on VMware on a single machine and kill/suspend VMs to see how it reacts.

    Most importantly, MAKE SURE YOU MODEL WHAT YOU ARE TESTING. IF you are not testing a scaled up version of what users actually do, you have a bad test.

    --
    Never answer an anonymous letter. - Yogi Berra
    1. Re:Not sure what you want to test. by Anonymous Coward · · Score: 0

      Use LoadRunner

      I would, but I'm currently suspended in the air with 10 robots on my tail.

  16. Another option by stonecypher · · Score: 1

    You'll find that Erlang doesn't even blink at those volumes, and that Erlang's entire reason to exist is scalability/reliability. Granted, it's a little severe to pick up a new language, but the benefits are enormous, and it's one of those boons you can't really understand until you've learned it. It is, however, worth noting that transactions on an MNesia database in the multiple gigabytes are typically faster than PHP just invoking MySQL in the first place, let alone doing any work with it.

    Erlang is difficult to learn from what's on the web; consider starting with Joe's book.

    --
    StoneCypher is Full of BS
    1. Re:Another option by dawsdesign · · Score: 1

      Erlang is indeed the perfect candidate for your issues. http://erlang.org/

  17. Similar Question - Interviews by Greenisus · · Score: 1, Offtopic

    I have a question sort of along the same line. I interviewed for a position at a very large internet company, and one of their primary concerns was very high performance and scalability. I went through the phone interviews and then the in-person interviews, and I actually did quite well, and was even told that I did quite well. However, in the end, I was told that while I did well, they would have liked to see more experience with very large web applications (I've worked at smaller companies). So, how do I go about learning something I think I already know, and from your experience, was that not the real reason I was not accepted?
     
    Sorry this is a bit off-topic; I've just been dying to ask the slashdot community and this seems to be the most appropriate forum for the question.

    1. Re:Similar Question - Interviews by Anonymous Coward · · Score: 1, Informative

      There is a good chance that this was the reason you were not accepted. I work at a very similar firm to the one you describe, one anyone here would recognize, and we do reject people for the reason you mentioned. Basically, the problem is that unless someone does have experience with very large scale applications, we find that they have a pretty steep learning curve ahead of them. While many candidates think that they know how to build a scalable app, what worked for them on an application that has 100k users totally breaks down when there are 100 million users.

      Its very difficult to get that kind of knowledge/experience without having actually done it before. The way I got it was by being hired into a project which was a rewrite of a very large scale system, and I got hired right as that project was starting. This was a great way to make it up the learning curve without too much pain, because I got to hear from the team directly about what decisions they thought were wrong about the previous design, and got to participate in the design discussions for the next generation. The team was very experienced at this and the choices they made (and mistakes/bumps along the way) taught me a lot about how to build such an application.

  18. Languages, Libraries, Abstraction, Audience by DonRoberto · · Score: 2, Interesting

    From working on both academic and enterprise software designed specifically to scale, these are four things I've noticed are incredibly important to scalability:

    Languages - I recently saw a multi-million dollar product fail because of performance problems. A large part of it was that they wanted to build performance-critical enterprise server software, but wrote it mostly in a language that emphasized abstraction over performance, and was designed for portability, not performance. The language, of course, was Java. Before I get flamed about Java, the issue was not Java itself and alone, but part of it was indeed using a language not specifically designed for a key project objective: performance. The abstraction, I would argue, did the project worse than all the other peformance issues associated with bytecode however. Relevant books on this subject are everywhere.

    Libraries - Using other people's code (e.g. search software, DB apps, etc.) will always introduce scalability weaknesses and performance costs in expected and unexpected places. Haphazardly choosing what software to get in bed to can come back to bite you later. It is an occupational hazard, and each database product and framework and even hardware configuration has its own pitfalls. Many IT book on enterprise performance or even whitepapers and academic papers can provide more information.

    Abstraction - There is no free lunch. When you make things easier to code, you typically incure some performance penalty somewhere. In C++, Java, and most other high level languages, the sheer notion of modularity and abstraction eventually add so much hidden knowledge and code that developers either lose track of what subtle costs everything is incurring, or are suddenly put in a position where they can't go back and rewrite everything. Sometimes it is better to write a clean, low-level API and limit the abstraction eyecandy or it will come back to bite you. On the other hand, sometimes a poor low-level API is worse than a cleanly abstracted high-level API. In practive, few complex and performance-oriented systems are architected in very high level languages however. I have seen few books on this subject, and it is pure software engineering. Design patterns might help, however.

    Audience - Both clientelle and developer audiences make a big difference. Give an idiot a hammer with no instructions... and you get the point. Make sure your developers know what they're doing and what priorities are, and also design your interfaces and manuals in such a way as to keep scalability in mind. Why have a script perform a hundred macro operations when a well-designed API could provide better performance with a single call? This entails both HCI and project development experience.

    Wish I could suggest more books, but there's just too many.

    1. Re:Languages, Libraries, Abstraction, Audience by EastCoastSurfer · · Score: 4, Insightful

      Language - Doesn't matter much if you know how to design a scalable system. Some languages like Erlang force you into a more scalable design, but even then it's still easy to mess up. Unless this multi-million dollar project you're talking about was an embedded system I would bet language used was the smallest reason for bad performance. Although it is fun to bash java whenever the chance.

      Libraries - Bingo lets throw out nice blocks of tested and working code b/c it's always better to write it yourself. You pretty much have to use libraries to get things done anymore. And are you suggesting someone should write their own DB software when building a web app? Um, yeah see that web app ever gets done.

      Abstractions - While most are leaky at some point, abstractions make it easier for you to focus on the architecture (which is what you should be focusing on anyways when building scalable systems).

      I see these types of arguments all the time and they rarely make sense. It's like arguing about C vs. Java over 1ms running time difference when if you changed your algorithm you could make seconds of difference or if you changed your architecture you would make minutes of difference...

    2. Re:Languages, Libraries, Abstraction, Audience by Reverend528 · · Score: 1

      a language that emphasized abstraction over performance... The language, of course, was Java.

      When has abstraction ever been a strength of java? It has one fucking abstraction, and there are programmers out there who say that sun didn't even get that one right.

    3. Re:Languages, Libraries, Abstraction, Audience by plsander · · Score: 1

      This is where profiling and "throw the first one away" come into play.

      Use the OO and abstraction language and tools to design the application. Then look at the performance data and optimize those sections that are called frequently.

      Abstraction is great for design, maintenance, and upgrade/redesign (your application's requirements never stay static, right?)

    4. Re:Languages, Libraries, Abstraction, Audience by marcosdumay · · Score: 1

      Or the GP was completely wrong... or maybe he has just tighter resources than you.

      All things he said are usefull to improve performance, and can lead to errors that will decrease said performance if you are not carefull enough. Of course, if your performance hits are due to gross architectural errors, you shouldn't even think on looking into them.

    5. Re:Languages, Libraries, Abstraction, Audience by Anonymous Coward · · Score: 0

      Although it is fun to bash java whenever the chance. ...

      Oh go fuck yourself you hyper-sensitive little crybaby. Jeeze Louise you Java fanboys are such pussies. GP didn't say "Java sucks", he said it was designed for portability, not performance. If performance happens to be a higher priority than portability for a given project, then pick a language that aligns with your priorities. Common sense. But you were so busy getting yourself offended, you didn't understand this or anything else GP wrote. S/he didn't say don't use libraries, just warned that sometimes they have unexpected performance and scalability costs. S/he didn't say don't use abstractions, just warned that using layers upon layers of abstractions (to an extreme that is not untypical of Java projects, I might add) can also exact a performance penalty.

      Peel the lips of your personal identity off the cornhole of the Java religion for a sec, it's not absolutely perfect for everything, everywhere, everytime. And that's okay. Java won't melt away as a result and take with it your sole reason for being and force you to find another sensation to base your entire view of the meaning of life on. It'll survive, and so will you. After you've somewhat gotten over all the awful heresies around you, you're then in a position to try prying open your eyes a little wider and notice that besides the 1% on each end of the black-to-white scale, there's a whole other 98% in the middle that you're missing. The problem with being an extremist is that it makes you perceive everyone else as one, so you end up not actually comprehending what's going on around you. Tie your knees down so that they can't jerk your brain off to kookoo land, and try reading it again. The advice given was merely just to stop and think about things and consider things, and not just reach for what you've always used and do it in the way you've always done or heard or read that it's done. For you in particular, stop sucking on the Java dick for long enough to think about things. I swear you people use it as a baby pacifier, and just sit there sucking away in that Java trance, cocooned from the real world and having to bother with thinking about real world issues.

    6. Re:Languages, Libraries, Abstraction, Audience by EastCoastSurfer · · Score: 2, Insightful

      Haha...actually I do dislike java and try to avoid it when I can. What I was really ripping on originally is how poor program performance is always looked at as a language issue first. Picking the right language for the job is of course important, but program design, algorithms used, and overall architecture will always be much bigger factors in a programs eventual performance.

    7. Re:Languages, Libraries, Abstraction, Audience by try_anything · · Score: 1

      A large part of it was that they wanted to build performance-critical enterprise server software, but wrote it mostly in a language that emphasized abstraction over performance, and was designed for portability, not performance. The language, of course, was Java.

      On the large scale, performance is as much about reliability, portability, design, and refactorability as it is about benchmarks.

      Reliability means you're free to spend more time thinking about design and rewriting code. Checked exceptions are a huge boon for making sure modules fail when they should and don't fail when they shouldn't. (Well, it's not exact, but it's a hell of a lot better than C and C++, where modules are typically heavily biased towards "don't fail at all, no matter what happens" or "die at the first hiccup to avoid corrupting data.") Time spend debugging core dumps and de-corrupting databases is time that could have been spent on performance testing and redesign.

      Portability means you can drop a huge Sun box into your previously Linux-only server room and not worry about subtle bugs caused by the differences between Posix functionality in Linux and Solaris.

      Design and refactorability are two sides of the same coin. It's easy to design APIs in any language that simply cannot be implemented efficiently. For an example of a poorly designed Java API that hurts performance for everyone who uses it, see this Google tech talk at 36:39. Note that the title of the slide is, "Effects of API Design Decisions on Performance are Real and Permanent." He's talking about externally published APIs, but internal APIs can end up being permanent, too, if other developers can't afford to let you muck with APIs or frameworks.

      So, what determines when you're stuck with your mistakes and when you can fix them? First, it's important to discover mistakes early, so anything that helps you cobble a working system together quickly is a good thing. This isn't Java's strong suit unless you're comparing it to C or C++. Second, the less work and less risk is involved in adapting existing code to the changes, the more likely you'll be able to fix things. Java is an absolute miracle in this regard because of the tools and practices. Some languages have been around far longer than Java, and some languages are probably better designed for tool support than Java, but the combination of language simplicity and massive popularity has given Java an unbeatable array of tools. If you want unit testing, continuous integration, refactoring tools, mock objects, hands-off deployment (for automatic integration testing), and so forth, you can go to Barnes and Noble and pick up, um, approximately three moderately sized and priced paperbacks. Then it's a few months to set up and learn the tools. (It only takes a few weeks, really, since any reasonably experienced Java developer is already familiar with JUnit and refactoring IDEs.) With any other language you'd have to roll most of that yourself. Unless you're willing to make that huge commitment to building infrastructure, you have to be really, really picky about making large-scale changes, so most of the time you either stick with the first idea you implemented or suffer delays.

      (Now, lots of languages *could* support tools well. A language's user base and culture count for a lot as well. Just post a question on comp.lang.lisp and ask about continuous integration or some-such, and some guy will reply about half an hour later with fifty lines of Common Lisp that does exactly what you asked for. Hey, that's great, but it takes years to get from that point to the point of Joe Average picking up a paperback that tells him how to download and configure a complete solution with a web interface, email alerts, and SCM integration. Lispers don't particularly care whether t

  19. HA is not load balancing by PlatinumRiver · · Score: 1

    This is a broad topic, but I would say begin by identifying your single points of failure. You can then research setting up HA solutions for each of those resources. Also, understand the difference between high-availability and load balancing. Just because your database is fault-tolerant, it does not necessarily mean it can scale to cope with increased traffic.

    Draw a high level map of your application and all the server/network resources it uses. Take each one of those components and analyze them for load balancing and fault tolerance. Any single component failure should not affect the overall uptime of the application. Part of a high-availability system is having proper monitoring and notification tools in place. It takes a lot to make a high availability environment work and some of it is not engineering related, but business process related. If your servers are in a data center and a database server goes down, yet your notification system sends an email to a database developer who works 9am to 5pm (maybe on vacation) alerting him/her of the issue... You can see how this can lead to problems. Proper health checks, escalation paths, etc. are all part of making your system work.

    My $0.02.

  20. Erlang is very cool, CouchDb uses it by dominux · · Score: 1

    http://www.couchdb.com/ is a distributed replicable non-relational database written in Erlang. It is a very clever system and I was impressed with the language choice of the developer.

  21. Use your strengths by Tablizer · · Score: 1

    Often there are a narrow set of query/reports types are the most common and consume the most resources. Perhaps consider making a nightly customized copy of a view(s) via batch tech that fits that frequent need well and put it on a separate server. This will not only speed up the common need, but also the other queries since their server load is lightened. In general, also make sure your indexing is designed well. In other words, put indexes where they are needed but don't put unnecessary ones. Study the usage needs carefully.

  22. Slightly off topic by Gazzonyx · · Score: 2, Insightful

    I keep hearing about Erlang being the next greatest thing since sliced bread... unfortunately, I don't have time to look into it too much. Could someone give me an 'elevator' pitch on what makes it so great for threading? Is it encapsulation based objects, a thread base class, or what? How does it handle cache coherency on SMP?

    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

    1. Re:Slightly off topic by Lord+Grey · · Score: 1

      Erlang's Wikipedia entry is longer than an elevator pitch, but has some decent information. Erlang's primary site is here.

      --
      // Beyond Here Lie Dragons
    2. Re:Slightly off topic by stonecypher · · Score: 3, Informative

      It's not something you can cram into an elevator pitch; erlang is an entirely different approach to parallelism. If you know how mozart-oz, smalltalk or twisted python work, you've got the basics.

      Basically, processes are primitives, there's no shared memory, communication is through message passing, fault tolerance is ridiculously simple to put together, it's soft realtime, and since it was originally designed for network stuff, not only is network stuff trivially simple to write, but the syntax (once you get used to it) is basically a godsend. Throw pattern matching a la Prolog on top of that, dust with massive soft-realtime scalability which makes a joke of well-thought-of major applications (that YAWS vs Apache image comes to mind,) a soft-realtime clustered database and processes with 300 bytes of overhead and no CPU overhead when inactive (literally none,) and you have a language with such a tremendously different set of tools that any attempt to explain it without the listener actually trying the language is doomed to fall flat on its face.

      In Erlang, you can run millions of processes concurrently without problems. (Linux is proud of tens of thousands, and rightfully so.) Having extra processes that are essentially free has a radical impact on design; things like work loops are no longer nessecary, since you just spin off a new process. In many ways it's akin to the unix daemon concept, except at the efficiency level you'd expect from a single compiled application. Every client gets a process. Every application feature gets a process. Every subsystem gets a process. Suddenly, applications become trees of processes pitching data back and forth in messages. Suddenly, if one goes down, its owner just restarts it, and everything is kosher.

      It's not the greatest thing since sliced bread; there are a lot of things that Erlang isn't good for. However, what you're asking for is Erlang's original problem domain. This is what Erlang is for. I know, it's a pretty big time investiture to pick up a new language. Trust me: you will make all your time back in writing far shorter, far more obvious code than you did in learning the language. You can pick up the basics in 20 hours. It's a good gamble.

      Developing servers becomes *really* different when you can start thinking of them as swarms.

      --
      StoneCypher is Full of BS
    3. Re:Slightly off topic by Anonymous Coward · · Score: 0

      In Erlang, you can run millions of processes concurrently without problems. (Linux is proud of tens of thousands, and rightfully so.) But that's not a valid comparison - they're not *real* processes.
    4. Re:Slightly off topic by stonecypher · · Score: 1

      And you get that idea from what, exactly?

      --
      StoneCypher is Full of BS
    5. Re:Slightly off topic by poopdeville · · Score: 1

      Think functional language (which gives you concurrency for free) with very strong OO support (using the Actors model).

      --
      After all, I am strangely colored.
    6. Re:Slightly off topic by nuzak · · Score: 1

      > But that's not a valid comparison - they're not *real* processes.

      As far as erlang is concerned, they are. And if you want to glue erlang to the outside world without using an FFI, you can run erlang nodes in Java or Emacs Lisp (no kidding), and you can communicate with a node from the shell or even run erlang code on an ad-hoc node on the fly with erl_call.

      --
      Done with slashdot, done with nerds, getting a life.
    7. Re:Slightly off topic by TheRaven64 · · Score: 3, Informative
      Erlang does some things very well. For good parallel programming, you need a language that enforces one rule: data may be mutable or aliased, but not both. If you don't have this rule, then debugging complexity scales with the factorial of the degree of parallelism. Erlang does this by making processes the only mutable data type. There are a few things that make it a nice language beyond that:
      • Very low overhead processes; creating a process in Erlang is only slightly more expensive than making a function call in C.
      • Higher order functions.
      • Pattern matching everywhere (e.g. function arguments, message receiving, etc). If you've want two different behaviours for a function depending on the structure of the data that it is passed (e.g. handlers for two different types of packet, with different headers) you can write two version of the function with a pattern in the argument.
      • Guard clauses on functions, lets you implement design-by-contract and also lets you separate out validation of arguments from the body of a function, giving cleaner code.
      • Simple message passing syntax, with pattern matching on message receive for out-of-order retrieval.
      • Asynchronous message delivery; very scalable.
      • Lists and tuples as basic language primitives.
      • Gorgeous binary syntax. I've never seen a language as good as Erlang for manipulating binary data.
      • Automatic mapping of Erlang processes to OS threads, allowing as many to run concurrently as you have CPUs.
      • Network message delivery, allowing Erlang code with only slight modifications to send messages over the network rather than to local processes (the message sending code is the same, only how you acquire the process reference is different).
      There are also a few down sides to the language:
      • The preprocessor is even worse than C's, so metaprogramming is hard (and badly needed; patterns like synchronous message sending or futures require a lot of copy-and-pasting).
      • Implementing ADTs is ugly (but no worse than C).
      • Variables are single static assignment, which is a cheap cop-out for the compiler writer and makes code convoluted at times.
      • Message sending and function call syntax is very different for no good reason. You are meant to wrap exposed (public) messages in function, which makes things even more messy.
      • Calling code in other languages is a colossal pain.
      • The API is inconsistent (e.g. some modules manipulating ADTs take the ADT as the first argument, some take it as the last).
      Erlang is a great language for a lot of tasks, particularly servers, but it's not suited for everything.
      --
      I am TheRaven on Soylent News
    8. Re:Slightly off topic by dawsdesign · · Score: 1

      Are you ever in #erlang on freenode?

    9. Re:Slightly off topic by stonecypher · · Score: 1

      No. I left in disgust because of the behavior of several regulars. I started #erlang on blitzed, but its population (so far) is modest. (That will change soon, though I decline to explain why.)

      --
      StoneCypher is Full of BS
    10. Re:Slightly off topic by stonecypher · · Score: 1
      Uh.

      Implementing ADTs is ugly (but no worse than C).
      Whereas the current state of things leaves a lot to desired, it's nowhere near as bad as C; that's why ETS and DETS are effectively polymorphisms of one another.

      metaprogramming is hard (and badly needed; patterns like synchronous message sending or futures require a lot of copy-and-pasting).
      I see no reason that synchronous message passing needs any cut and pasting; why not just make a module to implement it? As far as futures, frankly I think they're misguided; allowing the compiler to wait for a return and to keep going is cute and all, but the only latency that prevents is the latency in naïve design, and doing the checking per-variable per-access introduces enormous overhead. Erlang is a soft-realtime language; you cannot implement futures without trashing all sorts of very important guarantees.

      Variables are single static assignment, which is a cheap cop-out for the compiler writer
      No, it isn't. This has a lot to do with functional language optimization, and the perceived "correctness" of side effects. The language's original designer, Joe Armstrong, takes the position that mutability is a bad idea, that it leads to bugs, and so on; that's why, in his book, he tries really hard to discourage the use of the process dictionary. The language may not strictly forbid side effects, but immutability is a hell of a step towards it, without screwing around with implementing sockets and dictionaries as monads, or what have you. Mind you, I disagree with the choice, but that doesn't mean that it's a cheap cop-out. They chose to do that for a reason which they believed had to do with bug resistance and reliable design.

      Message sending and function call syntax is very different for no good reason.
      You think the syntax for sending a message to a process should necessarily be similar? They're not particularly related.

      You are meant to wrap exposed (public) messages in function, which makes things even more messy.
      I have a hard time understanding this belief. Please explain further.

      Calling code in other languages is a colossal pain.
      This is just downright false. You can load .DLLs or .sos directly into the virtual machine. You can use ports (which C calls stdin/stdout.) You can use sockets. You can use unix named pipes. You can use externalised message passing. Hell, Erlang even ships with a library to load your application as a fully fledged node in the Erlang network, which only takes five lines of code. I have a hard time imagining a language being easier to integrate into other languages, and I'm an experienced developer who speaks more than 40 languages, including several glue languages.
      --
      StoneCypher is Full of BS
    11. Re:Slightly off topic by Dan+Farina · · Score: 1

      Not to nit-pick (okay, well, only a little) this 300 bytes thing floating around is in error; the correct figure is 318 machine words, as seen at the erlang efficiency guide. Still a pittance, but quite a bit larger than 300 bytes on most architectures.

    12. Re:Slightly off topic by stonecypher · · Score: 1

      Actually, that's a very significant difference to me. Thank you for pointing out that error. I will amend it when speaking in the future. (You just saved me some egg-on-face in a future project; I appreciate it.)

      --
      StoneCypher is Full of BS
  23. The unfortunate thing about databases by PhrostyMcByte · · Score: 3, Insightful

    Is that most of them have poor native APIs when it comes to scalability. Some of them have something like

    handle = query("SELECT...");
    /*do something*/
    result = wait(handle);

    But that is far from optimal. When will they be smart and release an async API that notifies you via callback when complete? This would be very useful for apps that need maximum scalability.

    Microsoft's .NET framework is actually a great example of doing the right thing - it has these types of async methods all over the place. But then you have to deal with cross-platform issues and problems inherent with a GC.

    It's not that much different for web frameworks either. None that I've tried (RoR, PHP, ASP.NET) have support for async responding - they all expect you to block execution should you want to query a db/file/etc. and just launch boatloads of threads to deal with concurrent users. I guess right now with hardware being cheaper it is easier to support rapid development and scale an app out to multiple servers.

    1. Re:The unfortunate thing about databases by Khazunga · · Score: 4, Informative

      Most databases have async APIs. Postgresql and mysql have them in the C client libraries. Most web development languages, though, do not expose this feature in the language API, and for good reason. Async calls can, in rare cases, be useful for maximizing the throughput of the server. Unfortunately, they're more difficult to program, and much more difficult to test.

      High scale web applications have thousands of simultaneous clients, so the server will never run out of stuff to do. Async calls have zero gain in terms of server throughput (requests/s). It may reduce a single request execution time, but the gain does not compensate the added complexity.

      --
      If at first you don't succeed, skydiving is not for you
    2. Re:The unfortunate thing about databases by M.+Baranczak · · Score: 1

      Interesting.

      Do you think that non-blocking IO really offers enough performance gains to compensate for the resulting spaghetti code? This isn't a rhetorical question, I'm really curious.

    3. Re:The unfortunate thing about databases by Jhan · · Score: 1

      Why the spaghetti?

      Troubling blocking IO code in C++:ish pseudo:

      result1 = doIo(foo); // Blocking IO, wait 5s
      ... // Maybe do other things
      result2 = doIo(bar); // Blocking IO, wait 5s
      ... // Maybe do other things
      // Result 1 and 2 are first used here
      print("Result was "+resullt1+"&"+result2);
      // Min 10s to get here

      So, add this object

      class Unblocker {
      __operationHandle operationInProgress;
      __Unblocker(parameter) {
      ____operationInProgress = sendIo(parameter); // Non-blocking version of doIo above, as provided by API
      __}
      __Result result() {
      ____return waitIo(operationInProgress); // API-call to block until IO completes and return result, as provided by API
      __}
      }

      ...and the result is a very similar program:

      Unblocker unresult1(foo); // doIo call has been unblocked into sendIo call
      ... // Maybe do other things
      Unblocker unresult2(bar); // same
      ... // Maybe do other things
      // Result 1 and 2 are first used here
      print("Result was "+unresult1.result()+"&"+unresult2.result());
      // Min 5s to get here

      50% speed-up! Not shabby!

      --

      I choose to remain celibate, like my father and his father before him.

    4. Re:The unfortunate thing about databases by Anonymous Coward · · Score: 0

      I believe the pattern/technique you are referring to is often called "IOU" or "futures" and can, indeed, be very useful when doing asynchronous programming. There really is no need for "spaghetti" code (as you rightly point out). And, with a decent library in place, most anyone can use async calls safely.

      Rogue Wave has a nice, well thought out, API for C++ which you might find of interest: http://www.roguewave.com/support/docs/leif/sourcep ro/html/protocolsug/2-4.html

      This technique can be used not only with multi-threading, but also with any async-based API (such as OS and DB calls). I mention this mainly due to the fact that sometimes people forget that techniques used in one domain of programming often are applicable in others with slight adjustment. Of course, that's not always the case :-). But here, it seems to be so.

    5. Re:The unfortunate thing about databases by PhrostyMcByte · · Score: 1

      The only async apis they have are like the example I gave before. These are sub-optimal!

      It's true that with a single server handling 10, 100, 200 RPS, the stupid threaded model will likely not make a big difference in _throughput_. It will make a MASSIVE difference in CPU/RAM usage though, and let you easily scale up to 10000 RPS on commodity hardware using just a single thread. Some people like to maximize their hardware usage.

      And async is certainly not much more difficult - it's a new way of thinking, sure, something new to learn. But it's not really that difficult! Compare the two:

      /* threaded */
      int res = send(sock, buf, size, 0);
      if(res != -1) {
      /* do something */
      }
      else {
      /* handle error */
      }

      /* async */
      send(sock, buf, size, 0, on_finish, ctx);

      void on_finish(int res, void *ctx) {
      if(res != -1) {
      /* do something */
      }
      else {
      /* handle error */
      }
      }
    6. Re:The unfortunate thing about databases by bidule · · Score: 1

      Is that most of them have poor native APIs when it comes to scalability. Some of them have something like

      handle = query("SELECT..."); /*do something*/
      result = wait(handle);
      But that is far from optimal. When will they be smart and release an async API that notifies you via callback when complete? This would be very useful for apps that need maximum scalability.
      I don't understand what is wrong with that.

      Are you unhappy about the /*do something*/ part, because you'd want the handle released early to minimize this process impact?

      Are you unhappy about the wait() call, because you don't want to code and manage your own threads to handle the call asynchronously?
      --
      ID: the nose did not occur naturally, how would we wear glasses otherwise? (apologies to Voltaire)
    7. Re:The unfortunate thing about databases by Jhan · · Score: 1

      Is that most of them have poor native APIs when it comes to scalability. Some of them have something like
      handle = query("SELECT...");
      /*do something*/
      result = wait(handle);
      </ecode
      But that is far from optimal. When will they be smart and release an async API that notifies you via callback when complete? This would be very useful for apps that need maximum scalability.
      </i></blockquote>
      <p>Since you seem to maybe be talking circumspisciously about Java:
      <ecode>
      final Foo objectToNotify = theFoo;
      Thread.run(new Runnable() {
      public void run() {
      handle = query("SELECT...");
      /*do something*/
      result = wait(handle);
      objectToNotify.notify(result);
      }
      });

      The same can be achieved in any language with threads. Are you asking for callbacks in a pure C environment?

      --

      I choose to remain celibate, like my father and his father before him.

    8. Re:The unfortunate thing about databases by nuzak · · Score: 1

      I would be very surprised to see a database that didn't offer per-row callback functions in its call level API -- even SQLite has them. I won't hazard any guesses about MySQL, googling for the subject turned up too much PHP noise to be conclusive.

      I actually find async programming with a good API to be easier, because everything's an event, and you don't have to design the flow of control of everything else around constantly returning to poll for results, or deal with the locking and race conditions if you do it threaded. Mind you, that's "with a good API", and those are far between.

      --
      Done with slashdot, done with nerds, getting a life.
    9. Re:The unfortunate thing about databases by PhrostyMcByte · · Score: 1

      see this comment for more along the lines of what I'm talking about.

      I'm unhappy about the wait() call because it doesn't lend itself to fully async coding - if you've got nothing to do in that context, you're stuck blocking the thread when it could be doing other things. So now you have to waste CPU on context switches and waste RAM on state for a new thread.

      A good callback-based API doesn't have these deficiencies. You just call a function to dequeue completion callbacks, from however many threads you'd like. You're never stuck blocking when there is work to be done, and you don't waste time polling needlessly.

    10. Re:The unfortunate thing about databases by Anonymous Coward · · Score: 0

      Sorry but I fail to see how asynchronous calls on a client to fetch results have anything to do with more scalability on the server. How does doing that change the actual load on the server? You say the server needs to keep more threads alive when the clients block, but I'd think that you would at least need to start the same number of threads in the asynchronous method of responding to a client. I always imagine a solution for more scalability would be in a middle layer (tier if you will ;-), which caches the results of common queries. But even then IMHO ultimately scaling is the task of the DBMS and a middle tier is a solution to a problem on the server.

    11. Re:The unfortunate thing about databases by Jhan · · Score: 1

      What I ment to say was of course:

      Since you seem to maybe be talking circumspectedly about Java:

      final Foo objectToNotify = theFoo;
      Thread.run(new Runnable() {
      __public void run() {
      ____handle = query("SELECT...");
      ____/*do something*/
      ____result = wait(handle);
      ____objectToNotify.notify(result);
      __}
      });
      --

      I choose to remain celibate, like my father and his father before him.

    12. Re:The unfortunate thing about databases by Topherbyte · · Score: 0

      while not completely async, PHP does offer mysql_unbuffered_query, which does not block execution. However, this puts the onus on the database to return rows as quickly as the script tries to fetch them, and if the script isn't fetching rows fast enough, the database may keep the table locked, to the detriment of other users who may be trying to perform an UPDATE.

      Queries are usually preferable to reading files, IMHO, as the database gives you a finer level of granularity to tune via cache, indexes, etc, as well as complex ways in which to handle data. The lower overhead of reading files make sense for static data, such as page headers and footers.

      As in all multi-tiered development, it's about getting the balance right.

    13. Re:The unfortunate thing about databases by fittekuk · · Score: 1

      That is simply completely incorrect. All major databases support this. Oracle, Sybase, and Informix for sure. Hell, I was doing that with Sybase DBAPI 10 years ago.

    14. Re:The unfortunate thing about databases by Anonymous Coward · · Score: 0

      That is quite elegant. I'm sure all the mutii and locking mechanisms you've placed in ctx are as well.

    15. Re:The unfortunate thing about databases by Kenneth+Stephen · · Score: 2, Interesting

      I'm afraid the parent post is an example of not seeing the forest because of all the trees.....

      Application code should never ever be aware of deployment issues. Making it aware of such things a sure way to ensure nightmares when your environment changes. For example, lets say you have to send mail. You could take the option of always talking to localhost under the assumption that your app will always be deployment on a machine with a mail server. But consider the case when the app is taken and deployed in a production environment with a firewall around it, and to send mail, you have to send mail to another system. Your app breaks. The right way to do this is to externalize the existence of a mail server into some properties / config file that gets updated at application deployment

      This is so fundamental, that it seems obvious. Lets apply this philosophy to the case at hand: the application should never ever have to know whether there is a failover server / hot standby / cluster in place or not. It should just assume that its going to execute a statement and if there is a failure, the transaction will rollback. Whether the transaction will error out and rollback depends on the properties of the environment. For example, DB2 can do clustering / HADR (high-availability data replication). And on AIX, you have server clustering solutions like HACMP, transaction checkpointing and partition mobility, and a whole host of other technologies which can intervene to not cause the application to fail in case of a database / database server failure.

      If a database server ever introduces an API for making applications aware of failover issues, thats a sure sign that the database architects are asleep at the wheel.

      --

      There is no such thing as luck. Luck is nothing but an absence of bad luck.

    16. Re:The unfortunate thing about databases by Fulcrum+of+Evil · · Score: 1

      I'm unhappy about the wait() call because it doesn't lend itself to fully async coding - if you've got nothing to do in that context, you're stuck blocking the thread when it could be doing other things. So now you have to waste CPU on context switches and waste RAM on state for a new thread.

      So? Yuo can spend a little bit of cpu time and run more threads or, since the load on the DB is likely to be the bottleneck, get more boxes. Hardware is cheap. Debug time is not, and async programming is harder than sync programming.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    17. Re:The unfortunate thing about databases by PhrostyMcByte · · Score: 1

      did you reply to the wrong comment? async has nothing to do with deployment or failover, it's just a different method for being notified when an operation is done.

    18. Re:The unfortunate thing about databases by Anonymous Coward · · Score: 0

      since the load on the DB is likely to be the bottleneck
      I think you've hit the nail on the head with that statement. Application servers are easy to scale out. Just stick any decent sticky load-balancing solution in front of what can be a really large number of fairly inexpensive servers. DBs are harder to scale. Replication solutions do work somewhat, but they introduce their own set of headaches, especially when you replicate to many DB servers.

      Much like the cost of bugs increasing geometrically based on how early they're found, application logic gets increasingly more expensive the closer it gets to your DB layer. If you can do it in the user's browser without the app feeling slow, that processing power scales infinitely. The next best option is to do it on the app servers. The last resort should be doing it in the database.
    19. Re:The unfortunate thing about databases by Fulcrum+of+Evil · · Score: 1

      you can also do things like separate objects that are heavily used into their own application, whose only job is to server copies of that object and update the database on behalf of the app servers. this allows you to bring some of the horizontal scaling to the DB, as you can do things like partition your dataset and cache reads.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    20. Re:The unfortunate thing about databases by Khazunga · · Score: 1

      The only async apis they have are like the example I gave before. These are sub-optimal!
      Here you go...

      It will make a MASSIVE difference in CPU/RAM usage...
      No it won't. It will make zero difference in CPU usage. It will make a difference in RAM usage proportional to the relation between the time to process a request synchronously and the same time asynchronously. Given that, in most applications, this relation approaches 1, the gain is minuscule.

      And async is certainly not much more difficult - it's a new way of thinking, sure, something new to learn. But it's not really that difficult!
      It's way more difficult to test. You don't know which of your code will be executing when you receive the response from the database, so you can end up with the typical challenges of parallel programming: race conditions, need for locks, time-dependent bugs.
      --
      If at first you don't succeed, skydiving is not for you
    21. Re:The unfortunate thing about databases by Anonymous Coward · · Score: 0

      ASP.NET does support async pages, although not many people that I've seen know about them and use them.

      http://msdn.microsoft.com/msdnmag/issues/07/03/Wic kedCode/

    22. Re:The unfortunate thing about databases by crucini · · Score: 1

      Async calls have zero gain in terms of server throughput (requests/s). It may reduce a single request execution time, but the gain does not compensate the added complexity...

      Agreed that it cannot increase throughput; it could substantially reduce latency when two or more independent SQL queries are needed. Or if you need at least one SQL query and another synchronous backend call - say to a proprietary db:
      1. Asynch SQL: select * from user ... (takes t1 milliseconds)
      2. Get session from proprietary db (takes t2 milliseconds)
      3. Wait for asynch call to join

      In that scenario, total time is max(t1, t2) instead of t1 + t2.
      Are you sure the added complexity is that bad? Of course it does conflict a bit with object-relational mappers and similar paraphernalia.
    23. Re:The unfortunate thing about databases by crucini · · Score: 1

      It's way more difficult to test. You don't know which of your code will be executing when you receive the response from the database...

      In the general case, I agree. I prefer multi-process or select(2) to multi-threaded after watching some debugging nightmares. In this case I believe we can constrain which code is running during the concurrent portion; in fact, without much loss of performance, we can use a "great big lock" within the concurrent region.

      From the app programmer's view, the flow would be:
      1. Get HTTP (or other) request
      2. Create concurrent-request object C
      3. Add requests to C, including:
        • SQL queries
        • Back-end HTTP requests
        • dbm queries

      4. C.launch()
      5. C.wait(timeout) - wait until all responses are received, or timeout ms have elapsed
      6. Handle responses

      Note that concurrency only exists between launch and wait; therefore it is only an issue for the platform programmer, not the app programmer (who never has to deal with concurrency). Also, it can be solved with a coarse lock, and the only penalty will be a slight slowdown if two back-end processes respond at the same time.

      Am I missing something?
    24. Re:The unfortunate thing about databases by Khazunga · · Score: 1

      No, you're not missing anything. The flow you propose does balance ease of development/debugging with the ability to parallelize query execution. This is, in fact, the architecture I use when pages depend on multiple webservice calls, which usually exhibit a much longer wall-clock execution time than db calls. I just question the amount of parallel database queries that can be executed in most web applications. Most of them are quite simple, and all data can be retrieved in less than three queries, so the gain is not so large.

      Anyhow, this is the kind of optimization I'd keep an eye for, but wouldn't introduce in an application without a profiler showing that I need it.

      Full async, which was what I thought you were proposing, is still something I will steer clear of if at all possible. Time-dependent bugs are a bitch.

      --
      If at first you don't succeed, skydiving is not for you
  24. Great Book by wonko_el_sano · · Score: 1

    Scalable Internet Architectures by Theo Schlossnagle http://www.amazon.com/Scalable-Internet-Architectu res-Developers-Library/dp/067232699X.

    Great tutorial at OSCON about it too.

    But the biggest piece of advice is to never guess about where things are slow. Measure them and then fix the slow parts. Don't change a thing until you've benchmarked it.

  25. Availability Isn't Scalability by John_Booty · · Score: 2, Insightful

    From what I have read in industry papers and from conversations with friends, the apps I have worked on just don't address scaling issues. Our maximum load during typical usage is far below the maximum potential load of the system, so we never spend time considering what would happen when there is an extreme load on the system.

    Is it just me, or is the question hopelessly confused? He's using the term "availability" but it sounds like he's talking about "scalability."

    Availability is basically percentage of uptime. You achieve that with hot spares, mirroring, redundancy, etc. Scalability is the ability to perform well as workloads increase. Some things (adding load-balanced webservers to a webserver farm) address both issues, of course, but they're largely separate issues.

    The first thing this poster needs to do is get a firm handle on exactly WHAT he's trying to accomplish, before he can even think about finding resources to help him do it.

    --

    OtakuBooty.com: Smart, funny, sexy nerds.
  26. Define your thread's purposes by srealm · · Score: 5, Informative

    I've worked in multiple extremely super-scaled applications (including ones sustaining 70,000 connections at any one time, 10,000 new connections each minute, and 15,000 concurrent throttled file transfers at any one time - all in one application instance on one machine).

    The biggest problem I have seen is people don't know how to properly define their thread's purpose and requirements, and don't know how to decouple tasks that have in-built latency or avoid thread blocking (and locking).

    For example, often in a high-performance network app, you will have some kind of multiplexor (or more than one) for your connections, so you don't have a thread per connection. But people often make the mistake of doing too much in the multiplexor's thread. The multiplexor should ideally only exist to be able to pull data off the socket, chop it up into packets that make sense, and hand it off to some kind of thread pool to do actual processing. Anything more and your multiplexor can't get back to retrieving the next bit of data fast enough.

    Similarly, when moving data from a multiplexor to a thread pool, you should be a) moving in bulk (lock the queue once, not once per message), AND you should be using the Least Loaded pattern - where each thread in the pool has its OWN queue, and you move the entire batch of messages to the thread that is least loaded, and next time the multiplexor has another batch, it will move it to a different thread because IT is least loaded. Assuming your processing takes longer than the data takes to be split into packets (IT SHOULD!), then all your threads will still be busy, but there will be no lock contention between them, and occasional lock contention ONCE when they get a new batch of messages to process.

    Finally, decouple your I/O-bound processes. Make your I/O bound things (eg. reporting via. socket back to some kind of stats/reporting system) happen in their own thread if they are allowed to block. And make sure your worker threads aren't waiting to give the I/O bound thread data - in this case, a similar pattern to the above in reverse works well - where each thread PUSHING to the I/O bound thread has its own queue, and your I/O bound thread has its own queue, and when it is empty, it just collects the swaps from all the worker queues (or just the next one in a round-robin fashion), so the workers can put data onto those queues at its leisure again, without lock contention with each other.

    Never underestimate the value of your memory - if you are doing something like reporting to a stats/reporting server via. socket, you should implement some kind of Store and Forward system. This is both for integrity (if your app crashes, you still have the data to send), and so you don't blow your memory. This is also true if you are doing SQL inserts to an off-system database server - spool it out to local disk (local solid-state is even better!) and then just have a thread continually reading from disk and doing the inserts - in a thread not touched by anything else. And make sure your SAF uses *CYCLING FILES* that cycle on max size AND time - you don't want to keep appending to a file that can never be erased - and preferably, make that file a memory mapped file. Similarly, when sending data to your end-users, make sure you can overflow the data to disk so you don't have 3mb data sitting in memory for a single client, who happens to be too slow to take it fast enough.

    And last thing, make sure you have architected things in a way that you can simply start up a new instance on another machine, and both machines can work IN TANDEM, allowing you to just throw hardware at the problem once you reach your hardware's limit. I've personally scaled up an app from about 20 machines to over 650 by ensuring the collector could handle multiple collections - and even making sure I could run multiple collectors side-by-side for when the data is too much for one collector to crunch.

    I don't know of any papers on this, but this is my experience writing extremely high performance network apps :)

    1. Re:Define your thread's purposes by stormeru · · Score: 0

      Hey, did you wrote Napster? :)

    2. Re:Define your thread's purposes by Anonymous Coward · · Score: 0

      How would the least loaded pattern compare to instead allowing threads to perform work-stealing on CAS-based data structures? Or do you instead combine the two approaches?

    3. Re:Define your thread's purposes by cshay · · Score: 1

      Excellent post. Thank you.

    4. Re:Define your thread's purposes by rastoboy29 · · Score: 1

      Nice! Thanks for posting.

    5. Re:Define your thread's purposes by RAMMS+EIN · · Score: 1

      ``I've worked in multiple extremely super-scaled applications (including ones sustaining 70,000 connections at any one time, 10,000 new connections each minute, and 15,000 concurrent throttled file transfers at any one time - all in one application instance on one machine).''

      I'm interested in what software (OS, programming language, etc.) you used. Would you tell me?

      --
      Please correct me if I got my facts wrong.
    6. Re:Define your thread's purposes by srealm · · Score: 1

      This was on Linux, using C++ - and very little in the way of libraries. We DID of course have to modify some system settings (like max FD's per process, etc). The socket library we used was custom written, and everything was non-blocking.

      We used Berkeley DB for a data repository (read-only, most of this data loaded in memory where we could afford it - or at least partially loaded in memory to speed up accessing). The files were mirrored from a central repository, and we had some Small File Caching ability so that frequently sent files less than a certain size would be loaded into memory and LUA'd out.

      The app WAS serving up P2P files, and even had some logic in there to do things like restrict the bandwidth system-wide. I am currently working on re-writing some of the stuff I wrote for this application in open-source library form (still C++). But that is still a work in progress.

    7. Re:Define your thread's purposes by Anonymous Coward · · Score: 0

      Just out of curiosity... what protocol did you use for connections? tcp itself can address 2^16 ports so it's less than your 70.000 at any given time...

  27. multiple live servers by Anonymous Coward · · Score: 0
    If possible, try to have multiple live servers instead of fail-over. As Scott Courtney of VeriSign (who run the .COM DNS infrastructure) said in a talk recently:

    What we prefer is all active equipment. I don't want spares sitting in racks, I don't want spare sites if I can help it. Active to everything. Nothing like production traffic to flush out an issue.


    http://video.google.com/videoplay?docid=-552524691 9548243924#26m20s
    1. Re:multiple live servers by Anonymous Coward · · Score: 0

      There's a couple problems with using all equipment all the time: if load failure or timeouts are an issue with your application then increased load on single component failure can cause failure, synchronization of a live view can have a response time cost and a complexity cost (development time and uncaught bugs).

      My suggestion to the article submitter would be to do a google search on the obvious terms ("high availability", "failover", "load balancing") and find out what the possibilities are, then figure out which approach best matches his requirements and resources.

  28. Has anyone actually answered the question? by smackenzie · · Score: 3, Insightful

    I see a lot of recommendations for various technologies, software packages, etc. -- but I don't think this addresses the original question.

    What you are asking about, of course, is enterprise-grade software. This typically involves an n-tier solution with massive attention to the following:

    - Redundancy.
    - Scalability.
    - Manageability.
    - Flexilibility.
    - Securability.
    - and about ten other "...abilities."

    The classic n-tier solution, from top to bottom is:

    - Presentation Tier.
    - Business Tier.
    - Data Tier.

    All of these tiers can be made up of internal tiers. (For example, the Data Tier might have a Database and a Data Access / Caching Tier. Or the Presentation Tier can have a Presentation Logic Tier, then the Presentation GUI, etc.)

    Anyway, my point is simply that there is a LOT to learn in each tier. I'd recommend hitting up good ol' Amazon with the search term "enterprise software" and buy a handful of well-received books that look interesting to you (and it will require a handful):

    http://www.amazon.com/s/ref=nb_ss_gw/002-8545839-8 925669?initialSearch=1&url=search-alias%3Daps&fiel d-keywords=enterprise+software+

    Hope this helps.

    1. Re:Has anyone actually answered the question? by lycono · · Score: 3, Informative
      The list of books in that search that are even remotely related to what the OP was asking is very short. I count 7 total results (of 48) in that list that _might_ be useful. Of which, only 1 actually sounds like it might be what the OP wants:
      • How To Succeed In The Enterprise Software Market (Hardcover): Useless, its about the industry, not about writing the software.
      • Scaling Software Agility: Best Practices for Large Enterprises (The Agile Software Development Series): Useless, it describes how to use agile practices in large enterprise level software teams.
      • Groupware, Workflow and Intranets: Reengineering the Enterprise with Collaborative Software (Paperback): Useless, it just describes how various, existing enterprise level software categories are supposed to work together.
      • Metrics-Driven Enterprise Software Development: Effectively Meeting Evolving Business Needs (Hardcover): Possibly useful, if he wants to know how to use metrics to help write the software. I have a feeling this doesn't give him techniques for writing enterprise software specifically though, which sounds more like what he wants.
      • SAP R/3 Enterprise Software: An Introduction: Useless, it's an SAP manual.
      • Essential Software Architecture (Hardcover): Possibly, can't tell from the description whether there is enough enterprise specific information to be useful.
      • Large-Scale Software Architecture: A Practical Guide using UML: Aha! Something that sounds like what the guy is asking for.
      (Not that I'm saying the books are useless in general, I'm just not sure they're what this guy/girl is looking for.)

      I think this guy/girl is looking for something along the lines of this comment but in "accepted" book format. It doesn't look like the search returns a "handful"....
    2. Re:Has anyone actually answered the question? by smackenzie · · Score: 1

      I can't believe that you didn't include the "Fisher Price Baby Bowling Set" (comes up on the third page)! What were you thinking?

      Actually, I'm going to completely agree with you; bad original search term. Amazon usually does better (and I should have checked).

      The search term "enterprise architecture" seem to produce better general results.
      http://www.amazon.com/s/ref=nb_ss_gw/102-6220372-7 109710?initialSearch=1&url=search-alias%3Daps&fiel d-keywords=enterprise+architecture

    3. Re:Has anyone actually answered the question? by kingradar · · Score: 1

      This "enterprise" post reminds me of several WTF's. But I link to one in particular:

      http://worsethanfailure.com/Articles/Bitten_by_the _Enterprise_Bug.aspx

      In general, enterprise has come to mean overpriced and underperforming. By making something "enterprise" your saying you designed it such that you can throw money at the problem. By breaking things into multiple "tiers" your saying that if any one tier gets overloaded, you can fix the problem by throwing money/hardware at that teir.

      From my perspective, the best way to solve the performance/reliability problems are with sound design, good programming (algorithims), and careful tool selection. That means architecting your app so that there is no single point of failure. If one node goes down, can the other nodes recover and continue functioning? In the same vein, can you add nodes to the cluster and scale increased loads across more machines without encountering bottlenecks? Its been documented elsewhere, but you want a(n) algorithims, not a(log n) algorithims. All to often the answer to scaling "enterprise" software is buy a bigger box. That can get expensive very fast. The better, albeit more difficult solution is to write the app so that multiple machines can work in concert. And finally, making sure that the tools you use will be able to scale. In general this relates to what database system, and libraries you use (and not so much the language).

      I'll address the language issue too. A lot of people have mentioned Erlang. While I think its a great language for server applications, there just isn't the community support to make it a pracitical choice. (Exception, if you don't need libraries or you plan to write _everything_ yourself, as is often the case for embedded systems, then maybe Erlang is a good choice. Hence, why you find Erlang is routers.) Erlang also is problematic because of the small number of people skilled in its use. For me it really comes down to choosing C on Linux, or C# on Windows. (I've written scalable apps, supporting several hundred thousand users using both.) Its a simple fact that it takes less time to write production code in C# (or Java) than in C (or C++). So what you need to ask yourself is whether the efficiency savings of the former outweigh the added development cost of the latter.

      One of the apps I wrote is the SMTP/POP/IMAP server used to support my free e-mail service (http://lavabit.com/). For that project, hardware was comparatively expensive (I paid for everything myself), and my time was relatively cheap. (I started by only working on the code in between consulting gigs.) So it made sense to write the app in C, and use Linux. Over time, the efficiency savings have made the decision, while painful at times, the correct one. I'm able to support 70K users very cheaply. If I had chosen C#/Windows, I might have gotten the project done faster, but I'd need more expensive hardware. (I'm using Dell 1650's at the application tier, with beefier machines at the database/storage tier. Note, I have a two tier architecture.) I would also have had to shell out lots of dollars for Windows licenses. It just didn't make sense. For more on my mail server, read this other post http://slashdot.org/comments.pl?sid=191034&cid=157 11157.

      Another large project I worked on was a social networking site sponsored by a large carbonated beverage firm. In this case the pockets were deeper but the timeline was shorter. So it made sense to write the app in C#. In reminds me of the saying that in software development you have three factors: cost, quality and time. You get to pick two of those, but not three.

      I'll close by saying again, the best way to solve the performance/reliability problems are with sound design, good programming (algorithims), and careful tool selection.

  29. Statelessness by tweek · · Score: 3, Interesting

    I don't know if anyone has mentioned it but the key to a web application being scalable horizontally is statelessness. It's much easier to throw another server behind the load balancer than it is to upgrade the capacity on one. I've never been a fan of sticky sessions myself. This requires a different approach to development in terms of memory space and what not. With a horizontally scalable front tier, you can't always guarantee that someone will be talking to the same server on the next request that they were on the previous request. It requires a little more overhead in terms of either replicating the contents of memory between all application servers or on the database tier because you persist everything to the database.

    At least that's my opinion.

    --
    "Fighting the underpants gnomes since 1998!" "Bruce Schneier knows the state of schroedinger's cat"
    1. Re:Statelessness by sapgau · · Score: 1

      Agreed. I've read this somewhere else confirming the same reasons on why to eventually design your application for scalability from the start no matter how small the application.

      It is a greater headache to convert for example your webapp from file based cookies to persistent cookies in the DB. Of course trying to be careful not to serialize too much data in cookies either.

      my $0.02

    2. Re:Statelessness by demi · · Score: 1

      Sort of, but this is a naïve understanding. State is required or you're not doing anything interesting. You're just pushing it around. If you're the guy in charge of the web servers, it might seem like having sessions on the app server and plugin based request routing is a good idea, but it just pushes the problem to the app server guy. If you're the app server guy, it might seem like a good idea to put sessions in a database but that just pushes the state there. You're not solving any fundamental problems, just playing checkers with it. And replicating session information is a losing proposition for scaling because as the number of multimaster nodes grows so does its overhead. Loosely-coupled multiple master replication like that is very hard to get right.

      Sometimes you can get away with keeping state with the client, sometimes this isn't desirable. It depends what you're doing but it's not a general solution.

      The general solution to horizontal scalability is horizontal partitioning. This works for the stateful tiers as well. But integrating this with high availability is either complicated or inefficient (you get to pick).

      --
      demi
  30. HA is an IT thing by Ryan+Amos · · Score: 1

    Developers need not worry about HA too much. Your IT department should be able to set this up for you rather seamlessly. With things like LVS/Keepalived you can easily implement load balancing and auto-failover for databases, web servers, etc (you don't even need to code in multiple DB servers; VRRP works wonders for this kind of thing.) As long as the application is designed sanely to begin with, HA as it is typically discussed comes down to minimizing the impact of hardware failure by buying two of everything and making failover happen automatically (human response time is anywhere from 5-15 minutes in a best-case scenario, where worst-case for auto failover is http://keepalived.org/) is an excellent solution for something your size. It's basically an LVS frontend with host checking and automatic failover capability (via VRRP,) custom host checks (i.e. run a typical SQL query every 3 seconds to check that everything is ok, if it's not, remove the DB from the pool, do some other stuff and rebalance the cluster. It can all run off one IP on the front end so the app won't notice what happened.)

    1. Re:HA is an IT thing by dennypayne · · Score: 2, Insightful

      The first two sentences here are one of my biggest pet peeves...if application developers don't start becoming more network-aware, and vice versa, I think you're dead meat. Hint: there are very few applications these days that aren't accessed over the network. I see so many "silos" like this when I'm consulting. The network guys and the app guys have no idea what the "other side" does. Where if they actually worked together on these type issues instead of talking past each other, something actually might get done.


      So yes, developers absolutely need to worry about HA. It makes a difference whether your app is stateless or not. How the app is health-checked from the load balancer. How chatty the app is on the network. Etc, etc...


      Denny
      --
      Erecting the wall of separation between church and state is absolutely essential in a free society. - Thomas Jefferson
    2. Re:HA is an IT thing by psykocrime · · Score: 1

      Never a truer word have I heard spoken. Well said, friend!

      --
      // TODO: Insert Cool Sig
    3. Re:HA is an IT thing by webgrappa · · Score: 1

      I also think that the developers should not focus too much on HA, but this is thanks to the Virtualization, the bright future of HA. Need more performance? Just add another Blade to the Vmware Infrastructure!

    4. Re:HA is an IT thing by vidarh · · Score: 1
      You gloss over a lot under the sentence par "As long as the application is designed sanely to start with".

      In a HA environment, developers need to worry about a range of things such as minimizing state in any process, ensuring operations are idempotent (can be retried with the same result) etc., or keepalived/LVS/VRRP will not help you much.

      You need to design your apps to be resistant to bad results, suddenly dropped network connections, timeouts, high load situations (i.e. ensuring additional clients to any component gets rejected if it would jeopardize the whole component) etc.

      By "resistant" I mean "don't just feed errors back to the user if you have any way of recovering".

      Frankly, most applications not explicitly designed with HA in mind fail on most or all of these.

    5. Re:HA is an IT thing by James+Youngman · · Score: 1

      Developers need not worry about HA too much. Your IT department should be able to set this up for you rather seamlessly.
      You have that totally backwards. Completely.
  31. You are talking about two things by ChrisA90278 · · Score: 2, Informative

    You are talking about two things: reliability and performance. And there are two ways to measure performance: Latency (what one end user sees) and through put (number of transactions per unit time). You have to decide what to address.

    You can address reliability and through put by invest a LOT of money in hardware and using things like round robin load balancing, clusters and mirrored DBMSes, RAID 5 and so on. Then losing a power supply or a disk drive means only degraded performance.

    Latency is hard to address. You have to profile and collect good data. You may have to write test tools to measure parts of the system in isolation. You need to account for every millisecond before you can start shaving them off

    Of course you could take a quick look for obvious stuff like poorly designed SQL data bases, lack of indexes on joined tables and cgi-bin scripts that require a process to be strarted each time they are called.

  32. Doing this currently by Anonymous Coward · · Score: 0

    I just got hired at a company whose code is a mess and I have to clean it up. The biggest mistakes i see causing this are tables not normalized, business logic is not properly separated (neither in the DB nor in the code), MVC standards not followed, redundant coding and code/data bloat (ie. loading more data and/or code than is needed to perform a given function and/or task).

    These are alot to wade through. My best suggestion is to go with a prebuilt MVC architecture and do your best to throw it all into there. The new STRUTS2 is awesome if you know JAVA. Stay away from Ruby on Rails if you want scalability (as even the RUBY ON RAILS site requires PHP to scale). If you are using PHP, PHPulse is the fastest framework out their but is lacking in documentation. Comparable ones like Cake and ZEND have loads of documentation but are more bloated and far slower

  33. Lots of Options by curmudgeon99 · · Score: 3, Interesting

    First of all, excellent question.

    Second: ignore the ass above who said dump Java. Modern hotspots have made Java as fast or faster than C/C++. The guy is not up to date.

    Third: Since this is a web app, are you using an HttpSession/sendRedirect or just a page-to-page RequestDispatcher/forward? As much as its a pain in the ass--use the RequestDispatcher.

    Fourth: see what your queries are really doing by looking at the explain plan.

    Five: add indexes wherever practical.

    Six: Use AJAX wherever you can. The response time for an AJAX function is amazing and it is really not that hard to do Basic AJAX.

    Seven: Use JProbe to see where your application is spending its time. You should be bound by the database. Anything else is not appropriate.

    Eight: Based on your findings using JProbe, make code changes to, perhaps, put a frequently-used object from the database into a class variable (static).

    These are several ideas that you could try. The main thing that experience teaches is this: DON'T optimize and change your code UNTIL you have PROOF of where the slow parts are.

    1. Re:Lots of Options by Anonymous Coward · · Score: 0

      Second: ignore the ass above who said dump Java. Modern hotspots have made Java as fast or faster than C/C++. The guy is not up to date.

      Really?

      http://shootout.alioth.debian.org/gp4/benchmark.ph p?test=all&lang=gpp&lang2=java
      http://shootout.alioth.debian.org/gp4/benchmark.ph p?test=all&lang=gcc&lang2=java

      Java has certainly made performance gains over the years but it is most certainly not a replacement for C/C++ if you care about raw performance at all.

    2. Re:Lots of Options by Anonymous Coward · · Score: 0

      Java is just terrible, those who think otherwise are just kidding them selves.

    3. Re:Lots of Options by nuzak · · Score: 2, Informative

      The VM speed is not Java's problem. The decrepit servlet architecture, which was designed from the start around one-thread-per-request, is. Anything that fixes this architecture is essentially a patch on a broken system. Even if you escape, you're going to find that many JavaEE components will have you buying stock in RAM manufacturers.

      A good JMS provider is nice to have for HA though. Nothing like durable message storage to help you sleep well.

      --
      Done with slashdot, done with nerds, getting a life.
    4. Re:Lots of Options by curmudgeon99 · · Score: 1

      Funny, I seem to recall long maddening hours working on a collegue's code, trying to find his god damned memory leak. I'm sure you can find specific circumstances in which a C program is faster than Java. Then why are all these companies standardizing on Java? You think you're brighter than all these corporations? I doubt it. C++ is dying and I say good riddance.

    5. Re:Lots of Options by Wdomburg · · Score: 3, Insightful

      Six: Use AJAX wherever you can. The response time for an AJAX function is amazing and it is really not that hard to do Basic AJAX.

      AJAX can be a performance win. It can also be a nightmare if done poorly. I've seen far too many "web 2.0" applications that flood servers with tons of AJAX calls that return far too little data without a consideration for the cost (TCP connections aren't free, logging requests isn't free).

      Response time is also variable. What feels 'amazing' local to the server can be annoyingly slow over an internet connection, especially if the design is particularly interactive.

      Couple things I'd suggest:

      1) Don't do usability testing on a LAN. An EV-DO card wouldn't be a bad choice for an individual. For a larger scale development environment a secondary internet connection works well.

      2) Remember that a page can be dynamic without AJAX. Response time toggling the display property of an object is far more impressive than establishing a new network connection and fetching the data.

      3) Isolate AJAX interfaces in their own virtual host so that you can use less verbose logging for API calls. This is a good idea for images as well.

    6. Re:Lots of Options by Anonymous Coward · · Score: 0

      Funny, I seem to recall long maddening hours working on a collegue's code, trying to find his god damned memory leak.

      Funny, I'm not a very good painter so I guess I should blame my inability to paint a masterpiece on the canvas, brushes and paint I use instead of my own incompetence.

      I'm sure you can find specific circumstances in which a C program is faster than Java.

      Yes, if by "specific" circumstances you really mean "virtually all".

      Then why are all these companies standardizing on Java? You think you're brighter than all these corporations? I doubt it. C++ is dying and I say good riddance.

      Thanks for the personal attack. The Fortune 500 company I work for utilizes C/C++ almost exclusively on the backend. I guess I should run to the CTO's office to tell him that we must be doing things wrong because a person on the internet thinks everyone is standardizing on Java.

    7. Re:Lots of Options by wtarreau · · Score: 1

      Second: ignore the ass above who said dump Java. Modern hotspots have made Java as fast or faster than C/C++. The guy is not up to date.

      ROTFL! You made my day, really! Every person who tried to demonstrate this stupid statement to me finished by giving pointers to benchmarks computing PI in different languages, or meaningless things like this.

      But when it comes to real work where you need to allocate memory every 5 instructions, it's amazing how the gap is big between both! Also, the large memory footprint considerably reduces CPU caches efficiency. A big company I know is replacing a big web app running on about 20 CPU (coded in C) with this brillant technology... More than 200 CPU, and half a terabyte of RAM. You can surely imagine that the poor CPUs caches will always thrash with such a large dataset! So maybe with 10 times the CPUs and RAM your crap is "as fast or faster than C/C++", but please stop spreading dangerous statement as above to people who are starting a project !

      Six: Use AJAX wherever you can. The response time for an AJAX function is amazing and it is really not that hard to do Basic AJAX.

      I discovered who you are : you are a developper who always tests his code on local network and never through the net. You need 100 microseconds to perform an HTTP request on the LAN, but 100 ms over a DSL link. IF you use AJAX to fill a table with 100 values, it will require 10 ms on your LAN (immediate response), but 10 seconds over a DSL link. Good! What a smart choice! You should use AJAX only to avoid reloading complete pages, but not wherever you can
      as you stated above. Also, the fact that you only considered the function's response time is a clear indication that you don't care at all about the end-user experience.

      Hint: if you cannot afford to buy a modem to try your apps through the net, at least start with 2% packet loss between your server and client over the LAN, using this rule on your server (assuming you use linux), it will mimmick one of the problems people encounter through the net (though it will not add latency) :

      $ iptables -I OUTPUT -p tcp --sport 80 -m random --average 2 -j DROP

      And do not list to us what apps you have written, surely if people have been using them, they don't anymore.

    8. Re:Lots of Options by smellotron · · Score: 2, Insightful

      Then why are all these companies standardizing on Java? You think you're brighter than all these corporations? I doubt it. C++ is dying and I say good riddance.

      Most companies standardize on Java not because it gives them a higher "maximum power", but because it gives them a higher "minimum power". The language as a whole is designed (whether intentionally or not) to reduce the damage done by less-than-stellar programmers, which is more important to a business than increasing the power of the superstar programmers.

      Plus, you don't think Sun puts out all of the Java marketing buzz for nothing, do you? Java is sponsored by a corporation, and connections like that make a big difference to managers. While it may be a good thing for many companies to standardize on Java, they're not doing it because Java has an ounce of technical superiority.

      I haven't done any personal testing, but I'm under the impression that speed-wise C/C++ stack > Java (everything is on the heap) > C/C++ heap. Since realtime Java code seems to encourage disabling garbage collection, that seems consistent. So yeah, you might be able to tweak out your speed by avoiding the heap in some cases, but I'm not sure. Maybe someone in the hard-real-time environment can enlighten us on that.

    9. Re:Lots of Options by curmudgeon99 · · Score: 1

      I maintain that your information is out of date. Java will exceed the speed of a C++ program in many cases nowadays. I worked for a large insurance company and we had to prove to them using benchmarks that Java was faster. We won and that company now only uses Java. You're avoiding the issues of real-world development also. What is the expense of a memory leak? You can claim that they also can exist in Java but as we all know in C++ they are a constant problem--especially for the newbie programmers. So, are you putting this forward as an advantage of C++? I see the dangers to the inexperienced programmer as being much higher in C++ than in Java. What's your point? Are you proud that it's so easy to create a memory leak in C++?

    10. Re:Lots of Options by curmudgeon99 · · Score: 1

      Obviously, you want to be intelligent when you build an application. The idea of using AJAX "wherever you can" implies that the developer actually use his brain when deciding whether or not you "can". Obviously, if you're bringing back 500 values, you're going to see network latency. Hence the emphasis on "can". AJAX is not a free lunch but it has really helped out some applications that have massive, nationwide user bases.

    11. Re:Lots of Options by smellotron · · Score: 1

      Java will exceed the speed of a C++ program in many cases nowadays. I worked for a large insurance company and we had to prove to them using benchmarks that Java was faster.

      Yes... and I was implicitly asking readers to provide me with better information... which you sort-of have. But I'd really like to be able to see some side-by-side comparisons (if one isn't strictly faster than the other, then where are the differences showing up?).

      You're avoiding the issues of real-world development also. What is the expense of a memory leak? You can claim that they also can exist in Java but as we all know in C++ they are a constant problem--especially for the newbie programmers.

      No, I believe I explicitly addressed that the strengths of Java weren't technical strengths. We're agreeing on this point, really. Though, I am a firm believer in the RAII idiom in C++, which really makes C++ a lot simpler once you get used to it... but yes, it's a higher bar to start with, especially for newbie programmers. And it's certainly something I never heard about in school (whereas I learned about garbage collection algorithms as a freshman).

      Are you proud that it's so easy to create a memory leak in C++?

      Nope. But there are strengths to using C++'s model that Java users tend to ignore. RAII allows you to manage any resources, not just memory. I think Python has the best setup, though: garbage collection with deterministic destruction, which lets you use RAII without forcing you to manage your memory (though you still have to think about it, since the GC isn't as advanced as Java's, and gets stuck on cycles when destructors exist unless you explicitly use weak references, IIRC).

    12. Re:Lots of Options by kbjorklu · · Score: 1

      I maintain that your information is out of date. Java will exceed the speed of a C++ program in many cases nowadays.

      I agree, memory allocation can be faster, and the VM can compile the code to use all available features on the target processor. I doubt that the compiler can perform as many and thorough optimizations as a static compiler though.

      I worked for a large insurance company and we had to prove to them using benchmarks that Java was faster. We won and that company now only uses Java.

      I'm sure that using different implementations would have given different results.

      You're avoiding the issues of real-world development also. What is the expense of a memory leak? You can claim that they also can exist in Java but as we all know in C++ they are a constant problem--especially for the newbie programmers. So, are you putting this forward as an advantage of C++? I see the dangers to the inexperienced programmer as being much higher in C++ than in Java. What's your point? Are you proud that it's so easy to create a memory leak in C++?

      It's a common misunderstanding that garbage collection prevents memory leaks. It doesn't, it prevents use of dangling pointers. In C++ you can either create a memory leak (by not calling delete) or create a dangling pointer (by calling delete but leaving references to the deleted object). The former is always a bug, the latter only if you dereference the pointer (after which you usually crash or corrupt data). In Java, you can create a memory leak (by having references to unneeded objects in live objects), but you can't create a dangling pointer. You can accidentally use an object which is supposed to unneeded though (causing a violation of program invariants / contracts, without a crash but possibly corrupting data).

      See e.g. http://www.ibm.com/developerworks/java/library/j-l eaks/ for more information.

    13. Re:Lots of Options by curmudgeon99 · · Score: 1

      I believe if you read the post you replied to you will see I wrote "you can create" a memory leak in Java. Therefore, we are in agreement.

    14. Re:Lots of Options by Anonymous Coward · · Score: 0

      I'm a Java nut and I can tell you that Java IS NOT FASTER than C/C++. That statement is false. It is "fast enough" though which is the way I would word it. If you want the best performance you will need a thrid party JVM like BEA or an IBM alternative.

  34. Admit it.... by Jailbrekr · · Score: 5, Funny

    you work for Skype, don't you?

    --
    Feed the need: Digitaladdiction.net
  35. They use it to make money, so no criticism allowed by FatSean · · Score: 0, Flamebait

    Is that your point? 'Cause Microsoft is making a metric assload of money off of Windows, and Apple is making money off of OSX too.

    I think I smell a Google fan-boy.

    --
    Blar.
  36. Disaster planning by jhines · · Score: 1

    It sounds like you need to some basic disaster planning. Think in terms of "what if this happens?"

    Like you loose your data center? How good is your backup, is it off site, do you have a tested plan for restoring the data and system on an interm basis on someone's system?

    Then you can look at some more specific things, what happens if I loose this server, this connection, this router, and specific services, DNS, Email, etc.

    The big question $$$ depends on how much you have to loose. If you can afford a day of downtime, you don't have to spend as much effort on HA as say the NYSE, or an airline.

  37. Service Availability Forum by zix619 · · Score: 1

    Among others, another possibility Service Availability Forum (http://www.saforum.org/). You can download an open source implementation at http://developer.osdl.org/dev/openais/ and play with it (runs on top of Linux).

  38. High availablitiy and scalability... by pjr.cc · · Score: 1

    These really are 2 different things. Though they do sometimes cross over - oracle RAC is a good example of that.

    As for where to read from a developer perspective? (which alot of people replying seemed to have missed the actual question). There are TONNES.

    But split the question in two, where can i read about HA:
    start here-> http://en.wikipedia.org/wiki/High_availability Theres also many books on the subject (i remember one of the few i happened to like is the things that came out of the sun blueprints books). The problem with HA is it very subjective. You can talk about HA for say web applications and just talk session sharing and an intelligent load balancer (ironically, the same thing gives you scalability until you get to the DB) or you can talk all the way down to fault-tolerant hardware. Also take a look at the whitepapers that came out of such projects as mosix, VAX clusters, oracle HA (both RAC and dataguard), IBM Websphere (There alot in the various IBM sites about HA for all their products and one is bound to be similar to yours in nature), Sun J2EE. Alot of these do go into development aspects as well and give some fantastic concepts and paradigms to follow. But you really need to define the requirements for HA. i.e. 0 dt or 30 minutes dt is a HUGE difference! (and that really is just one scenario in many, and as a developer your usually faced with multiple requirements).

    Scalability is a different issue - and usually very application and environment dependent. Again http://en.wikipedia.org/wiki/Scalability is a good start but finding general literature is often very hard because its so dependent on the situation and the application.

    Personally, i've found i learn best from example. Such things J2EE application servers (websphere, sun JES, etc), load balancing, oracle RAC vs dataguard, mysql ndb vs replication vs read-only replica database methods, apache, php, samba, windows (most things), pick just about any main-stream application and it'll almost certainly cover both HA and scalability at a level helpful to a developer.

    If you want to get even more complex - take a look even cooler forms of scalability and HA that involve things like utility computing (vmware DRS, or egenera for eg). Have a look at their design documents because they offer even more diverse examples of both subjects at a more abstract layer (i.e. even below the OS and entirely on the HW)

    In both cases, its hard to go from a "we weren't thinking about HA or scalability scenario when we build it" to "its HA and scalable". HA tends to be a little easier because clusters can wrap themselves around almost any situation, but scaling on such systems usually means "i need bigger, faster and more CPUs, more memory and better disk until i can figure out how to code scaling into it".

    Always keep in mind though, the law of diminishing returns almost always applies.

  39. Re:They use it to make money, so no criticism allo by fimbulvetr · · Score: 2, Interesting

    The point was that he seemed to consider it so academic and so "well known" that he could just dismiss it without considering it.

    Google seems to have taken this elementary technique and turned it into a something that can kick the crap out of an over-engineered solution under the right circumstances. I've read the paper, and assuming this is really used how they say it is, I can say that it does a fantastic job of performing AND HA, based on my personal experiences with gmail, google, groups, adwords, maps, analytics, etc.

    Fanboy? Maybe, depending on your definition. Impressed? Hell yes.

  40. How about your local paper? by Bearhouse · · Score: 1

    Look for a job where they got lots of oltp!

  41. Hard stuff by ldapboy · · Score: 1

    It's just hard to do, and I've never seen a good book on the subject
    (in fact I've considered writing one on and off for years but sadly
    the $$ I cam make as a consultant on performance, scalability and
    availability far exceeds the likely rewards from publishing a book).

    Best advice is to look at some open source projects that are used
    in highly scalable applications. The other thing I'd say is that
    there isn't one true technique -- at this point everyone makes up
    their own solution as they go. Often the applications' characteristics
    drive the scaling architecture so each application is different.

  42. SEIA by lesinator · · Score: 1

    Software Engineering for Internet Applications will guide you in the right direction.

  43. A second or two? by Bluesman · · Score: 1

    On modern hardware, on an internal network, "a second or two" is an eternity. Instead of worrying about what would happen if all 60,000 people used the app at once (unlikely), I'd find the bottlenecks you have now and fix those.

    Prioritize. You have statistics already about typical usage, and typical wait and service times. Fix the problem that exists, instead of the problem that doesn't, but might someday.

    --
    If moderation could change anything, it would be illegal.
    1. Re:A second or two? by Shados · · Score: 1

      for many complex, distributed, multi-tier applications, especially those doing heavy real time calculations across multiple systems, and web apps doing pretty specific user session monitoring and personalisation, a second or two is pretty freagin blazing actually. A second or two is only an eternity if talking about a stand alone system doing simple stuff with only a few boundaries to cross, if any.

    2. Re:A second or two? by Bluesman · · Score: 1

      From the original question:

      "The type of work being done is generally straightforward reads or updates that typically hit two or three DB tables per transaction. So this isn't a complicated site and the usage is pretty low."

      There is no reason this should taking multiple seconds. He has a basic problem there. Now is not the time to be thinking about multiple distributed tiers, as much as he wishes he were working on a really complex, cool system. He's not.

      All of the people chiming in with their own details aren't helping the original poster out, they're steering him way off base. It's impossible to recommend anything to this guy without knowing more detail about his problem, how it can be partitioned, what hardware and software he's currently using, and what his cost constraints are.

      But I'm almost positive that his simple database queries aren't optimized and there's some other bottleneck when handling only 90 simultaneous users.

      --
      If moderation could change anything, it would be illegal.
    3. Re:A second or two? by Shados · · Score: 1

      Definately possible, yes. Though when you have 60 thousand users, the 2-3 tables per transactions you're hitting could very well be hundreds of gigs. And the database is only ONE part of the equation in a big system, and definately can take multiple seconds. That a transaction takes 1-2 sec doesn't take away from the SSL and authentication, querying the user specific properties. The app may be graphic intensive, who knows :) Thats why I was just replying to the main question which had to do with high availability development. The rest is his or her problem.

      That being said, its very possible that if they're going around toying with server side cursors all over the place and crappy queries, it will be an issue. But 90 simultaneous users on an app that contains transactions made by 60k users probably add up something fast. I mean, I've worked on an app that had -3- concurrent users, and the entire app was 30 tables, with never more than 6 tables accessed at a time. "Small fry stuff" one think. Until you realise that one of the actors on the tables would pump in millions of rows per -day- in the best of cases. Not much to do about that at that point but scale somehow...

  44. You get Performance, Easy of Use , or Cost pick 2 by Anonymous Coward · · Score: 0

    Anyone who uses Java and performance in the same sentence as fast, hasn't been around enough heavy systems to know what the hell they're talking about. You get Performance, Easy of Use , and Cost pick two. It's an old rule, but still applies to day. The C/C++ folks are laughting their arse(s) off at java and performance. And if anyone is old enought to know asembler try not to cringe to much when Java or the C/C++ folks talk about performance.

  45. No fallacy.... by encoderer · · Score: 4, Insightful

    Huh? So, because there is only one master it is unlikely to fail?
    Yes. If you take that sentence in context, the answer is "Yes." Compared to the likelihood that one of the thousands of worker-machines will fail during any given job, it IS unlikely that the single Master will fail. Moreover, while any given job may take hours to run, it also seems that many take just moments. Furthermore, just because a job may take hours to run doesn't mean it's CRITICAL that it be completed in hours. And, at times when a job IS critical, that scenario is addressed in the preceeding sentence: It is easy for a caller to make the master write periodic checkpoints that the caller can use to restart a job on a different cluster on the off-chance that a Master fails.

    If a job is NOT critical, the master fails, the caller determines the failure by checking for the abort-condition, and then restarts the job on a new cluster.

    It's not a logical fallacy, nor is it a bad design.

    For the benefit of anyone reading thru, here is the parapgraph in question. It follows a detailed section on how the MapReduce library copes with failures in the worker machines.

    It is easy to make the master write periodic checkpoints of the master data structures described above. If the master task dies, a new copy can be started from the last checkpointed state. However, given that there is only a single master, its failure is unlikely; therefore our current implementation aborts the MapReduce computation if the master fails. Clients can check for this condition and retry the MapReduce operation if they desire.
  46. Cache what you can by borkus · · Score: 1

    You can get massive savings in processing by using various caching techniques. Caching lets you save the results of one process for use later.

    1. Client side cache. Most developers shudder when they think of a web page being cached on the browser. However, some pages (like help pages, new articles) do not change with real time and can be stored on the client's browser for a few minutes. Learn how to use the HTTP Caching directives to reduce the number of unique pages requested by each user.

    2. HTML Output caching. While some parts of a page may change, some elements (such as navigation elements, footers, etc) may not need to be recalculated with each page load. Many app servers let you save sections of the page so that once one user has generated them, you can reuse them again.

    3. Database Caching. Frameworks like Hibernate allow you to cache the results of SQL calls so that if the same SQL is reissued (even between different users) the cache reads the result, not the database. Usually you can pick which calls are cached versus which ones have to be live.

    1. Re:Cache what you can by smellotron · · Score: 1

      Frameworks like Hibernate allow you to cache the results of SQL calls so that if the same SQL is reissued (even between different users) the cache reads the result, not the database. Usually you can pick which calls are cached versus which ones have to be live.

      Also consider looking at something like memcache, which is a very fast distributed caching mechanism. You can use it to cache more than just SQL queries, too.

  47. Good book to read by georgewilliamherbert · · Score: 1

    Blueprints for High Availability , Evan Marcus and Hal Stern, second edition. http://www.amazon.com/Blueprints-High-Availability -Evan-Marcus/dp/0471430269/ref=cm_taf_title_featur ed?ie=UTF8&tag=tellafriend-20

    Deals with the subject of high availability from the IT side rather than programming, but anyone dealing with HA systems needs to understand these issues.

  48. Actual answers by utnapistim · · Score: 1

    Well ... I've read what others wrote here and I don't think you got many actual answers (welcome to slashdot :) ).

    While there were some (very) good points about both scalability and HA, they didn't tell you how to go about learning that; HA and HS are two areas where by reading the books or following case studies, you can understand the basic problems, but not see how you actually go about building a particularly scalable or HA system (because it's usually a system, not a single server).

    I've worked in maintenance for a c++ server, where we gave the users guarantees of both low response time under stress and minimal down time.

    While working with that, we've had to use different angles in attacking the appearing problems, and usually the solutions we used were particular for the problems couldn't be very well generalized.

    For example, when different threads used the same input data, it was better in some situations to duplicate the data for each thread instead of using locks on it, so you didn't incur the delays involved in locking. In other places, we used locking as even with locking those places wouldn't create bottlenecks.

    In some value-objects (large data collections), we used copy-on-write pattern with sharing objects values in the same thread (to minimize the allocations done and memory fragmentation), and deep copies when the data was needed to be sent in other threads.

    We split the server in multiple different servers handling the different parts of processing (so we could throw hardware at the computation-heavy parts of the application logic) and use load-balancing.

    We also used in-memory databases in one case, for storing some of the information.

    Some IO operations (like logging) ran on different threads with messages passed to them (having multiple threads for logging for example).

    For HA, we had a complete monitoring system, with processes listening for hartbeat and making load-statistics for the different modules, and every computation part had a backup server ready to take over at certain loads.

    My point with these examples, is that neither of these solutions can be applied ad-hoc, but each possible bottleneck has to be studied separately and the solutions to avoid it can vary depending on a lot of factors (while you can get ideas, you won't learn scalability or HA from a book).

    The best solution I can think of for learning about HS and HA is working in a product that needs them and getting direct experience in improving the scalability and availability.

    --
    Tie two birds together: although they have four wings, they cannot fly. (The blind man)
  49. Re:You get Performance, Easy of Use , or Cost pick by Wdomburg · · Score: 1

    It really depends on the application. We're handling hundreds of concurrent connections and millions of connections a day per server with average CPU utilization hovering averaging 43.09% and never exceeding 63.47%. If you subtract time waiting for I/O and the average drops to 9.89% and peak 23.24%.

    Performance bottlenecks often lie in the disks and network, not in the application.

  50. Classics by Anonymous Coward · · Score: 0

    The C10K problem
    http://kegel.com/c10k.html

    High-Performance Server Architecture:
    http://pl.atyp.us/content/tech/servers.html

    (Note that this refutes http://ask.slashdot.org/comments.pl?sid=277739&cid =20331189 because this emphasizes that you should do as much work in one thread as possible in order to avoid redundant context switches. Thus don't separate threads based on role where one task must be picked up by many threads to complete -- instead, each thread should be able to execute any sequence through the state machine for any client.)

    Maybe not a classic, but still nice (see its references):
    Threads, Tasks, Coroutines, Processes, and Events:
    http://shlang.com/writing/threads-tasks.html

  51. Impressive by Gazzonyx · · Score: 1
    From what you and Raven64 have said, my interest in Erlang is piqued! I think I'll give it a shot when I have some downtime this semester (I'm always hearing "try Ruby, lisp, perl, $silverBulletLanguage", and I just don't have the time). I just have one question about what you've said:

    Suddenly, applications become trees of processes pitching data back and forth in messages. We aren't talking a win32 style message pump kind of message passing mechanism, are we? I truly can't stand the message pump in win32 - it always feels like such a 'hack'; I don't have a better solution, though, so I've been waiting for a better form of IPC. Yet Raven64 said there is no shared memory, so I'm confused on how the message passing happens. Hooks? By value, but not by reference (hence no shared memory)? Or are we talking recursion style process trees where the parent sleeps until the child scopes out?


    Thanks for your reply, I think I'll give it a shot!

    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

    1. Re:Impressive by stonecypher · · Score: 3, Informative

      We aren't talking a win32 style message pump kind of message passing mechanism, are we?
      No. Many people have also raised the question of whether MSMQ, the new OS-level messaging service, is modelled on Erlang's; again, the answer is no.

      The problem is, it's hard to explain why. The overhead of using things like that is tremendous; Erlang's message system is used for quite literally all communication between processes, and a system like Windows Events or MSMQ would reduce Erlang applications to a crawl. Erlang uses an ordered, staged mailbox model, much like Smalltalk's. If you haven't used Smalltalk, then frankly I'm not aware of another parallel.

      It's important to understand just how fundamental message passing is in Erlang. Send and receive are fundamental operators, and this is a language that doesn't have for loops, because it thinks they're too high level and inspecific (you can make them yourself; I know, that must sound crazy, but once you get it, it makes perfect sense.)

      I truly can't stand the message pump in win32 - it always feels like such a 'hack'; I don't have a better solution, though, so I've been waiting for a better form of IPC.
      You're about to see a completely different approach. I'm not saying it's the best, or the most flexible, but I really like it, and it genuinely is very different. What Erlang does can relatively straightforwardly be imitated with blocking and callbacks in C, but that involves system threads, and then you start getting locking and imperative behavior back, which is one of the things it's so awesome to get rid of (imagine - no more locks, mutexes, spin controls and so forth. Completely unnessecary, both in workload, debugging and in CPU time spent. It's a huge change.)

      Really, it's a whole different approach. You've just got to learn it to get it.

      Yet Raven64 said there is no shared memory, so I'm confused on how the message passing happens.
      No, I said that. I wrote some code to help explain it to you, though of course slashdot's retarded lameness filters wouldn't pass it, so I put it behind this link. Sorry it's not inline.

      Hopefully that will help. Sorry about the lack of whitespace; SlashDot's amazingly lame lameness filter is triggering on clean, readable code.
      --
      StoneCypher is Full of BS
  52. An article I wrote last year may help you by wtarreau · · Score: 1

    Hi,

    an article I wrote last year about application scaling using load balancing may help you. It will not solve your problems but will certainly help you with the concepts, best practises and traps to avoid, which is a good starting point.

    You can get read it online here : http://1wt.eu/articles/2006_lb/ or you can download it as a PDF here :
    http://www.exceliance.fr/en/ART-2006-making%20appl ications%20scalable%20with%20LB.pdf

    Also, what you need is to perform benchmarks frequent during all the cycle of development of your applications. Using
    traffic generators, you will simulate a lot of users and see how your application/database behaves. And believe me, it
    never breaks where you expected it to ! On the first run, it's almost always caused my too much memory usage.
    Then you optimize it (decrappify it in fact), then you break on concurrency (threads, processes, file descriptors,
    sockets, ...), then you optimize it again and break on I/O in the database. Then you have to rewrite all your requests,
    and when you finally saturate the frontend servers with 1% of your target load, you realize that you have to rewrite
    everything using a faster language. But at least, you will be able to save time by starting on a few cheap servers, for
    the time needed to translate the code for version 2.

    You talked about 60000 users. If it's a population of 60000 users, it's not much. If it's 60000 concurrent users, it's
    a huge load and you will have to educate yourself in network and operating system tuning, because tuning the app alone
    will not be enough.

    Good luck!
    Willy

  53. Re:You get Performance, Easy of Use , or Cost pick by Anonymous Coward · · Score: 0

    How many of those apps are running a java backbone? Sure Java is not the sole cause of the world to slowdown in performance, but don't make it into something that its not (i.e. a performance speed demon). The heavy lifing is not done in java and there are a lot of reasons why. (If people are missing this one, just what the hell are they teaching in the CS Deparments in Colleges these days anyway. This is basic stuff, no brainer material.) Time to grow up and move on.

  54. Databases and Streaming Architectures by Anonymous Coward · · Score: 0

    I would recommend familiarity with streaming architectures, which provide asymptotically better memory usage than the traditional store-and-forward model used in most J2EE and .NET applications. This is especially important if you will be sending large datasets to clients. These two articles outline streaming architecture, including both theory and practical implementations with performance results and analysis.

    Another point: for enterprise applications, it generally pays off better to focus on tuning the database tier, in my experience. If that's true in your case, understanding SQL optimizations, lock optimizations, and the various types of table indexes (e.g. clustering indexes) would be your best option.

  55. Tolerance of delays? by spagetti_code · · Score: 1


    The users of our apps are business professionals who are forced to use them, so they are are more tolerant of access times being a second or two slower than they could be.


    Actually, being forced to use your app doesn't make them more tolerant of delays. It makes *you* more tolerant because your users can't go away. They still hate the delays.

    1. Re:Tolerance of delays? by Anonymous Coward · · Score: 0

      Now if only those Indian fuckers could understand that... (sorry, I'm bitter)

  56. Re:You get Performance, Easy of Use , or Cost pick by Wdomburg · · Score: 1

    Erm, quite a few of them. Little site called eBay, for example, who migrated from a C++ impmlementation to Java in 2002. Happen to know one of the top ISPs in the country will be migrating about 20 million mailboxes to Java mailstores in the near future.

    When people think scalability has much to do with what language an application is written in, I start suspecting they've never worked in a real data center before.

  57. Re:You get Performance, Easy of Use , or Cost pick by Anonymous Coward · · Score: 0

    Like I said before you get Performance, Easy of Use, or Cost - pick 2. So if they opted for Performance and Easy of Use in their BACKBONE, then by the rule of thumb for the last 30 years, their costs are to high for their operations. (But I guess when you have Billions costs don't matter in the short term, but the stock holders might want to look into operational efficiency since it effects ROI.)

  58. Re:You get Performance, Easy of Use , or Cost pick by Wdomburg · · Score: 1

    So did you forget to make an actual argument, or do you just not have one?

  59. Really only 2 things to think about at the base by Shados · · Score: 2, Informative

    Task queuing to deal with server downtimes, and horizontal scalability.

    The first is handled by just about any messaging/queue system. J2EE has had one for ages, Microsoft has MSMQ that recently (better late than never... ::SLAPS::) integrated it directly in .NET via WCF, and there are others. In its simplest form, you really just send your jobs to a "queue", and have automated processes pick em up and handle em. If the processes go down, they'll just handle them when they get back up, so even a whole database server farm going down at the same time won't make you lose queued up requests. Nifty (it of course gets more complicated than that, but the basic scenarios can be learned by following an internet tutorial).

    Then horizontal scaling. Why horizontal? Because just taking a random new box and plugging in it the network is easier and faster (especially in case of emergency) than having to take servers down to upgrade them (vertically scaling). Also adds to redundancy, so the more servers you add to your farm, the less likely your system will go down. There are documents on it all over (Microsoft Patterns&Practices has some on their web sites, non-MS documentation is hard to miss if you google for it, and many third partys will be more than happy to spam you with their solutions), but it really just come down to: "Use an RDBMS that handles clustering and table partitioning, use distributed caching solutions, push as much stuff on the client side (stuff that doesn't need to be trusted only!), and make sure that nothing ever depends on ressources that can only be accessed from a single machine (think local flat files, in process session management, etc)".

    With that, no matter what goes down, things go on purring, and if someone ever bitch that the system is slow, you just buy a 1000$ server, stick a standard pre-made image on the disk, plug it in, have fun.

    Oh, and fast network switches are a must :)

    1. Re:Really only 2 things to think about at the base by Anonymous Coward · · Score: 0

      In short: business apps are generally I/O-bound, not CPU-bound. Throwing more processors at the problem WON'T solve it.

  60. Re:Read the following document. by Anonymous Coward · · Score: 0

    > 1: First of all _drop_ Java. (when are people going to learn... *sigh*)

    Sounds like something a 14 year old teenybopper stuck in his mom's basement might say. Dude, where have you been the last 10 years. Java PWNS the server side.

    > 2: Read http://h30163.www3.hp.com/NTL/view/?id=090015ea800 98885&p=880-8bf/090015ea80098885/TPCONTNT.pdf&toc= y

    Thanks for sending us to some BS terms of use page...loser.

    > 3: Use asynchronous IO

    Uh, dude, that'll be built into the app server in most cases.

    > 4: If at all possible, stay away from threads.

    All enterprise apps are multi-threaded you fool. Threading though, is best left to the container, not to business logic developers.

    Please, do us a favor, and stay FAR, FAR, FAR away from enterprise computing.

  61. Thank you! by Anonymous Coward · · Score: 0

    Yaws & MNesia & Erlang => thank you! You just made my life a lot brighter.

  62. Some enlightenment by GBuddha · · Score: 1

    The response to this question proves what I've suspected for a long time - Many nerds/geeks/techies who post on slashdot have strong opinions about technology and almost everything else and will go to great lengths to prove that they are right. Finally, there's a question that requires actual knowledge about technology to answer it (not just opinions) and look at the number of posts rated 5 (or even the total number of posts).

  63. Re:You get Performance, Easy of Use , or Cost pick by Anonymous Coward · · Score: 0

    Sorry, this is one of my beefs with slashdot.

    There are a lot of smart people here who are speaking outside of there fields of expertise. I am not one of those people on this topic. I gave you enough clues to end this argument several comments back. To me this a basic argument, that is normally resovled in the 3rd or 4th year of CS in college. Unfortunely you haven't accepted the simple overview. I have used Java since 1998(and been in the IT business a lot longer). I have a full command of the java language and many others as well. So I am informed on what I am speeking about.

    If you are really interest in the subject I recomend a good 4 year program in Computer Science (Berkly is one of the best.). If you are unable to take that route, I recomend a good Computer Club (Hal-PC is the best in the U.S.) much of their stuff is online.

    Some things to ponder in your search for IT knowledge in this area are:

    1.) Why don't they replace or add user functionality to the Cisco Highspeed Routers in Java (or Ajax if you perfer)?
    2.) Why don't they allow highspeed database functions to be added in Java (or Ajax if you perfer)?
    3.) In viewing a Highspeed and/or High Availabity Archectures what are the limitations and risks of having an interperted language in the main path of the highspeed data flows.

    (This should be enought to get you started.)

    ========
    ========

    Better questions for the uninitated are:
    1.) What is the performace envolope of java/ajax?
    2.) What is Java's strong points and weakness
    3.) What is the differnece betten Java Beans, Java Applets, Java Application?
    4.) What is a Java Server?
    5.) etc...

    The answers to these question will lead you away from Java as a Highspeed Solution in most cases.

    High Availablity(HA)is a different discussion. Rapid Applicatioin Developement(RAD) is also a different descussion. However Java/Ajax can be a very viable solution in these situations.

    ========
    ========

    (A side note:)
    Also the slashdot moderators are for the most part very good writers. But are almost totally clue less about the technical issues, and wouldn't know when to mod up a breif technical comment if it jumped up and bit them in the arse. Thus the posting of AC.

  64. Apprentice with a Technical Architect by Anonymous Coward · · Score: 0

    Apprentice with a Technical Architect if you want to learn about HA - High Availabilty.
    That's very different than scalability, but a good Tech Arch will know and be able to explain best practices for scalable apps too.

    As a technical architect, these are the types of problems I work on daily. From network design, network load balancers, to web server hardware, to application server architecture and clustering to app server hardware to DB server clustering and redundant SAN storage design.

    Then add in remote replication at least 200 miles away and disaster recovery systems in alternate locations. Don't forget about local and off-site backups. Some systems need to follow the sun around the world every day with the primary system and replicated secondary systems following. These are for 20,000 plus users and they aren't trivial web sites. These are complex data drive MUST HAVE apps for my business with almost a million transactions a minute.

    It can't be down by accident - ever. Well, if it does, I'm fired. My two envelopes are already written.

    Oh, and setting the budget for all the software, servers, storage, and networking for all this is also my job.

    My biggest complaint with developers is their lack of knowledge about the capabilities of the operating system they are writing for. Java people are the worst. Seems that about 90% of them have bought into the - I don't need to know the OS BS." Read a little about the different CPUs and don't try to tell me that you are CPU bound, so porting from C to Java and from Windows to Solaris on SPARC will be better. That's just stupid.

    Oh, did I mention I was a developer for 15 years first and a sysadmin for 5 years? Watch out for Tech Arch without any hands on experience.

  65. False Dichotomy-Re:Availability Isn't Scalability by Anonymous Coward · · Score: 0

    The OP sounds valid to me. If a system is not scalable, availability can suffer as a direct result when workloads increase.

    I don't think the scalability and availability form a dichotomy.

  66. stateless component pools by MikeFM · · Score: 1

    Unless you have an app that needs to be very tightly written I think the easiest way to write a scalable app is just to break the app down into components that can each be on their own server or duplicated across multiple servers. If each component isn't keeping state for itself then it doesn't matter which copy of a given component you make a request on so you can split tasks between copies with simple load balancing techniques. This also helps keep your application code clean as it makes you keep your components discrete.

    I perfer to create a pool of each component in virtual machines and then let VMWare manage the cluster of virtual machines - moving the VMs to the appropiate physical machines as needed to keep performance at peak. I'd suggest one copy of each component per physical machine you have. A lot easier than trying to measure everything and guess where problem areas will be (of course using both methods together is better).

    I let my components communicate with each other using XML-RPC. You can use SOAP but I find XML-RPC to be more lightweight and easy to use. Individual components can be written in different languages and even run on different operating systems. Just use whatever tools make the most sense for each component. This is especially good with web apps as it makes it easy to create alternative interfaces. The web-UI becomes a thin, to-the-point, component and it's easy to write alternative UIs for mobile devices, desktop applications, command-line access, etc.

    The only real blocking point is when you need to work with large amounts of data. You can use a clustering db to spread the load across multiple machines. You can also break database use up into it's own logical components. If two tables aren't logically connected then there is no reason you can't put those tables in different databases on different physical machines. Some basic tactics such as having one database for data crunching and another for caching and storing session data can be pretty effective. You've already split your application into components so it isn't to hard to see that those different components don't usually need to do everything in a single database.

    Being broken into components running in their own virtual machines makes it pretty easy to address availability too. Component pools can detect when an individual component is having issues, isolate it, restart it, replace it, alert an admin, etc.

    --
    At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
    1. Re:stateless component pools by tweek · · Score: 1

      I've been so busy dealing with a cluster at the office that I totally neglected my slashdot!

      In response to your post and the one right below it, yes all of these things are valid and in fact we currently use one method at my current company (independent components across multiple servers with discrete stateless transactions) and used something we used at a previous company (partitioning databases usage across different data sources - login traffic uses one datasource, session traffic to another, ad infinitum).

      I'm familiar with all these techniques but was not trying to get into advanced cluster techniques for the OP. Based on what he said, I was trying to simply provide the Yugo upgrade path that would buy him some time. It was more of a "keep this basic concept in mind when you want a cheap scaling path beyond the bigger hardware route".

      I didn't even want to get into the nuances and semantics around the phrase "high availability" or "cluster". Here's a hint - ask a JPL guy what a cluster is and then ask a corporate IT administrator what a cluster is.

      --
      "Fighting the underpants gnomes since 1998!" "Bruce Schneier knows the state of schroedinger's cat"
  67. Scale Up ... & Down by 1110110001 · · Score: 1

    You say you want a scaling application. But in the next sentence you only speak about scaling up. Scaling goes both ways and you want your application to go both ways. That's important. Otherwise you might be stuck with that ten servers you just needed for a usage peak.

  68. I guess that makes sense. by FatSean · · Score: 1

    It's just that where I work, one single Master server is NOT 'good enough' and this solution would be laughed out of the meeting. Multiple masters in different physical locations with automatic failover would be the baseline. I guess I'm not seeing it from their perspective.

    --
    Blar.
  69. What is High Availability to you? by Anonymous Coward · · Score: 0

    "What is high availability to you?" That is the question HP posed in a Service Guard class I was in once. It's a valid question though. I work with mission critical hospital systems in health care and deal with high availability on a medical hosting service. This means in my particular environment, we need 24/7 operation with minimum or no downtime (
    Linux HA Project
    IBM HACMP (High Availability Clustered Multi-processing)
    HP Service Guard for Linux (also available on HPUX)
    Oracle RAC (Real Applications Cluster)

    Those are some ares to start with. If you are doing Oracle, you can create a GRID compute environment which will allow for true clustering and load balancing with Oracle in a shared environment with SAN. Once thing to keep in mind is that a SAN is required for most clustering. RedHat also offers the GFS filesystem which is a true proven clusted filesystem. There is another called GPFS which has been used cross platform as well, but required licensing.

    When it comes to redundant hardware for HA, make sure you support the minimum requirements for heartbeat paths depending on what clustering solution you want. If you use HACMP or Service Guard, you will likely use a SAN HB and at least 1 redundant network path. Also when using a SAN, use multiple HBAs to provide reduncancy with a multi-path software such as dm-multipath (Linux), Securepath (UNIX), HDLM (Unix), MPIO (IBM UNIX), SDD (IBM UNIX). There are plenty of documents on how to do HA under various environments. I recommend looking at some of the IBM redbooks on HACMP and on Clustering. They also have redbooks for Oracle tuning on Linux with POWER, which will give you an idea about how to do Linux Oracle clusters. If you can create a Oracle Metalink account, you can find out some of the tuning and detailed info about Oracle clusters.

    I am sure there are others I am missing, but that covers the base for most clusters. The only other thing is finding a persistent messaging platform (like IBM Websphere MQ - MQ Series) to handle message passing in applications. IPC is good under UNIX for programming, but not as good with clusters, security, or transaction guarantees.

    The only other thing to remember is cost. HA environments do incur costs higher than small unreliable environments. Things like mirrored drives, redundant HBAs, redundant power supplies and power feeds, redundant NICs, etc. People worry about petty things like how likely drives will fail, etc. If you architect your environment properly and build your clusters, you build around that. RAID 5 on your SAN, redundant cards, fault tolerant hardware, better reporting mechanisms (HP and IBM integrate daemons on all their OS's to report potential hardware failures with mid-range to high end servers). Look at what your SLA is and what you have provide and then look for the best, most reliable hardware and software to fit in your budget to provide that. Not everyone can buy millions in hardware and software to run a true mission critical environment.

  70. The users by jgrahn · · Score: 1

    The users of our apps are business professionals who are forced to use them, so they are are more tolerant of access times being a second or two slower than they could be.

    I realize that this is just background to the question, but it strikes me as an odd thing so say:

    • Being forced to do something doesn't mean you tolerate it.
    • Do you realize how stressful it is to be forced to work with slow applications? How harmful it is to your work?

    I have been the victim of numerous really crappy internal applications. It makes me mad, because it shows a lack of respect for my work. What tends to happen in practice, if the users are technically minded, is that they don't use the applications as intended, and invent some primitive system on the side. Excel sheets, and so on.

    And the people who support the real system are usually happily unaware of this. They live in a fantasy land, where things work fairly well and the users are pleased.