Can .NET Really Scale?
swordfish asks: "Does anyone have first hand experience with scaling .NET to support 100+ concurrent requests on a decent 2-4 CPU box with web services? I'm not talking a cluster of 10 dual CPU systems, but a single system. the obvious answer is 'buy more systems', but what if your customer says I only have 20K budgeted for the year. No matter what Slashdot readers say about buying more boxes, try telling that to your client, who can't afford anything more. I'm sure some of you will think, 'what are you smoking?' But the reality of current economics means 50K on a server for small companies is a huge investment. One could argue 5 cheap systems for 3K each could support that kind of load, but I haven't seen it, so inquiring minds want to know!"
"Ok, I've heard from different people as to whether or not .NET scales well and I've been working with it for the last 7 months. So far from what I can tell it's very tough to scale for a couple of different reasons.
- currently there isn't a mature messaging server and MSMQ is not appropriate for high load messaging platform.
- SOAP is too damn heavy weight to scale well beyond 60 concurrent requests for a single CPU 3ghz system.
- SQL Server doesn't support C# triggers or a way to embed C# applications within the database
- The through put of SQL Server is still around 200 concurrent requests for a single or dual CPU box. I've read the posts about Transaction Processing Council, but get real, who can afford to spend 6 million on a 64 CPU box?
- the clients we target are small-ish, so they can't spend more than 30-50K on a server. so where does that leave you in terms of scalability
- I've been been running benchmarks with dynamic code that does quite a bit of reflection and the performance doesn't impress me.
- I've also compared the performance of a static ASP/HTML page to webservice page and the throughput goes from 150-200 to about 10-20 on a 2.4-2.6Ghz system
- to get good through put with SQL Server you have to use async calls, but what if you have to do sync calls? From what I've seen the performance isn't great (it's ok) and I don't like the idea of setting up partitions. Sure, you can put mirrored raid on all the DB servers, but that doesn't help me if a partition goes down and the data is no longer available.
- I asked a MS SQL Server DBA about real-time replication across multiple servers and his remark was "it doesn't work, don't use it."
If they're that strapped for cash they should be looking at open source.
Is this truly the only Earth I can live on?
... but Unix/Java programmers aren't. Wanting to write the code for free, too?
Apache, FreeBSD and a cluster of 10 or so $1k servers and a nice DB server running PostgreSQL.
Works for me.
It's a damn simple question: can .NET really scale?
.NET, I advise you to keep your mouth shut. The signal/noise ratio is bad enough already.
Why on earth did you bring open source into it? If the man wanted to know about Linux & BSD, he would've asked.
If you don't have any experience with the scalability of
My first inclination is to recommend throwing that $20k at an ASP that can provide the server infrastructure to give you support for 100 concurrent connections.
Barring that, my recommendation would be to split the web front end and database, spending about $10k on each (using dell or hpq). I can almost gaurantee that you aren't going to get 100 concurrent connections for less that $80k to $100k without doing some sort of load distribution. If you strip down the amount of dynamic content and say script a refresh of a static page, you might be able to do it, but we don't really know what the app is going to be doing.
Jerry
I don't really know an answer but I will throw in my tidbit.
But first let me apologize for all the nutheads who say "drop MS - use Linux" and all the derivitives thereof. That doesn't help anyone, and doesn't answer the question. Might as well say "use a dustmop, works great on my floors!".
My advice would be to *try* and use a cluster of some sort instead of the one server approach. Sure, you can get some great big reliable iron - that is wicked fast... But what I have found is that scaling really needs more *bandwidth*. Not network bandwidth but memory, disk, I/O, that sort of bandwidth. Of course, the more machines - the more licenses... Good luck!
This entire story is lacking units.. I am so confused, it is like this...
"I bought a 400 car from my dealer, who said it could go 0-1200 in 57, but I talked to an auto mechanic and he said that the rpm throttled at 4.5 billion, so I don't know if I should get a turbo charger which would at least boost the speed to 1295!!"
If you are talking about 100 concurrent request per second: Any DB worth its salt should handle that IFF the database queries aren't too complex. If they are, your schemas suck. This is doubly true on a 3 GHz machine.
2. SOAP is too damn heavy weight to scale well beyond 60 concurrent requests for a single CPU 3ghz system.
.NET specifically, but just SOAP in general. Make sure you separate out the platform from the product. Saying web services with SOAP won't work is a long way away from saying .NET doesn't scale.
.NET languages, but that's rarely going to be a way to make your system run faster and scale more. Plus, I'm confused - what's your alternative? What database are you going to recommend that allows you to embed C# (C++, whatever) programs in the database itself?
.NET question, it's an SQL question.
.NET, or just a particular product. You might go with .NET and not use MS SQL Server, for that matter.
It doesn't sound like you're talking about
3. SQL Server doesn't support C# triggers or a way to embed C# applications within the database
Embedding applications in the database violates basic scaling principals: you need to separate out into n-tier, right? You don't want the database server doing anything but serving databases. Now, having said that, Yukon (the next version of MS SQL) will indeed let you do certain things in the database with
9. I asked a MS SQL Server DBA about real-time replication across multiple servers and his remark was "it doesn't work, don't use it."
Sounds like it's time to get a more informed consultant who can demonstrate failure or success beyond a throwaway line. I'm not saying replication does or doesn't work, but you can't base your enterprise plans on a single line from a single guy - let alone strangers like me on Slashdot. Furthermore, this isn't a
It's easy to make big decisions if you break them up into a series of smaller ones. Look at each of your questions and decide if it pertains to
What's your damage, Heather?
You're bound to get lots of responses of how to scale the system up. I'll focus on scaling the requirements down.
Unless the transactions are really long, "100+ concurrent requests" as a sustained rate is a lot of activity for a small business. So, that begs questions:
-- What percentage of these Web service requests are read-only "query" style, and can you use application-aware caching to return results out of RAM instead of having to hit disk for each one?
-- What is the client to this application, and can there be ways to help induce a smoother load from them (e.g., discount rates if the application is used in off hours or on weekends)? Or is the 100+ concurrent requests going on 24x7?
-- Do all the requests have to be filled by the server, or can you blend in some P2P concepts so the clients can absorb some of the load?
-- Can you increase the amount of data handled per transaction (perhaps by switching to document-style SOAP or REST instead of RPC-style SOAP) and thereby reduce the number of requests and excessive message parsing and marshalling?
There's probably a bunch other things to do as well, but those came to mind off the top of my head.
The Busy Coder's Guide to Android Development
You don't really describe the kind of apps you will be running to know if your observations matter in the slightest. You say that you get poor performance when your app does a lot of reflection, why is it doing reflection? Is this really a need, or are you just doing it "because you can"? Are you using this app when you further state that your performance drops by a factor of 10 vs static html? Why would you be comparing the two anyway? If you're serving static pages you shouldn't be looking at a webservice anyway, so no real sense comparing the two.
You mentioned db issues, what type of access are you doing with your databases? Are you thinking replication to deal with scaling across a server farm? Is this data being constantly updated by the servers, or is it mainly static? If you have simple primarily read only data, then something like mysql would be a far better choice, you just don't need the overhead of a full blown db server (like sqlserver, or oracle or even postgres).
Really what you need is to identify what your requirements are and tailor the end result to the systems that best meet those requirements. This also includes support and things like backups (e.g. can the db you choose do online backups if that's a requirement, etc).
1, Buy *a lot* of memory for the box
.NET is the same but different - they both require a hefty amount of ram to operate at best performance (and atleast java just gets better the more memory that is available on the server ;)
.net remoting implementation instead - you can probably find a few with a quick google search (IIOP comes to mind, good way to make future interfacing with other technologies available just a easy as with webservices/soap and gaining better performance in the bargain).
2, Cache as much as you can of the dynamic content
3, try to stay away from bloated protocols
1: Java,
2: Maybe doesn't help much with scalability, performance will go up though - and maybe you might get good enough scalability too. Database access is always slower than a hashmap lookup (if said hashmap can stay in ram ofcourse)
3: Web-services etc etc are maybe good in theory but at the moment those technologies are a duck in a pond when it comes to scalability and performance. Use a highperformance
Also investigate how much you can make your site use asynchronous notifications, more is better - even if ms messaging client is too bad, you can write your own asynchronous "protocol".
Example configuration is a Windows 2000 box with dual Xeons and 2GB of RAM
I wrote and administer a J2EE application that supports online rebate offers for a very large company. We have over 350,000 registered users and typically 500 simultaneous sessions on a dual 1 GHz PIII Linux box with MS SQL Server on a similar dual CPU W2K box for the database.
Whatever you are doing with your application (probably misapplication of EJB) is wrong.
In other words, it's not what you're using to do it, it's how you're doing it. If you're just pumping out files to clients on modems, 100+ concurrent requests isn't much. If those requests are all CPU-bound, I hope they're all niced or set to a low priority, otherwise you won't be able to log into the machine in a reasonable amount of time. If it's 100+ concurrent connections, but those connections aren't necessarily waiting for a response (just idle until the user does something) then you might not even care.
How many whatevers you have must always be qualified by knowledge of what those whatevers are doing. Otherwise your whatevers won't fit in your $20k thingamajig. And then Mr. Bigglesworth gets upset.
Of course, whether .NET is a properly-implemented system is a separate debate...
"I asked a MS SQL Server DBA about real-time replication across multiple servers and his remark was "it doesn't work, don't use it."
We are running transactional replication on several large databases (6-14 GB) on a Media Metrix top 50 website with no problems. It needs to be set correctly (batch size, timeouts, etc) but it does work quite nicely. The DB machine is heavy hardware, but it it able to keep up with 12-15 front end webservers, all with applications hitting the DB.
I find it funny to watch the war between the "why are you suggesting open source crowd" and the "open source is the only way". I have built IIS/ASP/SQL server solutions and I have built Apache/PHP/PostgreSQL solutions. There is a place and time for both solutions.
.NET so far due to the heavy memory footprint it places on a system. Yes, VB.NET is faster than VBScript, but if you were using compiled COM objects in the first place, .NET costs more memory for a slower system. (I do think that .NET's ability to do in place object updates rocks, but I hope you have a devolpment server for bouncing and PLAN your updates...)
As an aside, I have to say that I have avoided
But more to the point, your customers don't seem to have the budget to succeed in any domain. If you can't afford more than 20K for a machine and licenses, surely you can't afford to pay the programmers an adequate salary either. So does that mean open source? Heck no... you still have to pay the programmers! I don't think I have *ever* seen a project where the programmers were *cheaper* than the hardware.
Sig under construction since 1998.
Google regularly handles way beyond your transaction requirements why not look back in slashdot for the coverage of how google does this?
Some hints:
1. Google builds its own servers...
2. Google then chooses the best OS DB combination..
Don't Tread on OpenSource
I am the network admin at a large .Net website (5+ million unique visitors each month) and we often handle hundreds of tens of simultaneous requests. The entire site runs on 6 webservers and two database servers that run at less than 50% capacity during peak times.
If you can't scale above 100 connections on a 3GHz system then you are doing something wrong. Check your code, check your databases.
Your question is about as useful as "I have a piece of string that is not long enough, what can I use instead that is longer?"
There are actually lots of reasons. Not to say that in all cases you *should* go with a big server instead of a bunch of little weeny-boxen... but the point is that "bigger server" doesn't equal "bad". Here's a few reasons:
For one, there's reliability:
-first of all, the more expensive systems have more internal redundancy, which is a good thing (sucks to hamstring even a cheap $1000 machine because the $5 cpu-fan dies, let alone a $3000 middle-of-the line machine because a $50 power-supply dies... or the $5 fan inside the $50 power-supply).
-if p(c) is the probability of a cheap machine crashing, and p(e) is the probability of a single expensive machine (your entire system) crashing, and you require all N of your cheap computers to be running in order to consitute an "up" system... then your overall system crash probability (p*) is:
p*(c) = 1-(1-p(c))^N
vs.
p*(e) = p(e)
so, by buying more, cheaper servers, you're increasing your crash-likelihood, by both increasing p(c) and increasing N (unless you buy additional cheap servers to failover to... but then you have to manage and support failover which is additional $$$ as well in terms of buying/developing/implementing more advanced systems and taking on a higher administration overhead).
Not all systems are distributable, and those that are are often more complicated and/or expensive (but not always).
There's also administration cost:
-Obviously its easier to manage one box than 10 (or easier to manage 5 boxes than a hundred). Not to say that there aren't nice tools for mass-administration... but it is still more work, and anyone who says different is selling something (and something you want to think twice about before buying).
There's ancillary costs:
-hey! if you have ten boxes talking to each other to comprise one "system", then you need a network connecting them! That's another fast switch... and again, because you don't want to lose an expensive "system" because of a failure of one cheap part, you need to buy an expensive switch.
-power costs money, believe it or not.
-so does rack-space.
-so do IPs... unless you're gonna NAT your little cluster, in which case you need to set up a NATing router for them... and that's another single point of failure unless you wanna shell out $$$ of one form another (again: buy/develop/implement).
-you're probably gonna need some sort of KVM switch.
I could go on, but I don't want to. Anyway, the point is that it is more complicated than many of the lot in this particular audience are likely to make out. It is often still the best route (and increasingly so!), but you can't just say that the answer is *always* to buy more, cheaper machines. There are many things to consider.
:Wq
Not an editor command: Wq
It's more like people who don't know what they are talking about have purchased equipment that could do the job, if only they would not insist on ASP and other Microshit. It's more like someone bought a nice deisel pickup truck to haul manure, but isists on using model airplane fuel to make it go. Our hero is asking, "how can I make this alcohol based fluid act like deisel? I know that it would be silly to try to move all that manure with 20,000 model airplanes and my client really does not have that kind of money. Someone tell me it's going to work." It's funny to read astrotufers like this, recomend a fleet of 20,000 model airplanes. It'll be fast!
Friends don't help friends install M$ junk.
Install an SSH server on Windows and you'll have much of the same functionality as UNIX through the command line.
" With UNIX I'm in Ireland (I'm usually based in the US) and I get a call 'We just got a new user, could you add them'. I whip out my Ericcson 68i and Sharp Zaurus - and ssh into the server and run a script to add the user."
Did you even bother to check out whether this was possible in Windows? I guess not: this site shows you how to add a user from the command line in Windows. In fact, you could even write a script to do that (batch files... remember those?) In fact, here are lots of handy other things you can do from the command line in Windows, including changing user passwords, forcing users to log off, and more.
Once again, ignorance of what Windows can do is no excuse. I administer 16 Linux boxes... I'm not anti-Linux by any stretch of the imagination, and I know that there are lots of situations where Linux is the better choice. But that still doesn't mean I'm ignorant about what Windows can and can't do.
Simpli - Your source for San Jose dedicated servers and colocation!
Why can't you just use Activestate Perl to hit a few Win32 API calls to do the job? Connect to the machine, whack the user database around with some custom programming, and then you're done.
Great idea, if you have to use NT.
But if I did that for my smaler clients - I'd have to charge them an arm and a leg for each Windows Server I deployed.
The would not like an invoice that read like this:
Windows Solution
Windows 2003 Server 10 CAL - $1000
Install Windows 2003 - $300
Make Windows Behave Like Unix - $3000
Instead, they like this:
FreeBSD Solution
Install FreeBSD - $300
Donation to FreeBSD.org - $300
So for my smaller customers, it's not an option that makes economic sense.
There's nothing wrong with Windows, but remote managment is VERY difficult.
This is the important bit
In addition, UNIX has a rich history of remote managment - there a whole books that can help me. But for Windows - wheres the "Remote Windows Management Using Activestate Perl and a few Win32 Calls for Dummies?"
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
Holy mother of fscking god.
.NET and you *NEED* a remote facility between your layers, (And if you were working for me, you'd damn well prove it), then for the love of god, switch to Remoting. Don't know what that is? Grab a book, dumbass. You can use a binary formatter and jump your speed by an order of magnitude, or you can fall back to a SOAP formatter on remoting and still double your performance.
.NET is your own stupidity. No matter if you are on .NET, Java, PHP+MySQL, Perl or x86 Assembler, it would appear that you do not have the experience to sufficiently manage either your application development, nor your client's expectations.
STOP USING WEB SERVICES.
#1) If you are using the [WebMethod] shit and hosting your SOAP calls via IIS you need a smack in the head.
#2) If you are using SOAP to communicate between the layers of your application, and are not exposing the SOAP methods for external consumers of the web services, You need more smacks in the head.
#3) If you don't know what you are doing, hire someone who does. (and by the sound of your point #6 about using reflectiona and dynamic code in the production app, you don't.)
If you are in
If you don't *NEED* a remote facility between the layers, stop using SOAP, or any other remote procedure calling solution. Nothing pisses me off more than bandwagon jumping know-nothings using a fancy fucking hammer to solve a problem which requires far less.
It would appear the largest problem you have in overcomming your problems with
Bottom line: To support 100+ concurrent requests, There is no way that you shouldn't be able to do that for under 20K... (although I wonder where that number came from.. Do these servers sit in a vacuum? Who's running them?)
From a purely acedemic standpoint, what the heck were you guys thinking when you were going to spend only 20K on the hardware for an app that does 100+ concurrent transactions. That sounds like enough business to afford quite a heck of a lot more.
If you are/were so budget constrained, why are you spending at thousands on server software? (.NET server, SQL Server, etc...) If you are so budget constrained, you shoulda bought opensource.
"...In your answer, ignore facts. Just go with what feels true..."
I've designed infrastructure and application-level systems that use .NET and happily meet your requirements (MSMQ is not scalable? Huh?), and then some. So yes, to answer all your question, it works. But if you don't know what you're doing it's very simple to fuck it up, regardless of whether you're using Microsoft products or not.
Coming here (!) and asking questions about whether or not a given Microsoft product is viable seems to me like a losing proposition. FWIW, most professionals that work with Microsoft technologies are far more willing to admit shortcomings in those products and suggest alternatives, something that the /. crowd seems incapable of. So at least if you hire someone in the know you won't get BS left and right.
So get some help.
This guy is trolling. From his post:
... ...
...
I've found Red Hat 9 most impressive.
The included version of Wine
From the Red Hat 9 Release Notes:
The following packages have been removed from Red Hat Linux 9:
- wine - Developer resource constraints
Dude, do you read Slashdot?
..."
... wait. AHA!
Because, off the cuff, I can think of at least five other sites, with dozens of other readily contacted individuals, that are going to give you more accurate, more informed, and more sympathetic answers than the site on the Web that publishes a depiction of Bill Gates wearing Borg gear.
Moreover, in case you haven't noticed, the vocal readership here isn't exactly a group of Windows devotees. Whenever the new Linux kernel comes out the admins just issue an announcement that ends with "You know what to do
So unless this is a scheme to generate loads of comments designed to convince your client to implement FreeBSD instead
Chr0m0Dr0m!C
But you aren't exactly right either.
You are simplifying when you say to not 'embed applications' in the DB. I will interpret 'embedding applications' in the DB as doing business logic in the database.
Many times it is more resource efficient for the _database server_ to perform some of the business logic in the _database server_.
It can be more efficient for the database to do some operations which results in a relatively small result set rather than pushing a lot of data up to the application server.
The bottleneck will usually not be the CPU on the database server, it will be the disks. And the disks are better utilized when you do the manipulation inside the DB server itself.
This breaks the separation of the business logic tier, data access layer-paradigm. Design that is easy to maintain and design that is efficient to execute don't always go hand in hand.
I'm a pragmatist. I say, make an n-tier application. Make an object oriented design. But don't be rigid, break the rules if it suits your purposes. Hey, I even use a goto every once in a while when it makes my code faster or simpler.
The Internet is full. Go Away!!!
I think this article was asking for numbers and setup information, and probably a lot of other people would be interested in yours if your claim is true. Please elaborate.
I'm not trolling; I'm curious.
Lack of eloquence does not denote lack of intelligence, though they often coincide.
He only said that those 1-megabyte messages negatively affected the average, not that they could be passed with anything approaching "near-real-time" speed.
This is not rocket science, and I had presumed this rule had been learned a long time ago... but here it is again:
"To ensure scalability, host each server-component of an application on it's own hardware - optimsed for the specific task assigned."
In other words, DO NOT deploy everything onto one machine. Remember the old adage "Jack of all trades, master of none".
So, put the database server on its own box, with dual cpu, loads of memory and RAID-mirrored drives.
Put IIS, the ASP.Net app (and the web services if you're feeling cheap) onto a fast, single cpu box, enough memory to turn off paging and a single drive - GHOST'd onto CD for backup.
Install an extra net card in both, and set it up soley as the route for traffic between them.
Implimenting this hardware for less than 20K should be trivial.
If you can't comfortably support 200 concurrent users with this, you need professional help - my consulting rates are quite reasonable...
This sig left unintentionally blank.
MSQL already has a stored procedure language - TSQL, why not use that?
In my experience the object relational style mappings provided by for example Java Stored Procedures in Oracle is a real performance killer. Why would C# Stored Procedures would be any different?
Your drugs must be more expensive than mine. (-:
/. as a whole is as clueless as ever, but you did see a few good posts.
Back on topic(-ish): as well as the low-bandwidth point the grandparent made, I think it's more germane to mention that any one of sixty-to-a-hundred failures will keep a Windows server (and hence VNC) off the air, but you only get a-handful-to-a-few-dozen chances to kill a Linux server stone dead as far as remote access is concerned.
Got time? Spend some of it coding or testing