Building/Testing of a High Traffic Infrastructure?
New Breeze asks: "I'm currently working on my first web 'application', and have discovered that I know less than nothing about setting up the infrastructure to manage a high traffic system.
Where does one go to learn about setting up the infrastructure required to host something like Slashdot? Or do you just say, 'Not my area!' and help them find a consultant?"
"My experience is pretty much limited to:
I haven't a clue. The last place I worked with on something like this hired a high dollar consultant who spend a huge pile of their money setting up a load balanced, oracle parallel server redundant everything system.
How do you test it? I've worked where they actually had a room with hundreds of systems on racks that they would configured to run test transactions against different servers and software builds for stress testing, but that's not in my budget..."
1. Install the web server on one box, the database on the same box if it's a small installation or a separate box if performance seems like it will need it. Add more memory and processors based on SWAG criteria. (Scientific Wild Ass Guess)I had a potential customer ask what I would recommend if they wanted to self host, they have around 300 remote locations and would have multiple users from each location hitting the application at the same time, so saying a couple of beefy servers probably isn't the right answer.
2. Contract with a hosting company.
I haven't a clue. The last place I worked with on something like this hired a high dollar consultant who spend a huge pile of their money setting up a load balanced, oracle parallel server redundant everything system.
How do you test it? I've worked where they actually had a room with hundreds of systems on racks that they would configured to run test transactions against different servers and software builds for stress testing, but that's not in my budget..."
That one was easy, ...Next
Seriously...they know all about serving up content on high traffic sites. Not only is it high traffic, but it's rather big files that they're delivering. When we're testing the networks that we set up, both wired and wireless, we often visit pr0n sites for our benchmarks.
"He uses statistics as a drunken man uses lampposts...for support rather than illumination." - Andrew Lang
Post a URL here and we'll help.
-- MarkusQ
P.S. Clever use of the text describing the link may help you control how much trafic you get, from low ("M. Moore Nude!") to high ("SCO caught robbing courthouse").
My sites have crashed under a 1200 user load, and sometimes act a little fruity due to databse concurrency issues even if they don't crash... I'd like to know the answer to your question too
Speak for yourself.
Well, this sounds like it's pretty big. So load balancing is probably part of the answer. You can pretty much load-balance anything from firewalls to database-servers. That includes web servers, yes.
The most "difficult" (read: expensive) part is setting up storage that is both large, reliable and fast enough to support everything.
1. Submit Link to slashdot with your webserver hosting a lot of large video files supporting the link.
2. Have link approved (Note - duplicate any story just posted is probably the best way to get approval and lots of people crying dupe)
3. Learn what caused the webserver to melt and how long it took to melt.
4. Fix the problem that caused step #3
5. Repeat 1-4 until server doesn't melt.
6. Congrats! You've learned how to host a high demand web server.
P.S. When I first tried to read this story, I got "Nothing for you to see here. Please move along" ... somewhat ironic I'd say ...
Hulk SMASH Celiac Disease
This particular area, given it's critical nature to one's business is something that an Unexperieced IT Engineer(s) should leave to test and decide on themselves. There are many folks out there who can give you suggestions or even an essesment in writting for a nominal fee.
Seriously, what kind of Slashdot reader are you, if you don't already know someone who could help you test / recommend load.
Go to one of your regions LUGs or them silly Slashdot meetup.com's (-;
Gamblers Forum
These are very obvious links to a shock site, ignore them and mod parent down. Seriously, AC, don't you get tired of this?
Then you need to hire a consultant.
I won't ask it then. Thanks for the warning, that was a close call.
it really depends on what you need.
In my experience though hardware (especially memory) and bandwidth come before a superoptimized software front-end & database.
A good introduction I can recommend is called "Developing IP-Based Services: Solutions for Service Providers and Vendors" - I forget who wrote it. But definatly worth reading on the subject.
Please don't tell him. We don't need another slashdot. Servers worlwide surrender
OK.It's easy. There are three steps involved
1.Build a low performance infrastructure.
2.Put a RT sticker and chromed exhaust pipes
3.Done
You have a serious problem. I suggest you start accept their newfound sexuality and let them be who they are... they say its genetic, and anyway its unlikely your efforts to make them more masculine will only backfire.
First step, do the math.
What was once a "high volume" app may be nothing for modern equipment. You're talking about on the order of 1K concurrent users (300 sites * several users per site).
If "use" means manually typing data into forms, viewing mostly static pages, etc. this isn't really a very "high volume" application, and a single decent server should handle it.
If, on the other hand, "use" means constantly running complex queries against a billion item data set, you're doomed.
So where do you fall in this spectrum?
Coming up next...where's the bottleneck?
-- MarkusQ
If you want to throw some serious load at your equipment, get a few other systems saturating your network with Apache Benchmark (ab) requests. It gives lots of useful data, like response times, etc. . And you're best off toppling the application and trying to find the cause that it failed and working on that as someone already suggested. The rinse and repeat.
Looks like Apache has updated their tools since the last time I had to do this...
http://httpd.apache.org/test/
Check out what those guys do at Wikipedia. Don't forget to look at their useful links at the bottom.
Or maybe it's overkill.
If your sites are acting fruity...I suggest you accept their newfound sexuality and let them be who they are... they say its genetic, and its likely your efforts to make them more masculine will only backfire.
I'm currently working cooking in a restaurant, and have discovered that I know less than nothing about performing stomach surgery. Where does one go to learn about the techniques and tools necessary for curing stomach cancer? Or do you just say, 'Not my area!' and help them find an oncologist?
Seriously.. you have a lot to learn, and a lot of what you need to know just comes from experience which you can't get from a book.
First: learn how everything works. When you click a link in your "application" (why the quotes?), what happens? For instance, does it run a Controller object? If you're using a language like Ruby or Perl, is it "pre-compiled" or does it have to interpret a script on each hit? Does the controller then go to the database and populate variables, then insert them into a template, then render the template? Is the template cached? How are your database settings? Enough memory for joins? Are all your queries using the appropriate indexes? Are you familiar with your database's performance-measuring variables and tools? Are you pulling more data than you need to in each query?
Once you have an understanding of what's happening, then you can start measuring. Where are the bottlenecks? This is a very important thing to keep in mind in programming or system architecture: DON'T OPTIMIZE UNLESS YOU NEED TO! Keep your system and code as simple as possible. For instance don't cache things in your program (making it more complicated and harder to maintain) unless you have a BENCHMARK IN HAND showing a performance bottleneck.
You might not need to move your database to another machine. What you need to do depends on your app.
Yes, you will need to do a lot of testing to identify your "first round" of bottlenecks. You need to build a lot of diagnostics into your app to help you identify how long different steps take.
Always deploy your app in stages, one site at a time, until you start identifying some problems. Then fix those problems before continuing deployment. Never "flip a switch" and reveal any change all at once.
Good luck!
First off, if this is a "must succeed with no problems" project, all bets are off -- hire an experienced consultant so you have someone to blame. Also, this technique only works when you have the type of site which will *build up* to expected load -- not get turned on instantly.
This is tough to generalize without knowing specifics, but here goes:
1. Make sure your application can work correctly when load balanced across multiple boxes
2. Keep webserving and DB work on different machines
3. Make sure your application can work with another database without much work (this gives you the option to hire, say, an Oracle DBA and buy an Oracle license if MySQL can't keep up.. does it even support row-locking yet?)
4. Have extra hardware handy, in the rack. Do NOT turn it on yet.
5. Observe the application running; determine bottlenecks, tune
6. If you can't tune it to perform adequately, NOW is the time to break out the extra hardware while re-evaluating the implementation.
If you throw all your hardware at the problem at once, you get very little warning when the shit starts to hit the fan, and no response scenario. Do NOT make that mistake. Load, test, tune, repair, repeat.
Do daemons dream of electric sleep()?
this is a very interesting topic. I just just started my new job where i was coming from an internship previously. There we had a web server, database server, a devbox and a log processing box for webtrends analysis. But now at my new job im being introduced to high level PIX boxes, F5 load balancers, redudant web servers, transaction servers, etc. One thing i just learned the other day is that they use the F5 to handle SSL encryption/decryption instead of relying on the webservers. I never knew that was possible. But eventually i want be able to do all that my boss does right now. Anything less is less than perfection...MUAHAHA.
1: Gradual growth. Find bottleneck, remove it. Repeat. Make sure to start with a growable database and web site technology, but that shouldn't be too tough. Also, stay ahead of the game, always with overcapacity, both to cover for outages and for sudden growth spurts.
2: Instant growth from 0 to thousands+: Hire someone who knows what they are doing. In the first scenario, you have the time to learn what is actually going on, which is an advantage. In this one, you don't, and the customer base is to big (i.e., $$$) to screw with.
That basically covers it. Specific advice will vary widely based on databases and web technology deployed, so just about any other specific advice you get here is as likely to be wrong for you as right.
CDN from the likes of Savvis (think: Digital Island) or Akamai (Buyer Beware here) all but completely alleviate flash-crowd pain. Ask for a free trial or a trail period with no commitment as see what I mean. In fact, you're an idiot for not at least looking into this.
I run a site which peaks above 5,000 page views/second. That part is static, and runs thttpd. No problems at all.
The other part is dynamic. It runs on Apache (load balanced, no problem) with a PostgreSQL server. If you don't need it's features, "just say no"!
It is the single part in our system that causes most problems. When your tables grow semi-large (less than 800k rows) and you do a few joins, it chooses strange - and slooooow - ways to execute your queries. Combine that with a few journalists who wants to insert and update articles, and you have a sysadms worst nightmare.
I think that the most important thing is to first have a site and worry as it grows... :)
http://www.terratoday.com - Environmental news, discussions & more!
By the way, how do you people install and configure a server with Apache + Mysql + Php + mod_perl + mod_perl for a server with a lot of bandwithg?
And what do you do to maitain that server secure? I mean, how do you protect the server so that users can't get out of their homedir using php/perl scripts? Does a chroot solve the problem?
What constitutes 'high traffic' for you?
I've been developing a high traffic site (well, maybe medium traffic) at about 1.5 million transactions per month. We have customers using the site all over North America, plus a few in Europe and Asia, and the whole thing is hosted internally off of our 10MB link.
We have each 'tier' clustered as a pair of servers - 1Ghz/256M is more than sufficient for our 2 Apache servers. 3Ghz/1GB is our Tomcat tier, and I'm not sure what the DB runs on, but they're the beefiest servers of all the tiers.
Within the app architecture, try to ensure that you can scale to more servers. We have the ability to add more servers to any of the above tiers without any changes, plus any long-running processes (complicated reports and such) get dispatched to a fourth layer of servers we call 'backend' (by RMI). These 'backend' servers can be low-end (300mhz/256M are fine), because they run non-time-critical tasks and generally might email their results or whatever.
In this way, we've avoided the EJB complications while also having full redundancy at every level. There was some custom framework involved, but it's been working well. Our application was complex enough to warrant an advanced framework (similar to Struts, except we wrote ours before Struts came out), yet EJB seemed too heavy for what we wanted to accomplish. Of course it didn't hurt that the only thing we paid licenses for was the DB.
Importantly, though, this was the right solution *for us*. It's serving us well, and already scaling well beyond the number of customers we originally anticipated would be using it. While this meets our needs fairly well, it may or may not be the right type of solution for what you're looking for, particularly because I don't know what your application is supposed to do.
You can accomplish anything you set your mind to. The impossible just takes a little longer.
dear /.:
please do my job for me! thanks!
AC
These are some very basic thoughts on the subject. They may not be 100% right for you, but will get you thinking in the right way:
Rule 1 - Three tier archictecture is popular for a reason - it works. Offload user interface (web) to dedicated boxes, make application itself run on separate boxes and make database separate
Rule 2 - When possible, scale horizontaly not vertically. Make sure your application is as stateless as possible and is capable of you just dropping in an extra server when needed without a lot of reconfiguring. Make sure you can survive a loss of a server without loss of data. Lots of cheap servers will most always work out better (and cheaper) than one big ass box.
Rule 3 - Make as much of your application as static as possible. Even pseudostatic data (something that updates every minute or so) should be made static and have a process re-generating it every minute or so. Not wasting your CPU time to render a menu or something on every hit will add up fast under heavy stress.
Rule 4 - Strip your HTML. For example, some crappier web languages (think ColdFusion) have a tendancy of inserting spaces for every line of code etc. A large application running CF (dont ask) would insert enough spaces to make a simple page hundreds of kb in size. Just turning on "the write to output only on demand" option will drop size of the page to next to nothing. So know what it is that you are producing on output and make sure it is lean. Turning on server side compression solves this better, however adds to CPU requirements. On trully stateless web servers this just mean you need more web servers. So MAKE YOUR WEB SERVERS STATELESS.
Rule 5 - Know how many users your upstream connection can handle (in simplest terms - average size of HTML communication * number of users) and make sure you do not exceed it. Limit your connectivity at load ballancer. Having some users not be able to access your site is better than having ALL users not be able to access your site. Make sure you get plenty of bandwidth to spare. If you are setting up a multi-site presence, make sure your intersite communication is a - not going over same line as incoming and b - has sufficient bandwith and latency to serve the traffic.
Rule 6 - Professional load testing tools cost big bucks. But if you are carefull you can fake it with some open source software. Google it. When testing remember to take into consideration the limitation of your tester system and bandwidth.
-Em
RelevantElephants: A Somatic WebComic...
Hire a consultant to do it right from the start.
If you can't, this is sort of a mischevious way of doing it, but one that can work well in a pinch. Get your basic requirements down in writing (bandwidth, OS and app software, server requirements, disk space, backup scheme, etc.) and then contact one of the high-end services like Rackspace to ask for a proposal for their services based on those requirements. In the resulting conversations, you'll learn a lot about what kind of infrastructure is "standard" (on the high end anyway), what kinds of costs could be involved, and you'll get what is practically a checklist for what you should consider setting up on your own. Whether or not you use the high-end service or go off and set up your own homegrown setup, you'll find yourself a lot more educated. (And then, of course, you should find ways to shuffle some paying business to the poor sales person whose time you wasted.)
Silas
You should'nt have clicked on the link, just copied the URL into your browser. Sheesh. Even the simplest gags still work around her.
It may or may not be a great idea depending on your situation. For one - the cost of SSL card for F5 is so high, it may be easier to just get extra servers. For another, I work with some banking applications and having data sent cleartext, even on an inside network directly connected to load balancers is NOT a valid option.
However if local security can be ignored and you have the money to spend, F5's offer a nice offload of encryption processing. But then again, so do hardware cards for individual servers.
-Em
RelevantElephants: A Somatic WebComic...
Hire a consultant this time, work with them closely if possible, learn how to do it for next time.
I've worked on a very high traffic system. At one point we were pushing 100MBPS in traffic. I had about 15 servers, 1 database server, and a load balancer. The traffic was mostly static html pages, with a bit of php/mysql for about 1/10th of the traffic.
We had a master database server that was distributed to all the webservers. When reading from the database, each webserver would read it's own local copy. mysql replication kept the data on the local webservers fresh.
Updates to the database were easy as only a small number of users were doing any updates. All updates were able to go through one server and wrote directly to the master database.
The load balancer was managed by the hosting company. It simply made sure that all the webservers shared the traffic load. Any webserver that died for whatever reason would automatically stop getting traffic sent to it.
Need a website host? Try out http://WebQualityHost.net
It's also a good opportunity for people to learn from other's experiences. Christ, man, I don't see why people have to hoard their knowledge. What kind of example does that set?
Everyone has to start somewhere right?
What's your background. There's lots of different ways to solve every problem. I think it's much more of an assessment of what kind of problems you're good at solving. If you think you can conceptualize what your system needs to do, and evaluate different components objectively do it.
Coming from someone who's implemented some massive testing infrastrucutres and custom tools, worked on computational biology frameworks, as well as well as currently working on fault tolerant scalable SIP based telephony systems and protocol development it's really just like any other massive project. Go incrementally and solve one problem at a time. If you're good with databases and know where they excel do it, otherwise use data structures. If you are strong with PERL and apache base it on linux(perhaps with MySQL), versus otherwise go to a bookstore, pick up books on a couple easy components and stick with what you're good at. I personally also recommend actually getting maintinence on open source products you're not incredibly familiar with as a little help goes a long way.
So anyway, again, above all, go with what you're good at. If you give some more details perhaps people can make some more concrete recomendations.
There is a reason why this is a specialty. There isn't a clear answer.
The answer depends on many factors such as:
- how heavy are the pages (many pictures?)
- what's the platform (Lamp/J2EE/etc....)
- how is the usage?, if someone gives you a figure for concurrent users, ask yourself what they mean by that. Some apps have users contstantly submitting, others once in a few minutes
- how are they connected? Reverse proxy can really help for slow connections!
- if you have performance problems, investigate where the pain really is. Is it the (R)DBMS, or the app server, memory IO.
- etc. etc.
Most of all: test! Get something like grinder, or opensta and put some serious load and stress on the setup. See where it hurts.
Make sure that if you have a problem, you actually fix the right problem. It is ok to add hardware, but you have to know what hardware to get.
Also many problems can be handled by configuration, such as preventing the system to come to crashing halt by limiting the amount of connections to the amount you can handle.
Look overhere Perl strategy doc It has some good advice that will help you also in non perl environments.
---
Accommodating "high traffic" that is mostly bandwidth intensive is quite a different problem than accommodating traffic that is database intensive.
Here's what I do: Bitty Browser & Andromeda
There's no shortcut or substitute for good profiling and benchmarking of your application. If you're doing anything mission critical, SWAG flat out isn't going to cut it. You need to PROFILE to figure out what resources your app uses so you can (a) tune and (b) allocate appropriately. For instance, if your app is making a lot of database queries you can look at ways to cut those down (such as caching responses where possible). And you know you'll need fairly beefy database servers (or conversely, that you can get away hosting the database on the same box handling the web front end).
BENCHMARKING allows you to size the hardware apropriately. This needs to be done scientifically. Set up whatever architechture - benchmark - if it doesn't meet the expected load, plus reasonable headroom for future growth, plus reasonable slop for load spikes, you can use your profiling results to help spot bottlenecks. Consversely, if you're getting "too good" performance (e.g. some servers are staying idle) you'll know where you can safely cut. The key here is to handle it scientifically. Measure, vary only one variable, then measure again. Rinse. Repeat.
Even if your company goes with a consultant, you need to be deeply involved in the process. Web application performance is too deeply tied to the application (duh) to allow independant evaluation of the infrastructure requirements. You need top to bottom approach, from application architechture, to implementation, to deployment, to get it right. All steps of the process are going to impact performance.
that inevitably brings out all the morons who will tell you to spend more money because that's all they know how to do.
Why not just try going with less and seeing what happens. I have run several PHP-Nuke sites off of a P166 and Knoppix, yeah that's right, from the CD! no freakin' hard drive at all. And this is off a home DSL line with 128K upstream and a virtual domain.
Now I confess these sites don't get any significant traffic most of the time, but there are times when I get a few dozen hits an hour and I've never had problems and they've literally been up for years. There is a delay when it hits the CD sometimes, but it's nothing compared to how bad most commercial sites stall while their freaking ad servers choke.
I would think even a moderate desktop PC and a slightly faster DSL line could handle at least hundreds of simultaneous users on a halfway functional LAMP setup.
Cisco's "How to build a datacentre" should give sdome insights.
e.g. Multi-peer BGP'd address space, feeding something like a Cisco 6509, with PIX, IDS, CSS and maybe the SSL modules. A 16-port GigE could then be used for upstream and downstream links, maybe just straight into your ~10 frontend servers, ideally caching reverse-proxies, with connections to another 6509 with GigE, which connects to your content web servers. Obvious databases etc. should also hand of this stuff.
On the side of all this would be a terminal server (Cyclades are good) for "oh shit" access, and preferaably a management network, again, using a totally different switch, and a dedicated line to your Office (preferably 2, one going east, one going west).
Oh and dont' forget power - UPS for the little 30 minute glitches, and generators for really bad times. Good aircon/dust filters and also get some FM200 to make sure that the place doesn't burn to the ground.
After you've got all this, you should be away, but just like software, MEASURE and UNDERSTAND where the bottlenecks are (leased lines, network, firewall, CPU, memory, BUS, DISK, Web server, database etc.etc.) and know what you can do get 50% more out of your current solution.
Enjoy.
Dom De Vitto
Individual building blocks and interconnects are easy to evaluate and once you've done them all you'll have a good idea of the sort of performance to expect. It takes an understanding of how all the pieces work, individually and togetner. It's more work for your brain than...
Brute force. Build it, exercise it, see where it breaks, swap out a block, rinse, repeat.
If you just want things that work, understanding them is the best approach. If you need to convince people with little knowledge and lots of prejudice, the brute force approach is best. Involving them in this manner is more conducive to check signing and referral work to other clueless clients and is, I suspect, the reason we see such idiocy as brute force testing when a little math would reveal.
Now I'm the grandest Tiger in the Jungle!
Jesus what an asshole this parent poster is. Someone asks for advice and this arrogant guy calls them incompetent for not being born with the knowledge. Someone please mod him troll; this is exactly why non-techies think we're all arrogant.
I don't really understand a lot of the stuff you said (I am not a sysadmin). For example:
What does it mean to not scale "vertically"? When I read that, the only thing that comes to mind is to put the boxes next to each other, not on top of each other. From context I gather that horizontally means extra machines, but what does vertically mean?
For "dropping in an extra server when needed without a lot of reconfiguring", what do you mean by "a lot of reconfiguring"? Obviously you need to get the machine, install the os, set up networking, install the web server, setup the web application, point it at the database, etc. How does the application being "stateless" help? I guess, what are some examples of state that an application can have that will make configuring an additional web server difficult?
Concerning the pseudo static data regeneration, what if the thing that was being updated was only accessed once every half-hour on average? I am assuming then that generating the page on demand would be better?
I don't really know what you mean by "MAKE YOUR WEB SERVERS STATELESS". I mean, they have to know if a request just came in, where the data is, what time it is etc, and that stuff gives it state. I am assuming you mean something else by stateless but I cannot figure it out.
Thanks for the help!
<high-level position here>
<name of stupid small company here>
If this is just for internal users and telecommuters then you really need to get an idea of how many people will actually be using the app and then put it on a server and simulate the effects of more and more users until it starts to tax the system. THen you can calculate how many users each server can support at 40-60% load and get that many servers behind a loadbalancing device. If its only few servers you can use a router to run the loadbalancing or get a dedicated load balancing device to do it.
I have had a great experience with Rackspace for managed servers and ServerBeach for unmanaged.
They will hook you up. I have never had a more knowledgeable group of people on the other end of the phone trying to sell me something and later supporting it. Security question, they conference with a security guy, network question, they conference with a ccie.
Oh yeah, and a lot of these places even have biometric scanners to get into the 24/7 monitoring room and the server rooms. They have standard hardware setups that they generally use pre-ghosted installs of FreeBSD or Windows 2000. Of course, everything is RAID-5 and backed up religiously. The best in the business.
I don't respond to AC's.
This is one of those areas that is there is no set answer. There are lots of articles on the topic, but usually on systems larger than you plan to do. Go to user groups, but many in user groups are doing smaller site, but some might be doing what you are.
Main thing is define what you call a lot of traffic. A lot to one person isn't a lot to another.
Then nail down your budget that will be your most defining factor.
Then when designing use a design that is easy to scale. That way if you are off you can scale with little pain.
Personally I would put money into the database server, they can be real pain to scale. The web side design as a farm even if only two web servers to start with. Decide how you plan to load balance. A couple web boxes DNS round robin will do, but bigger you have to look to real load balancing options. Also what is your SLA that will determine how big your farm needs to be or if to keep hot or cold spare boxes around. IF a farm how are you going to keep content in sync? Then power, cooling, Security, and on and on. Its a lot of work, but when done and everyone is happy you can't wait for a even bigger project.
I work for a website that does alot of traffic (it's a specialized industy, and no it's not pr0n). The site pushes about 10Mbit/s from 9-5 during the week through 6 webservers. There are a couple things you need to look at as far as making a site like that work.
The first thing you should do is look at your system and determine what your resource drains are. Do you have a database? Is it read-write or read-only? What are your replication and growth options for that app? That affects your scalability at that point and similarly applies to applications like EJB or other app servers. Do you use sessions? Do you have some sort of session aggregator available so sessions could be accessibly from multiple webservers? There are lots of things like this you need to find. I for example, setup seperate webservers (tux & apache) to handle static and dynamic content so that my DB connections would not be held by processes not using them.
The next thing you need to do, is know how your system is used. You should be able to statistically break this out from your logs by looking at a small set of users during testing. I found that 60% of my hits were to one page, and I knew I had to really optimize that (someone mentioned apache bench, which can work very well for testing single pages). Also, you need to know how parts of your site use your resources. If you have a single DB server and multiple webservers, you don't want anything slowing your DB because that cascades back to your webservers. We have pretty strict performance testing guidelines whenever a part of the site is updated, and I recommend doing your performance testing as you go.
The final thing you need to do, is have a growth plan. Do you know how to setup load balancing for your webservers? Can your DB/app servers be replicated, or do you just need to buy faster hardware as you grow? Do you know your capacity thresholds from your performance testing? If your system is going to grow, you're going to need to be able to answer these questions.
If you make sure you've got your scalability issues known, and you don't lock your self into something that can't grow, you should be ok. Beyond that test for speed under load, and track how your performance changes over time. That will help you know when you need to grow your hardware. HTH.
I run a dynamic (auction) website with 240Mb/s peak traffic. However, I got there by 5 years of removing bottlenecks. Still running on Apache/PHP and MySQL (~60 servers).
To start from zero, I recommend:
1. Do a lot of testing. Try Microsoft Stress Test - a free tool to record macros on IE and replay them on several machines simultaneously, simulating 100s of clients.
2. Redundancy. Use LVS and heartbeat for load balancing and failover. Use database replication as well.
Knowledge comes from Experience, and experience comes from Doing.
Mistakes will be made, They key is in mitigating the effects of those mistakes. Redundancy and Manageability are your two biggest buzzwords here. A good load test and utilization projections are definitely key, but no matter what you think your userbase will be, if it's a public application, you'll almost certainly be wrong. Try to prepare for the most traffic possible.
Redundancy on every level, including switching infrastructure is a very good plan. Any decent server sold can use multiple bonded NICs for reduncancy, if possible design your network such that if a switch fails, your network will fail over to another switch, etc.
I would suggest going to many local datacenters and interviewing each with probing questions relating to your situation. You will find that they are all relatively equal in terms of Standard DC items:
Diversity of route (physical entrance of cabling into the building) and redundant carriers.
Cooling
Power and backup gens
The things they differ on will be the readiness of their NOC team (do you have to fill out a web-form or call a call-center in East St. Louis to get a problem fixed in San Jose, or can you just "call the NOC and somene goes to your cage"), the monitoring/alerting they provide their customers for issues on the datacenter network. Infrastructure-wise, most DC's can provide you with Ping/Power/Pipe, but the service and SLAs are where they get points.
Do a LOT of reading. Depending on your platform, you have many choices. Linux vendors and Microsoft both have good platforms WRT building redundant networks, provided you do your homework.
Which brings you to manageability. Make sure that you have a deployment framework you can live with right from the start. Deploying code by hand is alright when you have 2 sites in IIS x 3 or 4 machines, but it gets hairy when you have 15 sites x 20 webservers. Make sure you can deploy web content, mid-tier apps, etc, with the "click of a button". This helps to ease the possibility of repetitive mistakes being made. Depending on the app, you may have to roll-your-own, but it's worth it.
Scalability. Make sure you pick a DC that can grow with you. If you plan to start out with 4 1u rackmount webservers and maybe a 7u DB, plus some storage array, make sure there is "room to move" in the DC without needing to cross-connect all over their facility with a cage here and a couple cabinets on the other end. Scalability testing by your engineers would be a great plan also. During load testing, if you're planning on using 2 mid-tier servers to process "Project X" from the web-users, set up 6 or 8 and load them up with bogus traffic. See how long it takes to kill your DB server.
Monitoring/analysis. Make sure you have a monitoring system into which you can hook custom monitors and alerts. Of your installation, those parts with the lowest levels of monitoring will be the ones most prone to breakage. Good packages here are NetCool and HP Openview. Expensive though. It's something you can probably write in-house until you need to spend the big bucks for an enterprise package.
Look to do a lot of reading, but break it into chunks. There is (I hope) no book called "Building and Maintaining High Traffic Enterprise Networks, for dummies, vol2". Every network will be different. But if you componentize your search, you will yeild great results. If you look to build your own monitoring or code deployment system, read up on WMI, read Cisco related newsgroups for network layer redundancy, etc.
Consultant is NOT a dirty word. Make sure you hire one for the right reasons. You do not want someone to come in and "make it so". You want someone with more experience than you have to work WITH you to design a network that you understand, can maintain, and which will scale. There's an art to it, hire Chris van Allsburg, not Picasso, Dali or certainly not Poll
I like music
See it at deepspace.linuxbe.com
At the company I work for, we have re-built our entire web site system and internal systems over the last couple of years. We've gone from single processor compaq server with webapp and DB on one to a load-balanced multi-application server [all dual processor] with primary and backup oracle databases. Why - because our traffic [both paying and just visiting] was expanding dramatically all the time. At least now we have loads of headroom in the system to allow a decent level of growth and we can just drop in additional servers if required.
The only was to design this stuff is partially by planning and mainly testing. Ensure your application is lean. Ensure it will scale from one server to ten without any problems for users. Load-test the hell out of it. We used a bank of PC's running Grinder in the end and after a lot of effort, we found the major bottleneck. It required the ADDITIONAL investment of two VERY expensive XML/XLS->XML applications boxes to get round it.
To get back to my first comment, if you are going to do it properly, it is going to cost a lot of money [and save the usual open-source-is-free comments - if you are going to need some serious database capability for example, you are going to pay some serious money]. So if you're budget is insufficient, walk away from it.
Throwing hardware at the problem is usually pretty cost effective - given that consultants are expensive.
I've seen a single (fairly dated) 12 cpu sparc box serve up about 600 simultanous connections for a cgi driven application without faltering.
Get a system that you can ramp up and keep adding processors and ram to and you should be able to handle to load that you are talking about with two boxes (one for FE and one for BE)
Here's my one tip to save some $:
If you have a bunch of redundant equipment, that equipment does not need a whole bunch of built-in redundancy.
Buy 2 good load balancers with redundant power supplies, SCSI disks with hardware RAID. Depending on how much database your app needs that's where your hardest to avoid point of failure will be, look into what slashdot does for high performance, I forget the name of the software but it's a distributed caching type system, linux journal had an article about it and it looked very interesting.
Also, find out how much load you need to support, high traffic means a lot of different things different people. Use siege to slam your setup once you think its good. Make charts and graphs of the data, you won't really understand the data until you try to process it into something that you can explain to someone else.
When it comes down to it your biggest bottleneck will often be your pipe. 2 fast ipvs load balancers, 10 web servers, and 2 big database servers could easily handle more than the 1 ethernet connection your isp provides if you're hosting moderate database sites.
Also, it's very your database performance is going to be the killer once you go into production, design your schema very well, and test it extensively. monitor all your queries which ones cost the most, and optimize them. Have a test AND a stage environment similar if not identical to your production one, and USE IT!
Make sure you know how to use all your tools, there's nothing like trying to search through man pages while your site is down. Make sure you have redundancy in personel, whether all on staff, or consultants. Make schedules and let people know who is responsible.
Oh, and monitor it like crazy, from at least 2 differnet sites that can page you 24x7, and don't ignore your pager at 3am just cuz you're asleep.
Also, make backups like crazy, the largest percentage of your disk and storage in general should be used by backups! Test restores of your backups on your test environment.
One final thing then I'll go, don't be afraid of buying things on eBay. Redundancy is worth more than speed when you're in a 24x7 environment. I really like to buy 3 year old servers and fibre disk arrays there for 1/10th the cost new. These were $30k servers 3-4 years ago, now going for $200-800, they have 3 redundant power supplies, hardware RAID controllers, multiple PCI buses, quad processors, 1-4GB ram, and run great. Also the SAN market on eBay is very saturated with sellers, and you can ignore anything that's close to retail price. I've seen 10x36GB disk fibre arrays with full dual redundant power supplies and controllers for $199 buy it now, and not broken crappy ones. I've got a 10x18GB one, software RAID 0 under Linux I get 95MB/sec sustained (20GB files) reads and writes. (In Windows 2000 server and Windows XP I get about 35MB/sec doing the similar software RAID 0, this is one of many reasons you should ALREADY know not to try to use windows in a production environement on the web)
I'm envious, I love setting stuff like that up! It's my favorite thing to do in IT!!
Whew, have a good day.
My Linux Command of the Day site : LCOD
First off, I'd say you're doing this bass-ackwards. You really should have already answered all these and many other questions before ever laying fingers to keyboard.
It depends on lots of things. Who's going to manage the self-hosted host? If they have an IT dept. maybe they can provide the hardware sizing. In any case you will first need to establish the usage patterns and then go forward from there.
http://tinyurl.com/3t236
... can be obtained by using specialized testing tools like Smartbits (just one example out of thousands of them. Basically, these devices/tools generate manged traffic. So, you can direct one of these devices to your servers/networks and start measuring points of saturation. Just spend few minutes on google searching for testing tools and you will end up with a list of 100s of them to be used in different areas like wired/wireless, security, VoiP, etc ....
---
Evil thrives, when good men do nothing!
"Evil thrives when good men do nothing"
A good alternative is the book by OReilly - Web Performance Tuning (http://www.website-owner.com/books/servers/webtun ing.asp).
Dw.
I have some experience with administration of web sites with very high traffic. My previous experience was with p0rn sites (lots of sites, lots of concurrent accesses). My current job is at Skyrock / Skyblog, that serves about 25 million pages every day.
In both jobs, the infrastructure was extremely similar.
The entry point is one (or more) load balancer.
A load balancer will not only blindly allow you to have multiple backends. It will also accept client connections, buffer the request, get the data from already established (keepalive) sessions, buffer it, and transmit it though large chunks to the client. This, alone, really helps to reduce the number of Apache processes that are taking resources (especially memory) for nothing.
The load balancer can also do other things, like protecting the servers against some attacks, plotting the current workload of every backend, compress HTML pages, etc.
At my previous job, we were using Foundry Serverirons. Now, we are using Zeus ZXTM http://www.zeus.co.uk/ with great success. Although it's very expensive software, it's way cheaper than Foundries, way more configurable, way more user-friendly and we are very pleased with it so far. A single PC handle 300 Mb/s (Linux 2.6 is needed for epoll).
The load balancer can also be configured to send the requests to this or that server according to the request.
Thus, servers are dedicated to specific tasks.
We have a bunch of static servers for static HTML, CSS, images, etc. They run minimal Apache servers, designed for speed, with NPTL and the worker MPM. Non-forking servers like thttpd or lighttpd is also an option. The static servers are mainly old P3 machines, with only 512 Mb RAM.
Then, we have servers for PHP. The Apache they are running is huge (our web sites need a lot of modules), the hosts are dual 3 Ghz Xeon with 2 Gb RAM and there are some other specific tweaks.
Content differentiation is important. It's a waste to spawn huge Apache process to serve static stuff, just because the same host should also be able to serve PHP. Also, tuning (esp. NFS) is very different for static and dynamic content. And as a specialized server often serves the same files, caching is more efficient.
We run Gentoo Linux on all web servers, plus one DragonFlyBSD (mostly for testing).
The same content differentiation is made for SQL server. One SQL server serves one sort of thing, so that caching is efficient. Also don't forget that on x86, Linux and MySQL can hardly use more than 2 Gb of RAM. So with big tables, this is really annoying. We are switching SQL servers to Transtec Opteron-based servers for that.
On high traffic infrastructures, the I/O is often the bottleneck especially if you serve a lot of different content.
For our blog service, we had to buy a Storagetek disk array with 56 disks (fiber channel, 15k) in RAID 10. As NFS would introduce too much delay, we directly plugged two web servers to the controller of the disk array. These web servers are the NFS servers for the PHP servers, but they also directly serve the static content.
The access time of hard disk is really annoying. For shared data, but also for databases. We found that RAID 5 was way too slow (even with the high-end Storagetek/LSI controller) since we have about 1 write for 5 reads. So we had to switch everything to RAID 10. It really performs better, but it's obviously more expensive.
Another bottleneck was the share of PHP sessions between all load-balanced PHP server. We first used a MySQL/InnoDB-based solution, but it poorly scaled. That's why I had to write specific software : Sharedance http://sharedance.pureftpd.org/
In a high-traffic infrastructure, my hint would be to use many modest, but specialized servers over one huge mega-fast server that does everything. This is way more scalable. And easier to manage, even from a financial point of view. You can b
{{.sig}}
You don't mention if you're on the applications side of the world or the network, so I'll cover a little about both.
1. If you're on the app side, make friends with the network side and vice versa. To understand web site management and acceleration, you will need to know about both parts. Making peace with the other team is crucial to a successful site.
2. If you are on the app side, start thinking about concurrency from the start. You're going to have not 2-3 users at the same time, but more like hundreds if not thousands. This means that you can't do things like lock up tables and the like in the database. If at all possible write your application so that users don't need to come back to the same server to track their session information. Make sure each request is tracked quickly and easily. Also, differentiate your static content from the dynamic content -- you'll eventually want to cache the static content and life will be easier with static objects being served out of a known location. And please... please, please, please... make sure your app generates clean HTTP headers. Set your cache controls correctly, don't duplicate headers, don't be a smart-ass with your headers. Just use clean headers. ASSUME that there will be proxies between you and the client. ASSUME that you will not be able to control all of them.
3. Don't forget about megaproxies. Depending on the nature of your site, you're going to have a ton of your users coming from a small handful of addresses. (e.g., AOL) While some megaproxies have fixed the issue of a single user coming out of multiple proxy servers, all have not. This means anything that you use for client IP persistence is broken.
4. Client IP addresses... don't assume you have them. Don't assume they represent a unique user. They don't. Many load balancers/web accelerators also need to act as proxy and will replace the client IP address anyway. (Don't stress about logging -- any reasonable one will insert the client IP address in a HTTP header that you can extract like X-Forwarded-For:)
5. Peak load on your web servers. Apache can go fast, scale, blah blah blah... my ass. It's not the web server or operating system that is going to determine your peak performance. It is your application itself. Be prepared to fess up to the reality that your application peak performance is not going to be hundreds or thousands of requests per second unless you go insane with the optimization. (e.g., write your application into the web server and embed the whole thing into the kernel, etc.) Assume you're more likely going to get a few dozen requests/sec per app server. Keep that in mind as you plan server purchases and scaling.
6. HTTP request does not equal TCP connection. Don't assume that. With HTTP multiplexing like the stuff that Netscaler does (web accelerator), you're going to see most of your requests coming out of a small handful of TCP connections. Make sure your application supports that. Even if you don't use a web accelerator, browsers will do that do. Don't cheat and force the connection closed on every HTTP request, your web server will crap.
7. This is related to 6, but don't forget that web connections are very short lived compared to what the original designers of TCP were thinking about. As a result, you're going to run into cases where you run out of epheral ports (netstat -an will show a ton of ports in TIME_WAIT) even though your machine is idle. This is why HTTP Multiplexing is important -- you don't want a lot of connection churn. Yes, you can tweak your OS settings so that TIME_WAIT expires quickly, but that isn't going to help your overall performance. (TCP connection setup/teardown is a huge burden on a HTTP request that may only span a few packets...)
8. Look into HTTP acceleration technology from the get go. I've used several different brands and I've found Netscaler's to be the best. They are crazy fast and capable boxes that have a ton of features (like the HTTP multiplexing, SSL acceleration, HTTP compression, web
better than ab is siege, which can deal with HTTP/1.1 Keep-Alives, and give more regression-style stats. it's at joedog.org.
better than siege would be something like httperf, and autobench, which will give you some indication whether or not your client generating the requests is still healthy. autobench also allows you to run multiple instances of httperf on different machines, and then aggregate the numbers after the test.
remember folks, there are only 65535 (minus 1024) ports that any machine can be using with one IP...that has to be considered as well, including at the load balancer layer.
So at peak traffic you are getting as many hits as Yahoo. Name the domain or admit you are either lying or can't do math.
As others have suggested, you may already be in over your head. But even to pick a consultant, you need to have a rough idea of the options and their cost/benefit trade-offs. The large vendors: IBM, Sun, Microsoft etc and some second-tier vendors such as Netscape and BEA have overviews of the application and architecture of their products on their respective web sites...that will cost you a day of reading and give you a headache from reading conflicting claims of superiority BUT, you will know the jargon and the current technology. Reading a book or two would't hurt but they tend not to be completely up to date. Also, look up SOA...the buzzword du jour in buiding web-delivered business services. If you have not googled already, you really ought. My first hit was a comparison of the performance of a dozen web servers with clear graphics and concise info on suggested benchmarking techniques.
In addition to hardware [do I need RAID? etc], and OS and web server infrastructure issues, don't forget you implentation language choice...what pool of programming skills will be available to write the code? for instance, here is how Perl stacks up but you have many choices these days.
And above all never forget "SH*T HAPPENS": how and how often and what are you backing up in case of crashes, fires etc.
SLASHDOT: news for people who can't concentrate on work or have no life at all and got tired of yelling back at the TV.
I'm not so sure I agree. Over the past three years I've done a considerable amount of work in the adult web industry (yes, actual work of the non-fun variety). The most professional hosts that allow adult sites aren't exclusively "adult hosts". Actually, many adult hosts are terribly unreliable and/or unprofessional, and few of them offer managed hosting, which is what this person needs.
That said, there are a number of extremely reliable, professional managed hosting companies that attract a lot of adult paysite owners. Rackspace Managed Hosting in Texas is one of them; Netgroup Data Center in Denmark is another. (They don't come cheap.)
But this person seems also to need to know how to implement the software side as well, and I can't say for sure whether even a managed hosting company is going to be able to pick up all the slack. Maybe it's time to call IBM...
Another value-add to load balancers is they let you easily swap servers in and out for maintainence. No need to modify DNS etc. I presume for most high volume services, this hardware is now standard.
...for the users will give you some problems with latency.
Depending on the user origin this can lead to a terrible experience for the users.
If you hate ask slashdot, then don't read, it. we have all had to learn something in our lives and at some point it comes from community, asking questions, reading a book, taking a training course, or testing with the help of others. your lack of a sense of community is, highly selfish, and i hope when you have a question, that you need information form someone that they refuse to help you. especially a doctor, when they try to wrench the gerbil out of your ass.
First, although 300 locations with a few users each may sound like a high-volume site, it is not. I don't want to burst any bubbles, but it simple is not high-traffic in today's world. I work with large e-tailing sites that get 200,000 unique visitors per hour.
The first step is to determine the type of load you will receive. Is it call-center type traffic, where they will have dedicated staff accessing the application, or will it be more like Internet traffic that comes in waves when it feels like it? If your application fits the call-center model, then you need to know the maximum number of operator-types that will be online at any given time. If it is more like an Internet site, such as Slashdot, then you need to either project the number of sessions per hour, also called the arrival rate, or examine the web logs to find out.
Concurrent users and arrival rates are not the same--one is the output of the other. In arrival rate mode, the number of concurrent users vary depending on the number of visitors arriving that minute, and the speed of the site. If the site slows down, which is will at a higher rate of visitors, then the sessions will take longer. If the sessions take longer, then visitors continue to come to the site and the number of concurrent users rise. Internet visitors do not know how many users are on the site and certainly won't obey any threshold that you determine.
The second step is to test over the Internet, and from as may remote locations as possible. You said that there were to be 300 remote offices. Are these all in the US, or are any of them International? Testing on a local LAN does not tell you much of anything, because there is no latency and everything runs at the speed of your switch. Very few people have 100 megabit connections to the Internet, so it is not realistic to test that way. Real users have a mix of line speeds, and come from a variety of locations. It is best to test from 5 or more geographically disperse locations, using a distribution of the line speeds that your end users will be using. If each of these 300 site has a T1, and each site has an average of 3 users, then each user should run at 512Kbps, not 1.54Mbps.
Lastly, perform realistic transactions on the site, don't just simply hit the home page. Real users on the site will probably start at the home page and traverse the site, doing various things. You should have an idea of what these actions will be, or you can examine the web logs to determine the top 10 paths through the site. Then write scripts for each path and run them proportionately. You also need to build in think or dwell times into each page. Real users don't go from page to page as fast as possible! They take time to fill out forms. A good load test takes into account how familiar a person is with the site and what the person's patience with the site will be. A person using an SSL connection purchasing something has more patience that someone browsing a catalog. By the same token, an operator-type person does not have any choice about whether they can use the site or not, however their productivity will be directly proportional to the speed of the site.
There are very few open source or free tools that do these things for you. Your options are to 1) wing it as best you can using the SWAG method you described, or 2) seek help. There are various Do-It-Yourself outsourced solutions, such as Test Perspective or some other total outsourced solution. The DIY method will probably get you the best value, but you are subject to your own work, and don't have anyone to blame if things go wrong.
No Not Again! Its whats for dinner.
Most porn companie are clueless, and can barely handle listing the files in a directory.
They get a MAJOR SCREWING from hosting companies that charge big $$ to figure out how to handle the load.
Places like equinix put all that crap up for show. There is nothing all that great about hand scans. The weak link is the utter morons that work security in those places.
I can tell you that more than once I've handed some african immigrant security guard dude my ID at big name datacenter and he's given me back a nametag for another company, and access to their cage!
Here's an overview of our stuff
Well, all that together costs about $2,336,000.
"What's in your wallet"
Yes, I am a smart ass; it's better than the alternative.
Check out Load Runner, apache bench is probably far too simplistic if you're doing a serious web application. http://www.wilsonmar.com/1loadrun.htm
Yes, I am a smart ass; it's better than the alternative.
I wonder what is the configuration for slashdot servers that they manage to stay alive while any URLs mentioned on this site go down in seconds?
First you need to think about how you might partition things. Slashdot can be partitioned by stories because each story forum is generally indepedent of each other. Thus, if traffic builds up, then split the traffic to multiple servers by dividing up which story goes to which server. If you have little e-stores, then you can obviously partition by store because each little store is probably independent from the others.
Another thing to consider is to put the web server on one machine and the database server on another.
The hard part is databases in which there is a lot of interaction and no clear way to divide. When you do a general key-word search on ebay, for example, you have to search across multiple "kinds" of things. In such a case it might be time to get a big-ass Oracle or DB2 system along with needed DB partitioning experts.
But even an ebay search may be partitioned by having the search server be seperate from the auction detail server(s) themselves. It just might result in a delay between item posts and the time the search server gets the info to search on. The frequency of updates between the search servers and detail servers may have to be adjusted for traffic because during peak times frequent updates may not be possible.
Which brings up another related topic: degrading gracefully. Plan what happens when traffic gets too high. You may want to prioritize services or features such that some lower-priority features are shut down before higher priority ones if things get steamy. For example, if ebay got flooded, then they may want to disallow new items for a while. If you don't plan this, then ALL services may go down.
Table-ized A.I.
who said i hate slashdot? it's just a funny quote i remembered.
First, separate the web serving from the database server, put them on different machines.
Second, web serving is easily (and massively) scalable. Buy a file server with a good RAID array (and backups!), then a bunch of front-end web servers. Start with round-robbing DNS for load-balancing. If you want, move to some LVS-based load balancers for failover, etc..
Third, database clustering is not an easy thing to do - if your database server doesn't offer good, scalable clustering, then you just have to buy a single, beefy machine.
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
drink lots of coffee
Anandtech has some great articles about how they started and what they used over their few years of existance. You really shouldn't start to big though. Maybe start with a decent database backend and a couple of web servers. http://www.anandtech.com/it/
Hi OP,d f
Good Luck!
you may want to read this from the creator of LiveJournal.com: http://www.danga.com/words/2004_oscon/oscon2004.p
Simple.
Load balancers (Foundry) - NIC1 Webservers NIC2 - NFS (NetApp) for pages.
Back to the SAME webservers as above:
Webservers NIC3 - Load balancers (Foundry - a differnet set) - NIC1 DB Servers NIC2 - NFS (NetApp) for data.
For the Webservers just get a lot of cheap 1U boxen and fill it full of RAM so the pages and NFS are cached as much as can be. Run the same image on each box (netbooting is even better - no harddrives to fall). To much traffic on the frontend, just add another box. If you netboot everything it make backups so simple. One backup (Tape) box and a few netboot boxen tossed it and you are good to go.
I did this years back with hugh clusters of Sun Netra T1 boxen (1999 era) and you could not slashdot it. Load gets to high unpack and rack a few new boxen and netboot 'em. It is quite simple, easy to manage and very very scalable. The biggest part of this is getting good DB prgs to write the DB part of the setup. This is somewhat the same way that Hotmail ran before the MS take over.
If NFS is to slow for some reason, you can do the samething with fiberchannel SANs.
Remember: Simple is better.
You may really want to consider hiring a Network Engineer consultant. In particular, someone well versed in Cisco products. My load balacing product of choice is the F5 BigIP, but a Cisco CSM would work too. You can hire a "CCIE", but these guys are always way overpaid and perform just as well as a non-certified seasoned engineer.
I am not sure how much traffic you actually are looking to handle in terms of megabits or gigabits per second, but lets just assume you need something less than 100 megs. I will also assume there is no need for geographic redundancy.
Secure a good colocation facility that you will have physical access to. I like colos ran by large internet carriers which also provide cheap 100/1000 meg ethernet connectivity to the internet. This is typically cheaper to terminate than bringing SONET out to the suburbs.
Hire or consult a network engineer. Unless you really want to becoming a makeshift expert overnight in switching and routing, do not tread here by yourself.
Think about the future. Your network will need to scale, so make sure the swtiching and routing can grow as you grow.
Think about redundancy. Don't just drop in a single router and switch because "its cheaper". Don't just bring in one circuit because "its cheaper". But don't go too crazy on redundancy -- compare the financial impact to an outage vs. cost of equipment. If you lose a million dollars for each outage minute during peak hours, that lousy extra $50,000 for an additional 7206 doesn't look so bad after all.
Avoid homebrew equipment where possible. Yea, you could save thousands by not using an F5 BigIP load balancer or making your own VPN box, but when things go wrong, who's going to support it? Is it really worth it? Only purchase hardware that you have extensively tested in your lab environment. Make sure you buy that support contract. Yea, they likely make a profit off of each one, but when things go down, its their ass, not yours!
More technically speaking, I would throw a pair of routers on the edge. Create at least two major network segments, external and internal. Use internal non-routable addressing for your internal segment. Put a BigIP between the two segments for load balancing. You can build a "VIP" which contains many different hosts that are dynamically load balanced based on specifications. Boxes can be taken out of load balancing if they do not respond properly to specified requests, or go down. Cisco's CSM can do the same. Stay away from Local Director.
You probably dont need a firewall in your network, unless you are restricting backchannel access. Use ACL's on the router and your load balancing layer between public and privately addressed segments will act as a natural barrier.
If you are serving web traffic to a lot of slower clients, and want to decrease download time, check out some of the web compression products. It turned some of our 18 second page load times (over dialup) into 4 second load times...
I am not a DBA, so I cannot advise you on how you can scale above 1 DB machine. Im sure there is some sort of clustering facility.
Good luck, and don't do it by yourself.
Real networking isnt something you pick up overnight, might as well spend the money upfront and do it right.
---- Booth was a patriot ----
I work at a company thats a hosted CRM. My first an most important sugestion is: MAKE SURE THE APPLICATION IS EFFECIENT. If you app does alot of unessisary crap in the database or what not, you are screwed no matter what you use.
Past that, The sugestions for a hardware load balancers are right on par. You have (for example) 2 machines hosting the same application, behind a hardware LB. You have a nice SQL machine, you are set. Add machines as needed (the SQL part gets tricky though).
snowulf.com
Most technically advanced pr0n site I've seen.
1. Foundry ServerIrons at the front-end layer.
2. Front-end proxying/caching. Not just static content either, take dynamic content that need not be updated often and put it on the front-end in a fashion that does not require over-weight httpds (i.e. no mod_perl). Use session affinity tricks on the front end (such as mod_rewrite with cookies). squid for caching as necessary.
3. Back-end heavy servers should have a maximum amount of memory, and obviously lower maxclients.
4. NetApp storage on the back-end, scaled as needed.
5. http://www.backhand.org/mod_log_spread/
6. Well designed network topology and aggressive switch partitioning: hint, use vlans and minimize trunking.
Testing with slower machines, sometimes purposely putting slow components into the mix (10-base-T between the machines, for instance), will give you an easier way to find the bottlenecks.
I have a colocated 400 MHz PowerPC 604ev which I use for testing which can push somewhere between 40 and 50 Mbps. It's much easier to get it to its ceiling than my other colo'd server, which is a 1.3 GHz G4 with tons of SCSI disks that can completely saturate the 100 Mbps upstream link. And when the 604ev is too fast, I also have an m68060 Amiga.
Using a slower machine also makes testing optimisations much easier to measure, too. Having tons of L2 / L3 cache means that synthetic benchmarks sometimes won't even come close to real world performace. Same with having tons of memory and / or an intelligent disk controller, either of which can make measuring disk hits unrepresentative of reality.
Slower machines have their uses.
Seriously dedicated service is cheap and plenty of bandwidth I have had my site hosted with these guys for a long time and the price is right. Heres the URL www.powerstorm.net
Not meaning to spam but they might be able to help out for a small budget.
Got hosting
Disclaimer: I'm an employee of Akamai, so I'm not unbiased at all.
Have you considered using a CDN like Akamai? They're in the business of distributing content (dynamic, static, big, small, whatever) for you so you don't have to worry about much of the complexity already mentioned on the thread: load balancers, bw provisioning, hw provisioning...
Note that all the comments on this thread about writing a good web app still matter a lot. Akamai (or anything else) won't help you if you, for example, write it to be heavily database bound, and has bad locking semantics.
-- Sef
(myname at akamai.com)
Okay, this depends quite a bit on how big a "large" application is going to get. If you are talking more than 2-3 TB of traffic per month, you will want to get a consultant or a team. Plain and simple. However, if you are looking at setting up a medium-sized application server, you may want to look into a Managed Hosting environment. I define medium as 3-10 servers, doing between 500 gigs and 2 ters of bandwidth a month. This may be a relatively popular website like theonion.com or FHM (fhmus.com).
Anything smaller than that may simply require a single systems administrator. Someone with a couple years experience will easily know how to handle a 2-4 machine setup, possibly with a load balancer.
As with others, my current job may skew my opinion, so please serve with salt.
--If I said something interesting it probably wasn't correct
yes, i Know you asked how can find how to setup a high traffic architecture. I think you came at the right place on Slashdot.
:10s of millions of pages views a day, most of them dynamic.
Although I have never seen really many documentation online, I have setup many architecures in the past, and still able to handle very high volume traffic
It really all depends on ONE factor: money.
I will give 2 choices, I have implemented both:
Appropriate budget:
Frontends/Load Balancing: We had a pair of of Big/IPs with SSL accelerator, configured for redundancy, that rocks.
behind them, we had a clustered NetApp F840, with gigabit interfaces, on a gigabit networks.
Frontends: We were running Apache, with all the binaries, config, webpages, perl scrips located on the shared filesystem. Each machine was a dual CPU, 2GB memory, 2x36GB scsi drive, we had 26 of them, double the capacity really needed so if a machine or two were to go down during the night, no need to worry and it would wait for next day, business hours, great for peaks as well.
As a database backup we had an Oracle Cluster on a SUN 6650, 14CPUs, 14GB of Memory, connected on an EMC storage. One machine was configured as the master, the other as a standby with the possibility to take down the primary and mount it's filesystem directly from the SAN. Pretty much all the config was on the SAN, on different volumes, and could be mounted on either machine. Each volumes had a copy and an hourly update in case of failure of the primary volume.
Now for a more realistic scenario with low budget:
- Load Balancer: Get 2 Linux machines, I'd suggest machines with 2GB Memory, 2x36GB Disk, 2x3Ghz CPUs, with Linux Virtual Server. (http://www.linuxvirtualserver.org/)
- Build 2 Linux machines that you would use as NFS Server (If you are short in budget also could use them as Oracle or Mysql Server), configure them with 2 external scsi arrays that can be mounted on either machine. If you are really short in budget, don't use external array, but big enough internal drives, and for example rsync to replicate the data between the 2 of them. (I would personnally use LVM, establish a snapshot copy on the master and do a rsync of this snapshot. If you have a database on it, put it in quiet (hot-backup) mode while you do the snapshot
).
- FrontEnds: Get a couples of machines with 2 CPUs, 2GB memory for example, 2x36Gb drives. Configure them to mount the filesystem from the NFS servers.
- Database, it is budget, use Mysql (or Oracle this would work), configure one machine as Master, the other as read-only. Have all your machines interrogating either machines for read-only requests, and going to the master only for write requests.
If you need more power: configure more frontends, configure more read-only slave database server. Now if you are write intensive, more than reads, on the database, then it becomes a bit more complicated.
if you want to know more, contact me off-list.
In short:
.NET seem to be popular choices. Don't get hung up on benchmarks claiming that server A is 13% faster than server B. The main thing is that the technology is easy to use and reliable.
:-) All I can say is, make sure that its stable. Don't choose an os that exhibits ANY stability issues. I have heard horror stories from people that used NT 4. Anything that needs a "reboot cycle" should be a big red flag. You should only have to reboot when you upgrade the kernel, etc.
1. Keep it simple.
2. Set up monitoring.
3. Use a staging server.
4. Backup data. Backup hardware. Backup staff.
5. Never believe the traffic "estimates"
Location:
Use a decent colo facility. Make sure that the techies seem competent. Confirm that they have multiple network peerings, good bandwidth. Run some traceroutes from locations around the country, if possible, to get a handle on the lag. Ask them about their redundant power, their 24/7 NOC, their strategy for managing DDOS. Try and find a colo that isn't about to go out of business, an empty colo is bad, but so is a full one.
Hardware:
Realize that the main contributing factor to your "down time" will be your code, not for instance, a network switch. Remember this when people tell you to set up some sort of complex HA switch configuration, etc. Think more in terms of "hot standby" than "no single point of failure".
Fully redundant hardware is expensive to buy, expensive to configure, and expensive to maintain. If you need the fault tolerance then, you should have the budget to do it right. If not, then don't blow all the cash on a switch cluster.
Ask yourself, "What if?", What if the firewall dies? What if the Load Balancer dies? What if the Database dies? Make sure that you can recover within an acceptable amount of time. If that's a week, then maybe you just need a reliable hardware supplier, if its 1 day, you better have the part at the office, if its 1 hour, the part better be racked in the cage and configured.
Software:
Use what you know. I have seen very large sites created using lots of technologies. They can all do the job, you just have to play by their rules. I would recommend using something that a healthy number of other people are using so that you will get some imporvements as time goes on, PHP, Perl, Java, and
Os:
Linux of course!
Security:
I am no guru here, but the main point is to think in layers, firewall externally and between layers. Don't go too nuts, the site has to be usable, but add as much as you can put up with, and have time for.
Web clusters, load balancers and all that:
Session is the key to a transactional web site. Sessions are usually maintained via browser cookies.
A content load balancer will stick a user to a webserver based on the cookie. So you probably want one of those. IP based sticky doesn't work all that well because some providers, AOL, send requests out using multiple IP addresses. An extra wrinkle here is SSL, unless your lb can peer inside the SSL data it won't be able to get at the cookie. So if you are using BOTH http and https on the same session you will need an lb that can peer into ssl data.
Some lbs can also help out with abuse and are crossover security devices. All are routers, and most have access control lists, syn cookie, and other security features. Still they are generally not designed to be the front line defense, but constiture another layer.
Database:
Don't go the Oracle RAC route unless you are going to buy more than 4 cpu's. A 4 way server is chaper and FAR easier to set up. Maybe get a 2 way xeon, with n+1 power and with an external raid array. Then get a shitty single cpu machine with a big internal scsi disk to use as a backup, in case the main db dies. The backup db can be used as a development db in the mean time.
Sometimes its easier to split up your user group into a number of clusters than to scale one cluster to service all of the users. If the users do
Mate,
You've lost the plot... All the guy was after was to know where to begin. So he could read up and start somewhere. All you did was say "go to a bookstore, pick up books..." Maybe he was after some titles, some links, not just a useless answer about going to the bookstore.
You come to me.
I browse at +5 Flamebait- moderation for all or moderation for none.
The old cliche:
Good judgment comes from experience. Experience comes from bad judgment.
Assuming that your application can run on one machine to start with, you will just need to make sure you have enough bandwidth and enough machine power so you don't have to worry about those details right away. Be ready to hack things to make your application scale better, that is probably the most difficult thing to prepare for. If this is a one person job, you probably want to have a hosting company that will do the hand holding necessary to keep things secure.
Don't screw around with cheap hardware - make sure you have multiple CPU's, RAID 10, dual power supplies, and a backup machine with a copy of everything.
For another, I work with some banking applications and having data sent cleartext, even on an inside network directly connected to load balancers is NOT a valid option.
My last sysadmin job (I'm in bizdev now) was at a brokerage firm. While the solution wasn't implemented before the higher ups pillaged the company, I and the network engineer came up with a way around this issue: use the F5 SSL accelerator to encrypt/decrypt the SSL stream, then use SSH port forwarding to make sure the cleartext data was encrypted between the machines. We never got it into production, but it worked great in the lab.
One of the nice side effects of F5 boxes being built around FreeBSD (even if the kernel has evolved so much over the years that it's a completely different beast now).
God invented whiskey so the Irish would not rule the world.
Our sites obviously have to serve millions of people, so they have to be pretty robust. I can't tell you every detail because we're all pretty specialized and don't get to see everything ourselves, but from working with our database guys and network guys, I do have a pretty good 10,000 foot picture of how things work. Here's a general sense of what you'll have to do to really be robust:
1. Your database gets its own server, as powerful as you can afford. If you're a really big site, you're using Oracle, and really, a database cluster rather than a single server. IMPORTANT: Only the DBA can touch the production databases. Developers MUST submit requests to the DBA for any changes. Nobody should be touching a production database from their desktop, other than maybe being able to run queries to check data, and they use a separate, limited login for that. Changes are done by the DBA ONLY.
2. You put a firewall between the database server and your middleware server. The firewall is a dedicated device, and you're careful about the ports you leave open. Only the middleware server and DBA workstations on your intranet can touch the database.
3. Your middleware server(s) are as powerful as you can afford (this will be a theme here) and ONLY run middleware. This means, business rule processing. Everything that touches a database in any way MUST come through middleware -- no direct connections, ever. IMPORTANT: developers don't directly install middleware; network staff only.
4. A firewall (again, dedicated device) between the middleware server and the web server. Only the web server (and network staff workstations on your intranet) are allowed to touch the middleware server.
5. A set of web servers for your websites, as powerful as you can afford (hate to keep repeating this, but if you skimp you'll end up screwing yourself down the road). IMPORTANT: Developers should NEVER have access to production web servers; they should give their stuff to the networking staff when it's ready. Also, if you're doing FTP and such, put it on a separate server.
6. A firewall outside your web server, which only permits port 80 traffic and is twice as paranoid as your other firewalls. Log everything "funny".
In general, you'll have to hire some people: someone really good at security, to configure all your firewalls, someone good at setting up load-balancing to set up all three layers, someone to help you set up a good development environment...
One thing lots of people overlook: You'll want a "sandbox", i.e. a dedicated set of test database, middleware server, and web server that your developers can play with when working on their sites. You'll also want to set up a UAT (User Acceptance Test) environment similar to your sandbox, so projects can be moved to UAT for testing before being rolled out to production. You can't do UAT on a sandbox; sandboxes are constantly changing. You need a stable environment for UAT.
Anyway... Hope that helps, it's just advice, you know? Not all of it directly addresses high-volume sites, some of it is about site stability and security, but I think it all ties in together. If your site is being changed by developers, it won't be stable... And if you don't have a paranoid firewall setup, it won't be secure. A lot of webmasters would consider this layout to be (putting it politely) seriously paranoid, but hell, just because you're paranoid doesn't mean they're not out to get you. And, anyway, like I said, high volume does imply these other considerations...
Good luck!
Farewell! It's been a fine buncha years!
- Over 750 requests/second on 29 - servers average >20 requests/second each (Yes I know some are not http servers) . Compare that to some commercial solutions.
- commodity hardware
- squid for cacheing/load balancing, feeding Apache
- multi-tiered archtieture
- dual Opteron for the master mysql database
My guess is going to be that the bottleneck is going to the the database, but we've done extensive testing with a million customer sample database running multiple instances of test applications from 10 other boxes, but that doesn't exactly prove much as it's too predictable.
Don't guess. You have too much riding on it to guess. Build proper test infrastructure. Not only will it pay off now, but it'll be hugely valuable in the future as you change and expand the application.
If you're not sure what real users will do, get some to try it out and record their activity. Then spend a little time building a load model, where you describe types of users, their activity patterns, and their expectations. (E.g.: "At the end of the month the 800 salespeople will rush to meet their quota, and during a peak hour they'll each do..." Generally I end up with nice series of spreadsheets, so I can adjust registered users and see peak hits per second come out the other end.) Then simulate the projected load and see where the real bottlenecks are.
You should be really wary about optimizing without data. As Knuth says, "Premature optimization is the root of all evil." I know a number of people who build very high volume stuff, and I don't know any of them who haven't been frequently surprised at exactly where the bottleneck turned out to be.
Also, start small and work up. There's no need to build a huge load testing suite all in one go; often you'll learn enough from the first simple tests to point developers and sysadmins in the right direction.
My last sysadmin job (I'm in bizdev now) was at a brokerage firm. While the solution wasn't implemented before the higher ups pillaged the company, I and the network engineer came up with a way around this issue: use the F5 SSL accelerator to encrypt/decrypt the SSL stream, then use SSH port forwarding to make sure the cleartext data was encrypted between the machines. We never got it into production, but it worked great in the lab
Interesting. The only thing is - is SSH encryption any less computationaly intensive than SSL?
-Em
RelevantElephants: A Somatic WebComic...
Your sharedance software is interesting. Don't know if you are aware of memcached though, (http://www.danga.com/memcached/, by Livejournal guys) and if so did it lack something that prompted you to write your own?
VIVA1023.com | Political Fashion.
Okay you said 300 sites each with multiable users hitting the application constantly. How much cpu time does each hit take? How much bandwidth? Frankly unless the application is really cpu and memory intensive your limiting factor will tend to be bandwidth not server speed or memory. Without knowing what the application is it is hard to tell you what to look at. I would start with the database. Will it handle the peak number of transactions? What about the bandwidth between the database server and the application server? It takes a big server to saturate a gigabit link so that will not tend to be an issue.
Figure out the peak transaction rate and the bandwidth/memory/drive space/db transactions and then multiply that by 1.2 to 1.5 and that what you will neet do scale your app. Frankly I guess that a normal server with an okay database server would support your application. If it is a realatively simple form/database/report style application 900 to a thousand users does not seem to be that big of a load. What I have seen kill more apps when they try to scale than any other factor is poor database design. If your querys and indexs are good you should have no problem. If they are not fix them.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Hmm, url goes to goolge with a "nyud.info" embeded in the URL, whois says that belongs to:
Registrant Organization:Gay Nigger Association of America
Created On:08-Sep-2004
Name is a "Dong Bird" but Google says the phone number its registerd belongs to someone else. (Of course that doesn't mean too much, my friends in their apartment has one guys name on the phone, but there are 4 guys living there (college, 4 bedrooms)). Phone number is right for the town and city listed though, but neither Google or Yahoo can find the address listed in the whois info, and the address on the phone number isn't the one in the whois. RandMcNalley.com offers that there are addresses for 100-999 S. Coit Road, and a 1200-2998 N. Coit Road.
So it seems to be falsified Whois info, aren't their rules/laws against that(yet)?
Page (opened in lynx, I can guess what it is going to be) is just a frame showing "http://www.metahusky.net/~snoof/bk.jpg" The page itself looks like some girls website (pictures of the family's trip to the zoo with piture of the kid in the photos, its all clean there, nothing obviously says that this picture is supposed to be there).
So if the image is a shocker, it might be up there without this person's knowledge.