As I have mentioned before, gettting IP's for a torrent is easy peasy. Seriously easy. I even made a small program to do it when this site first was in the news, and it took about two hour's work to get it running, and that's only because I'm a bad coder and knew little of the torrent protocol.
Gist of it:
1. You can connect to a tracker and say you are downloading torrent with hash X, and would like some peers. You then get a serving of IP's and ports.
2. Some trackers allow you to connect to them and get a list of ALL TORRENT hashes it serves. Think several megabytes of hashes. I didn't do much with it, other than to note that some trackers indeed support it.
3. Some trackers also support you giving a different IP address when connecting to the tracker (for when using proxy or having weird NAT's), thus opening for spoofing.
The python code (along with a some text around it) is available at my blag.
So it gives you a minor advantage over the free content?
It gives you alternatives..
Like one Heavy alternative weapon, "Natasha", slows enemies it hits, BUT does only 50% damage compared to stock weapon.
Or "Brass Beast", 20% more damage, but takes much longer time (50%) to start firing, and you can hardly move (60% slower) while it's spun up.
Or, the "Tomislav". Spins up faster (40%), no spin up sound, But does less damage (-20%).
Which one is the best weapon? Also, keep in mind you can only change weapons before spawning. Each one of the alternatives are specialized versions of the main weapon, better in some cases, but worse in others. And you're locked to that choice until you die.
Take the Brass Beast, and you're better suited for defending, but almost useless for attacking. The Tomislav, and you have to sneak up on people. Not that easy for a slow moving mountain.
high-level languages allow you to escape the bounds of that kind of thinking (all programs reduce to how they execute) and think in terms of higher-level abstractions. And that is the whole point which the "everything boils down to C eventually" argument misses.
Thank you! As one that really likes to work in python, I'll say that this is exactly the thing everyone miss!
Whenever I see people who say things like "It's just with different syntax" 90% of the time they just continue writing that low level language with a different syntax.. Instead of actually using the new language strengths.
The result can be a lot like this. And sure, that first code works, it solves the problem, and it kinda shows their point.. Done that way, it IS just a different syntax. And can probably be frustrating since you might bump into the language weaknesses, without seeing any of it's strength.
Which, again, proves their point to themselves, and since people love to be right..
Then perhaps, the administrators of the civics tests should be voted in democratically and not subject to regulation or influence by the government.
-And what does that turtle stand on? -Another turtle! -And that turtle then, what does it stand on? -You're very clever, young man. But it's turtles all the way down!
Reminds me of a part from Terry Pratchett's "The Dark Side of The Sun"
Behind Korodore the darkness of the big security room glowed here and there as the other security officers watched. Only Korodore knew that under the horticultural dome by the north lawn was another, smaller security room checking on this one. And occasionally he switched to his own private circuit and watched the officers there. And, hidden by him in a place the exact location of which he had scrubbed from his mind, was a small biocomputer. He had programmed it carefully. It watched him.
Here in Norway, from winter 2009 to today, it have gone from 10,63 nok per liter to 15 nok per liter. That's close to 50% higher cost in under 3 years.
And 15 nok per liter is around 10 usd per gallon.
And at least here in Bergen the mass transit system is a poor joke.
Speed limits are set by people smarter than you are about the subject at hand
... and they set it for the worst driver group.
Older people generally have much lower reaction time. Many of them have worse eyesight, too.
Young people, who just got their license, have much less experience than other drivers.
People not from the area, which does not know the roads as well as the locals.
When setting the speed limit, they have to account for all of those. I'm going to sound like a retarded asshole here, but.. My reaction time is generally better than the average person, and my ability to see patterns (in this case, traffic patterns) also seems to be a lot better. I also know extremely well the road I drive to work on every day. And, when I drive, I stay focused. I don't push people driving slower than me, but keep my distance. I also watch for example when new cars enter the highway, even if its not my lane, to see if there's space enough (and if not, clear space and prepare for an abrupt lane change from cars in the other lane).
The fact is, not everyone is equal when behind the wheel. The law does not (and can not) account for that. They go for safe defaults. That does not mean that anyone that drives over those limits are driving faster than what's safe. I drive under the speed limit if that's what I mean is safe (hello, snow and ice), and above the limit if I feel that's safe (if speed limit is say.. 80 km/h, I will probably vary from 7 to 100 km/h depending on the conditions, like my sight range - sharp turn = no sight beyond that = regulate speed to be able to stop in time if there suddenly was f.x a standstill around the bend)
What's paradoxal, the slower I drive, the less.. what shall we say? data to process?.. I get, the harder I find it keeping myself focused when driving.
Are you streaming music? What type of streaming do you use? Icecast? Have you asked if some of your users would like to host some relays? There are also certain pages out there that offers relaying, in exchange for some branding (or in some cases, ads on the relay page)
And what kind of hardware do you need? Software? Are you looking for a full server, or just streaming relay?
Have you looked at VPS'es? Some offer pretty good deals on bandwidth (although you should contact them first and check if its okay to actually use the deals.. ). One example : http://www.alvotech.de/vserver/ - when it comes to bandwidth they say:
The available bandwidth per vServer is 1,000Mbps. Once traffic has reached 1,000 GB the bandwidth is limited to 10Mbps until the end of the month. Upon request, the traffic limit can be replaced with a fee of 6.90 EUR per 1,000 GB additional traffic.
And yes, a VPS is perfectly fine for serving net radio to a few hundred users, if you got some external relays for the bandwidth hogging.
What would that solve, exactly? A copied signature is still a valid one. You'll just have two notes someplace in the world that both have the same valid signature.
I've just gotten my NoSQL feet wet by playing around a weekend with python + mongodb. I am pretty used to SQL, and generally had the same thinking you (and most other SQL people) had.
But, many people liked it, so I figured out I should at least have a look at it. I made a small webapp for tracking my movies, with query to imdb and with users. I was surprised to see that most of the problems I anticipated wasn't a problem at all, and things mostly just worked naturally. For a quick get-started intro to python + mongodb : Part 1 and Part 2. If you got the spare time and some interest, poking around with it is a great little weekend project.
Anyway, back to your question. MongoDB store data in a format very similar to JSON (technically BSON, a JSON superset), if you're familar with that. Unordered key->value and ordered lists. For the python driver, it translates the data to and from native python dict/list structures. I started with three fields; filename, added and imdb. The imdb field was more or less the raw data from imdb (json format, decoded to python native and encoded to mongodb's BSON format again.)
Later on I added option for users to mark movies as favorites and seen (by adding two new fields to movie list, "seenby" and "favoriteof" - both lists - these were added to a movie entry the first time someone marked one as seen or favorite). To add a new user I just did movie["seenby"].append(user_id) and movies.save(movie)
When I wanted to query the db, I created a data structure of what I wanted, and sent that to the server. The server would then return all documents that matched that example structure. So, to find the entry for file "/bla/test.mp4" I would do movies.find( {'filepath': '/bla/test.mp4} ).
For finding by imdb Title value : {'imdb.Title': '300'}. For finding all favorites by user: {"favoriteof": user_id} (yes, it would handle the list of users as you'd expect, and find all that the list of "favoriteof" had user in it. It would also of course skip all entries without that field).
mongodb also support some special keywords for searching. Let's say I have a list of 3 users, and want to have all movies that any of them have favorited. {"favoriteof" : {"$in": users} } would fix that - for movies that all of them have as favorite, {"favoriteof" : {"$all": users} }. Sorting was done using sort_by( field_n_direction_list )
You have a full list of modifiers here. And all could of course be combined to quickly and easily create powerful queries. And you of course have options for indexes. You might notice that you do lose something from normal SQL's here, if you wanted both movie and user info, you'd have to make two queries (well, from what I've understood) so highly relational data is not fitted for this. Also, you don't have the type constraints any more.
In the app I also wanted to list all movie genres (I did one preprocessing of the imdb data, splitting up comma seperated genres string to a list of genres) and number of times each genre was used. This led me to mapreduce, which was the thing I both anticipated most, and feared most. Well, I kinda chickened out, since the pymongo doc had an excellent example which was exactly what I wanted doing, but I did get a look at it at least:) And it was fast enough to not making a noticeable dent in load time for a few hundred movie entries.
*Cough* well, that was a long post.. I hope it helped you at least a bit in answering your question, and maybe inspire you to take a closer look at it when you get some spare time. I've only used it over a weekend, so I've probably just scratched the surface, and I probably have missed some neat features or horrible gotchas here and
That's what I get for not reading the release notes.. But still, the new event mod seems to be a bit limited.
The event Multi-Processing Module (MPM) is designed to allow more requests to be served simultaneously by passing off some processing work to supporting threads, freeing up the main threads to work on new requests. It is based on the worker MPM, which implements a hybrid multi-process multi-threaded server.
This MPM tries to fix the 'keep alive problem' in HTTP. After a client completes the first request, the client can keep the connection open, and send further requests using the same socket. This can save signifigant overhead in creating TCP connections. However, Apache HTTP Server traditionally keeps an entire child process/thread waiting for data from the client, which brings its own disadvantages. To solve this problem, this MPM uses a dedicated thread to handle both the Listening sockets, all sockets that are in a Keep Alive state, and sockets where the handler and protocol filters have done their work and the only remaining thing to do is send the data to the client. The status page of mod_status shows how many connections are in the mentioned states.
The improved connection handling does not yet work for certain connection filters, in particular SSL. For SSL connections, this MPM will fall back to the behaviour of the worker MPM and reserve one worker thread per connection.
So it looks like it still got some ways to go, before being on the same level as for example nginx. It seems like a wrapper around the worker thread, and can keep track of idle connections. However, on nginx or cherokee, which are those I have most experience with, it can also keeps connections idle while waiting for new data from backend / storage. It seems like that event module still needs a thread for that. Will be interesting to see practical comparisons:)
But yeah, ram will help a lot, both for caching files, and especially if you use an opcode cache, like APC (which can also have a hot cache in ram).
The more ram, the less it has to wait for the horribly slow disk to spin around, and thus faster answer. Great Win (TM):)
I'm still waiting for reasonably priced SSD's becoming normal in servers. *sigh* Being able to use an SSD for caching hot data automatically, without killing it instantly.. Sure, RAM is faster and cheap, but SSD is quite a bit larger compared to price, and still vastly superior to HDD. Ram first cache, ssd 2nd cache, disk for the stuff that no one use.
And regarding the game you mentioned.. My experience is that in the vast majority of cases, the speed problem isn't the language, but stupid code.. Write smarter, not harder!:D I don't know what he did wrong, but I know that he did something (or many things) very wrongly. 12 cores? For 400 users?
Reminds me of one of my sites, where you have long polling. First implentation (which I knew was bad, but it was easy and worked for small amounts of users) was just apache -> mod_python -> django -> polling events table every 1 second to see if latest id was changed. And abort routine after 60 seconds and reconnect.
As I said, it worked... For small amounts of users. When it started to hit the limit it hit it in a spectacular fashion. As requests got delayed, and errors started cropping up, connect queues piled up and clients reconnected and reconnected... It was like a small snowflake starting to roll downhill, and suddenly a house-sized snowball hit our server.
The current rewrite works fine for up to 1000 concurrent requests (tested on my dev server, which is weaker than prod), and worked (albeit with noticeable delays) up to 1500 connections. Current peak is around 500 connections. If we hit the new limit I'll need to rewrite it again. I have some ideas, but it would require more complexity overall, so for now the current one works great.
Err, anyway my point with that rambling was.. If you do something very stupid, then you need gigantic amounts of hardware, and can probably do things smarter. Of course..You have to look at the situation:)
Sounds like a fun and interesting system, and a fair bit beyond what I'm usually working with!
Yeah, shared hosting might explain some reasoning behind it:) Running different php processs under different userid's. Still.. "some virtual hosts are running over 30 processes sometimes" - that doesn't really address the main question. With only two logical execution units (two cores), and (rounding off) 200 php processes, 2 cpus over 200 processes = 0.01 cpu slice per process? Not calculating httpd processing.. For that not to be CPU bound, wouldn't that mean that each php process on average need to spend around 90% or more IO bound?
Please don't get me wrong here, I'm not trying to point out wrong things here. I'm just fascinated by large-scale web systems, and honestly curious about the different challenges between the level I'm used to (have yet have any need to move something to more than 3 servers, and rather small servers too. Most stuff I poke a finger into works fine on one machine with some consideration. Oh, and usually working on 1-2 web apps per system), and large scale systems.
Wait.. You're running 190 php processes on two cores? Are you serving static files with php, or using it to query a db on a different machine? And if so, is your DB so slow that you need 190 concurrent requests to get it to max out? Data that can not be cached with memcache, or pages that can't be cached with varnish?
Please, I'm honestly curious what all those php processes are doing, which involves sitting idle 90% of the time. Could you enlighten me?
your phone will be shouting publicly already.
But it will be protected by industry standard ROT13, so you can not read it.
there's no wildcard mechanism to ask the tracker for a list which torrents it serves.
Actually.. It is. Not all trackers support it. But on one that did, I got over 30mb of hashes back... :)
As I have mentioned before, gettting IP's for a torrent is easy peasy. Seriously easy. I even made a small program to do it when this site first was in the news, and it took about two hour's work to get it running, and that's only because I'm a bad coder and knew little of the torrent protocol.
Gist of it:
1. You can connect to a tracker and say you are downloading torrent with hash X, and would like some peers. You then get a serving of IP's and ports.
2. Some trackers allow you to connect to them and get a list of ALL TORRENT hashes it serves. Think several megabytes of hashes. I didn't do much with it, other than to note that some trackers indeed support it.
3. Some trackers also support you giving a different IP address when connecting to the tracker (for when using proxy or having weird NAT's), thus opening for spoofing.
The python code (along with a some text around it) is available at my blag.
So it gives you a minor advantage over the free content?
It gives you alternatives..
Like one Heavy alternative weapon, "Natasha", slows enemies it hits, BUT does only 50% damage compared to stock weapon.
Or "Brass Beast", 20% more damage, but takes much longer time (50%) to start firing, and you can hardly move (60% slower) while it's spun up.
Or, the "Tomislav". Spins up faster (40%), no spin up sound, But does less damage (-20%).
Which one is the best weapon? Also, keep in mind you can only change weapons before spawning. Each one of the alternatives are specialized versions of the main weapon, better in some cases, but worse in others. And you're locked to that choice until you die.
Take the Brass Beast, and you're better suited for defending, but almost useless for attacking. The Tomislav, and you have to sneak up on people. Not that easy for a slow moving mountain.
One school I was on used names from Star Trek. I don't remember all, but I do remember two pretty good ones (IMHO):
Uhura -> router
Worf -> Firewall
Actually, there is one place where Intel's integrated GPU knocks the socks off all the competition... Video encoding!
Just look at the benchmarks and image examples from AnandTech's review.
And that's the old Sandy Bridge. If we see 30%-50% improvement over that again.. I can see some uses for the integrated card :)
http://theoatmeal.com/comics/apple
high-level languages allow you to escape the bounds of that kind of thinking (all programs reduce to how they execute) and think in terms of higher-level abstractions. And that is the whole point which the "everything boils down to C eventually" argument misses.
Thank you! As one that really likes to work in python, I'll say that this is exactly the thing everyone miss!
Whenever I see people who say things like "It's just with different syntax" 90% of the time they just continue writing that low level language with a different syntax.. Instead of actually using the new language strengths.
The result can be a lot like this. And sure, that first code works, it solves the problem, and it kinda shows their point.. Done that way, it IS just a different syntax. And can probably be frustrating since you might bump into the language weaknesses, without seeing any of it's strength.
Which, again, proves their point to themselves, and since people love to be right..
Then perhaps, the administrators of the civics tests should be voted in democratically and not subject to regulation or influence by the government.
-And what does that turtle stand on?
-Another turtle!
-And that turtle then, what does it stand on?
-You're very clever, young man. But it's turtles all the way down!
Reminds me of a part from Terry Pratchett's "The Dark Side of The Sun"
Behind Korodore the darkness of the big security room glowed here and there as the other security officers watched. Only Korodore knew that under the horticultural dome by the north lawn was another, smaller security room checking on this one. And occasionally he switched to his own private circuit and watched the officers there. And, hidden by him in a place the exact location of which he had scrubbed from his mind, was a small biocomputer. He had programmed it carefully. It watched him.
Well, technically, Norway got its constitution in 1814, and can thus arguably be said to be younger.
Care to compare peni..errr democracy and freedom?
Here in Norway, from winter 2009 to today, it have gone from 10,63 nok per liter to 15 nok per liter. That's close to 50% higher cost in under 3 years.
And 15 nok per liter is around 10 usd per gallon.
And at least here in Bergen the mass transit system is a poor joke.
15 (Norwegian kroner per liter) = 10.1842151 US$ per US gallon
Right...
So you'd make a cloud.. of cloud providers? Woah.. +1 "Yo dawg" to you, mate!
I wonder if I could put that into something useful :P
Like this? http://web.archive.org/web/20050412233112/http://lineman.net/node/270
or ... more general stuff.. http://fr.thehackademy.net/madchat/esprit/textes/The_Art_of_Deception.pdf
Speed limits are set by people smarter than you are about the subject at hand
... and they set it for the worst driver group.
Older people generally have much lower reaction time. Many of them have worse eyesight, too.
Young people, who just got their license, have much less experience than other drivers.
People not from the area, which does not know the roads as well as the locals.
When setting the speed limit, they have to account for all of those. I'm going to sound like a retarded asshole here, but.. My reaction time is generally better than the average person, and my ability to see patterns (in this case, traffic patterns) also seems to be a lot better. I also know extremely well the road I drive to work on every day. And, when I drive, I stay focused. I don't push people driving slower than me, but keep my distance. I also watch for example when new cars enter the highway, even if its not my lane, to see if there's space enough (and if not, clear space and prepare for an abrupt lane change from cars in the other lane).
The fact is, not everyone is equal when behind the wheel. The law does not (and can not) account for that. They go for safe defaults. That does not mean that anyone that drives over those limits are driving faster than what's safe. I drive under the speed limit if that's what I mean is safe (hello, snow and ice), and above the limit if I feel that's safe (if speed limit is say .. 80 km/h, I will probably vary from 7 to 100 km/h depending on the conditions, like my sight range - sharp turn = no sight beyond that = regulate speed to be able to stop in time if there suddenly was f.x a standstill around the bend)
What's paradoxal, the slower I drive, the less .. what shall we say? data to process? .. I get, the harder I find it keeping myself focused when driving.
Are you streaming music? What type of streaming do you use? Icecast? Have you asked if some of your users would like to host some relays? There are also certain pages out there that offers relaying, in exchange for some branding (or in some cases, ads on the relay page)
And what kind of hardware do you need? Software? Are you looking for a full server, or just streaming relay?
Have you looked at VPS'es? Some offer pretty good deals on bandwidth (although you should contact them first and check if its okay to actually use the deals.. ). One example : http://www.alvotech.de/vserver/ - when it comes to bandwidth they say:
The available bandwidth per vServer is 1,000Mbps. Once traffic has reached 1,000 GB the bandwidth is limited to 10Mbps until the end of the month. Upon request, the traffic limit can be replaced with a fee of 6.90 EUR per 1,000 GB additional traffic.
And yes, a VPS is perfectly fine for serving net radio to a few hundred users, if you got some external relays for the bandwidth hogging.
Or get the LP version of the album :)
What would that solve, exactly? A copied signature is still a valid one. You'll just have two notes someplace in the world that both have the same valid signature.
I just have an interest in marketing because of my smugness and contempt of human intelect.
Please say you did that on purpose :D
I've just gotten my NoSQL feet wet by playing around a weekend with python + mongodb. I am pretty used to SQL, and generally had the same thinking you (and most other SQL people) had.
But, many people liked it, so I figured out I should at least have a look at it. I made a small webapp for tracking my movies, with query to imdb and with users. I was surprised to see that most of the problems I anticipated wasn't a problem at all, and things mostly just worked naturally. For a quick get-started intro to python + mongodb : Part 1 and Part 2. If you got the spare time and some interest, poking around with it is a great little weekend project.
Anyway, back to your question. MongoDB store data in a format very similar to JSON (technically BSON, a JSON superset), if you're familar with that. Unordered key->value and ordered lists. For the python driver, it translates the data to and from native python dict/list structures. I started with three fields; filename, added and imdb. The imdb field was more or less the raw data from imdb (json format, decoded to python native and encoded to mongodb's BSON format again.)
Later on I added option for users to mark movies as favorites and seen (by adding two new fields to movie list, "seenby" and "favoriteof" - both lists - these were added to a movie entry the first time someone marked one as seen or favorite). To add a new user I just did movie["seenby"].append(user_id) and movies.save(movie)
When I wanted to query the db, I created a data structure of what I wanted, and sent that to the server. The server would then return all documents that matched that example structure. So, to find the entry for file "/bla/test.mp4" I would do movies.find( {'filepath': '/bla/test.mp4} ).
For finding by imdb Title value : {'imdb.Title': '300'}. For finding all favorites by user: {"favoriteof": user_id} (yes, it would handle the list of users as you'd expect, and find all that the list of "favoriteof" had user in it. It would also of course skip all entries without that field).
mongodb also support some special keywords for searching. Let's say I have a list of 3 users, and want to have all movies that any of them have favorited. {"favoriteof" : {"$in": users} } would fix that - for movies that all of them have as favorite, {"favoriteof" : {"$all": users} }. Sorting was done using sort_by( field_n_direction_list )
You have a full list of modifiers here. And all could of course be combined to quickly and easily create powerful queries. And you of course have options for indexes. You might notice that you do lose something from normal SQL's here, if you wanted both movie and user info, you'd have to make two queries (well, from what I've understood) so highly relational data is not fitted for this. Also, you don't have the type constraints any more.
In the app I also wanted to list all movie genres (I did one preprocessing of the imdb data, splitting up comma seperated genres string to a list of genres) and number of times each genre was used. This led me to mapreduce, which was the thing I both anticipated most, and feared most. Well, I kinda chickened out, since the pymongo doc had an excellent example which was exactly what I wanted doing, but I did get a look at it at least :) And it was fast enough to not making a noticeable dent in load time for a few hundred movie entries.
*Cough* well, that was a long post.. I hope it helped you at least a bit in answering your question, and maybe inspire you to take a closer look at it when you get some spare time. I've only used it over a weekend, so I've probably just scratched the surface, and I probably have missed some neat features or horrible gotchas here and
That's what I get for not reading the release notes.. But still, the new event mod seems to be a bit limited.
The event Multi-Processing Module (MPM) is designed to allow more requests to be served simultaneously by passing off some processing work to supporting threads, freeing up the main threads to work on new requests. It is based on the worker MPM, which implements a hybrid multi-process multi-threaded server.
This MPM tries to fix the 'keep alive problem' in HTTP. After a client completes the first request, the client can keep the connection open, and send further requests using the same socket. This can save signifigant overhead in creating TCP connections. However, Apache HTTP Server traditionally keeps an entire child process/thread waiting for data from the client, which brings its own disadvantages. To solve this problem, this MPM uses a dedicated thread to handle both the Listening sockets, all sockets that are in a Keep Alive state, and sockets where the handler and protocol filters have done their work and the only remaining thing to do is send the data to the client. The status page of mod_status shows how many connections are in the mentioned states.
The improved connection handling does not yet work for certain connection filters, in particular SSL. For SSL connections, this MPM will fall back to the behaviour of the worker MPM and reserve one worker thread per connection.
So it looks like it still got some ways to go, before being on the same level as for example nginx. It seems like a wrapper around the worker thread, and can keep track of idle connections. However, on nginx or cherokee, which are those I have most experience with, it can also keeps connections idle while waiting for new data from backend / storage. It seems like that event module still needs a thread for that. Will be interesting to see practical comparisons :)
For me it still sounds pretty weird.. :)
But yeah, ram will help a lot, both for caching files, and especially if you use an opcode cache, like APC (which can also have a hot cache in ram).
The more ram, the less it has to wait for the horribly slow disk to spin around, and thus faster answer. Great Win (TM) :)
I'm still waiting for reasonably priced SSD's becoming normal in servers. *sigh* Being able to use an SSD for caching hot data automatically, without killing it instantly.. Sure, RAM is faster and cheap, but SSD is quite a bit larger compared to price, and still vastly superior to HDD. Ram first cache, ssd 2nd cache, disk for the stuff that no one use.
And regarding the game you mentioned.. My experience is that in the vast majority of cases, the speed problem isn't the language, but stupid code.. Write smarter, not harder! :D I don't know what he did wrong, but I know that he did something (or many things) very wrongly. 12 cores? For 400 users?
Reminds me of one of my sites, where you have long polling. First implentation (which I knew was bad, but it was easy and worked for small amounts of users) was just apache -> mod_python -> django -> polling events table every 1 second to see if latest id was changed. And abort routine after 60 seconds and reconnect.
As I said, it worked... For small amounts of users. When it started to hit the limit it hit it in a spectacular fashion. As requests got delayed, and errors started cropping up, connect queues piled up and clients reconnected and reconnected... It was like a small snowflake starting to roll downhill, and suddenly a house-sized snowball hit our server.
The current rewrite works fine for up to 1000 concurrent requests (tested on my dev server, which is weaker than prod), and worked (albeit with noticeable delays) up to 1500 connections. Current peak is around 500 connections. If we hit the new limit I'll need to rewrite it again. I have some ideas, but it would require more complexity overall, so for now the current one works great.
Err, anyway my point with that rambling was.. If you do something very stupid, then you need gigantic amounts of hardware, and can probably do things smarter. Of course..You have to look at the situation :)
Sounds like a fun and interesting system, and a fair bit beyond what I'm usually working with!
Yeah, shared hosting might explain some reasoning behind it :) Running different php processs under different userid's. Still.. "some virtual hosts are running over 30 processes sometimes" - that doesn't really address the main question. With only two logical execution units (two cores), and (rounding off) 200 php processes, 2 cpus over 200 processes = 0.01 cpu slice per process? Not calculating httpd processing.. For that not to be CPU bound, wouldn't that mean that each php process on average need to spend around 90% or more IO bound?
Please don't get me wrong here, I'm not trying to point out wrong things here. I'm just fascinated by large-scale web systems, and honestly curious about the different challenges between the level I'm used to (have yet have any need to move something to more than 3 servers, and rather small servers too. Most stuff I poke a finger into works fine on one machine with some consideration. Oh, and usually working on 1-2 web apps per system), and large scale systems.
Wait.. You're running 190 php processes on two cores? Are you serving static files with php, or using it to query a db on a different machine? And if so, is your DB so slow that you need 190 concurrent requests to get it to max out? Data that can not be cached with memcache, or pages that can't be cached with varnish?
Please, I'm honestly curious what all those php processes are doing, which involves sitting idle 90% of the time. Could you enlighten me?