Facebook Releases Open Source Web Server
Dan Jones writes "Ah the irony. The week Facebook is being asked to cough up source code to satisfy an alleged patent infringement, the company releases an open source Web server. The Web server framework that Facebook will offer as open source is called Tornado, was written in the Python language and is designed for quickly processing thousands of simultaneous connections. Tornado is a core piece of infrastructure that powers FriendFeed's real-time functionality, which Facebook maintains. While Tornado is similar to existing Web-frameworks in Python, it focuses on speed and handling large amounts of simultaneous traffic."
Facebook was built with PHP?.
I don't think it means what you think it means.
They are not the same thing (as the article makes clear).
This sounds interesting and will definitely take a look - but I doubt I'll be ditching Pylons any time soon.
I posted this news first, but it seems somebody else got it on the front page :/
http://slashdot.org/firehose.pl?op=view&id=5864125
Here, have a cookie...
"Thats not IRONY chumps & chumpettes, its just coincidental".
Relax... There's life outside Slashdot.
Tornado includes both a Web server and a Web framework. The framework can take advantage of the (non-blocking) server architecture to achieve high performance. Apparently you can also run it under mod_wsgi, but I can't really see an advantage of using it in that scenario when compared to other Python frameworks.
I wonder if the Tornado authors set forth to re-implemented Twisted Python just for kicks or out of not knowning about its existence.
Twisted supports epoll kqueue, win32 iocp, select, etc.
Lone Gunmen crew.
Submitted by Dan Jones on Thursday September 10, @03:42PM
Submitted by pharazon on Thursday September 10, @11:51PM
Sorry he beat you. It's just been posted now.
Twisted is hard to learn. It's the sort of thing that programmers will re-implement just to avoid reading the documentation.
Or maybe they wanted to have control. Whatever the case, they would have know. Everybody (who uses python for web work) would know a bit about Twisted ... it's on the front page of python.org
It's just coincidental!
That's pretty bold to claim your framework has better performance then another one that's not publicly available.
Twisted is a networking library, this seems to be a webserver (which Twisted can do) as well as a framework
If this webserver is supposed to be fast, than just how fast is it? Is it faster than lighttpd? YAWS? I'd like to know.
Most Python web servers use threading or multiple processes to handle concurrent requests and are not implemented as event driven systems. Most Python web applications are not designed to be implemented on event driven systems but rely on the ability to block during handling of web requests, something which the former allows but which doesn't work well with event driven systems as it blocks the main event loop and prevents anything else happening. So, it is not similar to other Python web servers or frameworks.
It should be further highlighted that WSGI for Python is effectively designed for that blocking model and it isn't really a good idea to be using it with a server based on event driven systems model and which uses multiple processes as well. Attempting to do so can have undesirable effects such as described in 'http://blog.dscpl.com.au/2009/05/blocking-requests-and-nginx-version-of.html'. Some seem to hope that WSGI 2.0 will support asynchronous systems but reality is that it almost definitely will not, so they should stop dreaming.
So, although these sorts of high performance servers are interesting, their applicability to most existing Python web applications is limited because in practice the web application has to be designed around the event driven system model and you can't really use standardised Python WSGI interface and components that build on that.
This doesn't mean that these type of servers aren't useful, they just aren't going to solve everyones problems and will principally remain a niche solution for things that need to main many long lived connections.
As to the benchmarks they give, it is very much just a pissing competition and nothing more. The bulk of web sites would never even handle enough hits to trouble the limits of the other hosting solutions they compare to. For larger sites, they are never going to use a single machine anyway, but use a cluster of machines to spread load and for redundancy. Yes, it may provide more head room for individual machines, but again we aren't talking about a situation which the majority would even have to deal with.
I don't know, Java, C++ and python all run at fine speeds if you write proper code for the language. C++ is probably the fastest in most cases, but Java is going to be a real close second written properly and on the right VM. While I don't like python myself, theres a reason it gets used in games, it can perform well enough to be used extensively if you can deal with compile time, which wouldn't really matter for long running process like a web server.
Perl isn't HORRIBLE, again, startup time is its biggest problem. PHP has issues, but when zend, precompiling and caching again, it works better than most expect.
I know nothing at all of Erlang so I won't speak to it.
MySQL is known for being fast as hell under the right workload, just gotta use it the right way.
Mix in some memcached and you can server a lot of hits.
Considering the number of extremely high traffic websites that use a mix of software about like this one, I think you'd have to be pretty stupid to put the blame on the software thats used.
Do you run a server farm that gets more traffic than Wikipedia, Yahoo or MySpace? I'll talk some shit about languages and say that everything should be written in C at the highest, by proper programmers so we don't end up with OSes that need gigs of ram to boot ... but ...
While possible, even I'm not arrogant enough to call them stupid.
I don't find anything about Wikipedia's setup 'impressive', but its certainly done properly. Their mix of php, python and mysql is all used exactly as is should be and serves a massive amount of people on a relatively low amount of processing power.
But again ... stupid? No, they are hardly stupid.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Relax... There's life outside Slashdot.
Heretic! Burn him!
It's a web app framework. Please click through the links.
Save your wrists today - switch to Dvorak
Actually Python is pretty slow, about 50 times slower than C++, but that's usually ok since you can put the bottleneck into a C++ module. However, if all the server software is in Python, things will be significantly slower.
Perl is actually horrible: it is the slowest language in the survey I linked, except for Ruby, plus we all know what the code looks like.
As for Erlang, it fares relatively well (though still 15 times slower than C++), but its main competitor would probably be Haskell, which also is faster.
Victims of 9/11: <3000. Traffic in the US: >30,000/y
Not really. It just means the onus is on the other framework to prove them wrong. Think of it as throwing down the gauntlet.
Come play free flash games on Kongregate!
this comparison Is more fair (IMO) as it shows that the same performance can be got by a well written program in many languages. p.s your own graph shows perl as having the potential to be better than python if its written well.
about 50 times slower than C++
At doing what? once you get to large applications that do more than pure maths/simple tasks, the performance of the language becomes negligible compared to the performance of what you are writing. For something as large and being updated as often as facebook, C++ (or any compiled language is out of the question), and if you've got the hardware to support it your much better of going with a language that results in less code (see my link) and easier maintenance (even I can read python code and know what's going on, and i could before i knew python too!)
I doubt there has been a week without me having some problem with Facebook functionality. Usually it means I can't access a friend's profile, some photos or such for a duration ranging from half an hour to several hours. (Rather big bug when that is what Facebook is used for) I've often heard that someone I know hasn't been able to login during a whole day. The list goes on and on - similar big problems are rather frequent.
Lesser bugs (such as me getting some notification multiple times, etc.) occur often several times a day.
I understand that having that large userbase must be difficult. It is understandable that some bugs come up every once in a while. Their service works "OK, good enough" most of time as it isn't used for anything crucial or really important so bugs don't matter. However, I would rejoice about this release a lot more if Facebook had the record of providing high quality, bug free service.
Actually, for tight loops, CPython (the main implementation) is a whopping 200x slower than C.
Reasons why tight loop speed doesn't matter:
- This isn't the kernel. Tight loops don't occur much. If you're polling or spinlocking, stop it and go read up on select, or switch up to a high-performance async library like Twisted. If you're doing number-crunching, use things like comprehensions or multiprocessing.Pool.map to accelerate your math. (Or use both; the former gets a speed boost in implementation, while the latter is concurrent across multiple processors.)
- Programs are usually not CPU-bound. Profilers tell all, really. Games are usually GPU-bound, unless they're written without a separate sound thread, in which case they get I/O-bound. Webservers are usually I/O-bound, and spend most of their time in select/epoll/etc. waiting for connections.
- Implementations can and will get fast, eventually. Unladen Swallow is one thing being talked about, but PyPy is also worth mentioning. The former is a bunch of CPython improvments, the latter is a JIT Python interpreter that matches C code for tight loop speed.
I know this is not a popular idea with a lot of people, particularly those working in places where "OMG speed is critical," but Python's execution speed just doesn't matter compared to its readability and time/LOCs required to get up off the ground and running.
~ C.
~ C.
As we say in #python, "Programming is hard!"
Learning Twisted is so much easier than rolling your own networking mini-library. Sure, a lot of people are kicking and screaming in the beginning, but once they actually sit down and start coding, they usually say something like, "Oh, hey, this is nice."
There's a reason it's popular.
~ C.
MySQL is known for being fast as hell under the right workload, just gotta use it the right way.
Sure its fast, when you don't turn on data validation.
If someone is passing you on the right, you are an asshole for driving in the wrong lane.
They explicitly states that they looked at Twisted and chose to write something more user-friendly. Having looked at Twisted (3-4 years ago though) and at Tornado's samples and benchmarks I think they succeeded. Twisted seems to be going the way of Zope: an interesting platform that did everything its own way and shut itself out from the rest of the Python universe, eventually losing relevancy.
I think a Tornado/Django mashup (Tornado infrastructure, Django front-end/application bootstrapping) would be realllly interesting....
The twisted folks have been working on web frameworks for years (nevow/athena comes to mind). One problem with twisted is that the core devs don't focus much on marketing (ala RoR) so not many people know about it. These guys had a good comet implementation before the phrase was coined.
I use CPython for performance dependent stuff and have found the loops themselves, even not doing anything, are surprisingly slow. Do you have a reference for your "200x slower than C" claim? I'd be interested to see if it tallied with my experiences.
Bret Taylor says:
When we started, we did use Twisted. In practice, I found Twisted tedious. The deferred abstraction works, but I didn't love it in practice. Likewise, the HTTP/web support in Twisted is very chaotic (see http://twistedmatrix.com/trac/wiki/WebDevelopment ... - even they acknowledge this). In general, it seems like Twisted is full of demo-quality stuff, but most of the protocols have tons of bugs.
Given all those factors, it didn't seem to provide a lot of value. Our core I/O loop is actually pretty small and simple, and I think resulted in fewer bugs than would have come up if we had used Twisted.
It is both - a non-blocking webserver and a framework designed to take advantage of that. Please click through the links (Is that British for RTFA?)
Learn about Photography Basics.
Perl is slower at what? Regexes, text processing?... Heh.
Btw, nobody implements b-trees or other stuff like that in pure perl. They just use the C implementations on CPAN.
And ugly code is all in the eye of the beholder. To me, Java's verbosity is ugly. Python slightly less so. LUA is nice. C and Perl are beautiful.
I don't know, Java, C++ and python all run at fine speeds if you write proper code for the language. C++ is probably the fastest in most cases, but Java is going to be faster written properly and on the right VM.
Fixed that for you.
Website Hosting
Actually Python is pretty slow, about 50 times slower than C++, but that's usually ok since you can put the bottleneck into a C++ module.
It's not quite as simple as that, since you also have to consider all the other factors involved (like amount of effort to stabilize the production solution, flexibility of the solution, etc.) Speed is only one - important - aspect.
And if you're in an I/O bound process, it matters not at all; you're going to be waiting for devices to do their stuff anyway...
"Little does he know, but there is no 'I' in 'Idiot'!"
I wonder if the Tornado authors set forth to re-implemented <a href="http://twistedmatrix.com/trac/">Twisted Python</a> just for kicks or out of not knowning about its existence. Twisted supports epoll kqueue, win32 iocp, select, etc.
And what makes you think they didn't know? Are you privy of information that objectively and clearly indicates the authors DID NOT have any valid technical or business reason AT ALL to implement Tornado as opposed to adopt Twisted?
To be honest, I don't know of any evidence, for or against. I have no clue of their reasons (intelligent and/or stupid). As a result I don't assume either. A more constructive and useful question would have been I wonder what were the technical or business reasons (if any) that lead Facebook to implement Tornado? Did they find a technical problem with Twisted? Did they have a strategic reason not to use it? Did they already have a lot of functionally-related Python code built in-house, making the creation of Tornado a reasonable step? I would like to know so that I can clearly understand this on its own merits.
I dunno, it's the pragmatic engineer in me talking here.
I never made it past the kicking and screaming part :-).
I remember when legal used to mean lawful, now it means some kind of loophole. - Leo Kessler
True, you can. But Facebook is at the forefront, the bleeding edge. They're doing stuff that nobody else is doing yet. So it'd be more like complaining about a surgeon killing a patient during a procedure that no one has ever tried before, like a heart transplant in 1973. And they do successfully serve most of their customers. I'm no facebook fanboy, I'm just saying they are pushing the limits, and also doing what they can to advance the technology.
Threads suck ass anyway.
I know this is not a popular idea with a lot of people, particularly those working in places where "OMG speed is critical," but Python's execution speed just doesn't matter compared to its readability and time/LOCs required to get up off the ground and running.
I have heard this perspective before but found that when you have a team of developers that share this philosophy, you end up with VERY slow software. When you are writing software used by one person maybe you can focus on readability, but if you're dealing thousands or millions of users, forget about it. Readability does not have to come at the cost of efficient code either. There are very few occasions where making code fast makes it very confusing. In fact, I would argue that there is a lot of slow code that is very confusing to me. If people take the time to optimize it, it usually is easier to understand because there are no un-needed execution. In these few cases where very fast code is a little hard to understand, you just need to add a few lines of documentation to explain what you're doing.
No Sigs!
If you bothered to follow the link I provided, there is a large Read-the-FAQ link answering your question and more.
Victims of 9/11: <3000. Traffic in the US: >30,000/y