Twitter Not Rocket Science, but Still a Work in Progress
While it may not be rocket science, the Twitter team has been making a concerted effort to effect better communication with their community at large. Recently they were set-upon by a barrage of technical and related questions and the resulting answers are actually somewhat interesting. "Before we share our answers, it's important to note one very big piece of information: We are currently taking a new approach to the way Twitter functions technically with the help of a recently enhanced staff of amazing systems engineers formerly of Google, IBM, and other high-profile technology companies added to our core team. Our answers below refer to how Twitter has worked historically--we know it is not correct and we're changing that."
Man, who *IS* this guy that his trolling has gotten its own front page article?
This is one of those cases where "effect" is a verb.
Will "a rather pointless waste of time" fit on a postcard? I guess I'm not 'hip' enough to care about Twitter, it all seems a bit pointless for anyone who's not some interweb celebrity.
It doesn't mean much now, it's built for the future.
I wonder what they mean by "elegant filesystem-based approach"? Maybe their going to treat tweets like an email and store it all in the filesystem rather than a database? There's certainly some proven extremely high volume email servers so you know that method scales.
I wonder what the disadvantages of setting up a front end to an email system and covert incoming tweets to actual an actual email is. On the retrieval side you just read the mailbox and convert back to the tweet format then send them on to the destination.
I came to the datacenter drunk with a fake ID, don't you want to be just like me?
You know, I wonder how hard it would be to do a Twitter clone on Google's App Engine. It seems like it would be the perfect fit: relatively simple application that needs to be massively scalable.
Grammar can be fun!
...a geek-celeb circlejerk?
This guy's the limit!
Hiring folks who used to work at IBM or Google is not the same thing as "large companies control[ling] how Twitter works." Some day, you'll have a job and you'll understand that. [Sorry to be an asshole about this, but your comment just shouts "teenage kid who's never had a serious job."] People with experience with large-scale applications may already know solutions to some of the problems Twitter is seeing. Those solutions aren't always in the text books; and if they were trivial and obvious, then such applications would be much more common.
...twitter blog is hosted on blogger (Google), and this morning it was out of service.
how long until
It seems to be meant to suggest that the article's use of "affect" is incorrect. Surely this is mistaken. If suggesting that twitter has anything to do with better communication isn't an affectation, I don't know what is.
Like the AC said, I think you're wildly exaggerating how ideological workplaces are, particularly from the point of view of a server monkey.
What I'm listening to now on Pandora...
Plurk has been gaining popularity in the past 24 hours, and it's handling scalability rather well so far (after having been mentioned by Leo Laporte, Robert Scoble, TechCrunch, and others). I'm very curious to see how well it would hold up if it had the same number of users as Twitter, though.
the JoshMeister on Security
but who would quit Google to work for Twitter???
You may find my appearance and demeanor foolish, but it is you who plays the fool.
I'm a twitter-shitter!
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
Ruby is not Twitter's problem. The algorithm Twitter uses is the problem.
When I started irately writing this post, I wrote it in a tone that would have gotten me modded into oblivion. But then I realized that ignorance, not idiocy, drives the particular myth I'm debunking. let me educate, not flame, those of you who haven't formally studied computer science.
It's become fashionable to blame Ruby for Twitter's problems, but that's wrong. The particular choice of language doesn't matter a bit when you talk about scalability, no matter what the language or the problem.
First, sending a twitter message is an algorithm. An algorithm is just a recipe for doing something to some data. Although most computer science literature deals with more abstract and general algorithms, like those for sorting and searching, the same principles applies to even the most mundane processes, like what rm foo does to a file system, or how a database engine runs an INSERT.
One way we can talk about algorithms is to use something called Big-O Notion, which describes the relationship between how much stuff an algorithm processes and how long it takes to run.*
It's easier to see things with examples. Say we have an algorithm and we give it three sets of data, D1 and D2, and D3, each twice as large as the last, so that D2 is twice as large as D1, and D3 is four times as large as D1.
If we call the algorithm O(1), it will take the same amount of time to process D1, D2, and D3. If we instead say it's O(N), D2 will take twice as long to run as D1, and D3 will take four times as long.
If N represents the number of users for a web application, and we want to double N, twice as many users, we'd need twice as many web servers if the bottleneck algorithms are O(N). If the database is the bottleneck, we'd need a twice-beefier database server, or some partitioning.
Things start to get interesting with O(N^2). In that case, D2 takes four times longer to run than D1, and D3 takes four times longer than D2, which sixteen times longer than D1.
That means that if we want to support twice as many users, we need four times as many web servers, or more likely, a four-times beefier database server.
It can get a lot worse than O(N^2) too, especially if you're not paying attention to complexity. For example, many graph (think social networking) algorithms can easily become O(2^N), which is a lot worse than N to a constant power.
When you try to scale a poorly-designed algorithm (pretty much anything worse than O(N)), you start running out of cores, rack space, electricity, and atoms in the universe.
One useful bit about big-O notation is that it lets us ignore piddly details that don't matter. Say we had an O(2N) version of the O(N) algorithm. Sure, the O(2) algorithm might take twice as long to run, but it can still handle double the data with double the capacity or double the time. Even if it's O(10N), you don't start boiling the oceans to cool your data center when you want to increase your visit capacity a thousandfold.
This observation is why the choice of language doesn't matter. If a language implementation is slow, all it does is add a constant factor to any algorithms written in that language. A Python application might be ten times slower than one written in C, but its big-O complexity will be the same.
At the worst, that means you'll need ten times as many servers as with the C web application. The increase in development efficiency writing in Python (or Ruby on Rails, or Lisp, or anything else) might make the trade-off worth it. You can deal with a constant factor slowdown.
If on the other hand, you code a wicked fast implementation of an O(N^3) algorithm in C, no amount of hardware will save you. You'll hit a number of users beyond which your servers slow to a crawl and you lose blagosphereic karma. Even if you double your capacity, or buy a four-times-beefier database server, that
Oh, ideology affects cage monkeys. The use of open source versus closed source, burchasing enough licenses for the software you use, and making '99.999%' uptime actually mean that instead of simply hiding downtime, and forcing people to spend more time documenting how much time they spent on a task than actually doing the task are all policies I've seen affect server work. Those may not be idologies per se, but they certainly arise from philosophies about how things should be done.
Psst, there's actually no "Twitter team." It's just one guy with like ten accounts.
DRM: Terminator crops for your mind!