Twitter Throttling Hits Third-Party Apps
Barence writes "Twitter's battle to keep the microblogging service from falling over is having a dire affect on third-party Twitter apps. Users of Twitter-related apps such as TweetDeck, Echofon and even Twitter's own mobile software have complained of a lack of updates, after the company imposed strict limits on the number of times third-party apps can access the service. Over the past week, Twitter has reduced the number of API calls from 350 to 175 an hour. At one point last week, that number was temporarily reduced to only 75. A warning on TweetDeck's support page states that users 'should allow TweetDeck to ensure you do not run out of calls, although with such a small API limit, your refresh rates will be very slow.'"
Isn't that an update nearly every 20 seconds? How fast do people need to see that you're currently wiping your butt?
Any information that needs to be distributed more than once per minute probably shouldn't be relying on twitter.
Over the past week, Twitter has reduced the number of API calls from 350 to 175 an hour.
Okay, if you're making that many calls to Twitter then there might be an inherent flaw with their RESTful interfaces. I think for a long time, the "web" as we know it has suffered from the lack of the Event/Listener paradigm. This is a pretty simple design concept that I'm going to refer to as the Observer. Let's say I want to know what Stephen Hawking is tweeting about and I want to know 24/7. Now if you have to make more than one call, something is wrong. That one call should be a notification to Twitter who I am, where you can contact me and what I want to keep tabs on--be it a keyword or user. So all I should ever have to do is tell Twitter I want to know everything from Stephen Hawking and everything with #stephenhawking or whatever and from that point on, it will try to submit that message to me via any number of technologies. Simple pub/sub message queues could be implemented here to alleviate my need to continually go to Twitter and say: "Has Stephen Hawking said anything new yet? *millisecond pause* Has Stephen Hawking said anything new yet? *millisecond pause* ..." ad infinitum. I'm not claiming Twitter does this but a cursory glance at the API looks like it's missing this sort of Observer paradigm that allows for the scalability they need.
...
I'm not leveling the finger at Twitter, it's a widespread problem that even I have been a part of. Ruby makes coding RESTful interfaces so easy that it's very very tempting to just throw up a few controllers that are basically CRUD interfaces for databases and to call it a day. I suspect that Twitter is feeling the impending pain of popularity right about now
My work here is dung.
It's high time that the so-called "Web 2.0" companies ditch the NoSQL bullshit they've started to put into place. It's not bringing the scalability benefits they all claimed it would, and it's leading to data with very questionable reliability otherwise (not that their data is particularly valuable in the first place...)
A lot of these scalability problems could be solved by using a proper RDBMS on proper hardware that's designed to handle huge concurrent workloads. This level of traffic isn't new by any means. There are many POS systems around the world, from retail operations to airlines, that deal with a similar level of "traffic".
It doesn't matter if they go with a database and hardware stack from Oracle, or a DB2 and hardware stack from IBM, or even use Sybase's ASE on hardware from HP. They just need to invest in some real hardware and some real database systems that are meant for dealing with absolutely huge loads.
Ditch NoSQL databases. Ditch shitty servers. Start using real software, and start using real hardware. That's what other businesses do when they "grow up". If twitter is a viable business, it's time for them to grow up, too.
I wonder if it would have much of an impact if they switched from the verbose JSON/XML over HTTP formats for the API to a binary UDP-based protocol. Twitter seems well suited to such a protocol since it is so simple and the messages ar so short
Is it that they are doing too much processing on the data, wasting too much bandwidth or is their database causing trouble? Since its twitter obviously any bandwidth used is a waste, but you know what I mean
Company bases a business model on offering their resources for free, only to discover to their chagrin that people will take them up on it. Where oh where have I heard this one before?
DRM: Terminator crops for your mind!
Nobody goes on Twitter anymore -- it's too crowded! (With apologies to Yogi Berra.)
I've abandoned my search for truth; now I'm just looking for some useful delusions.
What about clients who don't have a constant connection to the Internet, or who have a dynamic IP? Now twitter has to poll them, to see if they exist. You end up with the same situation, except worse.
E-mail seems to be doing just fine, despite these "shortcomings".
Yawn, just because the client isn't polling (REST, is just a way of saying polling to make people feel better), doesn't mean this doesn't work on just about every damn device out there. TCP keep-alives are supported by all the major TCP stacks and all the minor ones I've ever used (although not strictly required per RFC 1122). With reasonable configuration parameters for maintaining connections with little data transfer, its possible to keep a port open for basically an indefinite time period. Once the port is open, its going to consume server resources (and having more than a few 10k ports per IP is a problem, and is itself probably a good reason for having some kind of periodic queue poll type mechanism), but its going to significantly lower the bandwidth vs a polling mechanism.
That said, a big part of the problem is HTTP, and the insistence to use it for a API data transport even when its not well suited for such. Even worse though is the use of web servers like apache that consume significant resources for keep alive transactions. Frankly, though to be fair Apache was designed more for an environment where a lot of different machines were connecting for short periods of time, and then they were done. The http 1.1 keepalive mode didn't mesh well with the one process per connection model, and works only marginally better using the one thread per connection model now in use.
So, basically I don't think any of your arguments hold. Even over actual network failures, client standby, network changes, etc. The client will be notified of connection loss and can simply reconnect. Once reconnected, queued notifications can be issued, or the client can repoll before reconstructing the notification system.
Frankly, as someone who works with extremely high band-width (many GBytes/sec), high IO rate systems (100k/sec transactions) per node, I'm shocked at the problems twitter has. Fundamentally, i'm betting someone who didn't have to deal with the the BS could get the whole system running on a few fairly high power server nodes. The entire data set probably could be fit in RAM on a modern high end server. Its not like they are moving a lot of multiple MB messages around, or running really complex searches.
Just imagine what google would be like if written the same way.
Does Twitter make money? I'm not trolling, I'm serious. A quick search yields this article:
http://www.pcworld.com/businesscenter/article/200635/twitter_to_promote_marketers_special_offers.html
Even the author of my linked article has doubts. If I wasn't making money, I'd try to limit my expenditures (bandwidth costs, etc.) too. It's not surprising to me.
So how do they make money?
Twitter is a fundamentally stupid idea. It is like trying to run all of the mailing lists in the world from one server (and by 'like' I mean exactly the same) The end result is half as useful and twice as shitty. Seriously, write a web2.0 listserv interface and you will amaze tweeters. You can tweet with email holy cow!
Yes, it's a mailing list, suprise!