There’s a lot of agitation today because Twitter broke under the message volume generated during Steve Jobs’ Macworld keynote. I don’t know what that volume was, but I just checked the lower volume of tweets (i.e., updates) going through the “public timeline” (i.e., everything) twice, and both times it was under 200 messages per minute. So, let’s say there’s a much higher volume at peak times, and also hypothesize that Twitter would like to grow a lot, and say that Twitter would like to handle 10-100,000 messages/minute – i.e., 1000+/second — as soon as possible.
That’s easy using CEP (Complex Event Processing). A Twitter update is just a string of 140 or fewer characters. It is associated with three pieces of metadata – author, time, and mode of posting. It should be visible in real time to any of the author’s “followers,” as well as in a single public timeline; perhaps there will be other kinds of Twitter channels in the future. In most cases, these updates are only visible to a user upon page refresh. Almost n
No Twitter user seems to have more than about 7,000 followers, even Robert Scoble or Evan Williams.* The average number of followers, at least among active updaters, is probably in the low hundreds now. So basically, this is all a heckuva lot easier than the tick-monitoring systems Wall Street firms are using today.
I believe there’s a hard cap of 7,500, but nobody seems to have bumped against it yet.Twitterholic gives a different figure than Twitter does for Scoble. And it correctly shows Dave Troy with a little over 10,000.
Here’s how to implement that. You start with a complex event/stream processing engine like StreamBase or Coral8, both of which can handle a large multiple of that volume without working up a sweat. And you basically do everything in RAM. 6 million messages per hour of under 200 bytes each? Not a lot of RAM needed for that. A set of 25 or so recent messages cached for each of, say, the 100,000 or 1 million most recent users? Also not a whole lot of RAM. And if you push out aged messages and replace them with more recent message IDs, the cache gets smaller. Banging things to disk for persistence is an exercise left to the reader.
Meanwhile, Dave Winer views Twitter as a distributed computing challenge and Larry Dignan asks whether Twitter needs to be reliable at all (he leans to “Yes” because of the new apps that would open up).