January 16, 2008

Fixing Twitter in three letters: CEP

There’s a lot of agitation today because Twitter broke under the message volume generated during Steve Jobs’ Macworld keynote. I don’t know what that volume was, but I just checked the lower volume of tweets (i.e., updates) going through the “public timeline” (i.e., everything) twice, and both times it was under 200 messages per minute. So, let’s say there’s a much higher volume at peak times, and also hypothesize that Twitter would like to grow a lot, and say that Twitter would like to handle 10-100,000 messages/minute – i.e., 1000+/second — as soon as possible.

That’s easy using CEP (Complex Event Processing). A Twitter update is just a string of 140 or fewer characters. It is associated with three pieces of metadata – author, time, and mode of posting. It should be visible in real time to any of the author’s “followers,” as well as in a single public timeline; perhaps there will be other kinds of Twitter channels in the future. In most cases, these updates are only visible to a user upon page refresh. Almost nNo Twitter user seems to have more than about 7,000 followers, even Robert Scoble or Evan Williams.* The average number of followers, at least among active updaters, is probably in the low hundreds now. So basically, this is all a heckuva lot easier than the tick-monitoring systems Wall Street firms are using today.

*I believe there’s a hard cap of 7,500, but nobody seems to have bumped against it yet.Twitterholic gives a different figure than Twitter does for Scoble. And it correctly shows Dave Troy with a little over 10,000.

Here’s how to implement that. You start with a complex event/stream processing engine like StreamBase or Coral8, both of which can handle a large multiple of that volume without working up a sweat. And you basically do everything in RAM. 6 million messages per hour of under 200 bytes each? Not a lot of RAM needed for that. A set of 25 or so recent messages cached for each of, say, the 100,000 or 1 million most recent users? Also not a whole lot of RAM. And if you push out aged messages and replace them with more recent message IDs, the cache gets smaller. Banging things to disk for persistence is an exercise left to the reader.

Meanwhile, Dave Winer views Twitter as a distributed computing challenge and Larry Dignan asks whether Twitter needs to be reliable at all (he leans to “Yes” because of the new apps that would open up).

Comments

12 Responses to “Fixing Twitter in three letters: CEP”

  1. Henri Asseily on January 18th, 2008 3:01 am

    RAM doesn’t matter, really.
    Worst case, just add on a solid state disk and you’ve got all the speed you need.
    If you want to be sneaky, figure out how to make each message (or chunk of messages) its own unique document in memory space for the purpose of the OS, then create a massive virtual memory space on the SSD and let the OS do all the work regarding memory optimization. It will automatically page to the SSD the least used memory, and you don’t have to deal with it until you wipe old messages.

  2. Scripting News for 1/18/2008 « Scripting News Annex on January 18th, 2008 11:54 pm

    […] read this story on DBMS2, as part of the initial discussion, that explained there is commercial-grade software used […]

  3. Curt Monash on January 19th, 2008 12:11 am

    Henri,

    RAM always matters. :)

    But you’re right that solid-state memory has a role to play here. If I’m right with my ballpark figure of 20 megabytes/minute as an initial design goal (and that only for bursts), then a few hundred gigabytes of solid-state memory would be a huge help.

    Eventually Twitter will surely want to institute more searchability of message archives. That’s when solid-state memory will really shine.

    CAM

  4. Mike on January 19th, 2008 1:54 pm

    AFAIK, Twitter’s performance problems are mainly caused by the fact that it’s implemented in Rails; I don’t believe it is a data management/data access issue at all. But I could be wrong.

  5. Curt Monash on January 19th, 2008 3:52 pm

    Mike,

    That may be. But if it were to scale up significantly, database access could quickly become the bottleneck. Also, I think it’s likely that for usability we’ll need more filters or channels, and that compounds the data access issues.

    CAM

  6. Don Park on January 19th, 2008 6:58 pm

    As I see it, the main problem is that they don’t see the problem. Hopefully, improvements made by Joi’s Twitter Japan team will trickle up to Twitter US.

  7. Jim Deville on January 21st, 2008 1:13 pm

    Mike, Curt:
    Yes Twitter is running on Rails, however, if you go back to the original scaling kerfufle, you’ll see that Rails wasn’t the scaling issue. DB access was. Rails can scale just fine. It just didn’t have the ability to properly scale across multiple databases at that time.

    I don’t know if the keynote was a database thing or not, but I wanted to point out that Rails isn’t necessarily the problem.

  8. Curt Monash on January 21st, 2008 5:06 pm

    Jim,

    Thanks. I was going to post that myself. There’s a comment in the original Dave Winer scripting.com comment thread (liked above) from a Twitter guy spelling out the point.

    Best,

    CAM

  9. Text Technologies»Blog Archive » Sturgeon’s Law, and the future technology of social technology on February 5th, 2008 6:04 am

    […] Filtering technology for Twitter (CEP would do the job) […]

  10. Text Technologies»Blog Archive » The comprehensive guide to upgrading – or replacing – Twitter on February 9th, 2008 9:56 pm

    […] and distributing messages in real time. As I’ve already pointed out, this should be done via complex event/stream processing (CEP), not by writing everything first to a database. The need for much more complex filters just makes […]

  11. Twitter turmoil: when does it end? | Enterprise Alley | ZDNet.com on April 24th, 2008 11:15 am

    […] than what Twitter needs today. (Coral8 can do the same things.) That’s a lot of headroom. http://www.dbms2.com/2008/01/16/twitter-could-e… has some […]

  12. vaibhavi on July 18th, 2013 11:29 am

    can you give er model for twitter?

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.