A press person recently asked about:
… start-ups that are building technologies to enable MySQL and other SQL databases to get over some of the problems they have in scaling past a certain size. … I’d like to get a sense as to whether or not the problems are as severe and wide spread as these companies are telling me? If so, why wouldn’t a customer just move to a new database?
While that sounds as if he was asking about scale-out relational DBMS in general, MySQL or otherwise, short-request or analytic, it turned out that he was asking just about short-request scale-out MySQL. My thoughts and comments on that narrower subject include(d) but are not limited to:
- The biggest web companies had to go to non-transparently sharded MySQL years ago. The NoSQL movement is, in no small part, an attempt to improve upon that. Ditto for scale-out short-request MySQL.
- Some overlapping categories of companies or projects who need scale-out short-request database processing are:
- The aforementioned big companies who have other applications they haven’t hand-sharded yet.
- Other web companies whose applications are getting that big.
- Conventional enterprises whose web efforts happen to be very big.
- Sensor networks and other massive sources of machine-generated data.
- Certain specialized areas (e.g., financial trading).
- Relatively few of these applications are totally impossible to do in Oracle. But the Oracle approach might be very expensive.
- In particular, there’s a break point when companies — often SaaS vendors — outgrow Oracle Standard Edition.
- Yes, the alternatives usually are one of MySQL or Oracle.
- InnoDB isn’t an alternative to these newer technologies; it’s just a piece of the puzzle and indeed of default MySQL now. Several of them — e.g. dbShards — are meant to be used in conjunction with InnoDB.
- Merging his list and mine, the high-performance/scale-out MySQL alternatives look like dbShards, Schooner, ScaleBase, ScaleDB, Tokutek, Akiban, Xeround, and Clustrix. The first two are to my knowledge more proven than the rest.
- Proprietary hardware and the associated hardware/appliance pricing aren’t very appealing for these applications. That speaks against Oracle Exadata and Clustrix, and is the reason Schooner switched to a software-only strategy despite some initial appliance sales.
- However, hardware band-aids such as solid-state drives or even RAM-based solid-state storage could make more sense:
- If, for performance, you’ve scaling out your database so that it fits in RAM on each box, you don’t really have a disk-based architecture anyway, now do you?
- Even if you’re not doing that yet — if your problem is throughput rather than storage capacity, silicon-based storage could be a big help.
- In principle, devices of that kind can be moved from one application to another, after the first one is rearchitected not to need them. (In practice, however, I don’t know of anybody who is doing that. I also don’t believe that Kaminario et al. are marketing that kind of idea, more’s the pity.)
- My notes on all this from April, 2010 are already badly outdated, but may be interesting anyway.