I have a bunch of backlogged post subjects in or around short-request processing, based on ongoing conversations with my clients at Akiban, Cloudant, Code Futures (dbShards), DataStax (Cassandra) and others. Let’s start with Akiban. When I posted about Akiban two years ago, it was reasonable to say:
- Akiban is in the short-request DBMS business.
- MySQL compatibility is one way to access Akiban, but it’s not the whole story.
- Akiban’s main point of technical differentiation is to arrange data hierarchically on disk so that many joins are “zero-cost”.
- Walking the hierarchy isn’t a great way to get at data for every possible query; Akiban recognizes the need for other access techniques as well.
All of the above are still true. But unsurprisingly, plenty of the supporting details have changed.
Akiban company basics include:
- 20 employees.
- Reasonable amounts of venture capital.
- Offices in downtown Boston (in the same office complex as Cloudant).
- Enterprise edition product in beta, planned for Q2 release.
- Open source edition coming some time after that.
- Several users whose detailed stories are in Akiban’s marketing materials (user names currently NDA).
Akiban technical basics start:
- The central idea of Akiban is that if you have a hierarchy of tables in a schema, such as Customer-Order-Detail, then the rows of a child table are physically both ordered by and interleaved with the rows of its parent.
- A kind of physical composite key called hKey, organized into something called a group index, then takes the place of a conventional index. Akiban says this makes many kind of joins essentially free.*
- Akiban further says that even when you don’t get that benefit, reading and joining via hKeys and the group index is usually at least as fast as it would be to use conventional b-trees. Even so, other data access approaches are on the drawing boards.
Because Akiban’s great virtue is short-request join performance, its target market starts with online businesses that maintain some kind of customer profile along with transaction or presence data — for example retailers or dating services.
*For marketing purposes, the word “essentially” is usually omitted.
In its initial release, Akiban will be a single-server product, dependent on MySQL. That is:
- Akiban has a reasonable amount of MySQL compatibility, but won’t promise to be a full MySQL work-alike in all edge cases.
- Akiban (the company) advises you to take your existing, overstressed MySQL application and instruct its load balancer to send the worst “problem queries” to Akiban (the product). Akiban believes there’s a great chance it will execute those queries 10-100X faster.
- Akiban (the company) figures that if you have a MySQL performance problem, you already are replicating MySQL data to multiple read slaves. It wants Akiban (the product) to be one more of those slaves.
- Technically, Akiban (the core DBMS product) isn’t a MySQL storage engine. Rather, there’s a separate MySQL storage engine that receives the replicated data, and then sends it on to Akiban to be stored in Akiban’s structure.
Akiban’s idea of initially focusing on existing MySQL installations makes sense, because:
- That’s where the pain is.
- Working through MySQL lets Akiban say “Don’t worry about using new technology from a small company; if something goes wrong you can always use any other flavor of MySQL instead, including from a very large database company in Redwood Shores.”
Looking ahead, Akiban tends to conflate the ideas of:
- An open source version of its product.
- A version of its product that is a relational DBMS with full fit-and-finish, without relying on MySQL in any way.
- A version of its product that people will use for new, greenfield applications.
I think those aspects of Akiban’s strategy could still use some refinement.
As for possible Akiban futures:
- Founder/CTO Ori Herrnstadt likes to point out that Akiban’s physical architecture could be a great match for object-oriented programs, and hence Akiban would be well-suited for an object-oriented interface, presumably in the form of ORM (Object-Relational Mapping/er) transparency.
- Ori also notes that Akiban’s data organization scheme would lend itself nicely to scale-out. (dbShards bases some of its scale-out on a similar concept.)
- The potential second, columnar copy of the data I wrote about 2 years ago, while still a possible direction, is a much lower priority than it seemed then, due to the ability to get good performance in other ways.
One thing we haven’t talked about much is write speed. It would seem challenging for Akiban to achieve append-only/log-structured-merge kinds of write speeds. Update-in-place seems like a more suitable model. To me, that screams “solid-state storage”, but of course the reality is that plenty of high-volume MySQL sites today do update-in-place on cloudy disk-based systems. And a few of them even seem to be using Akiban.