Tokutek has a paradoxical pitch: Tokutek writes data particularly quickly, and therefore you’re supposed to buy Tokutek for query-oriented uses. Highlights of the Tokutek story include:
- Tokutek is a MySQL storage engine.
- MySQL/Tokutek writes indexed data a lot faster than B-tree-based alternatives. (The claim is 10s of 1000s of rows per second on a single server.)
- MySQL/Tokutek reads data at B-tree speeds. (But not, I presume, at the speed of specialized analytic DBMS.)
- Tokutek is not yet ACID-compliant. They’re working on that, but we don’t know what the performance implications will be when they achieve it. ACID compliance won’t come as soon as the May release (Tokutek Version 2.0).
- Tokutek has made one sale. Others are in the pipeline.
Tokutek’s initial target market is the usual combination of clickstream/personalization/other network management. The idea is that many data warehouse technologies have trouble getting latency below, say, 15 seconds to 5 minutes, at least at very high update volumes. So if immediacy is more important than raw complex query performance, Tokutek’s performance profile could be attractive.
Tokutek’s core technology is something it’s named “fractal tree” indexing. The “fractal” part of the name is based on a kind of “self-similar at different size ranges” flavor to the scheme. But if I understood correctly, it doesn’t sound like much of a tree, in that it always goes root-branch-leaf, rather than ever going root-branch-branch-branch-leaf or whatever.
The basic idea is that (to a first approximation) there’s one index block each of length 1, 2, 4, 8, and so on. With each insert, blocks get emptied and filled, so that each block is either completely empty or completely full. (And each block is in a sorted state as soon as it’s filled.) This results in a huge amount of shuffling in RAM — where the shorter blocks live — but disk only gets touched for occasional orderly pre-sorted merges. Tricks are obviously needed to:
- Make Tokutek lookup speed be order (log(N)) rather than order (log^2(N)) — that’s done by salting each block with information about where values lie in the next one.
- Make Tokutek disk merges operate in background.
- Dump data quickly when it’s time to delete it — I didn’t ask how Tokutek does that.
I have a plane to catch, so I’ll just post this much and stop. More later.