I went to Bracknell Wednesday to spend time with the Kognitio team. I think I came away with a better understanding of what the technology is all about, and why certain choices have been made.
Like almost every other contender in the market,* Kognitio WX-2 queries disk-based data in the usual way. Even so, WX-2′s design is very RAM-centric. Data gets on and off disk in mind-numbingly simple ways – table scans only, round-robin partitioning only (as opposed to the more common hash), and no compression. However, once the data is in RAM, WX-2 gets to work, happily redistributing as seems optimal, with little concern about which node retrieved the data in the first place. (I must confess that I don’t yet understand why this strategy doesn’t create ridiculous network bottlenecks.) How serious is Kognitio about RAM? Well, they believe they’re in the process of selling a system that will include 40 terabytes of the stuff. Apparently, the total hardware cost will be in the $4 million range.
*Exasol is the big exception. They basically use disk as a source from which to instantiate in-memory databases.
Other technical highlights of the Kognitio WX-2 story include:
- WX-2 is designed for shared-nothing MPP. But like most other shared-nothing vendors, Kognitio often winds up supporting SAN-in-a-box disk arrays.
- WX-2 is fairly silicon-heavy. In a typical installation, 8-core nodes will each manage 140-300 gigabytes of disk. I get the impression WX-2 is more CPU- than disk-bound, which may be why Kognitio has little interest in disk-based data compression.
- WX-2 has complete equality among nodes; there is no head/queen. For example, any node can receive, optimize, or compile a query.
- WX-2 compiles queries into low-level code. Roger says this reduces code path length by a factor of 10.
- WX-2′s optimizer is aware of what data is in RAM. A WX-2 DBA can deliberately replicate part or all of the database to RAM, in a way that the optimizer is aware of. This is not an ordinary cache, although I forgot to ask whether there’s also an ordinary cache in addition. Roger says that most WX-2 clients use this capability.
- Typical WX-2 installations have a little less than one data storage process per disk assigned to a node. Those disks that set aside a little space for the actual software get mirrored in their entirety, and the default is one process for that mirrored pair and one process for each other disk.
- Typical WX-2 installations have one query execution process per core.
- WX-2 has its own RAID-like scheme, rather than relying on RAID from storage providers.
- For those who care about such things, Kognitio claims WX-2 has linear scalability.
Non-technical highlights include:
- Kognitio is privately held. The investors at this point seem mainly to be two individuals, one of whom is Geoff Squire of Oracle fame.
- Kognitio has $16 million in revenue. Almost $10 million of that is in the WX-2 business.
- Kognitio names 14 customers for WX-2, all of whom are references.
- Long little more a UK national champion, Kognitio now has four WX-2 customers in the US market.
- Current business activity is about 50-50 license and SaaS (which Kognitio calls DaaS for Data As A Service).
- Installed WX-2 customers top out around 10 terabytes of user data. But a 50 terabyte deal has been sold, and a 100 terabyte deal looks really good in the pipeline.
- On the strength of two academic customers, genetic research is a bit of a focus vertical for Kognitio right now. Or, since Kognitio also has a university astronomy deal, we could say science is a focus market overall. (Their slides also mention oil & gas.)
- Most other WX-2 users fall into the usual verticals – telecom, analytic outsourcing, media/advertising analysis, retailing, etc.