XtremeData is announcing its DBx data warehouse appliance today. Highlights include:
- XtremeData is announcing a single pricing metric — $20,000 per terabyte of user data.
- DBx currently has no compression – so when XtremeData adds compression to DBx, price/TB will naturally go down further.
- XtremeData’s DBx node hardware is based on a board that combines an Intel-compatible CPU with some FPGAs, called the XtremeData In-Socket Accelerator (ISA). In addition there is a head node and a data loading node. Failover/high availability for the head node seem to mainly be futures.
- DBx software is based on PostgreSQL. XtremeData says it kept PostgreSQL’s front end and replaced the execution engine. I haven’t checked exactly which PostgreSQL features are in or out of DBx.
- (This subject edited, after Dave DeWitt pointed out how unclear the first version was.) XtremeData’s DBx of course does complete parallel redistribution of data after every intermediate result set. The basic idea of DBx’s data redistribution is that after each intermediate result set, DBx recalculates histograms and redistributes data — and hence work — approximately evenly accordingly. This recalculation is done in the FPGAs. XtremeData claims that this constant re-setting of the execution plan is more extensive, and results in more even data distribution, than rival vendors’ strategies. (Based outside Chicago, XtremeData was founded by high-performance computing (HPC) guys, and its design priorities reflect that. XtremeData’s database management guys, by the way, are in Bangalore.)
- XtremeData’s DBx uses Infiniband to support the resulting large amount of data movement.
- At least in theory, XtremeData’s DBx scales up to 1024 nodes, the limit at which its Infiniband switches can support full bandwidth.
- XtremeData’s smallest DBx product is a half-rack system with 8 nodes, rated at 30 TB of user data.
- Because of the its data redistribution strategy, XtremeData says DBx doesn’t much care about the physical distribution of data. Hash distribution is the default, but it has less benefit than in other MPP analytic DBMS systems.
- XtremeData also claims that DBx is particularly good at being schema-agnostic, in that competing MPP analytic DBMS products shine most for schemas that don’t lead to a lot of data redistribution (e.g., one big fact table, N small dimension tables that can be replicated at each node). However, I’m skeptical about that point. E.g., Oracle Exadata also uses Infiniband, and features no more data redistribution per query than DBx does, so where exactly is the bottleneck Exadata faces but DBx doesn’t?
- XtremeData says it has tested DBx up to 10-15 concurrent queries. There seem to be no workload management features in the first DBx release, but naturally there’s a technical roadmap in that direction.
- XtremeData claims a variety of DBx beta tests, successful proofs-of-concept (POCs), and even customers intending to buy, but no actual sales to date.
XtremeData has kindly permitted me to post its DBx launch slide deck. Three specific POC/prospect price/performance comments may be found on Slide 9.
XtremeData says that the clock speed on the FPGAs it uses is 200 megahertz, clearly much less than an Intel-compatible CPU’s. However, XtremeData also says 100s or 1000s of steps can be done at once on an FPGA. The reason for this seems to be “pipelining” much more than on-chip parallel streams. XtremeData’s explanation seemed to focus on the point that many rows of data could be processed independently of each other, and hence at once. I’m not wholly convinced that this is a standard use of the word “pipelining”. The point may be moot anyway, in that XtremeData’s reported performance advantages are nowhere what one would get by naively assuming DBx can do ~1000 times as many steps per clock cycle at 1/12th – 1/16th a normal clock speed.
XtremeData now has the obvious URL, but at the time of this posting it’s a work in progress.