Former SAP CEO Hasso Plattner has written a paper called A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database, in association with a SIGMOD keynote address.* The approach Plattner advocates is an MPP in-memory column store, presumably somewhat akin to SAP’s frequently renamed Business Warehouse Accelerator/Business Intelligence Accelerator/BWA/BIA/Son-of-TREX technology. There also are strong similarities to the MPP in-memory row store project H-Store/VoltDB, although I don’t know whether Plattner would go so far as to adopt the H-Store view that all transactions should run in stored procedures. Unsurprisingly, SAP applications are used as the OLTP paradigm throughout.
*Thanks to Dave Kellogg for tipping me off to Plattner’s paper. I only went to two SIGMOD sessions, neither of which was Plattner’s. Nobody actually mentioned Plattner’s talk to me when I was down at SIGMOD.
Perhaps the most interesting part is Plattner’s claim that what’s demanding about OLTP isn’t database updating per se, but rather maintaining aggregates for quick-response analytics. In his main example of that point, Plattner proposes a real-life “more than 18″ table schema, of which 2 are base tables, and (most of?) the rest are materialized views that his proposed database architecture dispenses with (because analytic performance is sufficiently good without them). Thus, Plattner’s core columnar argument seemingly is
columnar –> natively fast analytics –> no need to maintain aggregates –> much lower update burden.
That said — if Plattner’s paper contained a clear statement of how much more expensive it is to insert or update a single row in a columnar vs. row-based system, I overlooked it. Instead, Plattner seems to be arguing that the volume of base-table updates is low enough that — whatever it may be — column-store update overhead is an acceptable price to pay. (At one point he claims that only 5% of the data inserted in a financial application ever gets changed.) That may actually be true in a financial accounting system, but seems more questionable in a sufficiently large application that gets its updates from automatic devices, or from the consumer web.
Other highlights include:
- Like most modern observers, Plattner believes Postgres-style timestamping beats update-in-place.
- Plattner also offers a less common reason for liking timestamped inserts over updates-in-place — he thinks timestamps are helpful in planning-oriented applications. In particular, he wants timestamp-aware SQL extensions.
- Plattner claims columnar designs have a 10:1 compression advantage over row stores — specifically 20X vs. 2X — at least using compression schemes that allow for updating at reasonable speed. That seems exaggerated.
- Plattner seemed to drop various references to memory-centric structures SAP already uses. (SAP has long done a lot in-memory, in both the OLTP and planning areas. Years ago SAP told me of a customer that was buying >1 TB of RAM just to run SAP’s planning software. SAP also bragged that >99% of transactions never hit disk, in some sense of “transaction”. )
- There are lots of references to “tenants”, SaaS, and/or SAP’s SaaS product line. So SaaS is evidently a design point. That makes sense. First, SaaS is one of SAP’s biggest vulnerabilities. Second, the toughest customization a SaaS customer might want is to add a few columns to standard tables, which might be easier to accomodate with a columnar approach.