SAP AG
Analysis of SAP AG, and most especially its memory-centric BI Accelerator technology. Also covered are SAP’s overall database, connectivity, and analytics strategies. Related subjects include:
- SAP’s Business Objects business intelligence subsidiary
- Memory-centric data management
- Columnar database management
- (in Text Technologies) SAP’s TREX search engine and Inxight text analytics technology
- (in The Monash Report) Strategic issues for SAP
- (in Software Memories) Historical notes on SAP
Will database compression change the hardware game?
I’ve recently made a lot of posts about database compression. 3X or more compression is rapidly becoming standard; 5X+ is coming soon as processor power increases; 10X or more is not unrealistic. True, this applies mainly to data warehouses, but that’s where the big database growth is happening. And new kinds of data — geospatial, telemetry, document, video, whatever — are highly compressible as well.
This trend suggests a few interesting possibilities for hardware, semiconductors, and storage.
- The growth in demand for storage might actually slow. That said, I frankly think it’s more likely that Parkinson’s Law of Data will continue to hold: Data expands to fill the space available. E.g., video and other media have near-infinite potential to consume storage; it’s just a question of resolution and fidelity.
- Solid-state (aka semiconductor or flash) persistent storage might become practical sooner than we think. If you really can fit a terabyte of data onto 100 gigs of flash, that’s a pretty affordable alternative. And by the way — if that happens, a lot of what I’ve been saying about random vs. sequential reads might be irrelevant.
- Similarly, memory-centric data management is more affordable when compression is aggressive. That’s a key point of schemes such as SAP’s or QlikTech’s. Who needs flash? Just put it in RAM, persisting it to disk just for backup.
- There’s a use for faster processors. Compression isn’t free. What you save on disk space and I/O you pay for at the CPU level. Those 5X+ compression levels do depend on faster processors, at least for the row store vendors.
| Categories: Data warehousing, Database compression, Memory-centric data management, QlikTech and QlikView, SAP AG | 6 Comments | 
Word of the day: “Compression”
IBM sent over a bunch of success stories recently, with DB2’s new aggressive compression prominently mentioned.  Mike Stonebraker made a big point of Vertica’s compression when last we talked; other column-oriented data warehouse/mart software vendors (e.g. Kognitio, SAP, Sybase) get strong compression benefits as well. Other data warehouse/mart specialists are doing a lot with compression too, although some of that is governed by please-don’t-say-anything-good-about-us NDA agreements.
Compression is important for at least three reasons:
- It saves disk space, which is a major cost issue in data warehousing.
- It saves I/O, which is the major performance issue in data warehousing.
- In well-designed systems, it can actually make on-chip execution faster, because the gains in memory speed and movement can exceed the cost of actually packing/unpacking the data. (Or so I’m told; I haven’t aggressively investigated that claim.)
When evaluating data warehouse/mart software, take a look at the vendor’s compression story. It’s important stuff.
EDIT: DATAllegro claims in a note to me that they get 3-4x storage savings via compression. They also make the observation that fewer disks ==> fewer disk failures, and spin that — as it were 🙂 — into a claim of greater reliability.
| Categories: Data warehouse appliances, Data warehousing, Database compression, DATAllegro, IBM and DB2, SAP AG, Vertica Systems | 3 Comments | 
QlikTech – flexible, memory-centric, columnar BI
QlikTech has a pretty interesting story, and a number of customers seem to agree. Their flagship product QlikView is a BI suite that runs off an in-memory copy of the data. Specifically, that copy is logically relational and physically columnar. In an important feature, QlikView is happy to import data from multiple sources at once, such as a warehouse plus an operational data store.
So the QlikTech pitch is essentially “Buy our stuff, and you can start doing BI immediately, running any queries and reports you want to. No reason to limit your queries to any kind of dimensional model. No need to prepare the data.” More precisely, QlikTech claims to do away with some kinds of data preparation; obviously, cleaning and so on might still be necessary. Indeed, they describe their classic use case as being the combination of data partly from an operational store and partly from a pre-existing warehouse. Read more
| Categories: Business intelligence, Memory-centric data management, QlikTech and QlikView, SAP AG | 1 Comment | 
Who’s who in columnar relational database management systems
The best known columnar RDBMS is surely Sybase’s IQ Accelerator, evolved from a product acquired in the mid-1990s. Problem – it doesn’t have a shared-nothing architecture of the sort needed to exploit grid/blade technology. Whoops. The other recognized player is SAND, but I don’t know a lot about them. Based on their website, it would seem that grids and compression play a big part in their story. Less established but pretty interesting is Kognitio, who are just beginning to make marketing noise outside the UK. SAP’s BI Accelerator is also a compressed columnar system, but operates entirely in-memory and hence is limited in possible database size. Mike Stonebraker’s startup Vertica is of course the new kid on the block, and there are other columnar startups as well whose names currently escape me.
| Categories: Data warehousing, Investment research and trading, Kognitio, SAP AG, TransRelational | 3 Comments | 
Competitive issues in data warehouse ease of administration
The last person I spoke with at the Netezza conference on Tuesday was a customer/presenter that the company had picked out for me. One thing he said baffled me — he claimed that Netezza was a real appliance vendor, but DATallegro wasn’t, presumably due to administrability issues. Now, it wasn’t clear to me that he’d ever evaluated DATallegro, so I didn’t take this too seriously, but still the exchange brought into focus the great differences between data warehouse products in the area of administration. For example:
- Netezza has no indices at all. And no caches. And the hardware is preconfigured. This all makes administration pretty simple.
- DATallegro has almost no indices, and also has preconfigured hardware. But it has some partitioning, optionally.
- Teradata also has preconfigured hardware. It does have indices, but rather simple ones. Plus it has join indices. And it has a few more configuration options in other areas (e.g., block size) than the other appliance vendors. (Yes, I count Teradata among the appliances.)
- If you go through all the fuss of installing SAP’s applications and BI technology anyway, the incremental administration of just SAP BI Accelerator is pretty light.
- Oracle and IBM have mammothly complex indexing options, but have put large amounts of work into tools to lessen the resulting administrative burden.
- IBM offers preconfigured hardware units to simplify some installation issues.
- Come to think of it, I don’t really know how hard it is to administer columnar systems (e.g., Sybase IQ).
| Categories: Data warehouse appliances, Data warehousing, DATAllegro, Greenplum, IBM and DB2, Netezza, Oracle, SAP AG, Teradata | 3 Comments | 
SAP’s BI Accelerator
I wrote about SAP’s BI Accelerator quite a bit in my white paper on memory-centric data management, but otherwise I seem not to have posted much about it here. In essence, it’s a product that’s all RAM-based, and generally geared for multi-hundred-gigabyte data marts. The basic design is a compression-heavy column-based architecture, evolved from SAP’s text-indexing technology TREX. Like data warehouse appliances, it eschews indexing, relying instead on blazingly fast table scans.
I asked Lothar Schubert of SAP how BIA was doing in the market in its early going. This was his response:
| Categories: Analytic technologies, Business intelligence, Data warehouse appliances, Data warehousing, Database compression, Memory-centric data management, SAP AG | 8 Comments | 
Is data warehousing now all about sequential access?
A lot of evidence is pointing to a major paradigm shift in data warehouse RDBMS, along the lines of:
Old way: Assume I/O is random; lower total execution time by improving selectivity and thus lowering the amount of I/O.
New way: Drive the amount of random I/O to near zero, and do as much sequential I/O as necessary to achieve this goal.
Examples include:
- Data warehouse appliances (see especially this discussion of DATallegro’s architecture)
- Columnar systems (see Nathan Myer’s first comment in this discussion of the much-hyped Required Technologies prototype)
- Memory-centric systems, notably SAP’s BI Accelerator
| Categories: Data warehouse appliances, DATAllegro, Memory-centric data management, SAP AG, Theory and architecture, TransRelational | 4 Comments | 
QlikView – a leader in memory-centric BI
QlikTech — the vendor of QlikView — contacted me to tell their memory-centric BI story. A Swedish company with >$23 million in estimated license revenue last year and a 100%ish growth rate, they claim to be the leader in that space, pulling ahead of Applix. But for now, I’ll call them “a” leader, and say that their story sounds like a hybrid between those of Applix (TM1 product) and SAP (BI Accelerator).
| Categories: Business intelligence, Cognos, Memory-centric data management, QlikTech and QlikView, SAP AG | 1 Comment | 
ANTs’ memory-centric characteristics to the fore?
An eWeek article suggests that ANTs is repositioning with a strong emphasis on memory-centricity. ANTs’ website, frankly, doesn’t support this theory, giving a more balanced tech overview in line with how they pitched me in a briefing last November. Still, it’s an interesting possibility to watch.
The main focus of the article actually wasn’t ANTs, but rather SAP’s wildest dreams in expanding the scope of its BI Accelerator technology. But the new-to-me part was the positioning of ANTs.
| Categories: ANTs Software, Memory-centric data management, SAP AG | 2 Comments | 
Data warehouse appliances
If we define a “data warehouse appliance” as “a special-purpose computer system, with appliance administratibility, that manages a data warehouse,” then there are two major contenders: Netezza and DATAllegro, both startups, both with a small number of disclosed customers. Past contenders would include Teradata and White Cross (which seems to have just merged into Kognitio), but neither would admit to being in that market today. (I suspect this is a mistake on Teradata’s part, but so be it.) IBM with DB2 on the z-Series wouldn’t be properly regarded as an appliance player either, although IBM is certainly conscious of appliance competition. And SAP’s BI Accelerator does not persist data at this time.
In principle, the Netezza and DATAllegro stories are similar — take an established open source RDBMS*,  build optimized hardware to run it, and optimize the software configuration as well.  Much of the optimization is focused on getting data on and off disk sequentially, minimizing any random accesses.  This is why I often refer to data warehouse appliances as being the best alternative to memory-centric data management.  Beyond that, the optimizations by the two vendors differ considerably.
*Netezza uses PostgreSQL; DATAllegro uses Ingres. 
Hmm. I don’t feel like writing more on this subject at this very moment, yet I want to post something urgently because there’s an IOU in my Computerworld column today for it. OK. More later.
