Until recently, I was extremely critical of ParAccel’s marketing. But there was an almost-clean sweep of the relevant ParAccel executives, and the specific worst practices I was calling out have for the most part been eliminated. So I was open to talking and working with ParAccel again, and that’s now happening. On my recent California trip, I chatted with three ParAccel folks for a few hours. Based on that and other conversation, here’s the current ParAccel story as I understand it.
I’ve already noted that PADB 3.0 is coming soon (ParAccel Analytic DataBase), but pending its arrival, ParAccel’s technical story is primarily about query performance. More specifically:
- ParAccel asserts that PADB is much faster than other analytic DBMS — even close competitors such as Vertica — on especially complex queries. “60-way joins” were mentioned. So was the flattening of correlated subqueries.
- ParAccel also claims industry-leading performance on simpler queries, but not by the same (or perhaps even particular large) margins.
- Mercifully, ParAccel no longer claims to have never, ever lost on performance in a customer evaluation. But it still says that is very close to being true.
- Major reasons ParAccel gives for PADB’s high performance include:
- Like Vertica, Sybase IQ, and others, PADB uses a columnar architecture.
- ParAccel thinks PADB’s newest query optimizer — fondly named Omne — is outstanding.
- ParAccel’s PADB compiles its queries.
- In general, ParAccel is just performance-obsessed.
- One could also mention:
- ParAccel’s PADB runs smoothly in-memory, if that’s what you want.
- ParAccel also offers a Flash option for PADB.
- Like many other analytic DBMS vendors, ParAccel has created a custom networking protocol. (ParAccel has talked about that altogether too much in the past.)
- Like Vertica, ParAccel’s PADB generally decompresses data as late as the particular compression scheme used allows. (Well, actually, that’s not one ParAccel mentions unless asked.)
- ParAccel has long encouraged one to put part of one’s database on direct-attached storage as a kind of persistent cache, plus all of it on a storage-area network, because PADB can optimize its scans to go against both physical stores.
- ParAccel’s PADB does encryption a block at a time, rather than a row at a time, so there’s very little overhead to using the encryption feature.
- ParAccel says that PADB has no indexes, materialized views, etc., notwithstanding that I heard something different from Barry Zane a few years ago. This is the basis for ParAccel’s claim that no tuning (or at least very little) is required, or indeed even possible …
- … and similarly, it is the reason ParAccel encourages prospects to do ad-hoc queries in their POCs (Proofs Of Concept), at least when Vertica is the competitor.
- However, ParAccel’s PADB has rather complex initial set-up. This has been the basis for widespread skepticism about ParAccel’s “no tuning” claim. ParAccel is working to automate that away, but admits to being only part-way through the process.
- Highlights of ParAccel’s data writing strategy include:
- PADB sends data transactionally to disk.
- PADB usually sends data to disk a block at a time, because it is coming in fast enough for that to work out (either due to bulk load or streaming).
- PADB is append-only …
- … so PADB has a garbage-collection mechanism called Vacuum. Right now Vacuum has to be started manually, but doesn’t block reads and writes; full background garbage collection is of course a roadmap feature.
- As is natural for append-only systems, ParAccel’s PADB has MVCC (MultiVersion Concurrency Control) and snapshot isolation.
- Name a compression method, and PADB probably has it — 13 in all by ParAccel’s count, including dictionary/token, run-length encoding, Delta, LZ, and so on.
Tracking ParAccel’s customer success has long been difficult. The 2009 Gartner Magic Quadrant claim of ~20 ParAccel customers seems odd to everybody, including ParAccel. ParAccel’s own reporting of customer wins around then was quite confusing. And ParAccel’s customer count a year before that was extremely low. But ParAccel’s Michael Weir just rounded up some figures for me, namely:
- ParAccel has 30+ revenue-recognized customers, not counting OEMs, OEMs’ customers, or paid POCs.
- 2 ParAccel customers have > 100 TB of user data.
- 7 ParAccel customers have > 10 TB of user data.
- The largest ParAccel cluster is 28 nodes and growing.
Naturally, Michael went on to note that even relatively small databases can have high value.
One last note: ParAccel has approximately 78 employees.