1. It boggles my mind that some database technology companies still don’t view compression as a major issue. Compression directly affects storage and bandwidth usage alike — for all kinds of storage (potentially including RAM) and for all kinds of bandwidth (network, I/O, and potentially on-server).
Trading off less-than-maximal compression so as to minimize CPU impact can make sense. Having no compression at all, however, is an admission of defeat.
2. People tend to misjudge Hadoop’s development pace in either of two directions. An overly expansive view is to note that some people working on Hadoop are trying to make it be all things for all people, and to somehow imagine those goals will soon be achieved. An overly narrow view is to note an important missing feature in Hadoop, and think there’s a big business to be made out of offering it alone.
At this point, I’d guess that Cloudera and Hortonworks have 500ish employees combined, many of whom are engineers. That allows for a low double-digit number of 5+ person engineering teams, along with a number of smaller projects. The most urgently needed features are indeed being built. On the other hand, a complete monument to computing will not soon emerge.
3. Schooner’s acquisition by SanDisk has led to the discontinuation of Schooner’s SQL DBMS SchoonerSQL. Schooner’s flash-optimized key-value store Membrain continues. I don’t have details, but the Membrain web page suggests both data store and cache use cases.
4. There’s considerable personnel movement at Boston-area database technology companies right now. Please ping me directly if you care.
Comments on Gartner’s 2012 Magic Quadrant for Data Warehouse Database Management Systems — evaluations
To my taste, the most glaring mis-rankings in the 2012/2013 Gartner Magic Quadrant for Data Warehouse Database Management are that it is too positive on Kognitio and too negative on Infobright. Secondarily, it is too negative on HP Vertica, and too positive on ParAccel and Actian/VectorWise. So let’s consider those vendors first.
Gartner seems confused about Kognitio’s products and history alike.
- Gartner calls Kognitio an “in-memory” DBMS, which is not accurate.
- Gartner doesn’t remark on Kognitio’s worst-in-class* compression.
- Gartner gives Kognitio oddly high marks for a late, me-too Hadoop integration strategy.
- Gartner writes as if Kognitio’s next attempt at the US market will be the first one, which is not the case.
- Gartner says that Kognitio pioneered data warehouse SaaS (Software as a Service), which actually has existed since the pre-relational 1970s.
Gartner is correct, however, to note that Kognitio doesn’t sell much stuff overall.
In the cases of HP Vertica, Infobright, ParAccel, and Actian/VectorWise, the 2012 Gartner Magic Quadrant for Data Warehouse Database Management’s facts are fairly accurate, but I dispute Gartner’s evaluation. When it comes to Vertica: Read more
The 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems is out. I’ll split my comments into two posts — this one on concepts, and a companion on specific vendor evaluations.
- Maintaining working links to Gartner Magic Quadrants is an adventure. But as of early February, 2013, this link seems live.
- I also commented on the 2011, 2010, 2009, 2008, 2007, and 2006 Gartner Magic Quadrants for Data Warehouse DBMS.
Let’s start by again noting that I regard Gartner Magic Quadrants as a bad use of good research. On the facts:
- Gartner collects a lot of input from traditional enterprises. I envy that resource.
- Gartner also does a good job of rounding up vendor claims about user base sizes and the like. If nothing else, you should skim the MQ report for that reason.
- Gartner observations about product feature sets are usually correct, although not so consistently that they should be relied on.
When it comes to evaluations, however, the Gartner Data Warehouse DBMS Magic Quadrant doesn’t do as well. My concerns (which overlap) start:
- The Gartner MQ conflates many different use cases into one ranking (inevitable in this kind of work, but still regrettable).
- A number of the MQ vendor evaluations seem hard to defend. So do some of Gartner’s specific comments.
- Some of Gartner’s criteria seemingly amount to “parrots back our opinions to us”.
- As do I, Gartner thinks a vendor’s business and financial strength are important. But Gartner overdoes the matter, drilling down into picky issues it can’t hope to judge, such as assessing a vendor’s “ability to generate and develop leads.” *
- The 2012 Gartner Data Warehouse DBMS Magic Quadrant is closer to being a 1-dimensional ranking than 2-dimensional, in that entries are clustered along the line x=y. This suggests strong correlation among the results on various specific evaluation criteria.
|Categories: Data integration and middleware, Data warehousing, Database compression, Emulation, transparency, portability, Hadoop, Market share and customer counts, Oracle, Text||5 Comments|
Alternate title: TokuDB updates
Tokutek turns a performance argument into a functionality one. In particular, Tokutek claims that TokuDB does a much better job than alternatives of making it practical for you to update indexes at OLTP speeds. Hence, it claims to do a much better job than alternatives of making it practical for you to write and execute queries that only make sense when indexes (or other analytic performance boosts) are in place.
That’s all been true since I first wrote about Tokutek and TokuDB in 2009. However, TokuDB’s technical details have changed. In particular, Tokutek has deemphasized the ideas that:
- Vaguely justified the “fractal” metaphor, namely …
- … the stuff in that post about having one block each sized for each power of 2, …
- … which seem to be a form of what is more ordinarily called “cache-oblivious” technology.
Rather, Tokutek’s new focus for getting the same benefits is to provide a separate buffer for each node of a b-tree. In essence, Tokutek is taking the usual “big blocks are better” story and extending it to indexes. TokuDB also uses block-level compression. Notes on that include: Read more
|Categories: Akiban, Database compression, Market share and customer counts, NewSQL, Tokutek and TokuDB||7 Comments|
GenieDB is one of the newer and smaller NewSQL companies. GenieDB’s story is focused on wide-area replication and uptime, coupled to claims about ease and the associated low TCO (Total Cost of Ownership).
GenieDB is in my same family of clients as Cirro.
The GenieDB product is more interesting if we conflate the existing GenieDB Version 1 and a soon-forthcoming (mid-year or so) Version 2. On that basis:
- GenieDB has three tiers.
- GenieDB’s top tier is the usual MySQL front-end.
- GenieDB’s bottom tier is either Berkeley DB or a conventional MySQL storage engine.
- GenieDB’s bottom tier stores your entire database at every node.
- If you replicate locally, GenieDB’s middle tier operates a distributed cache.
- If you replicate wide-area, GenieDB’s middle tier allows active-active/multi-master replication.
The heart of the GenieDB story is probably wide-area replication. Specifics there include: Read more
|Categories: Cache, Cloud computing, Clustering, GenieDB, Market share and customer counts, MySQL, NewSQL||4 Comments|
I plan to write about several NewSQL vendors soon, but first here’s an overview post. Like “NoSQL”, the term “NewSQL” has an identifiable, recent coiner — Matt Aslett in 2011 — yet a somewhat fluid meaning. Wikipedia suggests that NewSQL comprises three things:
- OLTP- (OnLine Transaction Processing)/short-request-oriented SQL DBMS that are newer than MySQL.
- Innovative MySQL engines.
- Transparent sharding systems that can be used with, for example, MySQL.
I think that’s a pretty good working definition, and will likely remain one unless or until:
- SQL-oriented and NoSQL-oriented systems blur indistinguishably.
- MySQL (or PostgreSQL) laps the field with innovative features.
To date, NewSQL adoption has been limited.
- NewSQL vendors I’ve written about in the past include Akiban, Tokutek, CodeFutures (dbShards), Clustrix, Schooner (Membrain), VoltDB, ScaleBase, and ScaleDB, with GenieDB and NuoDB coming soon.
- But I’m dubious whether, even taken together, all those vendors have as many customers or production references as any of 10gen, Couchbase, DataStax, or Cloudant.*
That said, the problem may lie more on the supply side than in demand. Developing a competitive SQL DBMS turns out to be harder than developing something in the NoSQL state of the art.
In connection with Amazon’s Redshift announcement, ParAccel reached out, and so I talked with them for the first time in a long while. At the highest level:
- ParAccel now has 60+ customers, up from 30+ two years ago and 40ish soon thereafter.
- ParAccel is now focusing its development and marketing on analytic platform capabilities more than raw database performance.
- ParAccel is focusing on working alongside other analytic data stores — relational or Hadoop — rather than supplanting them.
There wasn’t time for a lot of technical detail, but I gather that the bit about working alongside other data stores:
- Is relatively new.
- Works via SELECT statements that reach out to the other data stores.
- Is called “on-demand integration”.
- Is built in ParAccel’s extensibility/analytic platform framework.
- Uses HCatalog when reaching into Hadoop.
Also, it seems that ParAccel:
- Is in the early stages of writing its own analytic functions.
- Bundles Fuzzy Logix and actually has some users for that.
|Categories: Amazon and its cloud, Cloud computing, Data warehousing, Hadoop, Market share and customer counts, ParAccel, Predictive modeling and advanced analytics, Specific users||5 Comments|
I’ve been known to gripe that covering big companies such as Microsoft is hard. Still, Doug Leland of Microsoft’s SQL Server team checked in for phone calls in August and again today, and I think I got enough to be worth writing about, albeit at a survey level only,
Subjects I’ll mention include:
- Parallel Data Warehouse
- Columnar data management
- In-memory data management (Hekaton)
One topic I can’t yet comment about is MOLAP/ROLAP, which is a pity; if anybody can refute my claim that ROLAP trumps MOLAP, it’s either Microsoft or Oracle.
Microsoft’s slides mentioned Yahoo refining a 6 petabyte Hadoop cluster into a 24 terabyte SQL Server “cube”, which was surprising in light of Yahoo’s history as an Oracle reference.
|Categories: Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Hadoop, Hortonworks, In-memory DBMS, MapReduce, Market share and customer counts, Microsoft and SQL*Server, Oracle, Yahoo||9 Comments|
My clients at Couchbase checked in.
- After multiple delays, Couchbase 2.0 is well into beta, with general availability being delayed by the holiday season as much as anything else.
- Couchbase (the company) now has >350 subscription customers, almost all for Couchbase (the product) — which is to say for what was known as Membase, which is basically a persistent version of Memcached.
- There also are many users of open source Couchbase, most famously LinkedIn.
- Orbitz is a much-mentioned flagship paying Couchbase customer.
- Couchbase customers mainly seem to be replacing a caching layer, Memcached or otherwise.
- Couchbase headcount is just under 100.
The big changes in Couchbase 2.0 versus the previous (1.8.x) version are:
- JSON storage, including secondary indexes.
- Multi-data-center replication.
- A back-end change from SQLite to a heavily forked version of CouchDB, called Couchstore.
Couchbase 2.0 is upwards-compatible with prior versions of Couchbase (and hence with Memcached), but not with CouchDB.
Technology notes on Couchbase 2.0 include: Read more
|Categories: Basho and Riak, Cache, Cassandra, Clustering, Couchbase, MapReduce, Market share and customer counts, MongoDB and 10gen, NoSQL, Open source, Structured documents||4 Comments|
With Strata/Hadoop World being next week, there is much Hadoop discussion. One theme of the season is BI over Hadoop. I have at least 5 clients claiming they’re uniquely positioned to support that (most of whom partner with a 6th client, Tableau); the first 2 whose offerings I’ve actually written about are Teradata Aster and Hadapt. More generally, I’m hearing “Using Hadoop is hard; we’re here to make it easier for you.”
If enterprises aren’t yet happily running business intelligence against Hadoop, what are they doing with it instead? I took the opportunity to ask Cloudera, whose answers didn’t contradict anything I’m hearing elsewhere. As Cloudera tells it (approximately — this part of the conversation* was rushed): Read more
|Categories: Business intelligence, Cloudera, EAI, EII, ETL, ELT, ETLT, Hadoop, HBase, Health care, Investment research and trading, MapR, Market share and customer counts, Telecommunications, Web analytics||4 Comments|