Upcoming webinar on investigative analytics
I recently coined the phrase investigative analytics to conflate
- Statistics, data mining, machine learning, and/or predictive analytics.
- The more research-oriented aspects of business intelligence tools:
- Ad-hoc query.
- Drilldown.
- Most things done by BI-using “business analysts”
- Most things within BI called “data exploration.”
- Analogous technologies as applied to non-tabular data types such as text or graph.
This will be be basis for my part of a webcast on March 10 at 11 am Pacific/2 pm Eastern time. The other main part of the webcast will be a demo by the webcast’s joint sponsors Aster Data and Tableau Software.
Some of Aster’s verbiage in describing and titling the webinar is so hyperbolic that I do not want to give the impression of endorsing it. But I am very hopeful that the webinar itself will be interesting and informative, and will point people at least somewhat in the direction of the benefits Aster is claiming.
Categories: Analytic technologies, Aster Data, Business intelligence, Data warehousing, Presentations, Tableau Software | 3 Comments |
Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms
The Forrester Wave: Enterprise Data Warehouse Platforms, Q1 2011 is now out,* hot on the heels of the Gartner Magic Quadrant. Unfortunately, this particular Forrester Wave is riddled with inaccuracy. Read more
Categories: Analytic technologies, Columnar database management, Data warehousing, EMC, Exadata, Greenplum, Netezza, Oracle, Pricing, SAP AG, Sybase, Teradata, Vertica Systems | 8 Comments |
Clarification on dbShards’ shard replication
After I posted recently about dbShards, a Very Smart Commenter emailed me with the challenge “but each individual shard is still replicated via two-phase commit, and everybody knows two-phase commit is fundamentally slow.” I replied that no, it wasn’t exactly two-phase commit, but fumbled the explanation of why — so I decided to escalate straight to dbShards honcho Cory Isaacson. Read more
Membase and CouchOne merged to form Couchbase
Membase, the company whose product is Membase and whose former company name is Northscale, has merged with CouchOne, the company whose product is CouchDB and whose former name is Couch.io. The result (product and company) will be called Couchbase. CouchDB inventor Damien Katz will join the Membase (now Couchbase) management team as CTO. Couchbase can reasonably be regarded as a document-oriented NoSQL DBMS, a product category I not coincidentally posted about yesterday.
In essence, Couchbase will be CouchDB with scale-out. Alternatively, Couchbase will be Membase with a richer programming interface. The Couchbase sweet spot is likely to be: Read more
Categories: Application areas, Cache, Couchbase, CouchDB, Market share and customer counts, memcached, NoSQL, Open source, Parallelization, Solid-state memory | 2 Comments |
Notes on document-oriented NoSQL
When people talk about document-oriented NoSQL or some similar term, they usually mean something like:
Database management that uses a JSON model and gives you reasonably robust access to individual field values inside a JSON (JavaScript Object Notation) object.
Or, if they really mean,
The essence of whatever it is that CouchDB and MongoDB have in common.
well, that’s pretty much the same thing as what I said in the first place. 🙂
Of the various questions that might arise, three of the more definitional ones are:
- Why JSON rather than XML?
- What’s with this fluidity between the terms “document” and “object”?
- Are you serious about the lack of joins?
Let me take a crack at each. Read more
Categories: CouchDB, MapReduce, MarkLogic, MongoDB, NoSQL, Object, Structured documents | 16 Comments |
Columnar compression vs. column storage
I’m getting the increasing impression that certain industry observers, such as Gartner, are really confused about columnar technology. (I further suspect that certain vendors are encouraging this confusion, as vendors commonly do.) So here are some basic points.
A simple way to think about the difference between columnar storage and columnar (or any other kind of) compression is this:
- Columnar storage is a reference to how data is grouped together on disk (or in solid-state memory).
- (Columnar) compression is a reference to whether the actual data is on disk, or whether you save space by storing some smaller substitute for the actual data.
Specifically, if data in a relational table is grouped together according to what row it’s in, then the database manager is called “row-based” or a “row store.” If it’s grouped together according to what column it’s in, then the database management system is called “columnar” or a “column store.” Increasingly, row-based and columnar storage are being hybridized.
There are two main kinds of compression — compression of bit strings and more intelligent compression of actual data values. Compression of actual data values can reasonably be called “columnar,” in that different columns of data can be compressed in different ways, often depending only on the data in that column.* Read more
Categories: Columnar database management, Data warehousing, Database compression, Exadata, Vertica Systems | 21 Comments |
The Continuent Tungsten MySQL replication story
To the consternation of its then-CEO, I wrote very little about my then-client Continuent. However, when I knew Schooner’s recent announcement was coming, I reached out to other MySQL scale-out vendors too. I’ve already posted accordingly about CodeFutures (the dbShards guys) and ScaleBase. Now it’s late-responding Continuent’s turn.
Actually, what I’m mainly going to do is quote a very long email that Continuent’s current CEO/former CTO Robert Hodges sent me, and which I lightly edited. Read more
Categories: Continuent, Data integration and middleware, dbShards and CodeFutures, MySQL, Open source, Parallelization, Schooner Information Technology | 9 Comments |
Comments on the Gartner 2010/2011 Data Warehouse Database Management Systems Magic Quadrant
Edit: Comments on the February, 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems — and on the companies reviewed in it — are now up.
The Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant is out. I shall now comment, just as I did to varying degrees on the 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants.
Note: Links to Gartner Magic Quadrants tend to be unstable. Please alert me if any problems arise; I’ll edit accordingly.
In my comments on the 2008 Gartner Data Warehouse Database Management Systems Magic Quadrant, I observed that Gartner’s “completeness of vision” scores were generally pretty reasonable, but their “ability to execute” rankings were somewhat bizarre; the same remains true this year. For example, Gartner ranks Ingres higher by that metric than Vertica, Aster Data, ParAccel, or Infobright. Yet each of those companies is growing nicely and delivering products that meet serious cutting-edge analytic DBMS needs, neither of which has been true of Ingres since about 1987. Read more
ParAccel PADB technical notes
I posted last October about PADB (ParAccel Analytic DataBase), but held back on various topics since PADB 3.0 was still under NDA. By the time PADB 3.0 was released, I was on blogging hiatus. Let’s do a bit of ParAccel catch-up now.
One big part of PADB 3.0 was an analytics extensibility framework. If we match PADB against my recent analytic computing system checklist, Read more
Categories: Analytic technologies, Data warehousing, EMC, MapReduce, ParAccel, Parallelization, Storage | 2 Comments |
Exadata notes
It’s been a while since I penetrated Oracle’s tight message control and actually talked with them about Exadata. But Doug Henschen wrote a good article about Exadata based on an Andy Mendelsohn webcast. I agree with almost all of it. At first I was a little surprised that Exadata’s emphasis shift from data warehousing to OLTP/generic consolidation hasn’t gone more quickly, but on the other hand:
- On the data warehouse side Exadata can alleviate screaming pain points.
- In OLTP consolidation, Exadata mainly can save money. (Yes, I just said a product from Oracle can save customers money, and I meant it. You may stop laughing at any time.)
Doug did overstate when he said that columnar architectures give 100X or more compression. That doesn’t happen. Yes, columnar compression can be >10X in a variety of use cases, while pre-Exadata Oracle index bloat can approach 10X at times; but even if you’re counting that way I doubt there are many instances in which it actually multiplies out to >100.
In other Exadata news, the long-standing observation that Oracle doesn’t like to do on-site Exadata POCs still holds true. A couple of existing Oracle users — one rather well-known — recently told me that Oracle won’t let them text Exadata except on Oracle premises. In one case, this is a deal-breaker keeping Exadata from being considered for a purchase, and Oracle still won’t budge.
Finally, I’m pretty sure that this “new” Softbank Teradata replacement Oracle has been touting since September as competitive evidence — which Doug’s article also references — isn’t quite what it sounds like. I believe Teradata’s version of the story, which somewhat edited goes like this: Read more