Updating our vendor client disclosures
From time to time, I disclose our vendor client lists. Another iteration is below. To be clear:
- This is a list of Monash Advantage members.
- All our vendor clients are Monash Advantage members, unless …
- … we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen to be SaaS vendors.)
- We do not usually disclose our user clients.
- We do not usually disclose our venture capital clients, nor those who invest in publicly-traded securities.
- Included in the list below are two expired Monash Advantage members who haven’t said they will renew, as mentioned in my recent post on analyst bias. (You can probably imagine a couple of reasons for that obfuscation.)
With that said, our vendor client disclosures at this time are:
- Aster Data
- Cloudera
- CodeFutures/dbShards
- Couchbase
- EMC/Greenplum
- Endeca
- IBM/Netezza
- Infobright
- Intel
- MarkLogic
- ParAccel
- QlikTech
- salesforce.com/database.com
- SAND Technology
- SAP/Sybase
- Schooner Information Technology
- Skytide
- Splunk
- Teradata
- Vertica
Terminology: Transparent sharding
When databases are too big to manage via a single server, responsibility for them is spread among multiple servers. There are numerous names for this strategy, or versions of it — all of them at least somewhat problematic. The most common terms include:
- (Shared-nothing) MPP (Massively Parallel Processing), often used to describe analytic DBMS. On the whole, these terms have worked pretty well, but they have issues even so. First, “MPP” means different things to different marketers. Second, most ostensibly “shared-nothing” systems aren’t really “shared-nothing.” They generally support at least storage arrays, if not storage-area networks (SANs); indeed, in a couple of cases (most notably EMC Greenplum), SAN support is prominent in their marketing message.
- (Horizontal) partitioning and/or data distribution. These have significant problems. “Partitioning” and “distribution” are easily confused with each other, not least because the term “partitioning” is used in different ways by different DBMS product vendors.
- Sharding, commonly used to describe scaled-out MySQL in Internet Request Processing use cases. This one has the advantage of being concise, but is beginning to mean two different things, in that it is used both when the data is REALLY in separate databases on different machines (i.e., the application has to explicitly reference the shard it wants to talk to) and also when the database is transparently distributed (e.g. via dbShards).
- Coherent caching and/or distributed shared memory, describing cases when data is in RAM. Besides being RAM-specific, these terms can be vague as to whether the same data is recopied onto different systems, or whether they are focused on letting (relatively) large in-memory data stores be spread across a cluster.
I plan to start using the term transparent sharding to denote a data management strategy in which data is assigned to multiple servers (or CPUs, cores, etc.), yet looks to programmers and applications as if it were managed by just one. Thus,
- dbShards and ScaleBase feature transparent sharding (this is the case which inspired me to introduce the term).
- Anything which has ever reasonably been called a “shared-nothing” MPP DBMS features transparent sharding.
- Memcached features transparent sharding. So, I imagine, do other caching systems I am less familiar with.
- Shared-disk DBMS do not feature transparent sharding, even if their query work can be scaled out across multiple servers. (But Oracle Exadata does, because of its server tier.)
| Categories: Parallelization | 19 Comments |
Terminology: Internet Request Processing (IRP)
As I observed previously, we need a term that means “like OLTP but not necessarily transactional”, to help describe a category of use cases that can reasonably be addressed by NoSQL or scale-out SQL systems alike.* So here’s a candidate phrase: Internet Request Processing (IRP). If we use that, I’ll call Schooner, Cassandra, Couchbase , et al. IRP DBMS, while other people will probably call them IRP databases.
*Consider, for example, the overlapping use cases for Schooner, dbShards, ScaleBase, Couchbase, and DataStax/Cassandra.
In my proposed terminology, an internet request processing (IRP) use case is one in which: Read more
| Categories: NoSQL, OLTP | 8 Comments |
Terminology: Analytic platforms
A few weeks ago, I described the elements of an “analytic computing system” or “analytic platform,” while reserving judgment as to which of the two terms would or should win out. I am now capitulating to the term analytic platform, under the influence of, among others, Sharmila Mulligan (and Aster Data in general), Vertica and a variety of fellow analysts (Merv Adrian, Neil Raden, Seth Grimes, Jim Kobielus, and Colin White). While Google evidence would suggest it’s way too early to make this call, I think it’s time to say “analytic platform” will win.
What’s more, I now think the phrase “analytic platform” should win. While I think the term “platform” is overused to the point of silliness, at least the phrase “analytic platform” is short. Thus, it could be modified in various descriptive or not-so-descriptive ways: “Advanced analytic platform,” “graph analytics platform,” “customer analytics platform,” “social media analytics platform,” “CRM analytics platform,” “text analytics platform,” or whatever. By way of contrast, try doing that with “analytic computing system,” and see if you can keep a straight face.
To take this in the direction of an actual definition, I’ll say that the three essential elements of an analytic platform are: Read more
| Categories: Analytic technologies, Data warehousing | 1 Comment |
Some quick notes on HP-Vertica
HP is acquiring Vertica. Read more
| Categories: Complex event processing (CEP), In-memory DBMS, Investment research and trading, Memory-centric data management, StreamBase, VoltDB and H-Store | 12 Comments |
Now we know why Vertica has been so weirdly evasive
Communicating with Vertica has been tricky recently. But HP is now announced to be buying Vertica, which pretty much forces me to comment about Vertica.
So I’ll indulge in a little bit of explanation as to what I know about Vertica, whether for publication or under NDA. My analysis of the HP/Vertica combination, and expectations for same, will go into another post. Read more
| Categories: Analytic technologies, Data warehousing, HP and Neoview, Market share and customer counts, Michael Stonebraker, Vertica Systems | 10 Comments |
Upcoming webinar on investigative analytics
I recently coined the phrase investigative analytics to conflate
- Statistics, data mining, machine learning, and/or predictive analytics.
- The more research-oriented aspects of business intelligence tools:
- Ad-hoc query.
- Drilldown.
- Most things done by BI-using “business analysts”
- Most things within BI called “data exploration.”
- Analogous technologies as applied to non-tabular data types such as text or graph.
This will be be basis for my part of a webcast on March 10 at 11 am Pacific/2 pm Eastern time. The other main part of the webcast will be a demo by the webcast’s joint sponsors Aster Data and Tableau Software.
Some of Aster’s verbiage in describing and titling the webinar is so hyperbolic that I do not want to give the impression of endorsing it. But I am very hopeful that the webinar itself will be interesting and informative, and will point people at least somewhat in the direction of the benefits Aster is claiming.
| Categories: Analytic technologies, Aster Data, Business intelligence, Data warehousing, Presentations, Tableau Software | 3 Comments |
Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms
The Forrester Wave: Enterprise Data Warehouse Platforms, Q1 2011 is now out,* hot on the heels of the Gartner Magic Quadrant. Unfortunately, this particular Forrester Wave is riddled with inaccuracy. Read more
| Categories: Analytic technologies, Columnar database management, Data warehousing, EMC, Exadata, Greenplum, Netezza, Oracle, Pricing, SAP AG, Sybase, Teradata, Vertica Systems | 3 Comments |
Clarification on dbShards’ shard replication
After I posted recently about dbShards, a Very Smart Commenter emailed me with the challenge “but each individual shard is still replicated via two-phase commit, and everybody knows two-phase commit is fundamentally slow.” I replied that no, it wasn’t exactly two-phase commit, but fumbled the explanation of why — so I decided to escalate straight to dbShards honcho Cory Isaacson. Read more
| Categories: Parallelization, dbShards and CodeFutures | 13 Comments |
Membase and CouchOne merged to form Couchbase
Membase, the company whose product is Membase and whose former company name is Northscale, has merged with CouchOne, the company whose product is CouchDB and whose former name is Couch.io. The result (product and company) will be called Couchbase. CouchDB inventor Damien Katz will join the Membase (now Couchbase) management team as CTO. Couchbase can reasonably be regarded as a document-oriented NoSQL DBMS, a product category I not coincidentally posted about yesterday.
In essence, Couchbase will be CouchDB with scale-out. Alternatively, Couchbase will be Membase with a richer programming interface. The Couchbase sweet spot is likely to be: Read more
