February 28, 2011

Updating our vendor client disclosures

Edit: This disclosure has been superseded by a March, 2012 version.

From time to time, I disclose our vendor client lists. Another iteration is below. To be clear:

This is a list of Monash Advantage members.
All our vendor clients are Monash Advantage members, unless …
… we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen to be SaaS vendors.)
We do not usually disclose our user clients.
We do not usually disclose our venture capital clients, nor those who invest in publicly-traded securities.
Included in the list below are two expired Monash Advantage members who haven’t said they will renew, as mentioned in my recent post on analyst bias. (You can probably imagine a couple of reasons for that obfuscation.)

With that said, our vendor client disclosures at this time are:

Aster Data
Cloudera
CodeFutures/dbShards
Couchbase
EMC/Greenplum
Endeca
IBM/Netezza
Infobright
Intel
MarkLogic
ParAccel
QlikTech
salesforce.com/database.com
SAND Technology
SAP/Sybase
Schooner Information Technology
Skytide
Splunk
Teradata
Vertica

Categories: About this blog, Aster Data, Cloudera, Couchbase, dbShards and CodeFutures, EMC, Greenplum, IBM and DB2, Infobright, Intel, MarkLogic, Netezza, ParAccel, QlikTech and QlikView, SAND Technology, SAP AG, Schooner Information Technology, Splunk, Sybase, Tableau Software, Teradata, Vertica Systems

1 Comment

February 24, 2011

Terminology: Transparent sharding

When databases are too big to manage via a single server, responsibility for them is spread among multiple servers. There are numerous names for this strategy, or versions of it — all of them at least somewhat problematic. The most common terms include:

(Shared-nothing) MPP (Massively Parallel Processing), often used to describe analytic DBMS. On the whole, these terms have worked pretty well, but they have issues even so. First, “MPP” means different things to different marketers. Second, most ostensibly “shared-nothing” systems aren’t really “shared-nothing.” They generally support at least storage arrays, if not storage-area networks (SANs); indeed, in a couple of cases (most notably EMC Greenplum), SAN support is prominent in their marketing message.
(Horizontal) partitioning and/or data distribution. These have significant problems. “Partitioning” and “distribution” are easily confused with each other, not least because the term “partitioning” is used in different ways by different DBMS product vendors.
Sharding, commonly used to describe scaled-out MySQL in Internet Request Processing use cases. This one has the advantage of being concise, but is beginning to mean two different things, in that it is used both when the data is REALLY in separate databases on different machines (i.e., the application has to explicitly reference the shard it wants to talk to) and also when the database is transparently distributed (e.g. via dbShards).
Coherent caching and/or distributed shared memory, describing cases when data is in RAM. Besides being RAM-specific, these terms can be vague as to whether the same data is recopied onto different systems, or whether they are focused on letting (relatively) large in-memory data stores be spread across a cluster.

I plan to start using the term transparent sharding to denote a data management strategy in which data is assigned to multiple servers (or CPUs, cores, etc.), yet looks to programmers and applications as if it were managed by just one. Thus,

dbShards and ScaleBase feature transparent sharding (this is the case which inspired me to introduce the term).
Anything which has ever reasonably been called a “shared-nothing” MPP DBMS features transparent sharding.
Memcached features transparent sharding. So, I imagine, do other caching systems I am less familiar with.
Shared-disk DBMS do not feature transparent sharding, even if their query work can be scaled out across multiple servers. (But Oracle Exadata does, because of its server tier.)

Categories: Parallelization, Transparent sharding

28 Comments

February 24, 2011

Terminology: Internet Request Processing (IRP)

As I observed previously, we need a term that means “like OLTP but not necessarily transactional”, to help describe a category of use cases that can reasonably be addressed by NoSQL or scale-out SQL systems alike.* So here’s a candidate phrase: Internet Request Processing (IRP). If we use that, I’ll call Schooner, Cassandra, Couchbase , et al. IRP DBMS, while other people will probably call them IRP databases.

*Consider, for example, the overlapping use cases for Schooner, dbShards, ScaleBase, Couchbase, and DataStax/Cassandra.

In my proposed terminology, an internet request processing (IRP) use case is one in which: Read more

Categories: NoSQL, OLTP

8 Comments

February 24, 2011

Terminology: Analytic platforms

A few weeks ago, I described the elements of an “analytic computing system” or “analytic platform,” while reserving judgment as to which of the two terms would or should win out. I am now capitulating to the term analytic platform, under the influence of, among others, Sharmila Mulligan (and Aster Data in general), Vertica and a variety of fellow analysts (Merv Adrian, Neil Raden, Seth Grimes, Jim Kobielus, and Colin White). While Google evidence would suggest it’s way too early to make this call, I think it’s time to say “analytic platform” will win.

What’s more, I now think the phrase “analytic platform” should win. While I think the term “platform” is overused to the point of silliness, at least the phrase “analytic platform” is short. Thus, it could be modified in various descriptive or not-so-descriptive ways: “Advanced analytic platform,” “graph analytics platform,” “customer analytics platform,” “social media analytics platform,” “CRM analytics platform,” “text analytics platform,” or whatever. By way of contrast, try doing that with “analytic computing system,” and see if you can keep a straight face.

To take this in the direction of an actual definition, I’ll say that the three essential elements of an analytic platform are: Read more

Categories: Analytic technologies, Data warehousing

2 Comments

February 14, 2011

Some quick notes on HP-Vertica

HP is acquiring Vertica. Read more

Categories: In-memory DBMS, Investment research and trading, Memory-centric data management, StreamBase, Streaming and complex event processing (CEP), VoltDB and H-Store

13 Comments

February 14, 2011

Now we know why Vertica has been so weirdly evasive

Communicating with Vertica has been tricky recently. But HP is now announced to be buying Vertica, which pretty much forces me to comment about Vertica. 🙂 So I’ll indulge in a little bit of explanation as to what I know about Vertica, whether for publication or under NDA. My analysis of the HP/Vertica combination, and expectations for same, will go into another post. Read more

Categories: Analytic technologies, Data warehousing, HP and Neoview, Market share and customer counts, Michael Stonebraker, Vertica Systems

10 Comments

February 12, 2011

Upcoming webinar on investigative analytics

I recently coined the phrase investigative analytics to conflate

Statistics, data mining, machine learning, and/or predictive analytics.

The more research-oriented aspects of business intelligence tools:

Ad-hoc query.

Drilldown.

Most things done by BI-using “business analysts”

Most things within BI called “data exploration.”

Analogous technologies as applied to non-tabular data types such as text or graph.

This will be be basis for my part of a webcast on March 10 at 11 am Pacific/2 pm Eastern time. The other main part of the webcast will be a demo by the webcast’s joint sponsors Aster Data and Tableau Software.

Some of Aster’s verbiage in describing and titling the webinar is so hyperbolic that I do not want to give the impression of endorsing it. But I am very hopeful that the webinar itself will be interesting and informative, and will point people at least somewhat in the direction of the benefits Aster is claiming.

Categories: Analytic technologies, Aster Data, Business intelligence, Data warehousing, Presentations, Tableau Software

3 Comments

February 11, 2011

Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms

The Forrester Wave: Enterprise Data Warehouse Platforms, Q1 2011 is now out,* hot on the heels of the Gartner Magic Quadrant. Unfortunately, this particular Forrester Wave is riddled with inaccuracy. Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, EMC, Exadata, Greenplum, Netezza, Oracle, Pricing, SAP AG, Sybase, Teradata, Vertica Systems

8 Comments

February 9, 2011

Clarification on dbShards’ shard replication

After I posted recently about dbShards, a Very Smart Commenter emailed me with the challenge “but each individual shard is still replicated via two-phase commit, and everybody knows two-phase commit is fundamentally slow.” I replied that no, it wasn’t exactly two-phase commit, but fumbled the explanation of why — so I decided to escalate straight to dbShards honcho Cory Isaacson. Read more

Categories: dbShards and CodeFutures, Parallelization, Transparent sharding

15 Comments

February 8, 2011

Membase and CouchOne merged to form Couchbase

Membase, the company whose product is Membase and whose former company name is Northscale, has merged with CouchOne, the company whose product is CouchDB and whose former name is Couch.io. The result (product and company) will be called Couchbase. CouchDB inventor Damien Katz will join the Membase (now Couchbase) management team as CTO. Couchbase can reasonably be regarded as a document-oriented NoSQL DBMS, a product category I not coincidentally posted about yesterday.

In essence, Couchbase will be CouchDB with scale-out. Alternatively, Couchbase will be Membase with a richer programming interface. The Couchbase sweet spot is likely to be: Read more

Categories: Application areas, Cache, Couchbase, CouchDB, Market share and customer counts, memcached, NoSQL, Open source, Parallelization, Solid-state memory

2 Comments

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Updating our vendor client disclosures

Terminology: Transparent sharding

Terminology: Internet Request Processing (IRP)

Terminology: Analytic platforms

Some quick notes on HP-Vertica

Now we know why Vertica has been so weirdly evasive

Upcoming webinar on investigative analytics

Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms

Clarification on dbShards’ shard replication

Membase and CouchOne merged to form Couchbase

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin