March 6, 2011

Three ways Fedex is a metaphor for data integration

It occurs to me that there are three reasons why Federal Express, aka Fedex, is a great metaphor for data integration. Read more

Categories: Data integration and middleware, EAI, EII, ETL, ELT, ETLT, SnapLogic

2 Comments

March 4, 2011

Teradata, Aster Data, and Teradata/Aster

Teradata is acquiring Aster Data. Naturally, the deal is being presented with a Treaty of Tordesillas kind of positioning — Teradata does X, Aster Data does Y, and everybody looks forward to having X and Y in the same product portfolio. That said, my initial positioning and product strategy thoughts on the Teradata/Aster combination go something like this. Read more

Categories: Analytic technologies, Aster Data, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, RDF and graphs, Specific users, Teradata

9 Comments

March 3, 2011

Terminology: Investigative analytics

In my post on the six useful things you can do with analytic technology, one of the six was

Research, investigate, and analyze in support of future decisions.

I’m calling that investigative analytics, and am hopeful the term will catch on.

I went on to say that the term conflated several disciplines, namely:

Statistics, data mining, machine learning, and/or predictive analytics. …

The more research-oriented aspects of business intelligence tools. …

Analogous technologies as applied to non-tabular data types such as text or graph.

By way of contrast, I don’t regard business activity monitoring (BAM) or other kinds of monitoring-oriented business intelligence (BI) as part of “investigative analytics,” because they don’t seem particularly investigative.

Based on the above, I propose the following simple definition of the investigative analytics activity or process:

Seeking (previously unknown) patterns in data.

Categories: Analytic technologies, Business intelligence

22 Comments

March 2, 2011

How about “Short Request Processing”?

While my other terminology posts seem to have gone pretty well, the Internet Request Processing name is proving a bit problematic. People seem pretty cool with the “request processing” part, but there are issues with the modifier, including:

“Internet” doesn’t really cover everything.
“Network” in practice sounds too low-level, and is also too general.
“Online” is also too general.

So how about just going with “short”? OLTP requests are inherently short. “GET” and “SET” are certainly short. 🙂 In general, queries that do not involve JOINs are probably short requests. Analytic queries, however, are generally not short. Even better, all that can apply to the syntax and the execution time alike. 🙂

Please note that I’m focused more here on describing use cases than products. Whether products generally used to do one kind of thing can also be stretched to do another — e.g., complex analytics hardwired into a Cassandra application — is not my primary concern.

Categories: NoSQL, OLTP

11 Comments

February 28, 2011

Updating our vendor client disclosures

Edit: This disclosure has been superseded by a March, 2012 version.

From time to time, I disclose our vendor client lists. Another iteration is below. To be clear:

This is a list of Monash Advantage members.
All our vendor clients are Monash Advantage members, unless …
… we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen to be SaaS vendors.)
We do not usually disclose our user clients.
We do not usually disclose our venture capital clients, nor those who invest in publicly-traded securities.
Included in the list below are two expired Monash Advantage members who haven’t said they will renew, as mentioned in my recent post on analyst bias. (You can probably imagine a couple of reasons for that obfuscation.)

With that said, our vendor client disclosures at this time are:

Aster Data
Cloudera
CodeFutures/dbShards
Couchbase
EMC/Greenplum
Endeca
IBM/Netezza
Infobright
Intel
MarkLogic
ParAccel
QlikTech
salesforce.com/database.com
SAND Technology
SAP/Sybase
Schooner Information Technology
Skytide
Splunk
Teradata
Vertica

Categories: About this blog, Aster Data, Cloudera, Couchbase, dbShards and CodeFutures, EMC, Greenplum, IBM and DB2, Infobright, Intel, MarkLogic, Netezza, ParAccel, QlikTech and QlikView, SAND Technology, SAP AG, Schooner Information Technology, Splunk, Sybase, Tableau Software, Teradata, Vertica Systems

1 Comment

February 24, 2011

Terminology: Transparent sharding

When databases are too big to manage via a single server, responsibility for them is spread among multiple servers. There are numerous names for this strategy, or versions of it — all of them at least somewhat problematic. The most common terms include:

(Shared-nothing) MPP (Massively Parallel Processing), often used to describe analytic DBMS. On the whole, these terms have worked pretty well, but they have issues even so. First, “MPP” means different things to different marketers. Second, most ostensibly “shared-nothing” systems aren’t really “shared-nothing.” They generally support at least storage arrays, if not storage-area networks (SANs); indeed, in a couple of cases (most notably EMC Greenplum), SAN support is prominent in their marketing message.
(Horizontal) partitioning and/or data distribution. These have significant problems. “Partitioning” and “distribution” are easily confused with each other, not least because the term “partitioning” is used in different ways by different DBMS product vendors.
Sharding, commonly used to describe scaled-out MySQL in Internet Request Processing use cases. This one has the advantage of being concise, but is beginning to mean two different things, in that it is used both when the data is REALLY in separate databases on different machines (i.e., the application has to explicitly reference the shard it wants to talk to) and also when the database is transparently distributed (e.g. via dbShards).
Coherent caching and/or distributed shared memory, describing cases when data is in RAM. Besides being RAM-specific, these terms can be vague as to whether the same data is recopied onto different systems, or whether they are focused on letting (relatively) large in-memory data stores be spread across a cluster.

I plan to start using the term transparent sharding to denote a data management strategy in which data is assigned to multiple servers (or CPUs, cores, etc.), yet looks to programmers and applications as if it were managed by just one. Thus,

dbShards and ScaleBase feature transparent sharding (this is the case which inspired me to introduce the term).
Anything which has ever reasonably been called a “shared-nothing” MPP DBMS features transparent sharding.
Memcached features transparent sharding. So, I imagine, do other caching systems I am less familiar with.
Shared-disk DBMS do not feature transparent sharding, even if their query work can be scaled out across multiple servers. (But Oracle Exadata does, because of its server tier.)

Categories: Parallelization, Transparent sharding

27 Comments

February 24, 2011

Terminology: Internet Request Processing (IRP)

As I observed previously, we need a term that means “like OLTP but not necessarily transactional”, to help describe a category of use cases that can reasonably be addressed by NoSQL or scale-out SQL systems alike.* So here’s a candidate phrase: Internet Request Processing (IRP). If we use that, I’ll call Schooner, Cassandra, Couchbase , et al. IRP DBMS, while other people will probably call them IRP databases.

*Consider, for example, the overlapping use cases for Schooner, dbShards, ScaleBase, Couchbase, and DataStax/Cassandra.

In my proposed terminology, an internet request processing (IRP) use case is one in which: Read more

Categories: NoSQL, OLTP

8 Comments

February 24, 2011

Terminology: Analytic platforms

A few weeks ago, I described the elements of an “analytic computing system” or “analytic platform,” while reserving judgment as to which of the two terms would or should win out. I am now capitulating to the term analytic platform, under the influence of, among others, Sharmila Mulligan (and Aster Data in general), Vertica and a variety of fellow analysts (Merv Adrian, Neil Raden, Seth Grimes, Jim Kobielus, and Colin White). While Google evidence would suggest it’s way too early to make this call, I think it’s time to say “analytic platform” will win.

What’s more, I now think the phrase “analytic platform” should win. While I think the term “platform” is overused to the point of silliness, at least the phrase “analytic platform” is short. Thus, it could be modified in various descriptive or not-so-descriptive ways: “Advanced analytic platform,” “graph analytics platform,” “customer analytics platform,” “social media analytics platform,” “CRM analytics platform,” “text analytics platform,” or whatever. By way of contrast, try doing that with “analytic computing system,” and see if you can keep a straight face.

To take this in the direction of an actual definition, I’ll say that the three essential elements of an analytic platform are: Read more

Categories: Analytic technologies, Data warehousing

2 Comments

February 14, 2011

Some quick notes on HP-Vertica

HP is acquiring Vertica. Read more

Categories: In-memory DBMS, Investment research and trading, Memory-centric data management, StreamBase, Streaming and complex event processing (CEP), VoltDB and H-Store

13 Comments

February 14, 2011

Now we know why Vertica has been so weirdly evasive

Communicating with Vertica has been tricky recently. But HP is now announced to be buying Vertica, which pretty much forces me to comment about Vertica. 🙂 So I’ll indulge in a little bit of explanation as to what I know about Vertica, whether for publication or under NDA. My analysis of the HP/Vertica combination, and expectations for same, will go into another post. Read more

Categories: Analytic technologies, Data warehousing, HP and Neoview, Market share and customer counts, Michael Stonebraker, Vertica Systems

10 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Three ways Fedex is a metaphor for data integration

Teradata, Aster Data, and Teradata/Aster

Terminology: Investigative analytics

How about “Short Request Processing”?

Updating our vendor client disclosures

Terminology: Transparent sharding

Terminology: Internet Request Processing (IRP)

Terminology: Analytic platforms

Some quick notes on HP-Vertica

Now we know why Vertica has been so weirdly evasive

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin