October 17, 2012

Notes on analytic hardware

I took the opportunity of Teradata’s Aster/Hadoop appliance announcement to catch up with Teradata hardware chief Carson Schmidt. I love talking with Carson, about both general design philosophy and his views on specific hardware component technologies.

From a hardware-requirements standpoint, Carson seems to view Aster and Hadoop as more similar to each other than either is to, say, a Teradata Active Data Warehouse. In particular, for Aster and Hadoop:

The most obvious implication is differences in the choice of parts, and of their ratio. Also, in the new Aster/Hadoop appliance, Carson is content to skate by with RAID 5 rather than RAID 1.

I think Carson’s views about flash memory can be reasonably summarized as: Read more

October 17, 2012

Hadoop/RDBMS integration: Aster SQL-H and Hadapt

Two of the more interesting approaches for integrating Hadoop and MapReduce with relational DBMS come from my clients at Teradata Aster (via SQL/MR and SQL-H) and Hadapt. In both cases, the story starts:

Of course, there are plenty of differences. Those start: Read more

October 17, 2012

The Teradata Aster Big Analytics Aster/Hadoop appliance

My clients at Teradata are introducing a mix-em/match-em Aster/Hadoop box, officially called the Teradata Aster Big Analytics Appliance. Basics include:

My views on the Teradata Aster Big Analytics Appliance start: Read more

October 16, 2012

Hadapt Version 2

My clients at Hadapt are coming out with a Version 2 to be available in Q1 2013, and perhaps slipstreaming some of the features before then. At that point, it will be reasonable to regard Hadapt as offering:

Solr is in the mix as well.

Hadapt+Hadoop is positioned much more as “better than Hadoop” than “a better scale-out RDBMS”– and rightly so, due to its limitations when viewed strictly from an analytic RDBMS standpoint. I.e., Hadapt is meant for enterprises that want to do several of:

Hadapt has 6 or so production customers, a dozen or so more coming online soon, 35 or so employees (mainly in Cambridge or Poland), reasonable amounts of venture capital, and the involvement of a variety of industry luminaries. Hadapt’s biggest installation seems to have 10s of terabytes of relational data and 100s of TBs of multi-structured; Hadapt is very confident in its ability to scale an order of magnitude beyond that with the Version 2 product, and reasonably confident it could go even further.

At the highest level, Hadapt works like this: Read more

October 15, 2012

What is meant by “iterative analytics”

A number of people and companies are using the term “iterative analytics”. This is confusing, because it can mean at least three different things:

  1. You analyze something quickly, decide the result is not wholly satisfactory, and try again. Examples might include:
    • Aggressive use of drilldown, perhaps via an advanced-interface business intelligence tool such as Tableau or QlikView.
    • Any case where you run a query or a model, think about the results, and run another one after that.
  2. You develop an intermediate analytic result, and using it as input to the next round of analysis.  This is roughly equivalent to saying that iterative analytics refers to a multi-step analytic process involving a lot of derived data.
  3. #1 and #2 conflated/combined. This is roughly equivalent to saying that iterative analytics refers to all of to investigative analytics, sometimes known instead as exploratory analytics.

Based both on my personal conversations and a quick Google check, it’s reasonable to say #1 and #3 seem to be the most common usages, with #2 trailing a little bit behind.

But often it’s hard to be sure which of the various possible meanings somebody has in mind.

Related links

Monash’s First and Third Laws of Commercial Semantics state:

October 12, 2012

(Relational) database (management system) — three analytic glossary draft entries

These are three closely-related draft entries for the DBMS2 analytic glossary. Please comment with any ideas you have for their improvement!

1. Database management system (DBMS)

In our definition, a database management system (DBMS) is:

Commonly, that API takes the form of a data manipulation language (DML) such as SQL or MDX, but our definition allows for APIs as simple as those of key-value stores.

There are two major alternatives to our definition:

  1. The above could be a definition of “data management software”, with the term “DBMS” reserved for systems with a true DML.
  2. Many vendors and industry observers abbreviate “database management system” or “data management software” as “database”.

Two important distinctions among categories of DBMS and the processing they’re optimized for are:

2. Database

The term database has two common meanings in IT: Read more

October 11, 2012

Oracle and IBM — strategic context

By my standards, I’ve been writing a lot about Oracle and IBM recently. Let me now step back and review the context in which I view them.

At the highest level, Oracle and IBM have similar strategic priorities, in line with the Innovator’s Dilemma/Innovator’s Solution issues I keep mentioning. That is:

Of course, there are major differences in the two companies’ product and service portfolios. Some of the biggest are: Read more

October 9, 2012

IBM Pure jargon

As best I can tell, IBM now has three related families of hardware/software bundles, aka appliances, aka PureSystems, aka something that sounds like “expert system” but in fact has nothing to do with the traditional rules-engine meaning of that term. In particular,

Within the PureData line, there are three sub-families:

The Netezza part of the story seems to start:

Perhaps someday I’ll be able to supply interesting details, for example about the concurrency improvement or about the uses (if any) customers are finding for Netezza’s in-database analytics — but as previously noted, analyzing big companies is hard.

October 7, 2012

IBM’s ETL

Bearing in mind the difficulties in covering big companies and their products, I had a call with IBM about its core ETL technology (Extract/Transform/Load), and have some notes accordingly. It’s pretty reasonable to say that there are and were a Big Three of high-end ETL vendors:

However, IBM fondly thinks there are a Big Two, on the theory that Informatica Powercenter can’t scale as well as IBM and Ab Initio can, and hence gets knocked out of deals when particularly strong scalability and throughput are required. Read more

October 6, 2012

Analyzing big companies is hard

Analyzing companies of any size is hard. Analyzing large ones, however, is harder yet.

Such limitations should be borne in mind in connection with anything I write about, for example, Oracle, Microsoft, IBM, or SAP.

There are many reasons for large companies to communicate less usefully with analysts than smaller ones do. Some of the biggest are:

Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.