Data models and architecture

Discussion of issues in data modeling, and whether databases should be consolidated or loosely coupled. Related subjects include:

October 30, 2013

Glassbeam instantiates a lot of trends

Glassbeam checked in recently, and they turn out to exemplify quite a few of the themes I’ve been writing about. For starters:

Glassbeam basics include:

All Glassbeam customers except one are SaaS/cloud (Software as a Service), and even that one was only offered a subscription (as oppose to perpetual license) price.

So what does Glassbeam’s technology do? Glassbeam says it is focused on “machine data analytics,” specifically for the “Internet of Things”, which it distinguishes from IT logs.* Specifically, Glassbeam sells to manufacturers of complex devices — IT (most of its sales so far ), medical, automotive (aspirational to date), etc. — and helps them analyze “phone home” data, for both support/customer service and marketing kinds of use cases. As of a recent release, the Glassbeam stack can: Read more

October 24, 2013

JSON in Teradata

I coined the term schema-on-need last month. More precisely, I coined it while being briefed on JSON-in-Teradata, which was announced earlier this week, and is slated for availability in the first half of 2014.

The basic JSON-in-Teradata story is as you expect:

JSON virtual columns are referenced a little differently than ordinary physical columns are. Thus, if you materialize a virtual column, you have to change your SQL. If you’re doing business intelligence through a semantic layer, or otherwise have some kind of declarative translation, that’s probably not a big drawback. If you’re coding analytic procedures directly, it still may not be a big drawback — hopefully you won’t reference the virtual column too many times in code before you decide to materialize it instead.

My Bobby McFerrin* imitation notwithstanding, Hadapt illustrates a schema-on-need approach that is slicker than Teradata’s in two ways. First, Hadapt has full SQL transparency between virtual and physical columns. Second, Hadapt handles not just JSON, but anything represented by key-value pairs. Still, like XML before it but more concisely, JSON is a pretty versatile data interchange format. So JSON-in-Teradata would seem to be useful as it stands.

*The singer in the classic 1988 music video Don’t Worry Be Happy. The other two performers, of course, were Elton John and Robin Williams.

October 10, 2013

Aster 6, graph analytics, and BSP

Teradata Aster 6 has been preannounced (beta in Q4, general release in Q1 2014). The general architectural idea is:

There’s much more, of course, but those are the essential pieces.

Just to be clear: Teradata Aster 6, aka the Teradata Aster Discovery Platform, includes HDFS compatibility, native MapReduce and ways of invoking Hadoop MapReduce on non-Aster nodes or clusters — but even so, you can’t run Hadoop MapReduce within Aster over Aster’s version of HDFS.

The most dramatic immediate additions are in the graph analytics area.* The new SQL-Graph is supported by something called BSP (Bulk Synchronous Parallel). I’ll start by observing (and some of this is confusing):

Use cases suggested are a lot of marketing, plus anti-fraud.

*Pay no attention to Aster’s previous claims to do a good job on graph — and not only via nPath — in SQL-MR.

So far as I can infer from examples I’ve seen, the semantics of Teradata Aster SQL-Graph start:

Within those functions, the core idea is:  Read more

September 29, 2013

ClearStory, Spark, and Storm

ClearStory Data is:

I think I can do an interesting post about ClearStory while tap-dancing around the still-secret stuff, so let’s dive in.


To a first approximation, ClearStory ingests data in a system built on Storm (code name: Stormy), dumps it into HDFS, and then operates on it in a system built on Spark (code name: Sparky). Along the way there’s a lot of interaction with another big part of the system, a metadata catalog with no code name I know of. Or as I keep it straight:

Read more

September 24, 2013


There’s a growing trend for DBMS to beef up their support for multiple data manipulation languages (DMLs) or APIs — and there’s a special boom in JSON support, MongoDB-compatible or otherwise. So I talked earlier tonight with IBM’s Bobbie Cochrane about how JSON is managed in DB2.

For starters, let’s note that there are at least four strategies IBM could have used.

IBM’s technology choices are of course influenced by its use case focus. It’s reasonable to divide MongoDB use cases into two large buckets:

IBM’s DB2 JSON features are targeted at the latter bucket. Also, I suspect that IBM is generally looking for a way to please users who enjoy working on and with their MongoDB skills.  Read more

September 21, 2013


Two years ago I wrote about how Zynga managed analytic data:

Data is divided into two parts. One part has a pretty ordinary schema; the other is just stored as a huge list of name-value pairs. (This is much like eBay‘s approach with its Teradata-based Singularity, except that eBay puts the name-value pairs into long character strings.) … Zynga adds data into the real schema when it’s clear it will be needed for a while.

What was then the province of a few huge web companies is now poised to be a broader trend. Specifically:

That migration from virtual to physical columns is what I’m calling “schema-on-need”. Thus, schema-on-need is what you invoke when schema-on-read no longer gets the job done. ;)

Read more

September 8, 2013

Layering of database technology & DBMS with multiple DMLs

Two subjects in one post, because they were too hard to separate from each other

Any sufficiently complex software is developed in modules and subsystems. DBMS are no exception; the core trinity of parser, optimizer/planner, and execution engine merely starts the discussion. But increasingly, database technology is layered in a more fundamental way as well, to the extent that different parts of what would seem to be an integrated DBMS can sometimes be developed by separate vendors.

Major examples of this trend — where by “major” I mean “spanning a lot of different vendors or projects” — include:

Other examples on my mind include:

And there are several others I hope to blog about soon, e.g. current-day PostgreSQL.

In an overlapping trend, DBMS increasingly have multiple data manipulation APIs. Examples include:  Read more

July 20, 2013

The refactoring of everything

I’ll start with three observations:

As written, that’s probably pretty obvious. Even so, it’s easy to forget just how pervasive the refactoring is and is likely to be. Let’s survey some examples first, and then speculate about consequences. Read more

June 23, 2013

Impala and Parquet

I visited Cloudera Friday for, among other things, a chat about Impala with Marcel Kornacker and colleagues. Highlights included:

Data gets into Parquet via batch jobs only — one reason it’s important that Impala run against multiple file formats — but background format conversion is another roadmap item. A single table can be split across multiple formats — e.g., the freshest data could be in HBase, with the rest is in Parquet.

Read more

April 14, 2013

Introduction to Deep Information Sciences and DeepDB

I talked Friday with Deep Information Sciences, makers of DeepDB. Much like TokuDB — albeit with different technical strategies — DeepDB is a single-server DBMS in the form of a MySQL engine, whose technology is concentrated around writing indexes quickly. That said:

*For reasons that do not seem closely related to product reality, DeepDB is marketed as if it supports “unstructured” data today.

Other NewSQL DBMS seem “designed for big data and the cloud” to at least the same extent DeepDB is. However, if we’re interpreting “big data” to include multi-structured data support — well, only half or so of the NewSQL products and companies I know of share Deep’s interest in branching out. In particular:

Edit: MySQL has some sort of an optional NoSQL interface, and hence so presumably do MySQL-compatible TokuDB, GenieDB, Clustrix, and MemSQL.

Also, some of those products do not today have the transparent scale-out that Deep plans to offer in the future.

Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.