Teradata

Analysis of data warehousing giant Teradata. Related subjects include:

August 31, 2014

Notes from a visit to Teradata

I spent a day with Teradata in Rancho Bernardo last week. Most of what we discussed is confidential, but I think the non-confidential parts and my general impressions add up to enough for a post.

First, let’s catch up with some personnel gossip. So far as I can tell:

The biggest change in my general impressions about Teradata is that they’re having smart thoughts about the cloud. At least, Oliver is. All details are confidential, and I wouldn’t necessarily expect them to become clear even in October (which once again is the month for Teradata’s user conference). My main concern about all that is whether Teradata’s engineering team can successfully execute on Oliver’s directives. I’m optimistic, but I don’t have a lot of detail to support my good feelings.

In some quick-and-dirty positioning and sales qualification notes, which crystallize what we already knew before:

Also: Read more

July 23, 2014

Teradata bought Hadapt and Revelytix

My client Teradata bought my (former) clients Revelytix and Hadapt.* Obviously, I’m in confidentiality up to my eyeballs. That said — Teradata truly doesn’t know what it’s going to do with those acquisitions yet. Indeed, the acquisitions are too new for Teradata to have fully reviewed the code and so on, let alone made strategic decisions informed by that review. So while this is just a guess, I conjecture Teradata won’t say anything concrete until at least September, although I do expect some kind of stated direction in time for its October user conference.

*I love my business, but it does have one distressing aspect, namely the combination of subscription pricing and customer churn. When your customers transform really quickly, or even go out of existence, so sometimes does their reliance on you.

I’ve written extensively about Hadapt, but to review:

As for what Teradata should do with Hadapt: Read more

July 14, 2014

21st Century DBMS success and failure

As part of my series on the keys to and likelihood of success, I outlined some examples from the DBMS industry. The list turned out too long for a single post, so I split it up by millennia. The part on 20th Century DBMS success and failure went up Friday; in this one I’ll cover more recent events, organized in line with the original overview post. Categories addressed will include analytic RDBMS (including data warehouse appliances), NoSQL/non-SQL short-request DBMS, MySQL, PostgreSQL, NewSQL and Hadoop.

DBMS rarely have trouble with the criterion “Is there an identifiable buying process?” If an enterprise is doing application development projects, a DBMS is generally chosen for each one. And so the organization will generally have a process in place for buying DBMS, or accepting them for free. Central IT, departments, and — at least in the case of free open source stuff — developers all commonly have the capacity for DBMS acquisition.

In particular, at many enterprises either departments have the ability to buy their own analytic technology, or else IT will willingly buy and administer things for a single department. This dynamic fueled much of the early rise of analytic RDBMS.

Buyer inertia is a greater concern.

A particularly complex version of this dynamic has played out in the market for analytic RDBMS/appliances.

Otherwise I’d say:  Read more

June 18, 2014

Using multiple data stores

I’m commonly asked to assess vendor claims of the kind:

So I thought it might be useful to quickly review some of the many ways organizations put multiple data stores to work. As usual, my bottom line is:

Horses for courses

It’s now widely accepted that different data managers are better for different use cases, based on distinctions such as:

Vendors are part of this consensus; already in 2005 I observed

For all practical purposes, there are no DBMS vendors left advocating single-server strategies.

Vendor agreement has become even stronger in the interim, as evidenced by Oracle/MySQL, IBM/Netezza, Oracle’s NoSQL dabblings, and various companies’ Hadoop offerings.

Multiple data stores for a single application

We commonly think of one data manager managing one or more databases, each in support of one or more applications. But the other way around works too; it’s normal for a single application to invoke multiple data stores. Indeed, all but the strictest relational bigots would likely agree:  Read more

February 9, 2014

Distinctions in SQL/Hadoop integration

Ever more products try to integrate SQL with Hadoop, and discussions of them seem confused, in line with Monash’s First Law of Commercial Semantics. So let’s draw some distinctions, starting with (and these overlap):

In particular:

Let’s go to some examples. Read more

November 19, 2013

How Revolution Analytics parallelizes R

I talked tonight with Lee Edlefsen, Chief Scientist of Revolution Analytics, and now think I understand Revolution’s parallel R much better than I did before.

There are four primary ways that people try to parallelize predictive modeling:

One confusing aspect of this discussion is that it could reference several heavily-overlapping but not identical categories of algorithms, including:

  1. External memory algorithms, which operates on datasets too big to fit in main memory, by — for starters — reading in and working on a part of the data at a time. Lee observes that these are almost always parallelizable.
  2. What Revolution markets as External Memory Algorithms, which are those external memory algorithms it has gotten around to implementing so far. These are all parallelized. They are also all in the category of …
  3. … algorithms that can be parallelized by:
    • Operating on data in parts.
    • Getting intermediate results.
    • Combining them in some way for a final result.
  4. Algorithms of the previous category, where the way of combining them specifically is in the form of summation, such as those discussed in the famous paper Map-Reduce for Machine Learning on Multicore. Not all of Revolution’s current parallel algorithms fall into this group.

To be clear, all Revolution’s parallel algorithms are in Category #2 by definition and Category #3 in practice. However, they aren’t all in Category #4.

Read more

November 10, 2013

RDBMS and their bundle-mates

Relational DBMS used to be fairly straightforward product suites, which boiled down to:

Now, however, most RDBMS are sold as part of something bigger.

Read more

October 24, 2013

JSON in Teradata

I coined the term schema-on-need last month. More precisely, I coined it while being briefed on JSON-in-Teradata, which was announced earlier this week, and is slated for availability in the first half of 2014.

The basic JSON-in-Teradata story is as you expect:

JSON virtual columns are referenced a little differently than ordinary physical columns are. Thus, if you materialize a virtual column, you have to change your SQL. If you’re doing business intelligence through a semantic layer, or otherwise have some kind of declarative translation, that’s probably not a big drawback. If you’re coding analytic procedures directly, it still may not be a big drawback — hopefully you won’t reference the virtual column too many times in code before you decide to materialize it instead.

My Bobby McFerrin* imitation notwithstanding, Hadapt illustrates a schema-on-need approach that is slicker than Teradata’s in two ways. First, Hadapt has full SQL transparency between virtual and physical columns. Second, Hadapt handles not just JSON, but anything represented by key-value pairs. Still, like XML before it but more concisely, JSON is a pretty versatile data interchange format. So JSON-in-Teradata would seem to be useful as it stands.

*The singer in the classic 1988 music video Don’t Worry Be Happy. The other two performers, of course, were Elton John and Robin Williams.

October 18, 2013

Entity-centric event series analytics

Much of modern analytic technology deals with what might be called an entity-centric sequence of events. For example:

Analytic questions are asked along the lines “Which sequences of events are most productive in terms of leading to the events we really desire?”, such as product sales. Another major area is sessionization, along with data preparation tasks that boil down to arranging data into meaningful event sequences in the first place.

A number of my clients are focused on such scenarios, including WibiData, Teradata Aster (e.g. via nPath), Platfora (in the imminent Platfora 3), and others. And so I get involved in naming exercises. The term entity-centric came along a while ago, because “user-centric” is too limiting. (E.g., the data may not be about a person, but rather specifically about the actions taken on her mobile device.) Now I’m adding the term event series to cover the whole scenario, rather than the “event sequence(s)” I might appear to have been hinting at above.

I decided on “event series” earlier this week, after noting that:  Read more

October 10, 2013

Aster 6, graph analytics, and BSP

Teradata Aster 6 has been preannounced (beta in Q4, general release in Q1 2014). The general architectural idea is:

There’s much more, of course, but those are the essential pieces.

Just to be clear: Teradata Aster 6, aka the Teradata Aster Discovery Platform, includes HDFS compatibility, native MapReduce and ways of invoking Hadoop MapReduce on non-Aster nodes or clusters — but even so, you can’t run Hadoop MapReduce within Aster over Aster’s version of HDFS.

The most dramatic immediate additions are in the graph analytics area.* The new SQL-Graph is supported by something called BSP (Bulk Synchronous Parallel). I’ll start by observing (and some of this is confusing):

Use cases suggested are a lot of marketing, plus anti-fraud.

*Pay no attention to Aster’s previous claims to do a good job on graph — and not only via nPath — in SQL-MR.

So far as I can infer from examples I’ve seen, the semantics of Teradata Aster SQL-Graph start:

Within those functions, the core idea is:  Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.