September 20, 2013

Trends in predictive modeling

I talked with Teradata about a bunch of stuff yesterday, including this week’s announcements in in-database predictive modeling. The specific news was about partnerships with Fuzzy Logix and Revolution Analytics. But what I found more interesting was the surrounding discussion. In a nutshell:

This is the strongest statement of perceived demand for in-database modeling I’ve heard. (Compare Point #3 of my July predictive modeling post.) And fits with what I’ve been hearing about R.

*That’s very similar to the list of sectors for SAS HPA.

**To support their extremely high focus on product quality, semiconductor manufacturers have been using state-of-the-art analytic tools for at least 30 years.

In-database modeling is a performance feature, and performance can have several kinds of benefit, which may be summarized as “cheaper”, “better”, and “previously impractical”. My impression is that in-database modeling is pretty far toward the “previously impractical” end of the spectrum; enterprises don’t adopt a new way of predictive modeling until they want to create models that the old way can’t get done.

Basically, I think that models are increasingly:

I think the first point pretty much implies the second, but the converse isn’t as clear; one can tweak old-style models in quick-turnaround fashion even more easily than one can develop the more complex newer styles.

And finally: I’m not hearing that modeling — even when it’s parallel and in-database fast — is commonly done on a complete many-terabyte dataset. It’s not a question I always remember to ask; for example, I didn’t bring it up with Teradata. But when I do, I rarely hear of models being trained on more than a few terabytes of data each.


One Response to “Trends in predictive modeling”

  1. Thomas W. Dinsmore on September 21st, 2013 9:50 am

    There are many benefits to modeling in-database, not the least of which is the elimination of data movement. Even if one builds models on “only” a few terabytes, the effort to move that data adds serious latency to the analytic cycle time. Moreover, there are very few server-based analytic products that can work with terabyte-sized data sets, so analysts working outside of a database typically work with sets of 100GB or less.

    SAS HPA has failed to gain acceptance, but not because it has a limited number of algorithms. The reasons this product has failed are (1) SAS has priced the product out of the market; (2) the product architecture restricts deployment choices; and (3) the product’s software engineering makes high demands on infrastructure leading to exploding TCO. HPA runs “in the appliance” and not “in the database”, so it requires specially constructed platforms that bulk up the memory and reduce storage. In the case of Teradata, SAS HPA runs only on the dedicated 720 appliance, and not on other members of the Teradata family.

    SAS’ more recent attempts to deploy HPA into Hadoop in a “run beside on the node” approach have also predictably run into roadblocks, since HPA cannot run on the typical commodity node servers that Hadoop customers use.

    SAS scoring is also limited to those algorithms supported by the SAS Scoring Accelerator or the SAS Analytics Accelerator, a subset of SAS analytic capabilities. Since both of these products have limitations of their own (and also require substantial additional licensing fees), and since SAS/STAT does not export PMML, many SAS customers simply rebuild the scoring code in SQL, C, Java, R or Python.

    Revolution R Enterprise, on the other hand, will run on any instance of Teradata 14.10, in any Teradata appliance. And any R code developed anywhere, on any platform, will run in Teradata in Revolution R.

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.