I talked with Teradata about a bunch of stuff yesterday, including this week’s announcements in in-database predictive modeling. The specific news was about partnerships with Fuzzy Logix and Revolution Analytics. But what I found more interesting was the surrounding discussion. In a nutshell:
- Teradata is finally seeing substantial interest in in-database modeling, rather than just in-database scoring (which has been important for years) and in-database data preparation (which is a lot like ELT — Extract/Load/transform).
- Teradata is seeing substantial interest in R.
- It seems as if similar groups of customers are interested in both parts of that, such as:
This is the strongest statement of perceived demand for in-database modeling I’ve heard. (Compare Point #3 of my July predictive modeling post.) And fits with what I’ve been hearing about R.
*That’s very similar to the list of sectors for SAS HPA.
**To support their extremely high focus on product quality, semiconductor manufacturers have been using state-of-the-art analytic tools for at least 30 years.
In-database modeling is a performance feature, and performance can have several kinds of benefit, which may be summarized as “cheaper”, “better”, and “previously impractical”. My impression is that in-database modeling is pretty far toward the “previously impractical” end of the spectrum; enterprises don’t adopt a new way of predictive modeling until they want to create models that the old way can’t get done.
Basically, I think that models are increasingly:
- Richer and more diverse than before. (see for example Point #5 of my July predictive modeling post.)
- Developed in a more experimental and quickly-iterative way than before.
I think the first point pretty much implies the second, but the converse isn’t as clear; one can tweak old-style models in quick-turnaround fashion even more easily than one can develop the more complex newer styles.
And finally: I’m not hearing that modeling — even when it’s parallel and in-database fast — is commonly done on a complete many-terabyte dataset. It’s not a question I always remember to ask; for example, I didn’t bring it up with Teradata. But when I do, I rarely hear of models being trained on more than a few terabytes of data each.