I have long complained about difficulties in discussing Netezza’s TwinFin i-Class analytic platform. But I’m ready now, and in the grand sweep of the product’s history I’m not even all that late. The Netezza i-Class timing story goes something like this:
- Netezza i-Class was first foreshadowed in February, 2010.
- Netezza i-Class customer testing started in October, 2010 or so. Netezza i-Class evidently has been shipped to 4-5 partners and a single-digit number of end-user organizations, spread across some usual-suspect industries (financial services, telecom, and so on).
- Netezza i-Class 1.0 general availability is still in the (near) future.
My advice to Netezza as to how it should describe TwinFin i-Class boils down to:
1. The Netezza platform has been enhanced in two major ways:
- There’s a good way to run all kinds of analytic processes. This is very flexible and powerful, but tightly integrated with the SQL engine even so.
- You are supplying some specific high-performing, highly parallel, big-data analytic process building blocks. More precisely, you have greatly extended the set of such building blocks; you had some cool building blocks (notably Spatial) even before this.
2. There are four main ways to get at this:
- Extended SQL.
- Programming, in a bunch of languages and paradigms, integrated into the SQL.
- Partner code, with them doing the programming for you.
Some of the rah-rah words aside, that’s a pretty fair overview. Here’s more detail.
To refresh your memory: Netezza TwinFin i-Class functionality basics include, as best I can tell (and there’s some more detail at the links above):
- You can run processes in a usual-suspect set of languages on Netezza i-Class (even Fortran).
- One notable example is R; indeed, there’s an R client for talking to Netezza TwinFin.
- Netezza provides its own Hadoop implementation, which differs from standard Hadoop implementations most notably in that it manages data relationally via the usual Netezza DBMS, not in anything like HDFS.
- Anything written in any language except C/C++ (or of course SQL) — and in particular anything involving Hadoop — runs out-of-process versus the Netezza DBMS. C/C++ can run in-process, for maximum performance.*
- There’s an assortment of parallelized mathematical analytic packages built into Netezza i-Class. The matrix algebra ones are called nzMatrix. Most of the rest are part of a collection called nzAnalytics. Often these are implemented as stored procedures, as they may make multiple passes through the data.
- Netezza has thoughtfully ported thousands of analytic procedures for you to the Netezza platform (in essence, the basic R/CRAN and GNU libraries). These are not promised to be parallel on their own, but you’re welcome to invoke an instance on each node and parallelize that way.
I forgot to check, but I’m guessing any extension of workload management to cover non-DBMS processes won’t be in the first release of Netezza i-Class.
*However, Netezza says that if you can batch requests to return even just 500-1000 records at a time, the out-of-process performance penalty — which is based on wait time for transferring data between processes — becomes insignificant.
None of that is particularly new information. But after a visit to Netezza on Tuesday, I’ve finally gotten some kind of handle on how i-Class is architected. Highlights of the Netezza i-Class architecture story, as I understand them, include:
- It all starts with UDtFs — User-Defined (table) Functions, which are subject to the usual limitations.
- To overcome the standard limitations of UDtFs, Netezza built:
- A set of UDtFs that, taken together, execute command-line programs.
- For each language (Java, Python, R, etc., and I think also C/C++), a library that talks to the command-line executor. This library can talk to multiple instances of the executor, so it’s not limited to a single data stream. Similarly, it can persist past the life of a query.
- Similarly, Netezza built a C/C++ library that talks to the command-line executor and also talks MPI (Message Passing Interface).
- This has not yet been exposed outside Netezza.
- Rather, MPI is used by nzMatrix, so that nzMatrix can invert (for example) really, really big matrices.
- There are two* main ways to invoke all this.
- SQL. Any analytic process can be invoked via a SQL UDtF. Netezza tends to use the term UDAP (User-Defined Analytic Process) interchangeably for the process itself and for the SQL UDtF that encapsulates it.
- Netezza’s (interfaces to an) R client. More on that below.
- Netezza’s version of Hadoop is an important special case. The mappers and reducers you write in Hadoop are UDAPs.
I didn’t delve far enough into Netezza’s UDAP syntax to understand how it compares to, say, Aster’s SQL/MR.
*From a marketing standpoint, Netezza might prefer to count partner code separately as a third way, but I’m focusing on the technology here, which is used by partners and end-user organizations alike.
Other Netezza/Hadoop notes include:
- Netezza has the usual kind of Cloudera partnership.
- Since Netezza’s owner IBM has a Hadoop implementation, it seems obvious there will be some partnership action with that too. But at this point it’s not so far along.
The Netezza TwinFin i-Class R story goes something like this:
- Assume you’re using R on a client. (I’m not sure whether Netezza has an R client to give or recommend to you.)
- There are three Netezza packages that change how R works, by letting it use stuff on the Netezza box.
- nzR translates between logical R memory structures and Netezza tables. In particular, nzR allows R to run, not just in-memory, but against the data on the Netezza box.
- nzMatrix lets you do R matrix algebra against the data on the Netezza box.
- nzAnalytics lets you invoke various algorithms that run on the Netezza box, against Netezza data.
A recently announced Netezza partnership with Revolution Analytics is meant to lead to Revolution replacing Netezza’s ports of R libraries with its own preferred distribution, and then supporting same.
Finally, there’s Netezza Spatial.
- Netezza claims multiple orders of magnitude of performance advantage for Netezza Spatial vs. geospatial alternatives, which is always a nice thing to be able to say.
- Generally, Netezza Spatial is now regarded as being part of i-Class.
- However, the product timing and adoption comments above don’t apply to Netezza Spatial.
- Netezza Spatial has a couple of dedicated salespeople, and seems to be well-liked by retailers.
- Netezza surely wishes everybody would forget about some of rewrites and controversy associated with Netezza Spatial.
Perhaps there are yet more pieces of the Netezza TwinFin i-Class story I’m overlooking, but I hope I now have most of the major aspects at least partway right.