Netezza has officially announced the Netezza Developer Network. Associated with that is a set of technical capabilities, which basically boil down to programming user-defined functions or other capabilities straight onto the Netezza nodes (aka SPUs). And this is specifically onto the FPGAs, not the PowerPC processors. In C. Technically, I think what this boils down to is:
- Extending Netezza’s SQL via user-defined functions (which probably wasn’t too hard, especially since the Netezza engine is related to PostgreSQL).
- Providing a C-to-Verilog compiler.
- Providing an application development environment and associated tools. (Presumably rather primitive, but I haven’t really checked it out.)
The applications mentioned in the NDN press release, and I quote directly, are:
- Multi-dimensional geospatial analytics on comprehensive data sets for risk management
- Predictive model scoring for customer segmentation, enabling real-time offer provisioning for customers
- Iterative modeling and analytics on billions of call detail records (CDRs) for telco price optimization
- Real-time Monte Carlo simulations on terabytes of detail-level data for risk management
- “Fingerprinting” with hashing algorithms for chain-of-custody document fingerprinting and to ensure that files transferred are intact
- Fuzzy text search analysis uses algorithms that provide a “best guess” of most likely results
Netezza says that the greatest interest has come from usual-suspect sophisticated users, specifically intelligence agencies and perhaps also financial services firms. But naturally, the partners actually trotted out at Netezza’s user conference were mainly hopeful small-company ISVs. The biggest stir was made by not-so-small SAS, which evidently believes this new capability will provide massive improvements to SAS/Netezza combined performance.
In principle, there are four different ways this new programmability could be a big win:
- Code might just run faster on FPGAs — or on an MPP system in general — than on standard processors. I don’t currently have an opinion as to whether this situation is likely to arise in practice to any significant degree. (Note to self: Talk with one or both of Netezza partners SAS and SPSS on this subject soon.)
- A communication bottleneck is eliminated, whereby query result sets currently have to be sent to an application box via gigabit Ethernet (or whatever) to be processed. I’m sure that’s a biggie. Rival vendors, who run on (more) standard hardware, have this problem to a much lesser extent.
- Network traffic internal to the appliance is also reduced, as data can be massaged right on the node rather than shipped off for processing elsewhere. For some kinds of applications, such as scoring or certain kinds of data reduction, this is surely a big deal. Once again, other MPP data warehouse specialists can and should offer such capabilities too.
- Non-tabular datatypes can now be supported. E.g., there are small outfits offering XML and geospatial, and Netezza has done some internal work to show off its ability to store and load images. I’ll say more about this in another post, not necessarily tonight.