Netezza is about to make its biggest product announcement in years. In particular:
- Netezza is cutting prices to under $20K/terabyte of user data, with even lower numbers promised for the near future.
- Netezza is replacing its PowerPC chips with Intel-based IBM blades.
- There will be substantial changes in how data flows between the various parts of a Netezza node.
- Netezza claims this will all produce an immediate 10-15X increase in price-performance, based on a 3X cut in price/terabyte and a 3-5X improvement in mixed workload performance. (Edit: Netezza now agrees that it shouldn’t have phrased things that way”.)
Allow me to explain.
For months, it has been an increasingly open secret that Netezza was planning a major refresh of its product line. As signaled by a blog post from Netezza’s product marketing VP Phil Francisco, many of the details are finally fit to post.*
*A couple more will be revealed next week, and a longer-term roadmap will be laid out during Netezza’s conference tour in September. (By the way, yours truly will be keynoting the Boston, Chicago, San Francisco, Washington, London, and Milan iterations of same. Come by and say hi!)
- Netezza will very soon announce a new product family that replaces its current NPS one. Naturally, NPS equipment and spare parts will be offered for quite a while to the installed base. (Edit: Netezza is calling this family TwinFin. This post has been edited to reflect that name.)
- Netezza TwinFin will have a list price slightly under $20K/terabyte of user data.
- Netezza TwinFin will scale to somewhat over 700 terabytes.
- These figures are based on 2.25X compression, in the middle of what Netezza says is the actual typical range of 2-2.5X. Netezza says that 4X compression is on the roadmap.
- Netezza will also soon unveil a roadmap for several other product families — high capacity (i.e., super-low price/terabyte), higher performance (i.e., higher throughput and not-so-low price/terabyte), and low-end/starter.
- All of Netezza’s new product families will be based on the same software, general hardware architecture, etc. The biggest differences will lie in the mix of different kinds of parts (e.g., CPU vs. disk).
- Netezza also plans to unveil a roadmap for further software enhancements, some of which (e.g., better compression) it believes will lead to further strong gains in price/performance.
- As always, Netezza’s architecture will rely strongly on FPGAs (Field-Programmable Gate Arrays).
- However, Netezza has now decided that conventional Intel-based boards are a better companion to the FPGAs than its currently-used PowerPC chips.
- Obvious implications of Netezza’s move to Intel CPUs include:
- Pretty much any kind of software that runs on a data warehouse appliance can be built on or ported to Netezza, if it’s not there already.
- In some cases, analytic performance will be greatly improved (Netezza says 100X with a straight face, although that’s far from being an across-the-board claim).
- In case anybody cares, Netezza’s new systems will run on Linux top to bottom. Previously, Netezza nodes ran Nucleus.
- Netezza claims load speeds in the new family fast enough for most purposes — “2+ terabytes/hour”. But — unlike query processing — loading does depend on the head node. Netezza says it has a roadmap to change that and go to true parallel load.
Beyond the switcheroo in components, Netezza is making substantial changes to its hardware architecture. In current Netezza products, the FPGA plays the role of a disk controller on steroids — it receives data, does some SQL or other analytic operations on it, and then throws it over the wall to the CPU for the rest of the processing. Netezza TwinFin, however, adds an actual disk controller. More important, it adds fast interconnects between the FPGAs, the disk controller, and RAM — specifically, as Phil Francisco put it in an email,
using multiple parallel channels of PCIe with much faster interconnection rates and lower contention between the blade server and the “DB accelerator card” with the FPGAs.
DMA (Direct Memory Access) technology also fits into the picture somehow.
Given faster interconnects, as well as faster CPUs, Netezza has changed its basic data path. Previously, data went from disk to the FPGAs (where it was filtered) to RAM, and from that point perhaps to the CPU for more processing. Now, however, data goes from disk to disk controller to RAM, and only then to the FPGAs.
The big win here — beyond the usual benefits of standard CPUs — is that Netezza now has a viable cache. Apparently, Netezza’s current product line doesn’t even cache the most heavily reused tables, such as those storing small dimensions or Netezza’s zone map information. Netezza TwinFin will be able to cache those and more, with a default of 256 megabytes of cache per core, and the ability to grow up to 1 gigabyte if needed or desired.
- Phil Francisco posted again, considerably clarifying Netezza’s hardware architecture and somewhat disputing my perception that it’s such a big change.
- Earlier, Netezza put together some of the better vendor slides I’ve seen in a while, and I got permission to post a few — mainly some very clear pictures (photos and architectural diagrams), plus a couple of tables that tease apart the claim of 3-5X overall performance increase.
- Here’s the official page for the IBM BladeCenter HS21 technology Netezza will use.
- Industry impact: “The Netezza Price Point” just changed.
- Here’s a follow-up post clarifying Netezza TwinFin’s FPGA use