What does Netezza do in the FPGAs anyway, and other questions
The news of Netezza’s new TwinFin product family has generated a lot of comments and questions, some pretty reasonable, some quite silly. E.g., I’ve seen it suggested privately or publicly that
- Netezza’s older products only handle one query at a time (nonsense, and I’m going to loyally protect the identity of the person who emailed that odd suggestion to me)
- A Netezza node can be a single point of failure (also nonsense, although performance degradation from a node failure might be considerable)
- Netezza has a cache consistency problem (also hardly true, except insofar as it’s an issue to overcome in future development as Netezza moves toward parallelizing bulk loads, transactional updates, and/or trickle feeds).
Netezza’s Phil Francisco addressed some points of this nature in a recent blog post.
More reasonable is the question:
Now that Netezza has changed its architecture, what are all those FPGAs (Field-Programmable Gate Arrays) being used for anyway?
The short answer is:
Almost everything they were used for before, except they aren’t substituting for the disk controller any more.
The longer answer is:
- Projections
- Restrictions/selections
- “Visibility,” which for now seems to mean recognizing which rows are and aren’t valid under Netezza’s form of MVCC (MultiVersion Concurrency Control).
- Compression and/or decompression (I’m a little confused as to which, but I imagine it’s both)
- Netezza’s form of UDFs (User-Defined Functions)
Under NDA, Phil told me that one more item from the list on Page 11 of this two-year-old Netezza white paper explaining its FPGA use is coming soon.
Related links
- A recent discussion of the use of FPGAs for SQL operations in a post and comment thread around XtremeData’s product launch
- A January, 2008 post by Phil Francisco about Netezza’s FPGA use, based on the white paper linked above (I reposted same because Netezza’s own link to it is broken)
- Daniel Abadi tore into Netezza’s critics, but also opined that Netezza oversold the significance of the TwinFin announcement
Comments
6 Responses to “What does Netezza do in the FPGAs anyway, and other questions”
Leave a Reply
[…] a follow-up post clarifying Netezza TwinFin’s FPGA use Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza, […]
Interesting. Seems like use of specialized hardware for processing of large volumes of data becomes a reality. I think it is a smarter solution than putting together a lot of cheap commodity PCs together like in MapReduce.
No one really talks about it, but FPGAs are much less power than high end server CPUs also. 10-30 Watts vs 100-130 Watts
For example, our (xtremedata) dbX appliance replaces a CPU with our “SQL In Silicon” (FPGA) in-socket accelerator, and removes 90 watts per server by doing this. This is 16 nodes * 90 Watts per rack or 1.44KWatts per rack.
Thus, I’m sure the TwinFin S-Blade sees a real power/cooling benefit over a dual-socket CPU-only blade.
> Compression and/or decompression (I’m a little
> confused as to which, but I imagine it’s both)
Curt,
It was Netezza’s marketing group that opted for euphony over precision in labeling the block of logic performing decompression the “Compress Engine”. Notice how they carefully avoided calling it the “Compression Engine” or the “Decompression Engine”. The ensuing confusion was entirely predictable.
Netezza has always focused on scan-mostly activities. Thus it makes sense to invest FPGA engineers and gates in decompression. All the more so since decompression is deterministic — the bit stream expresses exactly what to do.
By contrast, compression is a much more challenging activity. Nearly any compression scheme presents redundant encoding opportunities. Sometimes making good choices requires look-ahead or at least back-patching. That sort of stuff is _really_ hard to implement in an FPGA. Netezza’s experience is that with careful implementation software compression can be implemented with minimal impact on CPU utilization.
Finally, let me close with an historical analogy from the world of computer graphics. The earliest graphical workstations had only a bit mapped frame buffer manipulated in a high level language (e.g. Xerox Alto + Smalltalk). The drive for increasing graphics performance lead successively to carefully crafted assembly language (e.g. Mac toolkit), pixel interpolation hardware, and ultimately full graphics pipelines of ever increasing complexity. Yet today does anyone argue against “proprietary” graphics hardware? Could it be that hardware support for processing database fields is as inevitable as hardware support for generating pixels?
/john
I visited several web sites except the audio feature for audio songs existing at this web page is really wonderful.
Fantastic beat ! I would like to apprentice at the same time as you amend your website, how could i subscribe for
a blog site? The account aided me a applicable deal. I have been tiny bit familiar of this your broadcast provided vibrant clear idea