June 21, 2010

A partial overview of Netezza database software technology

Netezza is having its user conference Enzee Universe in Boston Monday–Wednesday, June 21-23, and naturally will be announcing new products there, and otherwise providing hooks and inducements to get itself written about. (The preliminary count is seven press releases in all.) To get a head start, I stopped by Netezza Thursday for meetings that included a 3 ½ hour session with 10 or so senior engineers, and have exchanged some clarifying emails since. 

It might be best to start with some Netezza product introduction and naming housekeeping:

Highlights of our NPS 6.0 conversation include:

The basic idea of clustered base tables (“base tables” are ones that are not, for example, materialized views) is to range partition in multiple dimensions at once. Then you rule out (as in don’t retrieve) all those blocks that fail a match in any one of the cluster dimensions. Netezza says its customers were doing a lot of work to simulate this benefit by multiple sorts; Netezza’s implementation will now handle that much more automatically. Netezza says that talking to customers revealed that 4-5 cluster dimensions was almost always the most somebody would need; they will ship support for 4. That makes sense. In most cases, you’d want to cluster on the answers to “W” questions – Who, What, Where, When (but probably not Why), in one dimension each. However, Netezza does call out as an ideal use case geospatial, precisely because 2 (or more rarely 3) dimensions each have “equal weight.”

I don’t know how other vendors implement clustered base tables, but in Netezza’s case it’s via a space-filling curve. (Actually, they called it a “Hilbert space-filling curve,” but I oppose that phrasing, as it’s apt to lead to extremely incorrect use of the term “Hilbert space.”) I.e., data is mapped to 4-tuples (say) in line with the dimensions, which are then sorted in a linear order in line with a space-filling curve. Happily, Netezza hasn’t experienced problems clustering columns that have particularly challenging cardinality or skew.

If I understood correctly, you can only zone map (and presumably cluster) on integers and dates right now, but that will change soon. (Edit: In blog comments and email, Tim Greenwood of Netezza explained to me that the NPS 6.0 workarounds to that were much more robust than I realized.)

Netezza put a lot of work for NPS 6 into something it calls “table grooming,” which amounts to recopying tables in more beneficial form. Uses for table grooming – which is a manually initiated process – include but probably aren’t limited to:

The core ideas of table grooming include:

This can be done part of a table at a time. Reads and loads are unaffected by the process, or at least not blocked. Delete commits are indeed blocked during a reorg, but Netezza guesses that the blocks hold for a few minutes during the grooming of a clustered base table, 10-15 seconds if space is being reclaimed, and something similar for an Alter Table.

And finally, here are some notes on Netezza’s query optimization and planning.

Related links

Comments

15 Responses to “A partial overview of Netezza database software technology”

  1. Netezza’s silicon balance | DBMS2 -- DataBase Management System Services on June 21st, 2010 8:00 am

    [...] I’ve mentioned in a couple of other posts, Netezza is stressing that the most recent wave of its technology is software-only, with no hardware upgrades made or needed. In other words, [...]

  2. The Netezza and IBM DB2 approaches to compression | DBMS2 -- DataBase Management System Services on June 21st, 2010 8:05 am

    [...] I spent 3 ½ hours talking with 10 of Netezza’s more senior engineers. Friday, I talked for 1 ½ hours with IBM Fellow and DB2 Chief Architect Tim Vincent, and we agreed [...]

  3. What kinds of data warehouse load latency are practical? | DBMS2 -- DataBase Management System Services on June 21st, 2010 8:15 am

    [...] took advantage of my recent conversations with Netezza and IBM to discuss what kinds of data warehouse load latency were practical. In both cases I got [...]

  4. Notes on a spate of Netezza-related blog posts | DBMS2 -- DataBase Management System Services on June 21st, 2010 8:16 am

    [...] A long discussion of Netezza’s technology, focusing on the database parts [...]

  5. Tim Greenwood on June 21st, 2010 10:42 am

    You wrote “If I understood correctly, you can only zone map (and presumably cluster) on integers and dates right now, but that will change soon.”
    Integers and dates are zone mapped by default, but you can zone map, and cluster on all data types except for numeric of size > 18.

  6. Curt Monash on June 21st, 2010 10:54 am

    @Tim,

    I thought that functionality didn’t make it out of QA for NPS 6.0. Did I misunderstand?

  7. Tim Greenwood on June 21st, 2010 11:24 am

    In NPS 6.0 you can cluster (organize on) all datatypes except for numeric of size > 18. These types are zone mapped by including them in the ORDER BY clause of CREATE MATERIALIZED VIEW

  8. Curt Monash on June 21st, 2010 12:52 pm

    @Tim,

    Thanks!

    Next question — why do zone maps have anything to do with materialized views?

    CAM

  9. Daniel Lemire on June 22nd, 2010 9:16 pm

    There are many “space filling curves” of which “Hilbert space filling curves” are a particular instance.

  10. Daniel Lemire on June 22nd, 2010 9:21 pm

    How do “clustered tables” differ from DB2′s multidimensional clustering?

  11. Curt Monash on June 23rd, 2010 2:03 am

    Daniel,

    I have just begun to look into DB2 in detail. I haven’t gotten to the clustered tables part.

  12. Lots of Aster Data analytic packages | DBMS2 -- DataBase Management System Services on June 27th, 2010 7:35 am

    [...] Netezza (user conference) [...]

  13. M-A-O-L » Netezza Updates on July 16th, 2010 3:14 am

    [...] A partial overview of Netezza database software technology [...]

  14. It can be hard to analyze analytics | DBMS 2 : DataBase Management System Services on October 10th, 2010 11:14 am

    [...] TwinFin i-Class was renamed/repackaged/repriced before it ever shipped. Even so, when Tim Young or Phil Francisco tries to recall exactly the [...]

  15. EMC/Greenplum notes | DBMS 2 : DataBase Management System Services on October 13th, 2010 12:30 am

    [...] of Netezza’s nzMatrix, Greenplum has built in some linear algebra capabilities as a building block for analytics. In [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.