For various reasons, I’m not going to try to give a comprehensive overview of the Netezza story. But I’d like to highlight four points that illustrate a lot of the difference between Netezza’s architecture and that of more conventional data warehousing DBMS.
- It’s all about sequential access. Netezza data is stored in “extents” 3 megabytes in size. DATallegro does something similar.
- There is very little indexing in Netezza systems. Indeed, they say 98-99% of processing is via hash joins. Much the same is true of DATallegro.
- Netezza’s idea of “materialized views” is much more limited than the state of the art. Netezza has something it calls a “materialized view,” but that’s only a restriction/projection of a single table. No pre-joins, no aggregates. They’re confident they can outperform conventional systems without those aids, and they want to keep their database structures SIMPLE.
- Netezza’s substitute for range partitioning is very simple. Netezza features “zone maps,” which note the minimum and maximum of each column value (if such concepts are meaningful) in each extent. This can amount to effective range partitioning over dates; if data is added over time, there’s a good chance that the data in any particular date range is clustered, and a zone map lets you pick out which data falls in the desired data range. But that seems to be the primary scenario in which zone maps confer a large benefit.