Logless, lockless Netezza more carefully explained
I talked at length with Bill Blake and Doug Johnson of Netezza today. (Bill is exactly the guy I complained of previously having had my access cut off to.) One takeaway was a clarification of their approach to transactions, which sounds even cooler than I first thought. It’s actually not a new idea; they just timestamp rows with CreateIDs and DeleteIDs, then exploit those to the hilt. Actually, it seems like this approach would be interesting in OTLP as well, although I’m not aware of it being used in any of the more successful OLTP DBMS systems. (Yes, this is an open invitation to fans of less-established DBMS products to tell me of their virtues, preferably in a flame-free manner.)
Here, as best I currently understand it, is how Netezza handles several of the functions one would expect from a transactional DBMS.
Concurrency, serializability, read repeatability, and locking. A query only returns rows that had been committed before the query began. A “visibility list” of updates in progress while the query is underway is maintained so that any contending results can be filtered out (I presume this is a performance feature). There’s no locking code whatsoever.
I forgot to ask what happens when two transactions try to update the same row, but I presume it’s a standard two-phase commit, and that there’s no need to get particularly good performance in such cases because the situation almost never arises.
Rollback, rollforward, logging. There’s no log. Rollback happens by slapping DeleteIDs on any row that has too recent of a CreateID.
Point-in-time snapshotting. We didn’t talk about the use of time stamps for point-in-time snapshotting (e.g., for compliance), but the possibilities are obvious.
All this sounds like great stuff. So where’s the catch? Well, you obviously have to put two columns into each table for the timestamp IDs. But while those columns have to be stored on disk, they rarely will be a burden on the microprocessors or data movement subsystems, since the FPGA will filter them out as soon as the get off of disk. And as a pure storage burden, the timestamps are much less of a deal than logs would be.
As for any direct downside to not keeping logs – besides their use in rollback/rollforward and the like, logs are good for replication and as an added copy of the data for recovery purposes. I don’t see why either factor would be a big deal for most Netezza customers.
Comments
2 Responses to “Logless, lockless Netezza more carefully explained”
Leave a Reply










Curt,
If I understand this right, there’s no real way to have multiple ‘transactions’ at the same time, which would seem to be a significant limitation if multiple users want to do updates etc. at the same time (which is remarkably common, even in DW systems).
Also, does this mean that an updated row effectively changes position on the disk? If so, how does this affect Netezza’s zone maps? Zone maps only work well in systems that load data in strict date order (thereby providing a kind of date partitioning). If the row order is changed later, the performance benefits of zone maps will degrade over time.
Stuart
DATAllegro
[...] Timestamps are used for inserts and deletes; otherwise, there are no data changes. (Without that kind of approach, the update strategy in Point #2 couldn’t be viable.) A big benefit to these timestamps is that you can assure integrity via “snapshot isolation”; i.e., by a virtual rollback to a recent point in time. Thus, Vertica can get away without any kind of locks or, for that matter, transaction/redo logs. Row-oriented Netezza uses a similar logless, lockless approach. [...]