Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it’s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.
For starters, there’s a lot of new analytic functionality. This isn’t Aster/Netezza-style ambitious. Rather, there’s a lot more SQL-99 functionality, plus some time series extensions of the sort that financial services firms – an important market for Vertica – need and love. Vertica did suggest a couple of these time series extensions are innovative, but I haven’t yet gotten detail about those.
Perhaps even more important, Vertica is cleaning up a lot of its previous SQL optimization and execution weirdnesses. In no particular order, I was told:
- Vertica’s delete performance is up “literally” 30-100X, at least in the case of “large” deletes. Performance for “large” updates has been enhanced as well.
- Vertica has finally cleaned up all vestiges of its prior bias to star schemas. For example, Vertica concedes that its product previously would sometimes force a star execution plan that wasn’t really appropriate.
- It is no longer the case that you need to define projections before you load a table into Vertica. This is now fully automatic.
- Vertica 4.0 automatically redesigns the database when new nodes are added to the system.
- When a database designer does hand-tune projections – and there’s no shame in this still being a possibility in Vertica 4.0 – that hand-tuning is now pulled back into the automatic generation/recommendation/whatever wizards for further projections. I.e., there’s a kind of DBA round-trip engineering going on.
- Vertica used to require that tables being joined be identically “segmented” (I think this means distributed across joins). That is no longer the case in 4.0.
- In connection with this new-found flexibility, Vertica now supports full outer joins directly, rather than requiring the left outer join/right outer join/UNION kluge.
- The Vertica 4.0 optimizer is smarter than its predecessor about things like predicate pushdown into subqueries, or exploiting commonality between predicates and partition keys.
- There’s a fundamental change that I don’t understand very well in the Vertica execution engine basic unit of work. It sounds as if in the past all the disk-based data containers the query needed got opened at once and read into memory, whether or not there was enough RAM and CPU cores to handle them, and this problem has now been fixed.
- Vertica always seemed to say that you could query immediately on new data, because even if it hadn’t hit disk yet – the ROS (Read-Optimized Store) – it was available in memory – the WOS (Write-Optimized Store). And queries were in essence federated between the ROS and WOS. But apparently it’s a new feature in Vertica 4.0 that you can read totally fresh data without locking. I confess to not understanding this very well either. (It has something to do with what Vertica calls “Epochs”.)
- Temporary tables can now be created in Vertica on a local/session basis without any DDL. Make temporary tables easier and more performant is important for a variety of reasons:
- Microstrategy, Company V* et al. use lots of temp tables. E.g,, Company V on Vertica has 3000 permanent tables and 5-7000 temporary ones.
- Vertica rightly points out that temporary tables are also important for ELT (Extract/Load/Transform).
- Vertica further says that single-node OEMs such as security appliance vendors use lots of temp tables.
*Company V = one of the more prominent vertical-market application providers.
In other Vertica highlights:
- It sounds as if 4.0 is the first Vertica release with what I would regard as serious workload management.
- While Vertica has stored and retrieved Unicode since Vertica 3.5 or so, 4.0 will be the first Vertica release in which Unicode is sorted and collated properly.
- Stored-procedure-like functionality is still a future for Vertica.