Last October I wrote about the Teradata 13 release of Teradata’s database management software. Teradata 13, which will be used across the various Teradata product lines, has now been announced for GCA (General Customer Availability)*. So far as I can tell, there were two main points of emphasis for Teradata 13:
- Performance (of course, performance is a point of emphasis for almost any release of any analytic DBMS product), especially but not only in the areas of aggregates, ETL (Extract/Transform/Load), and UDFs.
- UDFs (User Defined Functions), especially but not only in the areas of data mining and geospatial analysis.
To put it even more concisely, the focus of Teradata 13 is on advanced analytic performance, although there of course are some enhancements in simple query performance and in analytic functionality as well.
*Teradata development chief Scott Gnau said a couple of customers have already received Teradata 13, although this was recent enough that presumably nobody has it in production. But let’s not take all that too literally, since — for example — I heard nothing about the length or breadth of the beta cycle.
As just one example, when I asked Scott what was different between Teradata 13 as it is shipping now vs. Teradata 13 as it was foreshadowed back in October, he cited:
- Improved performance
- Additional “content,” including:
- Faster loading (sounds like an aspect of performance to me)
- In-database data mining initiatives (these fit in both the “UDF” and “performance” buckets).
But the parts of Teradata 13 that Scott already discussed back in October, 2008 largely boil down to performance and/or UDFs as well.
Scott also foreshadowed an area of emphasis for future Teradata releases — temporal data analysis. Teradata 13 offers a new PERIOD datatype, which Scott thinks is a “sleeper” on its own for the value customers will find in it. And Scott made it clear that Teradata plans much more functionality for temporal data analysis in the future.
As I understand it, PERIOD works like this: Suppose you have a table that maintains, say, address or employment status. When you update it, you naturally create a Start_Date and End_Date for the validity of certain information. Teradata’s PERIOD datatype automagically uses this to maintain a Period where information was true, even when that period is wholly in the past. Thus when you update a row with new information, you wind up with two rows — the newly changed row, and also a second row with the old information and an effectiveness period for same.
Note: I have no further detail about Teradata’s PERIOD datatype at this time. Even what I said includes enough guesswork that there are probably at least small errors in it.
The Teradata 13 UDF, in-database data mining, and SAS integration stories seem to go something like this:
- Teradata offered UDF support in C before Teradata 13. With Teradata 13 it supports Java UDFs as well.
- Any Teradata UDF is automatically parallel, running across all nodes, etc.
- Teradata 13 cleans up a variety of UDF issues, including:
- Allowing the use of UDFs in certain aggregates that didn’t support them before.
- Recursion, whatever that means in this context. (Perhaps the prior point is a hint.)
- Extended memory management/making more memory available to UDFs.
- Teradata’s work to enhance SAS integration has been focused on its general UDF framework. The memory management extensions seem to have particularly important to running SAS. (Note: That link refers to putting SAS on a “single node” in a Teradata grid. Scott gave me the impression that no such thing was possible. So I’m a bit confused. I’m also not sure it matters much.)
- Teradata expects this same general UDF framework to support integration with a variety of analytic technologies. But the only examples actually discussed were SAS and geospatial.
- Actually, we didn’t really discuss geospatial much either, so I’ll just refer you back to my October, 2008 post (already linked above) about Teradata’s geospatial datatype.
Besides UDFs, the other performance focus in Teradata 13 seems to be aggregations and OLAP. One Teradata 13 performance boost lies in aggressive query rewriting. Business intelligence tools, written to support multiple analytic DBMS (including non-current versions), can produce very messy SQL queries. Teradata 13 takes an optimizing compiler mindset to those, and in some cases can get significant speedup as a results. I get the impression there was work on other OLAP and aggregation speed-ups as well.
Also, Teradata 13 added a feature for load performance that Scott cites as being useful in the cases of heavy ETL (actually, it sounded more like ELT — Extract/Load/Transform) and OLAP aggregate-building. Namely, for the first time Teradata lets you turn off hash distribution. Teradata still wants you to hash-distribute whatever you’re going to persist to disk. But if you’re just creating a temporary table that will be dropped as soon as the load process completes, you’re now allowed to skip the hash distribution step. Scott says this can lead to >30% improvements in load performance.