It was obviously just a matter of time before there would be an Aster appliance from Teradata and some tuned bidirectional Teradata-Aster connectivity. These have now been announced. I didn’t notice anything particularly surprising in the details of either. About the biggest excitement is that Aster is traditionally a Red Hat shop, but for the purposes of appliance delivery has now embraced SUSE Linux.
Along with the announcements comes updated positioning such as:
- Better SQL than the MapReduce alternatives have.
- Better MapReduce than the SQL alternatives have.
- Easy(ier) way to do complex analytics on multi-structured data. (Aster has embraced that term.)
and of course
- Now also with Teradata’s beautifully engineered hardware and system management software!
As might also be expected, the announcements are accompanied by pictures along the lines of “There are your various data sources; there’s Teradata; there’s Aster; there’s Hadoop; look at all the nice arrows connecting them!”
Teradata Aster further decided it was time for a 5.0 DBMS release. Highlights include:
- Aster’s SQL-MapReduce has more flexible inputs. Specifically, if you view SQL/ MapReduce as steroid-enhanced table functions, those functions can now each have multiple tables as input. Aster is rightly positioning this as the key feature of the Aster 5.0 release.
- Workload management now explicitly manages not only CPU and I/O, but also RAM. That surely makes it safer to use algorithms which aggressively create temporary data structures. And the allocation is dynamic, in that it can be throttled back if workloads require.
- There’s more SQL functionality — I think this is minor, as Aster seems to have had pretty good SQL coverage already.
- Performance has been improved; i.e., Bottleneck Whack-A-Mole has progressed in multiple ways. One improvement Aster thinks is cutting-edge is a hybrid kind of join that tries to be a hash, then reverts to a merge if it has to spill out of memory. (E.g., if the available RAM is throttled back.)
Also, Aster is always expanding its library of prebuilt analytic functions/packages — often in connection with specific customer engagements — and took this opportunity to mention numerous recent or near-future additions to the list.
Part of Aster’s motivation in making multiple input tables available to its parallel analytic functions seems to be to allow the use of intermediate result sets alongside raw data. In some ways, this seems to be an alternative to the MPI-based approach favored by SAS, and highlights limitations of the vanilla MapReduce paradigm. The specific examples given were k-means clustering and — which I’d never heard of before — SAX pattern matching.
For an example of two true data tables being used as inputs, Aster offered a case of advertising attribution, with the data being about impressions and also conversions. Frankly, I suspect a “join them all and let MapReduce sort them out” strategy would also work for that application; if you join on something like Customer_ID, just how big would the result set really be? Even so, we can imagine other cases in which messy boundaries for graphs or time series makes that strategy unappealing, and — you read it here first! — Aster’s target use cases are focused on time series and graphs.
And finally: Whenever I ask the Aster folks “So, how big are Aster databases that are actually in production?”, they try to convince me that this is the wrong thing to ask. But — without actually answering the question — they did say:
- The new Teradata Aster appliance has been tested to a couple hundred terabytes.
- They are very confident about scaling Aster to a few hundred terabytes.
- They don’t have much in the way of proof in the 1 petabyte range.