Since Linda and I are leaving on vacation in a few hours, Aster Data graciously gave me permission to morph its “12:01 am Monday, November 2” embargo into “late Friday night.”
Aster Data is officially announcing the 4.0 release of nCluster. There are two big pieces to this announcement:
- Aster is offering a slick vision for integrating big-database management and general analytic processing on the same MPP cluster, under the not-so-slick name “Data-Application Server.”
- Aster is also offering a sophisticated vision for workload management.
In addition, Aster has matured nCluster in various ways, for example cleaning up a performance problem with single-row updates.
Highlights of the Aster “Data-Application Server” story include:
- At its core, the Aster “Data-Application Server” is the Aster nCluster MPP analytic DBMS, enhanced with basic application server functionality (I didn’t ask for details of that part), running on the same nCluster worker nodes that answer SQL queries.
- Thus, Aster is eliminating a lot of the data movement that plagues three-tier architectures and other less-integrated approaches.
- The Aster “Data-Application Server” further offers integrated workload management for applications and queries; more on that below.
- The Aster “Data-Application Server” requires applications to be parallelized and invoked via Aster’s SQL/MapReduce.
- As befits a MapReduce-based system, the Aster “Data-Application Server” lets you write your applications in lots of different languages (the usual suspects, and it also does .NET).
- The Aster “Data-Application Server” runs applications in their own process spaces, protecting the DBMS server from crashes and other damaging behavior.
- The Aster “Data-Application Server” allows applications to manage memory themselves, persistently, and not just via relational constructs. Thus, if you want your application to maintain a graph, mini rules engine, and/or finite state machine, you can, without doing SQL contortions.
In a compelling proof point for the Aster Data-Application Server’s slickness, Aster has leapfrogged Teradata and Netezza in the extent to which SAS functionality is integrated into Aster’s DBMS. (Aster and SAS both say that you can do full SAS modeling in parallel on Aster, but even so I wouldn’t be surprised to discover there were some parts of SAS’ system that turned out to be exceptions.) Of course, Aster is hardly the only analytic DBMS vendor to have the idea of explicitly enhancing general analytic processing; that’s why we see lots of MapReduce announcements, and it’s also why Teradata enhanced its UDFs (User-Defined Functions) to have some kind of persistent memory.* But I don’t know of anybody else whose approach is quite so elegant and general at this time.
*Unfortunately, I don’t yet know much about Teradata’s UDF enhancements. I neglected to drill down on Global Persistent Memory when it was mentioned a couple of times at Teradata Partners last week, and Teradata was unable to accommodate my request this week for a rapid follow-up briefing on the subject.
Aster’s approach to workload management is similarly stylish. The idea is:
- Lots of variables are available to be taken into account (e.g., user role, expected query duration, actual duration of a running query, etc.)
- SQL statements can be written against any of these variables.
- The SQL statements serve as rules to set query/task priorities.
- There seem to be a few different ways to measure priority, including explicit allocation of CPU or I/O resources, as well as more conventional “This group of queries gets higher priority than that one” kinds of metrics.
- The whole thing provides integrated workload management for queries, applications, load jobs, data redistribution, and so on.
Right now the interface is – well, you’re manipulating a SQL table. A more conventional workload management GUI is slated for the second quarter of 2010.
Discussing subjects such as mirroring and ILM (Information Lifecycle Management) with Aster can be tricky, as Aster uses the word “partition” in confusing ways. Anyhow, Aster has a few different levels of compression, and the ability to apply different levels of compression to different partitions, to change compression levels via ALTER TABLE, and to alter (presumably increase) compression on the fly when doing online backup. Aster is also part of a growing trend to eschew RAID, instead doing mirroring in its own software. (Other examples of this strategy would be Vertica, Oracle Exadata/ASM, and Teradata Fallback.) Prior to nCluster 4.0, this caused a problem, in that the block sizes for mirroring were so large as to create a lag in transactional updating. But Aster says this problem is now solved, and indeed claims that nCluster 4.0 is superior to most rivals in transactional efficiency.
And finally, while I was talking w/ Aster Data anyway, I checked up on cloud and MapReduce customer penetration. The answers were:
- Aster has two serious production cloud users, both of which have been disclosed for a while, namely:
- ShareThis, which runs Aster nCluster on Amazon EC2
- Didit, which runs Aster nCluster on AppNexus
- Outside of those two, Aster sees some cloud use for test, development, prototyping, etc.
- Every single Aster customer uses SQL/MapReduce — i.e., they invoke MapReduce via Aster nCluster SQL queries.
- Some of those customers use MapReduce for ETL, some use it for actual analytics.