January 24, 2011

Choices in analytic computing system design

When I posted a long list of architectural options for analytic DBMS, I left a couple of IOUs in for missing parts. One was in the area of what is sometimes called advanced-analytics functionality, which roughly speaking means aspects of analytic database management systems that are not directly related to conventional* SQL queries.

*Main examples of “conventional” = filtering, simple aggregrations.

The point of such functionality is generally twofold. First, it helps you execute analytic algorithms with high performance, due to reducing data movement and/or executing the analytics in parallel. Second, it helps you create and execute sophisticated analytic processes with (relatively) little effort.

For now, I’m going to refer to an analytic RDBMS that has been extended by advanced-analytics functionality as an analytic computing system, rather than as some kind of “platform,” although I suspect the latter term is more likely to wind up winning.  So far, there have been five major categories of subsystem or add-on module that contribute to making an analytic DBMS a more fully-fledged analytic computing system:

The most structural or architectural are the UDF framework and the non-UDF analytic execution engine.  But even those are in essence add-on modules, which means that pretty much any vendor can do any part of them if they invest enough resources in the effort. So I expect considerable convergence over time as the industry and market discover which capabilities are or aren’t particularly useful.

When I’m being told about an analytic DBMS that supposedly has evolved into an analytic computing system, some of my top-of-mind questions are:

Please note what I’m not including in this discussion — the integration of DBMS and fairly ordinary business intelligence. That may have virtues, for reasons of price or performance, and the virtues may grow as in-memory BI and/or data management capabilities evolve. But for the foreseeable future, BI/DBMS integration is a fairly separate matter from the integration of analytic DBMS with sophisticated investigative analytics.

Comments

8 Responses to “Choices in analytic computing system design”

  1. Tweets that mention Analytic computing systems, aka analytic platforms | DBMS 2 : DataBase Management System Services -- Topsy.com on January 24th, 2011 2:12 am

    [...] This post was mentioned on Twitter by Curt Monash and Oracle FAQ, Design4people (rss). Design4people (rss) said: Choices in analytic computing system design: http://bit.ly/hkg0VO [...]

  2. Vlad Rodionov on January 24th, 2011 2:53 pm

    UDF is not enough. Users need UDAF and UDTF as well.

    UDF – user defined functions (scalar to scalar mapping)
    UDAF – user defined aggregate functions(rows to scalar mapping)
    UDTF – user defined table functions (row to rows mapping)

    Then, all of these functions must be easily incorporated (installed) into analytical DBMS, hence C++ is not the best option here. We need Java, Scala, Python etc support to make things more interactive (on the fly).

    MapReduce is nice to have but its programming model is too rigid and limited to support effectively many useful distributed algorithms (any iterative algorithm, PageRank, for example :)).

  3. Curt Monash on January 24th, 2011 4:47 pm

    Vlad,

    “What forms can the inputs and outputs of a UDF take?” was meant to address your UDAF/UDTF point, but thank you for spelling it out more clearly!

  4. John Cieslewicz on January 25th, 2011 3:05 pm

    Vlad,

    You make a great point that an analytic computing system needs to support a wide range of programming languages. Multiple programming language support is something that we considered from day one when we built SQL-MapReduce at Aster Data (http://www.asterdata.com/resources/mapreduce.php). SQL-MapReduce provides rich java support complete with an Eclipse development plugin for interactive development. SQL-MapReduce also supports C, C++, Python, SAS, R, and a whole host of languages that can run on Linux.

    We feel that these options make deep analytics more user friendly and open to a wider development community. It allows enterprises to take advantage of more of the skill sets they have in-house rather than forcing analytics development to one particular language. That’s why SQL-MapReduce functions (UDFs, UDAs,and UDTs!) are first class citizens in nCluster along with — and within — any SQL query.

    To your point about MapReduce being too limited for some computations like PageRank, we designed SQL-MapReduce to allow pipelining of arbitrary Map and Reduce functions, or as we call them, Row and Partition functions. This means that unlike traditional MapReduce, which runs one map step and then one reduce step, in nCluster you can pipeline multiple map and reduce steps in any order and also mix these steps with SQL.

  5. The Data Blog: Aster Data Blog » Blog Archive » 2011: The Year of the Analytics Platform – Part I on January 26th, 2011 1:34 pm

    [...] By late 2010, the term “analytic platform” started to take shape. The definition of it fit exactly with what Aster Data has built. And now, traditional DW appliances are claiming to be analytic platforms. Even Netezza is taking the same box they had before and calling it “An Appliance for Deep Business Analytics,” and pure columnar MPP DBMS’s like Vertica and ParAccel overnight went from being ‘the world’s fastest database’ to ALL claiming to be an analytic(s) platform.  This is a typical marketing trajectory if you now see where the future lies in big data management.  The market as a whole is gravitating to accept that if you truly want to manage big, diverse data, you ultimately want to analyze all of it, and for that you’re really in need of a big data analytic platform – not just a big data store. Recently, Curt Monash supports a similar notion when describing choices in analytic computing system design. [...]

  6. Sameer Al-Sakran on February 3rd, 2011 7:25 pm

    “MapReduce is nice to have but its programming model is too rigid and limited to support effectively many useful distributed algorithms (any iterative algorithm, PageRank, for example )”

    This is beyond silly. The first major application for MapReduce at google was PageRank.

  7. Now we know why Vertica has been so weirdly evasive | DBMS 2 : DataBase Management System Services on February 14th, 2011 11:34 am

    [...] has positioned itself as an analytic platform company despite not obviously having the technology to back that [...]

  8. Analytic platform | DBMS 2 : DataBase Management System Services on August 19th, 2012 3:24 am

    [...] original feature list for analytic platforms (January, 2011) Categories: Analytic glossary, Aster Data, Data warehouse appliances, Data [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.