April 21, 2011

In-memory, parallel, not-in-database SAS HPA does make sense after all

I talked with SAS about its new approach to parallel modeling. The two key points are:

The whole thing is called SAS HPA (High-Performance Analytics), in an obvious reference to HPC (High-Performance Computing). It will run initially on RAM-heavy appliances from Teradata and EMC Greenplum.

A lot of what’s going on here is that SAS found it annoyingly difficult to parallelize modeling within the framework of a massively parallel DBMS such as Teradata. Notes on that aspect include:

SAS’ problems developing in-database modeling stem from, in essence, the limitations of UDFs (User Defined Functions). So why weren’t, for example, Teradata’s 2009 enhancements to its UDF capabilities enough? The clearest example SAS gave me is that, while database tables are commonly limited to something on the order of 1000 columns (their figure as well as mine), SAS might need 50-100,000 columns. One reason seems to be interactions between variables; SAS used the word “multiplied” a few times, but even so was coy about whether this could simply be regarded as quadratic terms in a regression. Another reason seems to be that in some cases, every value in a column spawns a new column in an intermediate table/array; indeed, this seems to be going on in the previously discussed case of logistic regression.

SAS code will be launched by the DBMS/data warehouse appliances, so potentially it can run under their native workload management. Teradata presumably has enough workload management richness to exploit that; EMC Greenplum, as of my August 2010 notes, probably did not.

SAS was gracious enough to let me post its slide deck, in both shorter and longer versions. Due to a technical glitch during the call, I neither looked at the slides nor took notes. I think the biggest loss from those difficulties is that I didn’t learn what the futures at the end of the longer deck were all about.

Related links

Comments

7 Responses to “In-memory, parallel, not-in-database SAS HPA does make sense after all”

  1. Application areas for SAS HPA | DBMS 2 : DataBase Management System Services on April 21st, 2011 3:24 am

    [...] I talked with SAS about its forthcoming in-memory parallel SAS HPA offering, we talked briefly about application areas. The three SAS cited [...]

  2. unholyguy on April 21st, 2011 11:29 am

    I think it is more then just the column limit, that the data access and internode communication algorithms that work well for MPP SQL are not suited for statistical analysis. Stat is much less set based when you come down to it

    The key bits I got from the deck are

    – Multi-pass methods, Only first pass should hit disk, keep data memory resident
    – Even ostensibly simple problems might require more then one pass
    – Chatty Node to Node communication

    Similar problems to what the graph analysis people are trying to solve through Pregal and Hama, more of a BSP style compute

    (http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel)

  3. unholyguy on April 21st, 2011 11:30 am

    might want to take a look at this google tech talk, think it’s relevant

    http://www.youtube.com/watch?v=PBLgUBGWcz8

  4. Curt Monash on April 21st, 2011 3:07 pm

    Camuel Gilyadov has been propagandizing me about Pregel, and I haven’t been seeing the point. Thanks!

  5. High Performance Analytics « DECISION STATS on April 22nd, 2011 4:53 am
  6. Traditional databases will eventually wind up in RAM | DBMS 2 : DataBase Management System Services on May 23rd, 2011 11:06 am

    [...] SAS HPA makes the argument that even “big data analytics” should sometimes be done in RAM. [...]

  7. Hadoop YARN — beyond MapReduce | DBMS 2 : DataBase Management System Services on July 23rd, 2012 4:26 am

    [...] find that credible because of the Greenplum/SAS/MPI [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.