Analysis of data mining powerhouse SAS, and the especially the relationship between SAS’s data mining products and various database management systems. Related subjects include:
I’ve already suggested that several apparent issues in predictive analytic agility can be dismissed by straightforwardly applying best-of-breed technology, for example in analytic data management. At first blush, the same could be said about the actual analysis, which comprises:
- Data preparation, which is tedious unless you do a good job of automating it.
- Running the actual algorithms.
Numerous statistical software vendors (or open source projects) help you with the second part; some make strong claims in the first area as well (e.g., my clients at KXEN). Even so, large enterprises typically have statistical silos, commonly featuring expensive annual SAS licenses and seemingly slow-moving SAS programmers.
As I see it, the predictive analytics workflow goes something like this Read more
|Categories: Investment research and trading, Predictive modeling and advanced analytics, SAS Institute, Telecommunications, Web analytics||22 Comments|
When I talked with SAS about its forthcoming in-memory parallel SAS HPA offering, we talked briefly about application areas. The three SAS cited were:
- Consumer financial services. The idea here is to combine information about customers’ use of all kinds of services — banking, credit cards, loans, etc. SAS believes this is both for marketing and risk analysis purposes.
- Insurance. We didn’t go into detail.
- Mobile communications. SAS’ customers aren’t giving it details, but they’re excited about geocoding/geospatial data.
Meanwhile, in another interview I heard about, SAS emphasized retailers. Indeed, that’s what spawned my recent post about logistic regression.
The mobile communications one is a bit scary. Your cell phone — and hence your cellular company — know where you are, pretty much from moment to moment. Even without advanced analytic technology applied to it, that’s a pretty direct privacy threat. Throw in some analytics, and your cell company might know, for example, who you hang out with (in person), where you shop, and how those things predict your future behavior. And so the government — or just your employer — might know those things too.
|Categories: Application areas, Predictive modeling and advanced analytics, SAS Institute, Surveillance and privacy, Telecommunications||2 Comments|
I talked with SAS about its new approach to parallel modeling. The two key points are:
- SAS no longer plans to go as far with in-database modeling as it previously intended.
- Rather, SAS plans to run in RAM on MPP DBMS appliances, exploiting MPI (Message Passing Interface).
The whole thing is called SAS HPA (High-Performance Analytics), in an obvious reference to HPC (High-Performance Computing). It will run initially on RAM-heavy appliances from Teradata and EMC Greenplum.
A lot of what’s going on here is that SAS found it annoyingly difficult to parallelize modeling within the framework of a massively parallel DBMS such as Teradata. Notes on that aspect include:
- SAS wasn’t exploiting the capabilities of individual DBMS to their fullest; rather, it was looking for an approach that would work across multiple brands of DBMS. Thus, for example, the fact that Aster’s analytic platform architecture is more flexible or powerful than Teradata’s didn’t help much with making SAS run within the Aster nCluster database.
- Notwithstanding everything else, SAS did make a certain set of modeling procedures run in-database.
- SAS’ previous plans to run in-database modeling in Aster and/or Netezza DBMS may never come to fruition.
|Categories: Aster Data, Data warehouse appliances, Data warehousing, EMC, Greenplum, Memory-centric data management, Netezza, Parallelization, Predictive modeling and advanced analytics, SAS Institute, Teradata, Workload management||7 Comments|
I wasn’t too impressed when I spoke with Revolution Analytics at the time of its relaunch last year. But a conversation Thursday evening was much clearer. And I even learned some cool stuff about general predictive modeling trends (see the bottom of this post).
Revolution Analytics business and business model highlights include:
- Revolution Analytics is an open-core vendor built around the R language. That is, Revolution Analytics offers proprietary code and support, with subscription pricing, that help in the use of open source software.
- Unlike most open-core vendors I can think of, Revolution Analytics takes little responsibility for the actual open source part. Some “grants” for developing certain open source R pieces seem to be the main exception. While this has caused some hard feelings, I don’t have an accurate sense for their scope or severity.
- Revolution Analytics also sells a single-user/workstation version of its product, freely admitting that this is mainly a lead generation strategy or, in my lingo, a “break-even leader.”
- Revolution Analytics boasts around 100 customers, split about 70-30 between the workstation seeding stuff and the real server product.
- Revolution Analytics has “about” 37 employees. Headquarters are at 101 University Avenue (do I have to say in what city? ). There are also a development office in Seattle and a sales office in New York.
- Revolution Analytics’ pricing is by size of server. “Small” servers — i.e. up to 12 cores — start at $25K/year.
- Unsurprisingly, adoption is more alongside SAS et al. than rip-and-replace.
|Categories: Health care, Investment research and trading, Open source, Parallelization, Predictive modeling and advanced analytics, Pricing, Revolution Analytics, SAS Institute||2 Comments|
A core point in SAS’ pitch for its new MPI (Message-Passing Interface) in-memory technology seems to be logistic regression is really important, and shared-nothing MPP doesn’t let you parallelize it. The Mahout/Hadoop folks also seem to despair of parallelizing logistic regression.
On the other hand, Aster Data said it had parallelized logistic regression a year ago. (Slides 6-7 from a mid-2010 Aster deck may be clearer.) I’m guessing Fuzzy Logix might make a similar claim, although I’m not really sure.
|Categories: Aster Data, Hadoop, Parallelization, Predictive modeling and advanced analytics, SAS Institute||8 Comments|
I am annoyed with my former friends at Greenplum, who took umbrage at a brief sentence I wrote in October, namely “eBay has thrown out Greenplum“. Their reaction included:
- EMC Greenplum no longer uses my services.
- EMC Greenplum no longer briefs me.
- EMC Greenplum reneged on a commitment to fund an effort in the area of privacy.
The last one really hurt, because in trusting them, I put in quite a bit of effort, and discussed their promise with quite a few other people.
|Categories: Analytic technologies, Aster Data, Data integration and middleware, Data warehouse appliances, Data warehousing, EAI, EII, ETL, ELT, ETLT, EMC, Greenplum, SAS Institute, Solid-state memory||8 Comments|
A number of recent posts have had good comments. This time, I won’t call them out individually.
Evidently Mike Olson of Cloudera is still telling the machine-generated data story, exactly as he should be. The Information Arbitrage/IA Ventures folks said something similar, focusing specifically on “sensor data” …
… and, even better, went on to say: Read more
When vendors talk about the integration of advanced analytics into database technology, confusion tends to ensue. For example: Read more
|Categories: Aster Data, Greenplum, Netezza, Predictive modeling and advanced analytics, SAS Institute||7 Comments|
One thing I love about DBMS 2 is the really smart comments a number of readers — that would be you guys — make. However, not all the smart comments are made in the first 5 minutes a post is up, so some readers (unless you circle back) might miss great points other readers make. Well, here are some pointers to some of what you might have missed, along with other follow-up comments to old posts while I’m at it. Read more
As you’ve probably read, IBM and Netezza announced a deal today for IBM to buy Netezza. I didn’t sit in on the conference call, but I’ve seen the reporting. Naturally, I have some quick thoughts, which I’ve broken up into several sections below:
- Clearing some underbrush.
- Speculation about what IBM/Netezza will do.
- Speculation about alternative acquirers for Netezza.
- Speculation about what IBM/Netezza competitors will do.
|Categories: Analytic technologies, Cognos, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, IBM and DB2, Netezza, Oracle, SAS Institute, Solid-state memory, Vertica Systems||19 Comments|