In-memory, parallel, not-in-database SAS HPA does make sense after all
I talked with SAS about its new approach to parallel modeling. The two key points are:
- SAS no longer plans to go as far with in-database modeling as it previously intended.
- Rather, SAS plans to run in RAM on MPP DBMS appliances, exploiting MPI (Message Passing Interface).
The whole thing is called SAS HPA (High-Performance Analytics), in an obvious reference to HPC (High-Performance Computing). It will run initially on RAM-heavy appliances from Teradata and EMC Greenplum.
A lot of what’s going on here is that SAS found it annoyingly difficult to parallelize modeling within the framework of a massively parallel DBMS such as Teradata. Notes on that aspect include:
- SAS wasn’t exploiting the capabilities of individual DBMS to their fullest; rather, it was looking for an approach that would work across multiple brands of DBMS. Thus, for example, the fact that Aster’s analytic platform architecture is more flexible or powerful than Teradata’s didn’t help much with making SAS run within the Aster nCluster database.
- Notwithstanding everything else, SAS did make a certain set of modeling procedures run in-database.
- SAS’ previous plans to run in-database modeling in Aster and/or Netezza DBMS may never come to fruition.
Notes on short-request scale-out MySQL
A press person recently asked about:
… start-ups that are building technologies to enable MySQL and other SQL databases to get over some of the problems they have in scaling past a certain size. … I’d like to get a sense as to whether or not the problems are as severe and wide spread as these companies are telling me? If so, why wouldn’t a customer just move to a new database?
While that sounds as if he was asking about scale-out relational DBMS in general, MySQL or otherwise, short-request or analytic, it turned out that he was asking just about short-request scale-out MySQL. My thoughts and comments on that narrower subject include(d) but are not limited to: Read more
Endeca topics
I visited my then-clients at Endeca in January. We focused on underpinnings (and strategic counsel) more than on coolness in what the product actually does. But going over my notes I think there’s enough to write up now.
Before saying much else about Endeca, there’s one confusion to dispose of: What’s the relationship between Endeca’s efforts in e-commerce (helping shoppers navigate websites) and business intelligence (helping people navigate their own data)? As Endeca tells it:
- Endeca’s e-commerce and business intelligence efforts are reflections of the same technical approach. Indeed, I’m pretty sure Endeca’s product lines still share much/most of the same technology.
- Endeca went after e-commerce first because that’s where the provable ROI was. As I pointed out a couple of times in 2007, Endeca became a market leader in that area.
- Endeca increased its BI efforts later.
- Circa 2009-10, Endeca differentiated its e-commerce and BI product lines from each other.
- An e-commerce line extension called Page Builder is what really got Endeca through the recent recession.
- The BI product line Latitude was launched in the fall of 2010.
Endeca’s positioning in the business intelligence market boils down to “investigative analytics for people who aren’t hardcore analysts.” Endeca’s technological support for that stresses: Read more
Categories: Business intelligence, Columnar database management, Database compression, Endeca | 11 Comments |
Netezza TwinFin i-Class overview
I have long complained about difficulties in discussing Netezza’s TwinFin i-Class analytic platform. But I’m ready now, and in the grand sweep of the product’s history I’m not even all that late. The Netezza i-Class timing story goes something like this:
- Netezza i-Class was first foreshadowed in February, 2010.
- Netezza i-Class customer testing started in October, 2010 or so. Netezza i-Class evidently has been shipped to 4-5 partners and a single-digit number of end-user organizations, spread across some usual-suspect industries (financial services, telecom, and so on).
- Netezza i-Class 1.0 general availability is still in the (near) future.
My advice to Netezza as to how it should describe TwinFin i-Class boils down to: Read more
Categories: Cloudera, Data warehouse appliances, Data warehousing, GIS and geospatial, Hadoop, IBM and DB2, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics | 5 Comments |
Unpacking the EMC Greenplum Q1 sales disaster rumors
A well-connected tipster believes:
- EMC Greenplum’s* revenue target for Q1 had been $35 million.
- Actual EMC Greenplum revenue for Q1 was $3 million, or maybe it was $8 million.
- EMC Greenplum had 75 sales teams trying to generate this revenue.
In the past I might have called Greenplum for clarification, but they’re not knocking themselves out to inform me these days, nor to inspire me with confidence in what they say. Read more
Categories: Data warehouse appliances, EMC, Greenplum | 3 Comments |
Attensity update
I talked with Michelle de Haaff and Ian Hersey of Attensity back in February. We covered a lot of ground, so let’s start with a very high-level view.
- Two years ago, Attensity merged with two other companies in somewhat related businesses, thus expanding 4X or so in size.
- Due to the merger, Attensity now has two core lines of business:
- Text analytics.
- Driving actions, such as call center or social media response, based on text analytics.
- The combined Attensity is part American, part German.
- Attensity’s German part compels it to do some public financial reporting. Attensity will do $50-60 million in 2011 revenue.
- Attensity crunches text in 17 languages. English is preeminent. #2 is — you guessed it! — German.
- A big part of Attensity’s business (or at least of its value proposition) is analyzing the text in social media. Attensity boasts coverage of 75 million social media sources, such as blogs, forums, or review sites.
The four most interesting technical points were probably:
- Attensity has changed how it does exhaustive extraction. I’m having some trouble writing that part up, so for now I’ll just refer you to Attensity’s own description of the new way of doing things.
- Attensity has development work underway meant to address some of the problems in text analytics/other analytics integration. I don’t feel I got enough detail to want to talk about that yet.
- Attensity runs its own data centers, with approximately 60 Hadoop/HBase nodes and 30 nodes of Apache Solr (open source text search). More on that below.
- Attensity now OEMs Vertica. More on that below too.
Some more specific notes include: Read more
Categories: Analytic technologies, Cloud computing, Hadoop, HBase, Predictive modeling and advanced analytics, Software as a Service (SaaS), Sybase, Vertica Systems | 7 Comments |
What Starcounter may be up to
Starcounter seems to be offering an in-memory object-based/object-oriented/whatever short-request DBMS that also talks SQL. I haven’t been briefed at this point, and hence don’t have detail beyond what’s on their rather breathless web site. I’m guessing this isn’t an H-Store/VoltDB architecture, but rather something more like what Workday runs.
Most of the crunch I found on the Starcounter website (emphasis mine) is:
Let’s say that it is possible to make a database that is 10,000 times faster than what you use today. It would then be possible for your computer language objects to live inside the database from the very beginning. From the first { Customer a = new Customer(); }. The objects could live in the database, not as a copy, but as both database object and a Java or C# object at the same time. The database would transparently be your heap. The time it would take to save your object to the database would be reduced to nothing.
If such a database existed, you could say goodbye to caches and the duality of business objects, the database objects/rows and the complexity that follows. The speed would be amazing. Goodbye to time consuming scale-out solutions. Actually, you would be able to say good bye to the databases as you know them. You only need your simple objects.
Such a technology would be the ultimate NoSQL database. But what if the ultimate NoSQL database had SQL support, ACID, checkpoints and recovery and other enterprise features? Your pure, clean objects would then become the fastest and most powerful database in the world.
Beside that, other clues to what Starcounter is doing include references to Hibernate and to the declining cost of RAM.
Categories: In-memory DBMS, Memory-centric data management, Starcounter | 7 Comments |
Use cases for low-latency analytics
At various times I’ve noted the varying latency requirements of different analytic use cases, which can be as different as the speed of a turtle is from the speed of light. In particular, back when I wrote more about CEP (Complex Event Processing), I listed some applications for super-low-latency and not-so-low-latency CEP alike. Even better were some longish lists of “active data warehousing” use cases I got from Teradata in August, 2009, generally focused on interactive customer response (e.g. personalization, churn prevention, upsell, antifraud) or in some cases logistics.
In the slide deck for the Teradata 6680/solid-state drive announcement, however, Teradata went in a slightly different direction. In its list of “hot data use case examples”, Teradata suggested: Read more
Categories: Data warehousing, Teradata | 2 Comments |
Teradata integrates in solid-state storage
For once, I think Teradata’s annual hardware refresh is pretty interesting, because of the integration of flash storage into its high-end “active enterprise data warehouse” product line. The essence of the announcement is:
- Teradata is rolling out a new appliance,* the 6680, which combines hard-disk and solid-state drives, relying on Teradata Virtual Storage.
- Teradata is also rolling out a hard-disk-based appliance,* the 6650, in a more routine annual refresh.
Categories: Data warehouse appliances, Pricing, Solid-state memory, Teradata | 3 Comments |
Revolution Analytics update
I wasn’t too impressed when I spoke with Revolution Analytics at the time of its relaunch last year. But a conversation Thursday evening was much clearer. And I even learned some cool stuff about general predictive modeling trends (see the bottom of this post).
Revolution Analytics business and business model highlights include:
- Revolution Analytics is an open-core vendor built around the R language. That is, Revolution Analytics offers proprietary code and support, with subscription pricing, that help in the use of open source software.
- Unlike most open-core vendors I can think of, Revolution Analytics takes little responsibility for the actual open source part. Some “grants” for developing certain open source R pieces seem to be the main exception. While this has caused some hard feelings, I don’t have an accurate sense for their scope or severity.
- Revolution Analytics also sells a single-user/workstation version of its product, freely admitting that this is mainly a lead generation strategy or, in my lingo, a “break-even leader.”
- Revolution Analytics boasts around 100 customers, split about 70-30 between the workstation seeding stuff and the real server product.
- Revolution Analytics has “about” 37 employees. Headquarters are at 101 University Avenue (do I have to say in what city? 🙂 ). There are also a development office in Seattle and a sales office in New York.
- Revolution Analytics’ pricing is by size of server. “Small” servers — i.e. up to 12 cores — start at $25K/year.
- Unsurprisingly, adoption is more alongside SAS et al. than rip-and-replace.