January 19th, 2008 Curt Monash
Rich Skrenta is quite a successful entrepreneur, so it’s likely that he doesn’t really mean the more ridiculous parts of this rant on the MapReduce debate. E.g., he cheerfully disregards the fact that the data warehouse appliance vendors have ALREADY disrupted the market he’s focusing on. Index-light row-based and columnar systems are both super fast at data mining extracts.
But let’s go straight to the one interesting thing he said, Read the rest of this entry »
Posted in Analytics and analytic technologies, Google, BigTable, and MapReduce, SAS Institute | 1 Comment »
January 14th, 2008 Curt Monash
I’m getting a flood of press releases today, because many of the companies I write about were selected to Intelligent Enterprise’s list of 12 most influential vendors plus 36 more to watch in the areas Intelligent Enterprise covers (which seems to be pretty much the analytics-related parts of what I write about here and on Text Technologies). It looks like a pretty reasonable list, although I think they forced the issue in some of the small analytics vendors they selected, and of course anybody can quibble with some of the omissions.
Among the companies they cited, you can find topical categories here for IBM (and Cognos), Informatica, Microsoft, Netezza, Oracle, SAP/Business Objects (both), SAS, and Teradata; QlikTech; Cast Iron, Coral8, DATAllegro, HP, ParAccel, and StreamBase; and Software AG. On Text Technologies you’ll find categories for some of the same vendors, plus Attensity, Clarabridge, and Google. There also are categories for some of these vendors on the Monash Report.
Posted in Business Objects, Cast Iron Systems, Coral8, DATAllegro, HP and Neoview, IBM and DB2, Informatica, Microsoft and SQL*Server, Netezza, Oracle, ParAccel, QlikTech and QlikView, SAP, BI Accelerator, and MaxDB, SAS Institute, Software AG and ADABAS, StreamBase, Teradata | No Comments »
December 14th, 2007 Curt Monash
There are at least 16 different vendors offering appliances and/or software that do database management primarily for analytic purposes.* That’s a lot to keep up with,. So I’ve thrown together a little overview of the analytic data management landscape, liberally salted with links to information about specific vendors, products, or technical issues. In some ways, this is a companion piece to my prior post about data warehouse appliance myths and realities.
*And that’s just the tabular/alphanumeric guys. Add in text search and you run the total a lot higher.
Numerous data warehouse specialists offer traditional row-based relational DBMS architectures, but optimize them for analytic workloads. These include Teradata, Netezza, DATAllegro, Greenplum, Dataupia, and SAS. All of those except SAS are wholly or primarily vendors of MPP/shared-nothing data warehouse appliances. EDIT: See the comment thread for a correction re Kognitio.
Numerous data warehouse specialists offer column-based relational DBMS architectures. These include Sybase (with the Sybase IQ product, originally from Expressway), Vertica, ParAccel, Infobright, Kognitio (formerly White Cross), and Sand. Read the rest of this entry »
Posted in Analytics and analytic technologies, Cognos and Applix TM1, DATAllegro, Data warehouse appliances, Data warehousing, Dataupia, Greenplum, IBM and DB2, Kognitio and WX2, Netezza, Oracle, ParAccel, Relational database management systems, SAS Institute, Sybase, Teradata, Vertica Systems | 10 Comments »
November 7th, 2007 Curt Monash
I followed up with Keith Collins of SAS today about SAS-in-the-database, expanding on what I learned or thought I did when we talked last month. Here’s the scoop:
SAS users do a lot of data filtering, aka data preparation, in SAS. These have WHERE clauses, just like SQL. However, only some of them map to actual SQL WHERE clauses. SAS is now implementing many of the rest as UDFs (User-Defined Functions), one DBMS at a time, starting with Teradata. In addition, SAS users can write custom filters that get registered as UDFs. This capability will be released with SAS 9.2. (The timing on SAS 9.2 is in line with the comment thread to my prior post on SAS-in-the-DBMS.) Read the rest of this entry »
Posted in Analytics and analytic technologies, Data warehousing, Database theory and practice, Intel, Relational database management systems, SAS Institute | No Comments »
October 10th, 2007 Curt Monash
After a hurried discussion with SAS CTO Keith Collins and a followup with Teradata CTO Todd Walter, I think I’ve figured out the essence of the SAS port to Teradata. (Subtle nuances, however, have to await further research.) Here’s what I think is going on:
1. SAS is porting or creating two different products or modules, with two different names (and I don’t know exactly what those names are). The two different things they are porting amount to modeling (i.e., analysis) and scoring (i.e., using the results of the model for automated decision-making).
2. Both products are slated for delivery at or near the time of SAS 9.2, which is slated for GA at or near the middle of next year. (Maybe somebody from SAS could send me the official word, as well as product names and so on?)
3. The essence of the modeling port is a library of static UDFs (User Defined Functions).
4. The essence of the SAS scoring port is the ability to easily generate a single “dynamic” UDF to score according to a particular model. This would seem to leverage Teradata scoring-related enhancements much more than it would compete or conflict with them.
5. There are two different kinds of benefits SAS gets from integrating with an MPP (Massively Parallel Processing) DBMS. One is actual parallel processing of operations, shortening absolute calculation time dramatically, and also leveraging Moore’s Law without painful SMP (Symmetric MultiProcessing) overhead. The other is a radical reduction in data movement costs for the handoff between the database and the SAS software. Interestingly, SAS reports huge performance gains even from putting its software on a single node inside the Teradata grid. That is, changing how data movement is done is already a huge win, even when there’s no reduction in the overall amount moved. But of course, in the complete implementation, where database and SAS processing are done on the same nodes, there’s also a huge reduction in actual data movement effort required.
One obvious question would be: How hard would it be for SAS to replicate this work on other MPP DBMS? Well, at its core this work involves implementing a variety of elementary arithmetic and data manipulation functions. So a first-best guess is that a fairly efficient port would be easy (given that this one has already been performed), but that the last 20% or whatever of the performance optimizations require a lot more work. As to whether or not this is more than a theoretical question — well, both SAS and SPSS are disclosed members of the Netezza Developers Network. As for SMP DBMS — well, some of the work certainly could be replicated, but other important parts don’t even make sense on Oracle or Microsoft the way they do on Teradata, Netezza, DATAllegro, et al. Read the rest of this entry »
Posted in Analytics and analytic technologies, Data warehouse appliances, Data warehousing, Relational database management systems, SAS Institute, Teradata | 4 Comments »
October 8th, 2007 Curt Monash
One of the big announcements at the Teradata user conference this week (confusingly named “Partners”) is SAS integration. Now, SAS is integrating with other MPP data warehouse appliance vendors as well, but it’s likely that the Teradata integration is indeed the most advanced. For example, one customer proofpoint offered was an insurer who used this capability to reevaluate its risk profile at high speed after Hurricane Katrina. I doubt any of the other SAS/DBMS integrations I know of were in customer hands a year ago.
Three still-open questions I hope to address over the next couple of days are: Read the rest of this entry »
Posted in Analytics and analytic technologies, Data warehouse appliances, Data warehousing, Relational database management systems, SAS Institute, Teradata | No Comments »
September 27th, 2007 Curt Monash
Netezza has officially announced the Netezza Developer Network. Associated with that is a set of technical capabilities, which basically boil down to programming user-defined functions or other capabilities straight onto the Netezza nodes (aka SPUs). And this is specifically onto the FPGAs, not the PowerPC processors. In C. Technically, I think what this boils down to is:
- Extending Netezza’s SQL via user-defined functions (which probably wasn’t too hard, especially since the Netezza engine is related to PostgreSQL).
- Providing a C-to-Verilog compiler.
- Providing an application development environment and associated tools. (Presumably rather primitive, but I haven’t really checked it out.)
The applications mentioned in the NDN press release, and I quote directly, are:
- Multi-dimensional geospatial analytics on comprehensive data sets for risk management
- Predictive model scoring for customer segmentation, enabling real-time offer provisioning for customers
- Iterative modeling and analytics on billions of call detail records (CDRs) for telco price optimization
- Real-time Monte Carlo simulations on terabytes of detail-level data for risk management
- “Fingerprinting” with hashing algorithms for chain-of-custody document fingerprinting and to ensure that files transferred are intact
- Fuzzy text search analysis uses algorithms that provide a “best guess” of most likely results
Netezza says that the greatest interest has come from usual-suspect sophisticated users, specifically intelligence agencies and perhaps also financial services firms. But naturally, the partners actually trotted out at Netezza’s user conference were mainly hopeful small-company ISVs. The biggest stir was made by not-so-small SAS, which evidently believes this new capability will provide massive improvements to SAS/Netezza combined performance.
In principle, there are four different ways this new programmability could be a big win: Read the rest of this entry »
Posted in Data warehouse appliances, Data warehousing, Native XML, Netezza, PostgreSQL, Relational database management systems, SAS Institute, Specialized data management in general | 8 Comments »
February 23rd, 2007 Curt Monash
Business Intelligence Lowdown has a well-dugg post listing what it claims are the 10 largest databases in the world. The accuracy leaves much to be desired, as is illustrated by the fact that #10 on the list is only 20 terabytes, while entirely unmentioned is eBay’s 2-petabyte database (mentioned here, and also here). Read the rest of this entry »
Posted in DATAllegro, Data warehouse appliances, Data warehousing, Database theory and practice, Greenplum, IBM and DB2, Netezza, Oracle, SAS Institute, Teradata | 3 Comments »
October 4th, 2006 Curt Monash
SAS has its own data store, called SAS Intelligence Storage. It’s a relational system running on SMP boxes, whose unique feature is that it has fixed-length records and hence is a perfect array, for speedy lookup. This is highly analogous to classical MOLAP systems. However, SAS reports that customers store up to several hundred terabytes of data in SAS Intelligence Storage, which is definitely not very analogous to what goes on in the MOLAP world.
It sounds as if the product is optimized for data mining and generic OLAP alike. Indeed, SAS Intelligence Storage is used to power both SAS’s data mining and other advanced analytics, and also its more conventional BI suite.
Posted in Data warehousing, MOLAP, Relational database management systems, SAS Institute | 1 Comment »