Data warehouse appliance hardware strategies
Recently, I’ve done extensive research into the hardware strategies of computing appliance vendors, across multiple functional areas. Data warehousing, firewall/unified threat management, antispam, data integration – you name it, I talked to them. Of course, each vendor has a unique twist. But some architectural groupings definitely emerged.
The most common approaches seem to be:
Type 1: Custom assembly from off-the-shelf parts. In this model, the only unusual (but still off-the-shelf) parts are usually in the area of network acceleration (or occasionally encryption). Also, the box may be balanced differently than standard systems, in terms of compute power and/or reliability.
Type 2 (Virtual): We don’t need no stinkin’ custom hardware. In this model, the only “appliancy” features are in the areas of easy deployment, custom operating systems, and/or preconfigured hardware.
And of course there are also appliances of Type 0: Custom hardware including proprietary ASICs or FPGAs.
Different markets had different emphases; e.g., firewall appliances are typically Type 1, while antispam devices cluster in Type 2. But the data warehouse appliance market is highly diverse, which maybe shouldn’t be a surprise. After all, the revenue market leader is non-appliance software vendor Oracle, while noisy upstart Netezza is famous for its FPGA. Read more
| Categories: Data warehouse appliances, Data warehousing, DATAllegro, Greenplum, IBM and DB2, Kognitio, Netezza, Teradata | 8 Comments |
And then there were two: DATAllegro seems to be going with standard hardware
A while ago – for example, in a comment dated July 9, 2006 — CEO Stuart Frost of DATAllegro hinted that the company might port its software to commodity hardware before long. If this user story is to be believed, that has now happened. (Specific quote: “the Datallegro system is based on Dell and EMC hardware …”) Officially, the company is doing a Sgt. Schultz on the subject. But the evidence is pretty clear. Read more
| Categories: Data warehouse appliances, Data warehousing, DATAllegro | 3 Comments |
Can MySQL scale?
Making the rounds of cyberspace is a report by MediaTemple, a hosting company, on how it believes it will solve its difficulties with grid-based MySQL hosting.
Takeaways include:
- MySQL has real issues with handling diverse, high-volume workloads.
- When MySQL gets overloaded, database corruption is routine.
- Some people write really, really bad MySQL web applications.
With the possible exception of #2, I doubt any of this surprises anybody.
| Categories: MySQL, Open source | 6 Comments |
Arguments AGAINST data warehouse appliances
Data warehouse appliance opponents like to argue that history is conclusively on their side. Database machine maker Britton-Lee, eventually bought by Teradata, fizzled. LISP machines were a spectacular failure. Rational Software’s origins as a special-purpose Ada machine maker had to be renounced before the company could succeed.
But the true story is more mixed. Teradata continues to this day as a major data warehouse technology player, and as far as I’m concerned Teradata indeed makes appliances. If we look further than the applications stack, we find that appliances actually occupy a large and growing share of the computing market. So a persuasive anti-appliance argument has to do more than just invoke the names of Britton-Lee and Symbolics.
I just ran across an article by MIT professor Samuel Madden that attempts to make such a case. And his MIT colleague Mike Stonebraker made similar arguments to me a few days ago. They are not wholly unbiased; indeed, both are involved in Vertica Systems. With that caveat, they have an interesting three-part argument:
Who’s who in columnar relational database management systems
The best known columnar RDBMS is surely Sybase’s IQ Accelerator, evolved from a product acquired in the mid-1990s. Problem – it doesn’t have a shared-nothing architecture of the sort needed to exploit grid/blade technology. Whoops. The other recognized player is SAND, but I don’t know a lot about them. Based on their website, it would seem that grids and compression play a big part in their story. Less established but pretty interesting is Kognitio, who are just beginning to make marketing noise outside the UK. SAP’s BI Accelerator is also a compressed columnar system, but operates entirely in-memory and hence is limited in possible database size. Mike Stonebraker’s startup Vertica is of course the new kid on the block, and there are other columnar startups as well whose names currently escape me.
| Categories: Data warehousing, Investment research and trading, Kognitio, SAP AG, TransRelational | 3 Comments |
Are row-oriented RDBMS obsolete?
If Mike Stonebraker is to be believed, the era of columnar data stores is upon us.
Whether or not you buy completely into Mike’s claims, there certainly are cool ideas in his latest columnar offering, from startup Vertica Systems. The Vertica corporate site offers little detail, but Mike tells me that the product’s architecture closely resembles that of C-Store, which is described in this November, 2005 paper.
The core ideas behind Vertica’s product are as follows. Read more
Mike Stonebraker Blasts “One Size Fits All”
When it comes to DBMS inventors, Mike Stonebraker is the next closest thing to Codd. And he’s become a huge non-believer in the idea that one DBMS architecture meets all needs.
Frankly, there isn’t much in that paper that hasn’t already been said in this blog, except for the part that is specifically relevant to one of his startups, StreamBase. Still, it’s nice to have the high-powered agreement.
More recently, the argument in that paper has been extended with a benchmark-filled follow-up based on another Stonebraker startup, Vertica.
| Categories: Columnar database management, Database compression, StreamBase, Theory and architecture, Vertica Systems | Leave a Comment |
(Crosspost) New ways to read our research!
We’ve finally redesigned the Monash Information Services website. In particular, we’ve created two great new ways to read our research. First, there’s a new, Google-based integrated search engine. (And it really works well, the one glitch being that it brings back feeds and pages interchangeably. Try it out!) Also – and I really encourage you all to subscribe to this — there’s a new integrated research feed.
The reason you should care about these is in both cases the same: Our research is actually spread across multiple sites and feeds. I write about Google both in the Monash Report and on Text Technologies. I write about enterprise text management both on Text Technologies and on DBMS2. I write about computing appliances both on DBMS2 and in the Monash Report. I write about data mining in all three places. And now that there’s an integrated, industry history relevant to any of the other subject areas may find its way onto Software Memories. Your view of my views simply isn’t complete unless you have access to all of those sites.
| Categories: About this blog | Leave a Comment |
Data integration appliance vendor Cast Iron Systems
I’ve been doing a lot of research lately into computing appliances – not just data warehouse appliances, but security, anti-spam and other appliance types as well. Today I added Cast Iron Systems to the list.
Essentially, they offer data integration without the common add-ons. I.e., there’s little or nothing in the way of data cleansing, composite apps, business process management, and/or business activity monitoring. Data just gets imported, extracted, and/or synchronized, whether between pairs of transactional systems, or between a transactional system and a reporting database. A particularly hot area of application for them seems to be SaaS/on-demand app integration (Salesforce.com, Netsuite, etc.) In particular, they boast both Lawson and Salesforce.com as internal users, and at least at Lawson they are used for a Salesforce/Lawson integration.
The big advantage to this strategy is that their integrator is simple enough for appliance deployment. Read more
| Categories: Cast Iron Systems, EAI, EII, ETL, ELT, ETLT | 5 Comments |
Bulletin on Cogito
My Bulletin on Cogito — i.e., a short-short white paper — is now available for download. Thankfully, it turned out to be pretty consistent with what I previously wrote on the company and its technology. 😉 The conclusion to the paper bears quoting here:
In deciding between conventional DBMS and specialty graph-oriented tools such as Cogito’s, there’s one key criterion: Path length. If path lengths are short and predictable, there’s a good chance that relational DBMS and their forthcoming extensions can do the job. In complex graphs with longer paths, however, relational approaches may not scale well. In such cases, specialty technologies warrant serious consideration.
| Categories: Cogito and 7 Degrees, RDF and graphs | Leave a Comment |
