Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
Netezza has another big October quarter
Netezza reported a big October quarter, ahead of expectations. And official guidance for next quarter is essentially flat quarter-over-quarter, suggesting Q3 was indeed surprisingly big. However, Netezza’s year-over-year growth for Q3 was a little under 50%, suggesting the quarter wasn’t so remarkable after all. (Netezza has a January fiscal year.)
Tentative conclusion: Netezza just tends to have big October quarters, perhaps by timing sales cycles to finish soon after the late September user conference. If Netezza’s user conference ever moves to later in the fall, expect Q3 to be weak that year.
Netezza reported 18 new customers, double last year’s figure. Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Greenplum, Kognitio, Netezza | 3 Comments |
Vertica update – HP appliance deal, customer information, and more
Vertica quietly announced an appliance bundling deal with HP and Red Hat today. That got me quickly onto the phone with Vertica’s Andy Ellicott, to discuss a few different subjects. Most interesting was the part about Vertica’s customer base, highlights of which included:
- Vertica’s claim to have “50” customers includes a bunch of unpaid licenses, many of them in academia.
- Vertica has about 15 paying customers.
- Based on conversations with mutual prospects, Vertica believes that’s more customers than DATAllegro has. (Of course, each DATAllegro sale is bigger than one of Vertica’s. Even so, I hope Vertica is wrong in its estimate, since DATAllegro told me its customer count was “double digit” quite a while ago.)
- Most Vertica customers manage over 1 terabyte of user data. A couple have bought licenses showing they intend to manage 20 terabytes or so.
- Vertica’s biggest customer/application category – existing customers and sales pipelines alike – is call detail records for telecommunications companies. (Other data warehouse specialists also have activity in the CDR area.). Major applications are billing assurance (getting the inter-carrier charges right) and marketing analysis. Call center uses are still in the future.
- Vertica’s other big market to date is investment research/tick history. Surely not coincidentally, this is a big area of focus for Mike Stonebraker, evidently at both companies for which he’s CTO. (The other, of course, is StreamBase.)
- Runners-up in market activity are clickstream analysis and general consumer analytics. These seem to be present in Vertica’s pipeline more than in the actual customer base.
Categories: Analytic technologies, Business Objects, Data warehouse appliances, Data warehousing, DATAllegro, HP and Neoview, RDF and graphs, Vertica Systems | 5 Comments |
Clarifying SAS-in-the-DBMS, and other SAS tidbits
I followed up with Keith Collins of SAS today about SAS-in-the-database, expanding on what I learned or thought I did when we talked last month. Here’s the scoop:
SAS users do a lot of data filtering, aka data preparation, in SAS. These have WHERE clauses, just like SQL. However, only some of them map to actual SQL WHERE clauses. SAS is now implementing many of the rest as UDFs (User-Defined Functions), one DBMS at a time, starting with Teradata. In addition, SAS users can write custom filters that get registered as UDFs. This capability will be released with SAS 9.2. (The timing on SAS 9.2 is in line with the comment thread to my prior post on SAS-in-the-DBMS.) Read more
Categories: Analytic technologies, Data warehousing, Intel, SAS Institute, Theory and architecture | Leave a Comment |
Netezza cites three warehouses over 50 terabytes
Netezza is finally making it clear that they run some largish warehouses. Their latest press release cites Catalina Marketing, Epsilon, and NYSE Euronext as having 50+ terabytes each. I checked with Netezza’s Marketing VP Ellen Rubin, and she confirmed that those are clean figures — user data, single warehouses, etc. Ellen further tells me that Netezza’s total count of warehouses that big is “significantly more” than the 3 named in the release.
Of course, this makes sense, given that Netezza’s largest box, the NPS 10800, runs 100 terabytes. And Catalina was named as having bought a 10800 in a press release back in December, 2006. Read more
ParAccel opens the kimono slightly
Please do not rely on the parts of this post that draw a distinction between in-memory and disk-based operation. See our February 18, 2008 post about ParAccel instead. It turns out that communication with ParAccel was yet worse than I had realized.
Officially launched today at the TDWI conference, ParAccel is out to compete with Netezza. Right out of the chute, ParAccel may have surpassed Netezza in at least one area: pointlessly annoying secrecy. (In other regards I love them dearly, but that paranoia can be a real pain.) As best I can remember, here are some things about ParAccel that I both am allowed to say and find interesting:
- ParAccel offers a columnar, MPP data warehouse DBMS, called the ParAccel Analytic Database.
- ParAccel’s product runs in two main modes. “Maverick” is normal, stand-alone mode. “Amigo” mode amounts to a plug-compatible accelerator for Oracle or Microsoft SQL*Server. Early sales and marketing were concentrated on SQL*Server Amigo mode.
- ParAccel’s product also runs in another pair of modes – in-memory and disk-based. Early sales and marketing were concentrated on in-memory mode. Hybrid memory-centric processing sounds like something for a future release.
- Sun has a reseller partnership with ParAccel, focused on in-memory mode.
- Sun and ParAccel published record-shattering 100 gigabyte, 300 gigabyte, and 1 terabyte TPC-H benchmarks today, based on in-memory mode. (If you’d like to throw 13 terabytes of disk at 1 terabyte of user data, running simple and repetitive queries, that benchmark might be a useful guide to your own experience. But hey – that’s a big improvement on the prior champion, who used 40 terabytes of disk. To ParAccel’s credit, they’re not pretending that this is a bigger deal than it is.)
Infobright responds
An InfoBright employee posted something quite reasonable-looking in response to my inaugaral post about BrightHouse. Even so, InfoBright asked if they could substitute something with a slightly different tone. I agreed. Here’s what they sent in.
Curt, thanks for the write-up and the opportunity to talk about our customer success stories. As you say, our customer story is definitely “more than zero.” We are addressing a number of critical customer issues with our unique approach to data warehousing.
Infobright currently has 5 customers – customers that have bucked the trend of throwing hardware at the problem. To be perfectly braggadocio about this, we have never lost a competitive proof of concept in which we’ve been engaged. This is accomplished with the horsepower of one box (though for redundancy customers may deploy multiple boxes with a load balancer). Read more
Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Infobright | Leave a Comment |
DATAllegro discloses a few numbers
Privately held DATAllegro just announced a few tidbits about financial results and suchlike for the fiscal year ended June, 2007. I sent over a few clarifying questions yesterday. Responses included:
- Yes, the company experienced 330% year-over-year annual revenue growth.
- The majority of DATAllegro customers have bought systems in the 25-100 terabyte range.
- One system over 250 terabytes has been in production for months (surely the one I previously wrote about); a second is being installed.
- DATAllegro has “about 100” employees. By way of comparison, Netezza reported 225 full-time employees for the year ended January, 2007 – which probably means as of January 31, 2007.
All told, it sounds as if DATAllegro is more than 1/3 the size of Netezza, although given its higher system size and price points I’d guess it has well under 1/3 as many customers.
Here’s a link. I’ll likely edit that to something more permament-seeming later, and generally spruce this up when I’m not so rushed.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, DATAllegro | 8 Comments |
Vertica — just star and snowflake schemas?
One of the longest-running technotheological disputes I know of is the one pitting flat/normalized data warehouse architectures vs. cubes, stars, and snowflake schemas. Teradata, for example, is a flagwaver for the former camp; Microstrategy is firmly in the latter. (However, that doesn’t keep lots of retailers from running Microstrategy on Teradata boxes.) Attensity (a good Teradata partner) is in the former camp; text mining rival Clarabridge (sort of a Microstrategy spinoff) is in the latter. And so on.
Vertica is clearly in the star/snowflake camp as well. I asked them about this, and Vertica’s CTO Mike Stonebraker emailed a response. I’m reproducing it below, with light edits; the emphasis is also mine. Key points include:
- Almost everybody (that Vertica sees) wants stars and snowflakes, so that’s what Vertica optimizes for.
- Replicating small dimension tables across nodes is great for performance.
- Even so, Vertica is broadening its support for more general schemas as well.
Great question. This is something that we’ve thought a lot about and have done significant research on with large enterprise customers. … short answer is as follows:
Vertica supports star and snowflake schemas because that is the desired data structure for data warehousing. The overwhelming majority of the schemas we see are of this form, and we have highly optimized for this case. Read more
Categories: Analytic technologies, Columnar database management, Data models and architecture, Data warehousing, Theory and architecture, Vertica Systems | 6 Comments |
Vertica update
Vertica has been quietly selling product for three quarters and has about 50 customers.
Andy Ellicott of Vertica pointed me to the above Richard Hackathorn quote. Sadly, he asked me not to name and shame another analyst who foolishly said Vertica hadn’t “launched” yet.
But then, I understand. I’m also not going to identify the client who gave me fits by insisting on believing that nonsense, even in the face of the well-known facts that Vertica has shipping product, paying customers, and so on.
Infobright BrightHouse — columnar, VERY compressed, simple, and related to MySQL
To a first approximation, Infobright – maker of BrightHouse — is yet another data warehouse DBMS specialist with a columnar architecture, boasting great compression and running on commodity hardware, emphasizing easy set-up, simple administration, great price-performance, and hence generally low TCO. BrightHouse isn’t actually MPP yet, but Infobright confidently promises a generally available MPP version by the end of 2008. The company says that experience shows >10:1 compression of user data is realistic – i.e., an expansion ratio that’s fractional, and indeed better than 1/10:1. Accordingly, despite the lack of shared-nothing parallelism, Infobright claims a sweet spot of 1-10 terabyte warehouses, and makes occasional references to figures up to 30 terabytes or so of user data.
BrightHouse is essentially a MySQL storage engine, and hence gets a lot of connectivity and BI tool support features from MySQL for “free.” Beyond that, Infobright’s core technical idea is to chop columns of data into 64K chunks, called data packs, and then store concise information about what’s in the packs. The more basic information is stored in data pack nodes,* one per data pack. If you’re familiar with Netezza zone maps, data pack nodes sound like zone maps on steroids. They store maximum values, minimum values, and (where meaningful) aggregates, and also encode information as to which intervals between the min and max values do or don’t contain actual data values. Read more