Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
ParAccel unveils its EMC-related appliance strategy
Embargoes are getting ever more stupid these days, wasting analysts’ and bloggers’ time in doomed attempts to micromanage the news flow. ParAccel is no exception to the rule. An announcement that’s actually been public knowledge for a couple of months was finally made official a few minutes ago. It’s an appliance, or at least an attempt to gain customers for an appliance. The core ideas include:
- ParAccel’s usual shared-nothing configuration is hooked up to SAN-based EMC storage at the back end.
- Around half of the total data is on internal (i.e., node-specific) disks, mirrored on the storage device. The rest of the data lives only on the EMC device. Logically, all this data is integrated. So hopefully you’ll be able to process more data per unit of time than you could on a standard ParAccel configuration.
- Also, different parts of the EMC device are dedicated to different ParAccel nodes. So, while this isn’t a shared-nothing architecture, at least it’s shared-not-very-much. (DATAllegro does something similar, although without the mirroring on direct-attached storage.)
- Backup, snapshotting, and so on are inherited from EMC. Administration will increasingly be integrated with EMC’s.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, EMC, ParAccel, Parallelization | 2 Comments |
The week of trivial press releases
TDWI has several conferences per year, including one this week. And so companies active in data warehousing feel they must put out several press releases a year timed for TDWI conferences, whether or not anything newsworthy has actually happened. So far, the only one I’ve gotten of any real substance was Vertica’s (and, in compliance with Murphy’s Law, even that was glitched in its release). Most are yawnworthy product partnerships, adding some useful but me-too feature to somebody’s product line. Worst of all was the announcement that a certain vendor had established an indirect sales channel — after over a decade in the marketplace, when all its other competitors already have one.
Usually I love my work, but there are exceptions …
Categories: Analytic technologies, Data warehousing | 1 Comment |
Vertica in the cloud
I may have gotten confused again as to an embargo date, but if so, then this time I had it late rather than early. Anyhow, the TDWI-timed news is that Vertica is now available in the Amazon cloud. Of course, the new Vertica cloud offering is:
- Super-easy to set up
- Pay-as-you-go.
Slightly less obviously:
- Vertica insists its software was designed for grid computing from the ground up, and hence doesn’t need Elastra’s administrative aids for starting, stopping, and/or provisioning instances.
- This is a natural fit for new or existing Vertica customers in data mart outsourcing.
Other coverage:
Related link
Categories: Analytic technologies, Cloud computing, Data warehousing, Software as a Service (SaaS), Vertica Systems | 1 Comment |
Vertica update
Another TDWI conference approaches. Not coincidentally, I had another Vertica briefing. Primary subjects included some embargoed stuff, plus (at my instigation) outsourced data marts. But I also had the opportunity to follow up on a couple of points from February’s briefing, namely:
Vertica has about 35 paying customers. That doesn’t sound like a lot more than they had a quarter ago, but first quarters can be slow.
Vertica’s list price is $150K/terabyte of user data. That sounds very high versus the competition. On the other hand, if you do the math versus what they told me a few months ago — average initial selling price $250K or less, multi-terabyte sites — it’s obvious that discounting is rampant, so I wouldn’t actually assume that Vertica is a high-priced alternative.
Vertica does stress several reasons for thinking its TCO is competitive. First, with all that compression and performance, they think their hardware costs are very modest. Second, with the self-tuning, they think their DBA costs are modest too. Finally, they charge only for deployed data; the software that stores copies of data for development and test is free.
Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Pricing, Vertica Systems | 10 Comments |
Outsourced data marts
Call me slow on the uptake if you like, but it’s finally dawned on me that outsourced data marts are a nontrivial segment of the analytics business. For example:
- I was just briefed by Vertica, and got the impression that data mart outsourcers may be Vertica’s #3 vertical market, after financial services and telecom. Certainly it seems like they are Vertica’s #3 market if you bundle together data mart outsourcers and more conventional OEMs.
- When Netezza started out, a bunch of its early customers were credit data-based analytics outsourcers like Acxiom.
- After nagging DATAllegro for a production reference, I finally got a good one — TEOCO. TEOCO specializes in figuring out whether inter-carrier telcom bills are correct. While there’s certainly a transactional invoice-processing aspect to this, the business seems to hinge mainly around doing calculations to figure out correct charges.
- I was talking with Pervasive about Pervasive Datarush, a beta product that lets you do super-fast analytics on data even if you never load it into a DBMS in the first place. I challenged them for use cases. One user turns out to be an insurance claims rule-checking outsourcer.
- One of Infobright’s references is a French CRM analytics outsourcer, 1024 Degres.
- 1010data has built up a client base of 50-60, including a number of financial and retail blue-chippers, with a soup-to-nuts BI/analysis/columnar database stack.
- I haven’t heard much about Verix in a while, but their niche was combining internal sales figures with external point-of-sale/prescription data to assess retail (especially pharma) microtrends.
To a first approximation, here’s what I think is going on. Read more
ParAccel pricing
I made a round of queries about data warehouse software or appliance pricing, and am posting the results as I get them. Earlier installments featured Teradata and Netezza. Now ParAccel is up.
ParAccel’s software license fees are actually very simple — $50K per server or $100K per terabyte, whichever is less. (If you’re wondering how the per-TB fee can ever be the smaller one, please recall that ParAccel offers a memory-centric approach to sub-TB databases.)
Details about how much data fits on a node are hard to come by, as is clarity about maintenance costs. Even so, pricing turns out to be one of the rare subjects on which ParAccel is more forthcoming than most competitors.
Categories: Analytic technologies, Data warehousing, ParAccel, Pricing | 3 Comments |
Yet another data warehouse database and appliance overview
For a recent project, it seemed best to recapitulate my thoughts on the overall data warehouse specialty DBMS and appliance marketplace. While what resulted is highly redundant with what I’ve posted in this blog before, I’m sharing anyway, in case somebody finds this integrated presentation more useful. The original is excerpted to remove confidential parts.
… This is a crowded market, with a lot of subsegments, and blurry, shifting borders among the subsegments.
… Everybody starts out selling consumer marketing and telecom call-detail-record apps. …
Oracle and similar products are optimized for updates above everything else. That is, short rows of data are banged into tables. The main indexing scheme is the “b-tree,” which is optimized for finding specific rows of data as needed, and also for being updated quickly in lockstep with updates to the data itself.
By way of contrast, an analytic DBMS is optimized for some or all of:
-
Small numbers of bulk updates, not large numbers of single-row updates.
-
Queries that may involve examining or returning lots of data, rather than finding single records on a pinpoint basis.
-
Doing arithmetic calculations – commonly simple arithmetic, sorts, etc. — on the data.
Database and/or DBMS design techniques that have been applied to analytic uses include: Read more
DATAllegro finally has a blog
It took a lot of patient nagging, but DATAllegro finally has a blog. Based on the first post, I predict:
- DATAllegro’s blog will live up to CEO Stuart Frost’s talent for clear, interesting writing.
- Like a number of other vendor blogs — e.g., Netezza’s — DATAllegro’s will have infrequent but usually long posts.
The crunchiest part of the first post is probably
Another very important aspect of performance is ensuring sequential reads under a complex workload. Traditional databases do not do a good job in this area – even though some of the management tools might tell you that they are! What we typically see is that the combination of RAID arrays and intervening storage infrastructure conspires to break even large reads by the database into very small reads against each disk. The end result is that most large DW installations have very large arrays of expensive, high-speed disks behind them – and still suffer from poor performance.
I’ve pounded the table about sequential reads multiple times — including in a (DATAllegro-sponsored) white paper — but the point about misleading management tools is new to me.
Now if I could just get a production DATAllegro reference, I’d be completely happy …
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, DATAllegro | 6 Comments |
Netezza pricing
In connection with the announcement of the Teradata 2500, I asked some Teradata competitors about pricing. Netezza’s response amounted to “We don’t disclose list pricing, but our cheapest system handles about 3 1/4 TB and sells for under $200K.” So Netezza’s actual pricing is well below the list price of the Teradata 2500.
Categories: Data warehouse appliances, Data warehousing, Netezza, Pricing, Teradata | 11 Comments |
Teradata introduces lower-cost appliances
After months of leaks, Teradata has unveiled its new lines of data warehouse appliances, raising the total number either from 1 to 3 (my view) or 0 to 2 (what you believe if you think Teradata wasn’t previously an appliance vendor). Most significant is the new Teradata 2500 series, meant to compete directly with the smaller data warehouse specialists. Highlights include:
- An oddly precise estimated capacity of “6.12 terabytes”/node (user data). This estimate is based on 30% compression, which is low by industry standards, and surely explains part of the price umbrella the Teradata 2500 is offering other vendors.
- $125K/TB of user data. Obviously, list pricing and actual pricing aren’t the same thing, and many vendors don’t even bother to disclose official price lists. But the Teradata 2500 seems more expensive than most smaller-vendor alternatives.
- Scalability up to 24 nodes (>140 TB).
- Full Teradata application-facing functionality. Some of Teradata’s rivals are still working on getting all of their certifications with tier-1 and tier-2 business intelligence tools. Teradata has a rich application ecosystem.
- What will be controversial performance, until customer-benchmark trends clearly emerge.