Data warehousing

Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:

March 4, 2008

Odd article on Sybase IQ and columnar systems

Intelligent Enterprise has an article on Sybase IQ and columnar systems that leaves me shaking my head. E.g., it ends by saying Netezza has a columnar architecture (uh, no). It also quotes an IBM exec as saying only 10-20% of what matters in a data warehouse DBMS is performance (already an odd claim), and then has him saying columnar only provides a 10% performance gain (let’s be generous and hope that’s a misquote).

Also from the article — and this part seems more credible — is:

“Sybase IQ revenues were up 70% last year,” said Richard Pledereder, VP of engineering. … Sybase now claims 1,200 Sybase IQ customers. It runs large data warehouses powered by big, multiprocessor servers. Priced at $45,000 per CPU, those IQ customers now account for a significant share of Sybase’s revenues, although the company won’t break down revenues by market segment.

Categories: Analytic technologies, Columnar database management, Data warehousing, Pricing, Specific users, Sybase

5 Comments

February 26, 2008

Introduction to Exasol

I had a non-technical introduction today to Exasol, a data warehouse specialist that has gotten a little buzz recently for publishing TPC-H results even faster than ParAccel’s. Here are some highlights:

Exasol was founded back in 2000.
Exasol is a German company, with 60 employees. While I didn’t ask, the vast majority are surely German.
Exasol has two customers. 6-8 more are Coming Real Soon. Most or all of those are in Germany, although one may be in Asia.
Karstadt (big German retailer) has had Exasol deployed for 3 years. The other deployed customer is the German subsidiary of data provider IMS Health.
[Redacted for confidentiality] is a strategic investor in and partner of Exasol. [Redacted for confidentiality]’s only competing partnership is with Oracle.
Exasol’s system is more completely written from scratch than many. E.g., all they use from Linux are some drivers, and maybe a microkernel.
Exasol runs in-memory. There doesn’t seem to be a disk-centric mode.
Exasol’s data access methods are sort of like columnar, but not exactly. I look forward to a more technical discussion to sort that out.
Exasol’s claimed typical compression is 5-7X. As in the Vertica story, database operations are carried out on compressed data.
Exasol says it has performed a very fast TPC-H inhouse at the 30 terabyte level. However, its deployed sites are probably a lot smaller than that. IMS Health is cited in its literature as 145 gigabytes.
Oracle and Microsoft are listed as Exasol partners, so there may be some kind of plug-compatibility or back-end processing story.

Categories: Analytic technologies, Data warehousing, Exasol, Specific users

The biggest eBay database

There’s been some confusion over my post about eBay’s multiple petabytes of data. So to clarify, let me say:

eBay’s figure of >1.4 petabytes of data — for its largest single analytic database — counts disks or something, not raw user data.
I previously published a strong conjecture that the database vendor in question was Teradata, which is definitely an eBay supplier. In particular, it is definitely not an Oracle data warehouse.
While eBay isn’t saying who it is either — not even off-the-record — the 50%ish compression figures they experience just happen to map well to Teradata’s usual range.
Edit: Just to be clear — not that there was any doubt, but I have reconfirmed that eBay is a Teradata user, in or including eBay’s Paypal division.

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, eBay, Specific users, Teradata

1 Comment

February 19, 2008

Mike Stonebraker may be oversimplifying data warehousing just a tad

Mike Stonebraker has now responded to the second post in my five-part database diversity series. Takeaways and rejoinders include: Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, Database diversity, Michael Stonebraker, Theory and architecture, Vertica Systems

2 Comments

February 19, 2008

Kalido — CASE for complex data warehouses

Kalido briefed me last week, under pre-TDWI embargo. To a first approximation, their story is confusingly buzzword-laden, as is evident from their product names. The Kalido suite is called the Kalido Information Engine, and it comprises:

Kalido Business Information Modeler (the newest part)
Kalido Dynamic Information Warehouse
Kalido Universal Information Director
Kalido Master Data Management

But those mouthfuls aside, Kalido has some pretty interesting things to say about data warehouse schema complexity and change.

Categories: Data integration and middleware, Data models and architecture, Data warehousing, EAI, EII, ETL, ELT, ETLT, Kalido, Theory and architecture

1 Comment

February 18, 2008

ParAccel technical highlights

I recently caught up with ParAccel’s CTO Barry Zane and Marketing VP Kim Stanick for a long technical discussion, which they have graciously continued by email. It would be impolitic in the extreme to comment on what led up to that. Let’s just note that many things I’ve previously written about ParAccel are now inoperative, and go straight to the highlights.

Categories: Columnar database management, Data warehousing, Emulation, transparency, portability, Microsoft and SQL*Server, ParAccel

5 Comments

February 15, 2008

Database management system choices – relational data warehouse

This is the third of a five-part series on database management system choices. For the first post in the series, please click here.

High-end OLTP relational database management system vendors try to offer one-stop shopping for almost all data management needs. But as I noted in my prior post, their product category is facing two major competitive threats. One comes from specialty data warehouse database management system products. I’ve covered those extensively in this blog, with key takeaways including:

Specialty data warehouse products offer huge cost advantages versus less targeted DBMS. This applies to purchase/maintenance and administrative costs alike. And it’s true even when the general-purposed DBMS boast data warehousing features such as star indexes, bitmap indexes, or sophisticated optimizers.
The larger the database, the bigger the difference. It’s almost inconceivable to use Oracle for a 100+ terabyte data warehouse. But if you only have 5 terabytes, Oracle is a perfectly viable – albeit annoying and costly – alternative.
Most specialty data warehouse products have a shared-nothing architecture. Smaller parts are cheaper per unit of capacity. Hence shared nothing/grid architectures are inherently cheaper, at least in theory. In data warehousing, that theoretical possibility has long been made practical.
Specialty data warehouse products with row-based architectures are commonly sold in appliance formats. In particular, this is true of Teradata, Netezza, DATAllegro, and Greenplum. One reason is that they’re optimized to stream data off of disk fairly sequentially, as opposed to relying on random seeks.
Specialty data warehouse products with columnar architectures are commonly available in software-only formats. Even so, Vertica and ParAccel also boast appliance deals, with HP and Sun respectively.
There is tremendous technical diversity and differentiation in the specialty data warehouse system market.

Let me expand on that last point. Different features may or may not be important to you, depending on whether your precise application needs include: Read more

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Database diversity, Theory and architecture

20 Comments

February 8, 2008

Load speeds and related issues in columnar DBMS

Please do not rely on the parts of the post below that are about ParAccel. See our February 18 post about ParAccel instead.

I’ve already posted about a chat I had with Mike Stonebraker regarding Vertica yesterday. I naturally raised the subject of load speed, unaware that Mike’s colleague Stan Zlodnik had posted at length about load speed the day before. Given that post, it seems timely to go into a bit more detail, and in particular to address three questions:

Can columnar DBMS do operational BI?
Can columnar DBMS do ELT (Extract-Load-Transform, as opposed to ETL)?
Are columnar DBMS’ load speeds a problem other than in issues #1 and #2?

Categories: Analytic technologies, Business intelligence, Columnar database management, Data warehousing, EAI, EII, ETL, ELT, ETLT, Michael Stonebraker, ParAccel, Sybase, Theory and architecture, Vertica Systems

Vertica update

I chatted with Andy Ellicott and Mike Stonebraker of Vertica today. Some of the content is embargoed until February 19 (for TDWI), but here are some highlights of the rest.

Vertica now is “approaching” 50 paid customers, up from 15 or so in early November. (Compared to most of Vertica’s fellow data warehouse specialists, that’s a lot.) Many — perhaps most — of these customers are hedge funds or telcos.
Vertica’s typical lag from sale to deployment is about one quarter.
Vertica’s typical initial selling price is $250K. Or maybe it’s $100-150K. The Vertica guys are generally pretty forthcoming, but pricing is an exception. Whatever they charge, it’s strictly per terabyte of user data. They think they are competitive with other software vendors, and cheaper, all-in, than appliance vendors.
One subject on which they’re totally non-forthcoming (lawyers’ orders) is the recent patent lawsuit filed by Sybase. They wouldn’t even say whether they thought it was bogus because they didn’t infringe, or whether they thought it was bogus because the patent shouldn’t have been granted.
Average Vertica database size is a little under 10 terabytes of user data, with many examples in the 15-20 Tb range. Lots of customers plan to expand to 50-100 Tb.
Vertica claims sustainable load speeds of 3-5 megabytes/sec/node, irrespective of database size. Data is sucked into RAM uncompressed, then written out a gig/node at a time, compressed. Gigabyte chunks are then merged on disk, which is superfast as it doesn’t involve sorting. (30 megabytes/second.) Mike insists this doesn’t compromise compression.

We also addressed the subject of Vertica’s schema assumptions, but I’ll leave that to another post.

Categories: Analytic technologies, Data warehousing, Database compression, Investment research and trading, Michael Stonebraker, Sybase, Theory and architecture, Vertica Systems

6 Comments

January 26, 2008

Kognitio WX2 overview

I had a call today with Kognitio execs Paul Groom and John Thompson. Hopefully I can now clear up some confusion that was created in this comment thread. (Most of what I wrote about Kognitio in October, 2006 still applies.) Here are some highlights. Read more

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Kognitio

12 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in