Data warehousing

Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:

June 8, 2009

Greenplum blogs about some customers

I’ve written some about Greenplum’s customers at eBay and Fox Interactive Media. But as I recently grumped, I’m not in the mood right now to write much about other Greenplum customers. Fortunately, Greenplum has filled the gap itself. Marketing chief Paul Salazar just blogged about a number of other big Greenplum customers. And last month Paul blogged in considerable detail about what he characterizes as an enterprise data warehouse (EDW) conversion — Oracle replacement — at a large pharmaceutical company.

Categories: Application areas, Data warehousing, Greenplum, Oracle

The future of data marts

Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC). Key observations around the concept — mixing mine and Greenplum’s together — include:

Data marts aren’t just for performance (or price/performance). They also exist to give individual analysts or small teams control of their analytic destiny.
Thus, it would be really cool if business users could have their own analytic “sandboxes” — virtual or physical analytic databases that they can manipulate without breaking anything else.
In any case, business users want to analyze data when they want to analyze it. It is often unwise to ask business users to postpone analysis until after an enterprise data model can be extended to fully incorporate the new data they want to look at.
Whether or not you agree with that, it’s an empirical fact that enterprises have many legacy data marts (or even, especially due to M&A, multiple legacy data warehouses). Similarly, it’s an empirical fact that many business users have the clout to order up new data marts as well.
Consolidating data marts onto one common technological platform has important benefits.

In essence, Greenplum is pitching the story:

Thesis: Enterprise Data Warehouses (EDWs)
Antithesis: Data Warehouse Appliances
Synthesis: Greenplum’s Enterprise Data Cloud vision

When put that starkly, it’s overstated, not least because

Specialized Analytic DBMS != Data Warehouse Appliance

But basically it makes sense, for two main reasons:

Analysis is performed on all sorts of novel data, from sources far beyond an enterprise’s core transactions. This data neither has to fit nor particularly benefits from being tightly fitted into the core enterprise data model. Requiring it to do so is just an unnecessary and painful bureaucratic delay.
On the other hand, consolidation can be a good idea even when systems don’t particularly interoperate. Data marts, which commonly do in part interoperate with central data stores, have all the more reason to be consolidated onto a central technology platform/stack.

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, DATAllegro, EAI, EII, ETL, ELT, ETLT, eBay, Greenplum, Microsoft and SQL*Server, Parallelization, Specific users, Teradata

30 Comments

June 8, 2009

More on Fox Interactive Media’s use of Greenplum

Greenplum’s most important reference is probably its energetic advocate Fox Interactive Media, even ahead of much larger user Greenplum user eBay, and notwithstanding Aster Data’s large presence in Fox subsidiary MySpace. I just ran across a “review” of Greenplum by FIM’s Brian Dolan, neatly summarizing his views about Greenplum’s strengths, weaknesses, and uses inside Fox. Highlights include: Read more

Categories: Data warehousing, Fox and MySpace, Greenplum, Web analytics

2 Comments

June 7, 2009

Merv Adrian on SAND Technology

Merv Adrian blogged about SAND Technology, casting significant doubt on SAND’s business prospects. At this point, I can’t say I disagree. On the other hand, SAND does have public, audited financial statements showing it generating more revenue than a lot of other analytic DBMS or archiving vendors probably make. Columnar DBMS vendors doing better than SAND are Sybase, Vertica, maybe Infobright — and who else?

Categories: Archiving and information preservation, Columnar database management, Data warehousing, SAND Technology

1 Comment

June 7, 2009

Daniel Abadi on Kickfire and related subjects

Daniel Abadi has a new blog, whose first post centers around Kickfire. The money quote is (emphasis mine):

In order for me to get excited about Kickfire, I have to ignore Mike Stonebraker’s voice in my head telling me that DBMS hardware companies have been launched many times in the past are ALWAYS fail (the main reasoning is that Moore’s law allows for commodity hardware to catch up in performance, eventually making the proprietary hardware overpriced and irrelevant). But given that Moore’s law is transforming into increased parallelism rather than increased raw speed, maybe hardware DBMS companies can succeed now where they have failed in the past

Good point.

More generally, Abadi speculates about the market for MySQL-compatible data warehousing. My responses include:

OF COURSE there are many MySQL users who need to move to a serious analytic DBMS.
What’s less clear is whether there’s any big advantage to those users in remaining MySQL-compatible when they do move. I’m not sure what MySQL-specific syntax or optimizations they’d have that would be difficult to port to a non-MySQL system.
It’s nice to see Abadi speaking well of Infobright and its technology.
To say that Infobright went open source because it was “desperate” is overstated. That said, I don’t think Infobright was on track to prosper without going open source.
While open source and MySQL go together, an appliance like Kickfire loses many (not all) of the benefits of open source.
Calpont has indeed never disclosed a customer win. Any year now … (Just kidding, Vogel!)
In general, seeing Abadi be so favorable toward Vertica competitors adds credibiity to the recent Hadoop vs. DBMS paper.

Anyhow, as previously noted, I’m a big Daniel Abadi fan. I look forward to seeing what else he posts in his blog, and am optimistic he’ll live up to or exceed its stated goals.

Categories: Calpont, Columnar database management, Data warehouse appliances, Data warehousing, DBMS product categories, Infobright, Kickfire, MySQL, Open source, Theory and architecture

2 Comments

June 5, 2009

Greenplum update — Release 3.3 and so on

I visited Greenplum in early April, and talked with them again last night. As I noted in a separate post, there are a couple of subjects I won’t write about today. But that still leaves me free to cover a number of other points about Greenplum, including: Read more

Categories: Data warehousing, Database compression, EAI, EII, ETL, ELT, ETLT, Greenplum, MapReduce, Market share and customer counts, Parallelization, PostgreSQL, Pricing

11 Comments

June 5, 2009

Greenplum will be announcing some stuff

Greenplum is having a webinar Monday to announce “The Next Big Leap in Data Warehousing” (capitalization theirs). The idea they’ll be talking about is a genuinely good one. And off the top of my head I can only think of a few vendors who implemented it before Greenplum, and even fewer who emphasize it explicitly. So if you like webinars, you might want to listen in. I plan to blog about the general concept soon after the 12:01 am Monday embargo lifts. (Uh, guys, it is Monday rather than Tuesday, right?) Read more

Categories: Data warehousing, Greenplum, Specific users

1 Comment

May 21, 2009

How big are the intelligence agencies’ data warehouses?

Edit: The relevant part of the article cited has now been substantially changed, in line with Jeff Jonas’ remarks in the comment thread below.

Joe Harris linked me to an article that made a rather extraordinary claim:

At another federal agency Jonas worked at (he wouldn’t say which), they had a very large data warehouse in the basement. The size of the data warehouse was a secret, but Jonas estimated it at 4 exabytes (EB), and increasing at the rate of 5 TB per day.

Now, if one does the division, the quote claims it takes 800,000 days for the database to double in size, which is absurd. Perhaps this (Jeff) Jonas guy was just talking about a 4 petabyte system and got confused. (Of course, that would still be pretty big.) But before I got my arithmetic straight, I ran the 4 exabyte figure past a couple of folks, as a target for the size of the US government’s largest classified database. Best guess turns out to be that it’s 1-2 orders of magnitude too high for the government’s largest database, not 3. But that’s only a guess …

Categories: Data warehousing, Specific users

5 Comments

May 14, 2009

Facebook’s experiences with compression

One little topic didn’t make it into my long post on Facebook’s Hadoop/Hive-based data warehouse: Compression. The story seems to be:

Facebook uses gzip, and gets a little bit more than 6X compression.
Experiments suggest bzip2 would reduce data by another 20% or so, increasing compression to the 7.5X range.
The downside of bzip2 is 15-25% processing overhead, depending on the kind of data.

Categories: Data warehousing, Database compression, Facebook, Hadoop

2 Comments

May 12, 2009

How much state is saved when an MPP DBMS node fails?

Mark Callaghan raised an interesting question in the comment thread to my recent Facebook/Hadoop/Hive post:

My question is about how commercial MPP RDBMS vendors recover from single or a small number of node failures during a long running SQL query. Do any of them save enough state to avoid starting the query over?

Honestly, I’d just be guessing at the answer.

Would any vendors or other knowledgeable folks care to take a crack at answering directly?

Categories: Data warehousing, Parallelization

10 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Data warehousing

Greenplum blogs about some customers

The future of data marts

More on Fox Interactive Media’s use of Greenplum

Merv Adrian on SAND Technology

Daniel Abadi on Kickfire and related subjects

Greenplum update — Release 3.3 and so on

Greenplum will be announcing some stuff

How big are the intelligence agencies’ data warehouses?

Facebook’s experiences with compression

How much state is saved when an MPP DBMS node fails?

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin