Parallelization

Analysis of issues in parallel computing, especially parallelized database management. Related subjects include:

November 7, 2008

Big scientific databases need to be stored somehow

A year ago, Mike Stonebraker observed that conventional DBMS don’t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods. Even so, some of the largest databases around are scientific ones, and they have to be managed somehow. For example:

Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility — e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.

October 22, 2008

Update on Aster Data Systems and nCluster

I spent a few hours at Aster Data on my West Coast swing last week, which has now officially put out Version 3 of nCluster. Highlights included: Read more

October 17, 2008

Oracle notes

I spent about six hours at Oracle today — talking with Andy Mendelsohn, Ray Roccaforte, Juan Loaiza, Cetin Ozbutun, et al. — and plan to write more later. For now, let me pass along a few quick comments. Read more

October 15, 2008

eBay doesn’t love MapReduce

The first time I ever heard from Oliver Ratzesberger of eBay, the subject line of his email mentioned MapReduce.  That was early this year.  Subsequently, however, eBay seems to have become a MapReduce non-fan.  The reason is simple: eBay’s parallel efficiency tests show that MapReduce leaves most processors idle most of the time.  The specific figure they mentioned was parallel efficiency of 18%.

September 28, 2008

Exadata and Oracle Database Machine parallelization clarified

Some kind Oracle development managers have reached out and helped me better understand where Oracle does or doesn’t stand in query and analytic parallelization. This post supersedes prior discussions of the subject over the past week. Read more

September 25, 2008

So what’s Oracle’s MPP-aware optimizer and query execution plan story?

Edit: Answers to the title question have now shown up, and so the post below is now superseded by this one.

In most respects — including most data warehousing respects — Oracle’s query optimizer is the most sophisticated on the planet (even ahead of IBM’s, I’d say). But in all the Exadata discussion — and also in a good, comprehensive review of Oracle’s data warehouse technology — I haven’t seen any claims that Oracle has tackled the hard problems of parallel analytics.

Yes, Oracle is now getting data off of multiple disks onto multiple processors at once, without SAN bottlenecks, and doing some local filtering. That’s the heart of the Exadata storage story, and it’s indeed a huge advance over Oracle’s prior technology. But what happens to the data after that? It’s sent over to a RAC cluster. And unless I’m terribly mistaken, any further processing will be done on just a single node in that cluster.

September 24, 2008

Exadata: Oracle finally answers the data warehouse challengers

Oracle, in partnership with HP, has announced a new data warehouse appliance product line, cleverly branded “Exadata.” The basic idea seems to be that database processing is split among two sets of servers:

Numbers are being thrown around suggesting that, unlike prior Oracle offerings, the Oracle Exadata-based appliance at least has scalability and price/performance worth comparing to Teradata — hey, Exa is bigger than Tera! — Netezza, et al.

Kevin Closson, who evidently worked on the project, offers the most useful and detailed description of Oracle Exadata I’ve seen so far. In particular, he and Oracle seem to claim: Read more

September 6, 2008

SANs vs. DAS in MPP data warehousing

Generally speaking:

But if you think about it, those facts don’t exactly add up. Read more

September 5, 2008

Dividing the data warehousing work among MPP nodes

I talk with lots of vendors of MPP data warehouse DBMS. I’ve now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives.

Read more

September 5, 2008

More on known MapReduce application areas

In surveying MapReduce applications to date, I said that they fell mainly into three overlapping categories:

and really should have included a fourth:

Nokia just released another MapReduce implementation, Disco, and its list of applications to date fits right into that template. The relevant quote is:

This far Disco has been succesfully used, for instance, in parsing and reformatting data, data clustering, probabilistic modelling, data mining, full-text indexing, and log analysis with hundreds of gigabytes of real-world data.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.