Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

May 26, 2009

Teradata Developer Exchange (DevX) begins to emerge

Every vendor needs developer-facing web resources, and Teradata turns out to have been working on a new umbrella site for its.  It’s called Teradata Developer Exchange — DevX for short.  Teradata DevX seems to be in a low-volume beta now, with a press release/bigger roll-out coming next week or so.  Major elements are about what one would expect:

If you’re a Teradata user, you absolutely should check out Teradata DevX.  If you just research Teradata — my situation 🙂 — there are some aspects that might be of interest anyway.  In particular, I found Teradata’s downloads instructive, most particularly those in the area of extensibility.  Mainly, these are UDFs (User-Defined Functions), in areas such as:

Also of potential interest is a custom-portlet framework for Teradata’s management tool Viewpoint.  A straightforward use would be to plunk some Viewpoint data into a more general system management dashboard.  A yet cooler use — and I couldn’t get a clear sense of whether anybody’s ever done this yet — would be to offer end users some insight as to how long their queries are apt to run.

May 15, 2009

MySQL forking heats up, but not yet to the benefit of non-GPLed storage engine vendors

Last month, I wrote “This is a REALLY good time to actively strengthen the MySQL forkers,” largely on behalf of closed-source/dual-source MySQL storage engine vendors such as Infobright, Kickfire, Calpont, Tokutek, or ScaleDB. Yesterday, two of my three candidates to lead the effort — namely Monty Widenius/MariaDB/Monty Program AB and Percona — came together to form something called the Open Database Alliance.  Details may be found:

But there’s no joy for the non-GPLed MySQL storage engine vendors in the early news. Read more

May 14, 2009

Facebook’s experiences with compression

One little topic didn’t make it into my long post on Facebook’s Hadoop/Hive-based data warehouse: Compression. The story seems to be:

May 14, 2009

The secret sauce to Clearpace’s compression

In an introduction to archiving vendor Clearpace last December, I noted that Clearpace claimed huge compression successes for its NParchive product (Clearpace likes to use a figure of 40X), but didn’t give much reason that NParchive could compress a lot more effectively than other columnar DBMS. Let me now follow up on that.

To the extent there’s a Clearpace secret sauce, it seems to lie in NParchive’s unusual data access method.  NParchive doesn’t just tokenize the values in individual columns; it tokenizes multi-column fragments of rows.  Which particular columns to group together in that way seems to be decided automagically; the obvious guess is that this is based on estimates of the cardinality of their Cartesian products.

Of the top of my head, examples for which this strategy might be particularly successful include:

May 11, 2009

Facebook, Hadoop, and Hive

I few weeks ago, I posted about a conversation I had with Jeff Hammerbacher of Cloudera, in which he discussed a Hadoop-based effort at Facebook he previously directed. Subsequently, Ashish Thusoo and Joydeep Sarma of Facebook contacted me to expand upon and in a couple of instances correct what Jeff had said. They also filled me in on Hive, a data-manipulation add-on to Hadoop that they developed and subsequently open-sourced.

Updating the metrics in my Cloudera post,

Nothing else in my Cloudera post was called out as being wrong.

In a new-to-me metric, Facebook has 610 Hadoop nodes, running in a single cluster, due to be increased to 1000 soon. Facebook thinks this is the second-largest* Hadoop installation, or else close to it. What’s more, Facebook believes it is unusual in spreading all its apps across a single huge cluster, rather than doing different kinds of work on different, smaller sub-clusters. Read more

April 30, 2009

eBay’s two enormous data warehouses

A few weeks ago, I had the chance to visit eBay, meet briefly with Oliver Ratzesberger and his team, and then catch up later with Oliver for dinner. I’ve already alluded to those discussions in a couple of posts, specifically on MapReduce (which eBay doesn’t like) and the astonishingly great difference between high- and low-end disk drives (to which eBay clued me in). Now I’m finally getting around to writing about the core of what we discussed, which is two of the very largest data warehouses in the world.

Metrics on eBay’s main Teradata data warehouse include:

Metrics on eBay’s Greenplum data warehouse (or, if you like, data mart) include:

Read more

April 24, 2009

Some DB2 highlights

I chatted with IBM Thursday, about recent and imminent releases of DB2 (9.5 through 9.7). Highlights included:

April 22, 2009

Clearing some of my buffer

I have a large number of posts still in backlog.  For starters, there are ones based on recent visits with Aster, Greenplum, Sybase, Vertica, and a Very Large User.  I suspect I’ll write more soon on Oracle as well.  Plus there’s my whole future-of-online-media area.  And quite a bit more will grow out of planned research.

So there are a whole lot of other worthy subjects I doubt I’ll be getting to any time soon.  In some cases, of course, other people are doing great jobs of writing about same. Here are pointers to a few links that I am glad to recommend:

April 20, 2009

MySQL storage engine round-up, with Oracle-related thoughts

Here’s what I know about MySQL storage engines, more or less.

April 20, 2009

Calpont update — you read it here first!

Calpont has gone through a lot of strategy iterations since its founding. The super-short version is that Calpont originally planned an appliance built around a SQL chip, much like Kickfire. But after various changes in management and venture backing, Calpont turned itself into a software-only analytic DBMS vendor relying on a MySQL front end. Calpont is now at the stage of announcing an Early Adopter program at the MySQL conference on Wednesday, although details of Calpont’s product release timing, pricing, feature set, etc. are all To Be Determined.

Minor highlights of the Calpont technical story include: Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.