January 28, 2011

Schooner — flash-based, now software-only, and very fast

Last October I wrote about Schooner Information Technology, which made flash-based appliances, for MySQL, memcached, or persistent memcached. Schooner sold those appliances to close to 20 customers, but even so decided software-only was a better way to go.

Schooner’s core value proposition is that one Schooner box with flash does the job of a lot of MySQL or NoSQL boxes with hard drives. Highlights of the Schooner story — of which you can find more detail at the Schooner website — now include:  Read more

January 25, 2011

ScaleBase, another MPP OLTP quasi-DBMS

Liran Zelkha of ScaleBase raised his hand on Twitter. It turns out ScaleBase has a story rather similar to that of CodeFutures/dbShards. That is:

Our talk didn’t get deeply technical, and I don’t know exactly how ScaleBase’s replication works. But a website reference to a small transaction log in a distributed cache does sound, while not identical to the dbShards approach, at least directionally similar.

ScaleBase is a year or so old, with about 6 people, based in the Boston area despite strong Israeli roots. ScaleBase has raised a round of venture capital; I didn’t ask for details.

Liran says that ScaleBase is in closed beta, with some production users, at least one of whom has over 100 database servers.

January 25, 2011

dbShards update

I talked yesterday with Cory Isaacson of CodeFutures, and hence can follow up on my previous post about dbShards. dbShards basics include:

One dbShards customer writes 1/2 billion rows on a busy day, and serves 3-4,000 pages per second, naturally with multiple queries per page. This is on a 32-node cluster, with uninspiring hardware, in the cloud. The database has 16 shards, aggregating 128 virtual shards. I forgot to ask how big the database actually is. Overall, dbShards is up to a dozen or so signed customers, half of whom are in production or soon will be.

dbShards’ replication scheme works like this:  Read more

January 24, 2011

Do we still need EDWs?

Colin White reopened the question of whether enterprise data warehouses (EDW) are still needed, lining up and knocking down a number of traditional pro-EDW arguments, in more detail than I ever have. So this feels like a good time to revisit my answer to the question of the EDW’s role, whose money quote was:

At conventional enterprises … Manage some of your data to enterprise data warehouse standards, but not all of it. Specifically, your highest-value data should be in something that looks like a classic enterprise data warehouse, and your lower-value data shouldn’t.

For sufficiently small enterprises, the “something that looks like a classic enterprise data warehouse” might just be your One Central Database, combining OLTP (OnLine Transaction Processing) and analytics. Otherwise, the chances are high that you’re going to want to copy your data crown jewels to an EDW, even if they’re also being used as analytic inputs directly from the OLTP systems that first capture them.

As I’ve recently reviewed, there are huge amounts of specialized technology for SQL queries and other analytics. Classical EDW vendors may not be the best or lowest-cost providers of such technology. And even when the EDW is technically competitive, the bureaucratic processes around it can impede rapid adoption of important analytic tools. So Colin is directionally right, in that most large enterprises should be taking the EDW concept less seriously than they currently do. But core EDW technology and business attitudes shouldn’t be entirely discarded either.

January 24, 2011

Choices in analytic computing system design

When I posted a long list of architectural options for analytic DBMS, I left a couple of IOUs in for missing parts. One was in the area of what is sometimes called advanced-analytics functionality, which roughly speaking means aspects of analytic database management systems that are not directly related to conventional* SQL queries.

*Main examples of “conventional” = filtering, simple aggregrations.

The point of such functionality is generally twofold. First, it helps you execute analytic algorithms with high performance, due to reducing data movement and/or executing the analytics in parallel. Second, it helps you create and execute sophisticated analytic processes with (relatively) little effort.

For now, I’m going to refer to an analytic RDBMS that has been extended by advanced-analytics functionality as an analytic computing system, rather than as some kind of “platform,” although I suspect the latter term is more likely to wind up winning.  So far, there have been five major categories of subsystem or add-on module that contribute to making an analytic DBMS a more fully-fledged analytic computing system:

Read more

January 22, 2011

Mega-trends driving data warehousing and business intelligence

Philip Russom opines (emphasis mine):

What’s driving change in data warehousing (DW) and business intelligence (BI)? There are obvious scalability issues, due to burgeoning data, reports, and user communities. Plus, end-users need more real-time and on-demand BI. For many organizations, integrating existing systems into DW/BI is a higher priority than putting in new ones. And the “do more with less” economy demands more BI at lower costs. Hence, most drivers of change in BI and DW concern four Mega-Trends: size, speed, interoperability, and economics.

Depending on which universe of enterprises and vendors you’re looking at, Philip’s claim of “most” may be technically true. But from where I sit, Philip omitted two other crucial trends: new kinds of data and increased analytic sophistication.

A year ago, I divided data into three kinds:

Most organizations on the planet could benefit from better understanding or exploiting their human-generated tabular data. But even so, many of the best opportunities to add analytic value come from capturing and analyzing fundamentally newer kinds of information.

I further would suggest that analytic sophistication is going up, for at least two reasons:

Some of the best examples of these trends, especially the second one, may be found in what I recently called analytic profiling.

January 20, 2011

Notes, links, and comments January 20, 2011

I haven’t done a pure notes/links/comments post for a while. Let’s fix that now. (A bunch of saved-up links, however, did find their way into my recent privacy threats overview.)

First and foremost, the fourth annual New England Database Summit (nee “Day”) is next week, specifically Friday, January 28. As per my posts in previous years, I think well of the event, which has a friendly, gathering-of-the-clan flavor. Registration is free, but the organizers would prefer that you register online by the end of this week, if you would be so kind.

The two things potentially wrong with the New England Database Summit are parking and the rush hour drive home afterwards. I would listen with interest to any suggestions about dinner plans.

One thing I hope to figure out at the Summit or before is what the hell is going on on Vertica’s blog or, for that matter, at Vertica. The recent Mike Stonebraker post that spawned a lot of discussion and commentary has disappeared. Meanwhile, Vertica has had three consecutive heads of marketing leave the company since June, and I don’t know who to talk to there any more.  Read more

January 19, 2011

Sound bites on HP/Microsoft and Neoview

HP and Microsoft put out a press release.  Three new appliances are being announced, and we’re being reminded of at least one past announcement. I wasn’t briefed, and wouldn’t want to comment on, say, price/performance or feature particulars. That said:

January 18, 2011

Architectural options for analytic database management systems

Mike Stonebraker recently kicked off some discussion about desirable architectural features of a columnar analytic DBMS. Let’s expand the conversation to cover desirable architectural characteristics of analytic DBMS in general.  Read more

January 12, 2011

Mike Stonebraker on “real column stores”

Mike Stonebraker has a post up on Vertica’s blog trying to differentiate “real” from “pretend” column stores. (Edit: That post seems to have come back down, but as of 1/19 it can be found in Google Cache.) In essence, Mike argues that the One Right Way to design a column store is Vertica’s, a position that Daniel Abadi used to share but since has retreated from.

There are some good things about that post, and some not-so-good. The worst paragraph is probably

Several row-store vendors (including Oracle, Greenplum and Aster Data) now claim to be selling a column store.   Obviously, this would require a complete rewrite of a DBMS to move from Figure 1 to Figure 2.  Hence, none of the “pretenders” have actually done this.  Instead all have implemented some aspects of column stores, and then claim to be the real thing.  This blog defines what the “real enchilada” looks like, and how to tell it from the pretenders.

which I question on two levels. Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.