Clustering

Analysis of products and issues in database clustering. Relates subjects include:

March 26, 2012

CodeFutures/dbShards update

I’ve been talking a fair bit with Cory Isaacson, CEO of my client CodeFutures, which makes dbShards. Business notes include:

Apparently, the figure of 6 dbShards customers in July, 2010 is more comparable to today’s 20ish contracts than to today’s 7-8 production users. About 4 of the original 6 are in production now.

NDA stuff aside, the main technical subject we talked about is something Cory calls “relational sharding”. The point is that dbShards’ transparent sharding can be done in such a way as to make many joins be single-server. Specifically:

dbShards can’t do cross-shard joins, but it can do distributed transactions comprising multiple updates. Cory argues persuasively that in almost all cases this is enough; but I see cross-shard joins as a feature that should someday be added to dbShards even so.

The real issue with dbShards’ transparent sharding is ensuring it’s really transparent. Cory regards as typical a customer with a couple thousand tables, who had to change a dozen or so SQL statements to implement dbShards. But there are near-term plans to automate the number of SQL changes needed down to 0. The essence of that change is this: Read more

March 21, 2012

DataStax Enterprise 2.0

Edit: Multiple errors in the post below have been corrected in a follow-on post about DataStax Enterprise and Cassandra.

My client DataStax is announcing DataStax Enterprise 2.0. The big point of the release is that there’s a bunch of stuff integrated together, including at least:

DataStax stresses that all this runs on the same cluster, with the same administrative tools and so on. For example, on a single cluster:

Read more

March 12, 2012

Kinds of data integration and movement

“Data integration” can mean many different things, to an extent that’s impeding me from writing about the area. So I’ll start by simply laying out some of the myriad ways that data can be brought to where it is needed, and worry about other subjects later. Yes, this is a massive wall of text, and incomplete even so — but that in itself is my central point.

There are two main paradigms for data integration:

Data movement and replication typically take one of three forms:

Beyond the core functions of movement, replication, and/or federation, there are other concerns closely connected to data integration. These include:

In particular, the following are largely different from each other. Read more

February 15, 2012

Quick notes on MySQL Cluster

According to the MySQL Cluster home page, today’s MySQL Cluster release has — give or take terminology details 🙂 —  added transparent sharding (Edit: Actually, please see the first comment below) and a memcached interface. My quick comments on all this to a reporter a couple of days ago were:

I don’t really know enough about MySQL Cluster right now to comment in more detail.

November 3, 2011

MarkLogic’s Hadoop connector

It’s time to circle back to a subject I skipped when I otherwise wrote about MarkLogic 5: MarkLogic’s new Hadoop connector.

Most of what’s confusing about the MarkLogic Hadoop Connector lies in two pairs of options it presents you:

Otherwise, the whole thing is just what you would think:

MarkLogic said that it wrote this Hadoop connector itself.

Read more

October 23, 2011

NoSQL notes

Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it’s time for a round-up NoSQL post. 🙂

Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon where you sit.”

Read more

October 23, 2011

Transparent relational OLTP scale-out

There’s a perception that, if you want (relatively) worry-free database scale-out, you need a non-relational/NoSQL strategy. That perception is false. In the analytic case it’s completely ridiculous, as has been demonstrated by Teradata, Vertica, Netezza, and various other MPP (Massively Parallel Processing) analytic DBMS vendors. And now it’s false for short-request/OLTP (OnLine Transaction Processing) use cases as well.

My favorite relational OLTP scale-out choice these days is the SchoonerSQL/dbShards partnership. Schooner Information Technology (SchoonerSQL) and Code Futures (dbShards) are young, small companies, but I’m not too concerned about that, because the APIs they want you to write to are just MySQL’s. The main scenarios in which I can see them failing are ones in which they are competitively leapfrogged, either by other small competitors – e.g. ScaleBase, Akiban, TokuDB, or ScaleDB — or by Oracle/MySQL itself. While that could suck for my clients Schooner and Code Futures, it would still provide users relying on MySQL scale-out with one or more good product alternatives.

Relying on non-MySQL NewSQL startups, by way of contrast, would leave me somewhat more concerned. (However, if their code is open sourced. you have at least some vendor-failure protection.) And big-vendor scale-out offerings, such as Oracle RAC or DB2 pureScale, may be more complex to deploy and administer than the MySQL and NewSQL alternatives.

October 23, 2011

Schooner pivots further

Schooner Information Technology started out as a complete-system MySQL appliance vendor. Then Schooner went software-only, but continued to brag about great performance in configurations with solid-state drives. Now Schooner has pivoted further, and is emphasizing high availability, clustered performance, and other hardware-agnostic OLTP (OnLine Transaction Processing) features. Fortunately, Schooner has some interesting stuff in those areas to talk about.

The short form of the SchoonerSQL (as Schooner’s product is now called) story goes roughly like this:

Read more

September 19, 2011

Are there any remaining reasons to put new OLTP applications on disk?

Once again, I’m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:

So I’m leaning to saying:   Read more

August 13, 2011

Couchbase technical update

My Couchbase business update with Bob Wiederhold was very interesting, but it didn’t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven’t had their designs locked down yet anyway. But here’s at least a partial explanation of what’s up.

memcached is a way to cache data in RAM across a cluster of servers and have it all look logically like a single memory pool, extremely popular among large internet companies. The Membase product — which is what Couchbase has been selling this year — adds persistence to memcached, an obvious improvement on requiring application developers to write both to memcached and to non-transparently-sharded MySQL. The main technical points in adding persistence seem to have been:

Couchbase is essentially Membase improved by integrating CouchDB into it, with the main changes being:

Let’s drill down a bit into Membase/Couchbase clustering and consistency. Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.