Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

Any subcategory
Database diversity
Explicit support for specific data types
(in Text Technologies) Text search

May 13, 2010

SAP believes in database proliferation

For as long as we’ve had the concept of database management, there’s been a debate as to whether it is realistic for large enterprises to have a single Grand Unified Enterprise Storehouse Of All Information, or whether database proliferation actually makes sense. This argument has been particularly intense in the area of data warehouse/data marts. I’m generally on the side of data mart proliferation.

4 1/2 years ago, I noted that SAP believed strongly in database proliferation: Read more

Categories: Data warehousing, SAP AG, Theory and architecture

3 Comments

May 12, 2010

Quick reactions to SAP acquiring Sybase

SAP is acquiring Sybase. On the conference call SAP said Sybase would be run as a separate division of SAP (no surprise). Most of the focus was on Sybase’s mobile technology, which is forecast at >$400 million in 2010 revenues (which would be 30%ish of the total). My quick reactions include: Read more

Categories: Analytic technologies, ANTs Software, Business intelligence, Business Objects, Columnar database management, Data warehousing, In-memory DBMS, Memory-centric data management, OLTP, ParAccel, SAP AG, Sybase, Theory and architecture, Vertica Systems

13 Comments

May 12, 2010

The Clustrix story

After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included: Read more

Categories: Application areas, Clustrix, Emulation, transparency, portability, Games and virtual worlds, MySQL, NoSQL, OLTP, Parallelization, Solid-state memory

8 Comments

May 4, 2010

Clustrix may be doing something interesting

Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, white paper. Based on that, I get the impression:

Clustrix is making OLTP DBMS.
The core problem Clustrix tries to solve is scale-out, without necessarily giving up SQL. (I couldn’t immediately tell whether Clustrix supports NoSQL-style key-value interfaces enthusiastically, grudgingly, or not at all.)
Unlike Akiban or VoltDB, Clustrix makes database appliances. The Clustrix software seems to assume a Clustrix appliance.
A key feature of Clustrix’s database appliances is that they rely on solid-state memory. I’m guessing that Clustrix appliances don’t even have disks, or that if they do the disks store some software or something, not actual data. (As previously noted, I agree with Oracle in thinking that much of the progress in database technology this decade will come from proper design for solid-state memory.)
Clustrix talks of things that sound like compiled queries and attempts to avoid locks. However, it doesn’t sound as extreme in these regards as VoltDB.
Clustrix also talks of things that sound like consistent hashing.
The brand name “Sierra” also shows up along with the brand name “Clustrix.”

Categories: Clustrix, Data warehouse appliances, DBMS product categories, NoSQL, Parallelization, Solid-state memory, Storage, Theory and architecture

2 Comments

May 2, 2010

Daniel Abadi on NoSQL design tradeoffs

In a thought-provoking post, Daniel Abadi points out NoSQL-related terminological problems similar to the ones I just railed against, and argues

To me, CAP should really be PACELC — if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?

and goes on to say

For example, Amazon’s Dynamo (and related systems like Cassandra and SimpleDB) are PA/EL in PACELC — upon a partition, they give up consistency for availability; and under normal operation they give up consistency for lower latency. Giving up C in both parts of PACELC makes the design simpler — once the application is configured to be able to handle inconsistencies, it makes sense to give up consistency for both availability and lower latency.

However, I think Daniel’s improved formulation is still misleading, in at least two ways:

Daniel implicitly assumes any given NoSQL system makes a fixed set of tradeoffs, when actually — as he in fact notes in his post — some of them offer tradeoffs that are quite tunable.
I think Daniel is at best oversimplifying when he appears to assert that best-case network latency is an important design criterion for all that many NoSQL systems. Naively, anything that acknowledges reads or writes requires two hops. Two-phase commit (2PC) requires three hops. 33% latency reductions are not the kinds of goals that drive dramatic DBMS redesigns, even though tenths of seconds — i.e. 100s of milliseconds — matter in the kinds of environments where NoSQL is sprouting up.

Categories: Amazon and its cloud, Cassandra, NoSQL, Theory and architecture

2 Comments

May 1, 2010

Read-your-writes (RYW), aka immediate, consistency

In which we reveal the fundamental inequality of NoSQL, and why NoSQL folks are so negative about joins.

Discussions of NoSQL design philosophies tend to quickly focus in on the matter of consistency. “Consistency”, however, turns out to be a rather overloaded concept, and confusion often ensues.

In this post I plan to address one essential subject, while ducking various related ones as hard as I can. It’s what Werner Vogel of Amazon called read-your-writes consistency (a term to which I was actually introduced by Justin Sheehy of Basho). It’s either identical or very similar to what is sometimes called immediate consistency, and presumably also to what Amazon has recently called the “read my last write” capability of SimpleDB.

This is something every database-savvy person should know about, but most so far still don’t. I didn’t myself until a few weeks ago.

Considering the many different kinds of consistency outlined in the Werner Vogel link above or in the Wikipedia consistency models article — whose names may not always be used in, er, a wholly consistent manner — I don’t think there’s much benefit to renaming read-your-writes consistency yet again. Rather, let’s just call it RYW consistency, come up with a way to pronounce “RYW”, and have done with it. (I suggest “ree-ooh”, which evokes two syllables from the original phrase. Thoughts?)

Definition: RYW (Read-Your-Writes) consistency is achieved when the system guarantees that, once a record has been updated, any attempt to read the record will return the updated value.

Categories: Amazon and its cloud, NoSQL, OLTP, Parallelization, Theory and architecture

26 Comments

April 29, 2010

Vertica update

Last month, Vertica’s CEO Ralph Breslauer quit,* and Vertica made it sound like there would be a new CEO late in April. And indeed, as of April 29, there was. He’s a guy I’ve never heard of before named Chris Lynch, apparently quite the sales machine builder. The most substance I’ve found is a pair of Mass High Tech articles — the latter exceedingly typo-ridden — to the general effect that:

Vertica plans to build a massive, world-conquering sales force.
If Vertica dips back into negative cash flow to do that and has to raise more venture capital, so be it.
“Triple-digit” revenue growth is expected for this year.

1 Comment

April 27, 2010

Gear6 seems to have failed in the memcached market too

As previously noted, I’ve briefly cut back on blogging (and research) due to some family health issues. The first casualty was a post about memcached. One of the two companies to be featured were my new clients at Northscale. The other was Gear6. What they had in common was:

Both Northscale and Gear6 offered distributions of memcached.
Both Northscale and Gear6 also wanted to sell persistent versions of memcached — in essence, simple DBMS with the memcached API in place of a substantial DML (Data Manipulation Language).

Categories: Clustering, Couchbase, memcached, NoSQL

1 Comment

April 18, 2010

Greenplum et alia’s BigDataNews.com site

Greenplum recently started a website BigDataNews.com, and quickly signed up Aster Data as a co-sponsor. (Edit: As per a comment below, the decision to sign up additional sponsors was made by the site’s independent publisher.) It’s actually being run by Brett Sheppard, a former Gartner/DataQuest analyst who now gets involved in this kind of thing. (Brett and I may be working on another project soon, with Greenplum funding.)

The heart of the site is feeds* from a variety of high-profile blogs (DBMS2, Daniel Abadi’s, Joe Hellerstein’s, James Kobelius’, et al.), plus some additional posts written by Brett (primarily) or Greenplum folks. Highlights of Brett’s posts include:

What I am told was an unauthorized revelation that Greenplum Chorus is built on CouchDB and Erlang.
An impassioned defense of the integrity of Gartner’s analysis.

*At least in my case, that’s just a post title or snippet, plus a link back to the main post. The same goes for mapreduce.org, actually.

Categories: Analytic technologies, Data warehousing, Greenplum, NoSQL

2 Comments

April 12, 2010

Greenplum Chorus and Greenplum 4.0

Greenplum is making two product announcements this morning. Greenplum 4.0 is a revision of the core Greenplum database technology. In addition, Greenplum is announcing Greenplum Chorus, which is the first product release instantiating last year’s EDC (Enterprise Data Cloud) vision statement and marketing campaign.

Greenplum 4.0 highlights and related observations include: Read more

Categories: Analytic technologies, Benchmarks and POCs, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Greenplum, Market share and customer counts, Petabyte-scale data management, Specific users, Telecommunications, Theory and architecture

5 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Theory and architecture

SAP believes in database proliferation

Quick reactions to SAP acquiring Sybase

The Clustrix story

Clustrix may be doing something interesting

Daniel Abadi on NoSQL design tradeoffs

Read-your-writes (RYW), aka immediate, consistency

Vertica update

Gear6 seems to have failed in the memcached market too

Greenplum et alia’s BigDataNews.com site

Greenplum Chorus and Greenplum 4.0

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin