Eric Lai emailed today to ask what I thought about the NoSQL folks, and especially whether I thought their ideas were useful for enterprises in general, as opposed to just Web 2.0 companies. That was the first I heard of NoSQL, which seems to be a community discussing SQL alternatives popular among the cloud/big-web-company set, such as BigTable, Hadoop, Cassandra and so on. My short answers are:
- In most cases, no.
- Most of these technologies are designed for simple, high-volume OLTP (OnLine Transaction Processing.) Most large enterprises have an established way of doing OLTP, probably via relational database management systems. Why change?
- MapReduce is an exception, in that it’s designed for analytics. MapReduce may be useful for enterprises. But where it is, it probably should be integrated into an analytic DBMS.
- There’s one big countervailing factor to all these generalities — schema flexibility.
As for the longer form, let me start by noting that there are two main kinds of reason for not liking SQL. First, you might be fine with the idea of a (somewhat) nonprocedural, schema-aware DML/DDL (Data Manipulation/Description Language), but just think another kind is better, or more suited to your use case. If your reason is like that, you might favor alternatives such as:
- OLAP-based languages such as MDX.
- XML-oriented languages.
- “True” relational languages, because SQL deviated from the path of relational virtue under the corrupt influence of IBM — aka “Blue Babylon” — and the IT world has been languishing in sin ever since.
The second class of reason for avoiding SQL is because you don’t like the idea of a separate schema-aware DML at all. Possible reasons for this orientation include:
- You just like to program, and want to manipulate stored data the same way you do anything else. Thus, you are bothered by an “impedance mismatch” between SQL and your favorite programming languages. This is real. It also has been overcome by many, many enterprises around the world.
- You believe that more procedural alternatives are a better fit for cloud computing and extreme scale-out on failure-prone commodity hardware. Facebook made that case to me. However, I have trouble thinking of very many enterprise scenarios where it applies, especially when one considers electricity costs and the like.
- Your schemas change more quickly than your data architects can reasonably be expected to keep up with. Facebook made that case to me too. Enterprise examples might include marketing campaigns and M&A. I’ve long thought this to be a legitimate, looming concern. But I don’t know that stripped-down DBMS are the way to address it.
- You believe that SQL has severe processing overhead. In most enterprise use cases, that would just be bogus.
- You lack familiarity with SQL.
That last point is not a joke. One of the weirder database architectures I know of is the one underlying Guild Wars. Its developer — a brilliantly impressive guy — told me flat-out that he learned in college how to build a DBMS, but he didn’t learn how to develop for a conventional one. This was instrumental in his decision to build an unconventional data management architecture that uses SQL Server as little more than a smart file manager.
The questions of SQL performance and — often-unspecified — “overhead” are interesting to view through the lens of the H-Store/VoltDB project. Mike Stonebraker et al.:
- Are building a scale-out-oriented OLTP DBMS that is meant to run in RAM, preserving data through replication to other servers’ RAM more than through output to disk.
- Believe that 95% of what a typical SQL DBMS does to manage OLTP is wasteful overhead
- Originally planned to not use SQL, but wound up going with SQL because alternatives were insufficiently performant.
Mike himself, of course, has been all over the spectrum on SQL-like languages. First he favored QUEL vigorously over SQL for mainstream relational DBMS. Then he led the charge to extend SQL in PostgreSQL, Illustra, et al. Then he actually staked out a contrarian position in the area of complex event/stream processing by favoring a SQL-like language in an area where other alternatives were better established — but that was at what turned into StreamBase, which now emphasizes visual programming over any kind of coding language.
I need to write much more about schema flexibility, but tonight — which will be my third straight of <<8 hours sleep — is not the time for that.