January 28, 2013

Attack of the Frankenschemas

In typical debates, the extremists on both sides are wrong. “SQL vs. NoSQL” is an example of that rule. For many traditional categories of database or application, it is reasonable to say:

Reasons to abandon SQL in any given area usually start:

Some would further say that NoSQL is cheaper, scales better, is cooler or whatever, but given the range of NewSQL alternatives, those claims are often overstated.

Sectors where these reasons kick in include but are not limited to:

Or to quote a 2008 post,

Conor O’Mahony, marketing manager for IBM’s DB2 pureXML, talks a lot about one of my favorite hobbyhorses — schema flexibility* — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:

  • Tax authorities change their rules and forms every year, but don’t want to do total rewrites of their electronic submission and processing software.
  • The financial services industry keeps inventing new products, which don’t just have different terms and conditions, but may also have different kinds of terms and conditions.
  • The same, to some extent, goes for the travel industry, which also keeps adding different kinds of offers and destinations.
  • The energy industry keeps adding new kinds of highly complex equipment it has to manage.

Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange. For example, hospitals (especially in the US) have disparate medical records and billing systems, which can make information interchange a chore.

*I now call that dynamic schemas.

So, for fear of Frankenschemas, should we flee from RDBMS altogether? Hardly. For social proof, please note:

More conceptually, I’d say that the advantages of a relational DBMS start:

Those aren’t chopped liver.

Comments

8 Responses to “Attack of the Frankenschemas”

  1. Sébastien Derivaux on January 28th, 2013 12:48 pm

    What about the hybrid solution of having XML or JSON or BLOB (like MMORPG as you mentioned) or a key-value store inside a relational database for those corner cases? I saw that many times and while probably not the most efficient it did the job.

  2. Curt Monash on January 28th, 2013 2:13 pm

    Sometimes that works. Oracle is probably world’s biggest vendor of enterprise search, or else was for many years. But RDBMS vendors have been surprisingly ineffective at narrowing the gap vs. more focused systems.

    Indeed, I expect the NoSQL/NewSQL winners to wind up being systems that can exhibit multiple “personalities” in terms of logical data model.

  3. aaron on January 29th, 2013 3:20 pm

    There seems to be a lot of dancing around the point here.
    - In general, structured data with a variety of use needs is amenable to relational model RDBMS.
    - There are edge cases relating to scale, model, etc. that NewSQLs are going after. ACID replication so far doesn’t seem one of these.
    - There are edge cases around complex read only that some BI centric stores are attacking.
    - There are database-trivial applications where KVPs allow fast, scalable, and redundant caching.
    - There are structured apps with “soft edges”, where parts of the app have evolving schemas. These are addressed by document stores and RDBMS with XML and/or blobs.
    - There are apps that are just going to evolve, such as ones where business goals are in flux. These are served by a variety of noSQLs which allow deferred and evolving schema definition which allow changing goals of the store.
    - There are uses where the critical factor is archival and support of scalable storage and scalable processing where noSQLs (and some doc stores) shine.

    Speaking as a practioner, preserving a single view of a schema is really hard. Having multiple views may work in a few areas, but is truly complex and is not for general use.

    It would be instructive to look at your customer examples over the past few years. Likely, the ones that succeeded slid into more traditional RDBMS (and some noSQL, especially Hadoop), with only edge cases in newSQL and other specialized niche products.

  4. Brian Andersen on January 29th, 2013 5:30 pm

    The one thing I will say in favor of RDBMS is that they actually give you a means to do some analysis on your data. The NoSQL’s give you storage and retrieval at scale but totally throw up their hands when you want to do some analysis. The ability to analyze the data in place is what makes the difference between a database and a datastore.

  5. Curt Monash on January 30th, 2013 12:29 am

    Brian,

    Let’s not be quite so fast in saying that. First, a lot of NoSQL systems have at least some primitive analytic capabilities — aggregates, range queries, perhaps a form of map/reduce. Second, for hard core analytics, why do it in a short-request data store of any kind anyway? Third, there’s HBase/Hadoop integration. Fourth, there’s Cassandra/Hadoop integration. Etc.

  6. Curt Monash on January 30th, 2013 12:32 am

    Aaron,

    NewSQL clearly hasn’t had a lot of traction to date. And the one client I pointed toward a Schooner/dbShards combo probably is looking for alternatives on the Schooner side right now. (Meanwhile, the individual who spearheaded that selection jumped to DataStax.)

    But there are plenty of robust NoSQL installations, and I don’t include generic Hadoop as NoSQL. (HBase is NoSQL, however, of course.)

  7. aaron on January 30th, 2013 12:43 pm

    I think Brian is pointing out the obvious why (and again your example is instructive). To a make a decision to go to a niche product implies a lot of trust of the vendor AND a willingness to look at a db API as a construction tool – rather than current practice of integrating vendor products. I can’t think of *any* examples I’ve seen of successful serious and unfrustrating BI against a newSQL, even ones with standard SQL JDBC or ODBC interfaces.

    It would be really instructive to review *how and why people choose the new products*. The product vendors are of course dealing with the chicken and egg of competitors against mature stacks – you can do better tech for purpose, but you compete against an entire industry.

    (And on Hadoop being a DBMS – Pig and Hive are pretty crude, but they seem to be among the largest use/largest growth BI tools right now, so it can be considered as much a db as SAS or Access.)

  8. Data model churn | DBMS 2 : DataBase Management System Services on August 4th, 2013 6:10 pm

    [...] for me. They do overlap a lot — and the whole discussion overlaps with my post about schema complexity last January, and more generally with what I’ve written about dynamic schemas for the past [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.