February 22, 2015

Data models

7-10 years ago, I repeatedly argued the viewpoints:

Since then, however:

So it’s probably best to revisit all that in a somewhat organized way.

To make the subject somewhat manageable, I’ll focus on fielded data — i.e. data that represents values of something — rather than, for example, video or images. Fielded data always arrives as a string of bits, whose meaning boils down to a set of <name, value> pairs. Here by “string of bits” I mean mainly a single record or document (for example), although most of what I say can apply to a whole stream of data instead.

Important distinctions include:

Some major data models can be put into a fairly strict ordering of query desirability by noting:

Unsurprisingly, that ordering is reversed when it comes to writing data.

And so, for starters, most large enterprises will have important use cases for data stores in all of the obvious categories. In particular:

Beyond that:

And finally, I think in-memory data grids:

Related links

Comments

8 Responses to “Data models”

  1. Michael Hausenblas on February 22nd, 2015 11:36 pm

    Thanks for the writeup Curt, excellent brain-food! Funny you should mention I had somewhat related thoughts over the weekend: http://datadventures.ghost.io/2015/02/22/principia-data/

    Cheers,
    Michael

  2. Chris W on February 23rd, 2015 9:07 pm

    Hi Curt,

    “In relational use cases field names tend to be implicit”

    Is that correct, or did you mean explicit?

    Thanks,
    Chris

  3. Venkat Krishnamurthy on February 23rd, 2015 9:29 pm

    For RDF (not necessarily representative of graphs) – the same advantages of a declarative query language hold, with a few improvements – no explicit joins (there are no tables/relations), graph-path patterns via path regexes and schema being expressible as data (i.e in RDF). The last part means that schema is a perspective, not a fixed thing – and substitutable over a given ‘base dataset’. This is pretty powerful – as schema information can be added/updated completely after the fact. It’s powerful in a head-exploding kind of way, which probably explains why many people don’t get it or use it.

  4. Curt Monash on February 23rd, 2015 11:53 pm

    Chris,

    I meant what I wrote. The field names aren’t directly carried with the data.

    To be fair, however — I wasn’t thinking of what the SQL would look like.

  5. David Gruzman on February 24th, 2015 2:50 am

    I also see schema-less, document oriented data model -when we store arbitrary JSONs and can search for them by values in arbitrary fields.
    To clarify a bit I would tell that each document do have very clear structure, this ready to be queried, but different documents do not have to share the same structure.
    This model is used by two very popular systems – MongoDB and ElasticSearch.

  6. Curt Monash on February 24th, 2015 4:22 am

    Fair enough. Queries can be cross-column, especially if they are resolved by some kind of text search. Attivio’s founders tried to popularize that at FAST and then at Attivio, without a lot of success in either place that I could discern. RDBMS have in some cases added it, I think. I think MarkLogic is pretty good about it too.

  7. Neil Hepburn on March 3rd, 2015 2:52 pm

    From my perspective the biggest shift I am seeing is in the analytical/BI space.
    I believe the future of data warehousing is not in databases but rather in file systems. I don’t think you even need HDFS. I would argue that NTFS is actually more appropriate for the majority organizations: Friction-less access from pretty much any tool you can think of; most tables are not that big or can be easily broken down into small partitions; and NTFS has a more robust security model. But there’s not a big push here due in part to the confluence of incentives and disincentives between vendors and IT staff, and the fact that Windows is generally seen as uncool.

    On the transaction processing side of things, I think key/value stores will continue to flourish, but their scope of business consistency is really only a single key/value pair. Relational Sharding might not be as cheap to scale as key/value, but it does provide a much larger scope of controlled business consistency. But for the time being, this fundamental trade-off between Availability and Consistency gets short shrift in IT and IT journal circles. Or it’s argued in an obtuse round-about way that nobody understands. So I expect Key/Value will continue to flourish, there may be a shift back to Relational Sharding once a broader awareness emerges of what the real trade-offs are, and not the ‘straw-man’ trade-offs many vendors put out there.

  8. BI for NoSQL — some very early comments | DBMS 2 : DataBase Management System Services on March 15th, 2015 7:51 pm

    […] I noted in a recent post about data models, many databases — in particular SQL and NoSQL ones — can be viewed as collections of […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.