March 28, 2008

XML versus sparse columns in variable schemas

Simon Sabin makes an interesting point: If you can have 30,000 columns in a table without sparsity management blowing up, you can handle entities with lots of different kinds of attributes. (And in SQL Server you can now do just that.) The example he uses is products — different products can have different sets of possible colors, different kinds of sizes, and so on. An example I’ve used in the past is marketing information — different prospects can reveal different kinds of information, which may have been gathered via non-comparable marketing programs.

I’ve suggested this kind of variability as a reason to actually go XML — you’re constantly adding not just new information, but new kinds of information, so your fixed schema is never up to date. But I haven’t detected many actual application designers who agree with me …

Comments

3 Responses to “XML versus sparse columns in variable schemas”

  1. Ning on March 28th, 2008 8:33 pm

    Exactly. This is one of the sweet spots of XML databases: schema evolution and schema chaos.

    With relational views defined on top of XML data, application designers should not feel lost. I guess the major reasons that prevent them adopting XML by now are performance and operational completeness, which should be mature eventually.

  2. Daniel Weinreb on March 30th, 2008 12:13 pm

    To make this mode of operation useful, there would be one column that contains a value that says what type this row is, and there would be metadata somewhere that says which columns are relevant to this row. (For XML, your metadata would say even more, namely where each column corresponds to which element or attribute in an XML schema.) This is basically a cheesy way to do polymorphism, in RDBMS’s which don’t have this kind of polymorphism natively defined.

    Simon claims you have strongly-typed data, but what in the DBMS stops you from inserting a row representing socks but setting the cup size?

    And what if you have special sports socks, which have an additional attribute saying what sport they’re aimed at? This column should be empty for regular socks. That’s a simple example of inheritance. You can represent the rows just fine but you need the metadata somewhere, and I don’t see how it would be in the DBMS, so the DBMS’s ability to do integrity checking is limited.

  3. Travis Spencer on July 17th, 2008 3:46 am

    @Ning: Besides performance and features, the other barrier is cost. ATM, the native XML database of choice is MarkLogic and they know it. So, they charge you an arm and a leg for it. The list price is 6 figures IIRC.

    @Daniel: Simon’s example that includes typed metadata is a case of what some call EAV/CV (http://en.wikipedia.org/wiki/Entity-Attribute-Value_model). The metadata to create this type system is in the RDBMS as well, and your application essentially becomes the referential integrity engine. Fun times!

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.