January 27, 2008

The 4 main approaches to datatype extensibility

Based on a variety of conversations – including some of the flames about my recent confession that mid-range DBMS aren’t suitable for everything — it seems as if a quick primer may be in order on the subject of datatype support. So here goes.

“Database management” usually deals with numeric or alphabetical data – i.e., the kind of stuff that goes nicely into tables. It commonly has a natural one-dimensional sort order, which is very useful for sort/merge joins, b-tree indexes, and the like. This kind of tabular data is what relational database management systems were invented for.

But ever more, there are important datatypes beyond character strings, numbers and dates. Leaving out generic BLOBs and CLOBs (Binary/Character Large OBjects), the big four surely are:

Numerous other datatypes are important as well, with the top runners-up probably being images, sound, video, time series (even though they’re numeric, they benefit from special handling).

Four major ways have evolved to manage data of non-tabular datatype, either on their own or within an essentially relational data management environment.

So how does this all play out in real-world examples? It’s all over the place.

Comments

10 Responses to “The 4 main approaches to datatype extensibility”

  1. Roland Bouman on January 27th, 2008 4:33 am

    “including some of the flames about my recent confession that mid-range DBMS aren’t suitable for everything”

    You confuse “being contradicted” with “being flamed”. Instead of being so touchy, you could just show some responsibility and correct the factual mistakes in the post you are referring to.

  2. Curt Monash on January 27th, 2008 4:57 am

    Why, thank you for showing up! This post was in part for you, because you’d expressed ignorance on the matter of datatypes.

    Otherwise — the words “slander” and “disgusting” are indeed flames. And you’re way out of line for claiming that your statements of opinion are factual at all, let alone that they point out somebody else’s “factual mistakes.”

    Unless you settle down and start thinking clearly, I have no interest in attempting substantive discussion with you.

    CAM

  3. Roland Bouman on January 27th, 2008 1:13 pm

    “This post was in part for you, because you’d expressed ignorance on the matter of datatypes.”

    Then thank you so much for enlightening me and the rest of us ignoramuses! I’m so glad you told us all about those fancy datatypes like images, I really should notify Frank Mash at fotolog immediately he must look out for another database that can handle them a.s.ap.

    “the words “slander” and “disgusting” are indeed flames”

    What can you expect? You started a smoke screen, it’s just natural that there must be a fire somewhere.

    “And you’re way out of line for claiming that your statements of opinion are factual at all, let alone that they point out somebody else’s “factual mistakes.””

    I and others have explained exactly and repeatedly where the factual mistakes are. You have been avoiding even an acknowledgement that you are aware the refutations exist, even though you have been given the pointers (and members of posterity can browse the past week of planetmysql.org archives to check for themselves). I’m really starting to believe you are doing this on purpose.

    “Unless you settle down and start thinking clearly, I have no interest in attempting substantive discussion with you.”

    Oh no…please don’t threaten me…does “thinking clearly” mean I can’t contradict you anymore? If so, that’s really going to be such a tough choice…oh the wretched dilemma!…am I really going to cut myself off of this world-class opportunity to have Curt attempt a substantive discussion with mere me?

  4. DBNews 2007 #3 | PettiNix on January 27th, 2008 2:04 pm

    […] di articoli apparsi sulla rete ne voglio evidenziare due di DBMS2, il primo riguardante i 4 approcci  principali all’estensibilità dei tipi di dati, il secondo invece elenca 14 motivi per cui non dovrebbe essere utilizzato un DBMS di fascia […]

  5. Curt Monash on January 27th, 2008 6:11 pm

    Roland,

    I’ll respond to the one on-topic thing you’ve said in this comment thread. I would guess strongly that the photos at Fotolog.com aren’t indexed directly. E.g., there’s no search that says “Show me all the photos with 3 or more faces.” (I’d be fascinated if this is wrong, by the way.) Rather, they most likely are indexed on the metadata, such as descriptions. (Nothing wrong with that; image indexing is still pretty primitive. But it illustrates my point.) I further note that there’s no native search of the descriptive text; they just use Google. (Nothing wrong with that either; I do it too.)

    Thus, Fotolog does not appear to be evidence that MySQL is being used to store anything but alphanumeric data and BLOBs/CLOBs.

    As for the off-topic stuff — we agree on one thing. There are plenty of other posts and comment threads, here and elsewhere, where the various opinions are noted.

    CAM

  6. Ning Zhang on February 26th, 2008 11:05 pm

    Curt,

    Great post and very informative about the big picture! Having been studying and working on XML databases for more than 5 years, I’ve seen good and bad usages of XML. In my personally opinion, the best thing about XML is that it doesn’t require a schema or it could define a very flexible schema, yet you can define relational views on top of it to “map” it to some schema. This saves a lot of headaches from schema evolutionand data/schema integration.

    Also regarding to data types or data models, I thik graph data model and graph querylanguages may play a significant role in the future database research and development. There are many applications that can be naturally modeled as a database problem ongraphs (examples include social networks, scientific data, and RDF). Queries (e.g., route queries) on graphs are usually not expressible in pure SQL. Using vendor-dependent extensions or UDFs usually results in unacceptable performance on large data sets. So I guess graphs should be the next data type that major DBMS vendors should support in the near future.

    Thanks,
    Ning

    Ning

  7. Curt Monash on February 27th, 2008 12:21 am

    Ning,

    Cogito had an interesting approach to graphs, which I wrote about here. Its assets have now been bought by another company; stay tuned.

    RDF is the other approach to graphs, of course. If you look at the RDF section here, you may find some thoughts congenial with yours. :)

    CAM

  8. Ning Zhang on February 27th, 2008 2:20 am

    Curt,

    Thanks for the pointers. I’ve read your Cogito white paper and their web site. Although there is not much detail about the GQL, storage and query processing, it seems a more interesting approach comparing to some of the relational approaches developed for graph databases — one simple reason is because SQL is not expressive enough to specify complex graph queries. Also clustering nodes based on their proximity is also more promising than set-oriented storage. Graph partitioning techniques seems to be one of the keys to the performance.

    Some technical questions about the MyFamily.com example mentioned in your white paper. Are there any details about the experiments? i.e., is the database RAM-based/disk-based? Are there any indexes? is the data set really a graph or a tree (seems tree is enough for the problem)?

    I looking forward to your update.

    Thanks,
    Ning

  9. Curt Monash on February 27th, 2008 4:08 pm

    Ning,

    The answer to most of your questions is “I don’t know.”

    That said, Cogito definitely was supporting graphs and not just trees. Whether they optimized clustering for general graphs or just for trees is of course a whole other matter, since the former is a much harder problem.

    As I write this, either my calendar is wrong or the CEO of the successor company is late giving me a phone call. As I said — stay tuned.;)

    EDIT: Oops. I was supposed to call him, not vice-versa …

    CAM

  10. 5 kinds of data structure and 16 kinds of data access method | DBMS2 -- DataBase Management System Services on June 16th, 2009 5:14 am

    […] recent post about datatype extensibility zoomed over at least one head, as per the comment thread. Since then I’ve googled, and come […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.