Based on a variety of conversations – including some of the flames about my recent confession that mid-range DBMS aren’t suitable for everything — it seems as if a quick primer may be in order on the subject of datatype support. So here goes.
“Database management” usually deals with numeric or alphabetical data – i.e., the kind of stuff that goes nicely into tables. It commonly has a natural one-dimensional sort order, which is very useful for sort/merge joins, b-tree indexes, and the like. This kind of tabular data is what relational database management systems were invented for.
But ever more, there are important datatypes beyond character strings, numbers and dates. Leaving out generic BLOBs and CLOBs (Binary/Character Large OBjects), the big four surely are:
- Text. Text search is a huge business on the web, and a separate big business in enterprises. And text doesn’t fit well into the relational paradigm at all.
- Geospatial. Information about locations on the earth’s surface is essentially two-dimensional. Some geospatial apps use three dimensions.
- Object. There are two main reasons for using object datatypes. First, the data can have complex internal structures. Second, it can comprise a variety of simpler types. Object structures are well-suited for engineering and medical applications.
- XML. A great deal of XML is, at its heart, either relational/tabular data or text documents. Still, there are a variety of applications for which the most natural datatype truly is XML.
Numerous other datatypes are important as well, with the top runners-up probably being images, sound, video, time series (even though they’re numeric, they benefit from special handling).
Four major ways have evolved to manage data of non-tabular datatype, either on their own or within an essentially relational data management environment.
- Utterly standalone servers. There are lots of search engines, geospatial engines, object-oriented database management systems, and so on. Some may have ODBC/JDBC SQL interfaces, to handle metadata (which is commonly tabular in nature) if nothing else. But even so, there’s little relational about them.
- True RDBMS extensibility. In the 1990s, awkwardly named object-relational database management systems were introduced, boasting the awkwardly named feature abstract datatypes. Oracle, DB2, Informix, and PostgreSQL are now of this kind. They let one write data access methods for data that’s right in the basic relational table structure, and get at it through extensions to SQL.
- Tightly coupled servers. A close relative of RDBMS extension via new access methods is to create new servers for new datatypes, well-integrated with your RDBMS. Your parser and optimizer are in charge of federating them; the user just writes extended-SQL statements.
- User-defined functions. User-defined functions are like datatype extensions, but vastly easier to write, in that they don’t have any special access methods. When their performance is good enough, UDFs are often the best way to handle extended-datatype needs.
So how does this all play out in real-world examples? It’s all over the place.
- Enterprise text search is divided among three modes – integrated into the RDBMS (Oracle and IBM), tightly-couple server (Microsoft, pre-FAST acquisition), and standalone (Autonomy, FAST pre-acquisition, Google, and most other vendors).
- Geospatial datatypes are embedded into extensible DBMS – generally via technology from ESRI – for OLTP uses. But for data warehousing, where you don’t need pinpoint record retrieval, UDFs are generally believed to suffice. (E.g. Teradata, Netezza.)
- Intersystems seems to stand alone in getting nontrivial revenue from the standalone OODBMS market.
- The XML situation is really confused: Oracle has been late getting its native XML strategy together; the tightly-coupled DB2 Viper engine has been a performance disappointment; Microsoft’s integrated native XML isn’t heard from much either; and text/XML integrated engine Marklogic is getting some non-text business almost by default. In addition, every serious relational vendor has a capability to “shred” XML into relational tables, and can of course also just bulk-handle XML via BLOBs/CLOBs.