Relational DBMS versus text data
There seems to be tremendous confusion about “search,” “meaning,” “semantics,” the suitability of relational DBMS to manage text data, and similar subjects. Here are some observations that may help sort some of that out.
1. Relational database theorists like to talk about the “meaning” or “semantics” of data as being in the database (specifically its metadata, and more specifically its constraints). This is at best a very limited use of the words “meaning” or “semantics,” and has little to do with understanding the meaning of plain English (or other language) phrases, sentences, paragraphs, etc. that may be stored in the database. Hugh Darwen is right and his fellow relational theorists are confused.
2. The standard way to manage text is via a full-text index, designed like this: For hundreds of thousands of words, the index maintains a list of which documents the word appears in, and at what positions in the document it appears. This is a columnar, memory-centric approach, that doesn’t work well with the architecture of mainstream relational products. Oracle pulled off a decent single-server integration nonetheless, although performance concerns linger to this day. Others, like Sybase, which attempted a Verity integration, couldn’t make it work reasonably at all. Microsoft, which started from the Sybase architecture, didn’t even try, or if they tried it wasn’t for long; Microsoft’s text search strategy has been multi-server more or less from the getgo.
3. Notwithstanding point #2, Oracle, IBM, Microsoft, and others have SQL DBMS extended to handle text via the SQL3 (or SQL/MM ) standard. (Truth be told, I get the names and sequencing of the SQL standard versions mixed up.) From this standpoint, the full text of a document is in a single column, and one can write WHERE clauses on that column using a rich set of text search operators.
But while such SQL statements formally fit into the relational predicate logic model, the fit is pretty awkward. Text search functions aren’t two-valued binary yes/no types of things; rather, they give scores, e.g. with 101 possible values (the integers from 0 - 100). Compounding them into a two-valued function typically throws away information, especially since that compounding isn’t well understood (which is why it’s so hard to usefully federate text searches across different corpuses).
4. Something even trickier is going on. Text search can be carried out against many different kinds of things. One increasingly useful target is the tables of a relational database. Where a standard SQL query might have trouble finding all the references in a whole database to a particular customer organization or product line or whatever, a text search can do a better job. This kind of use is becoming increasingly frequent. And while it works OK against relational products, it doesn’t fit into the formal relational model at all (at least not without a tremendous amount of contortion).
5. Relational DBMS typically manage the data they index. Text search systems often don’t. But that difference is almost a small one compared with some of the others mentioned above, especially since it’s a checkmark item for leading RDBMS to have some sort of formal federation capability.
Comments
7 Responses to “Relational DBMS versus text data”
Leave a Reply

[...] One of the biggest is in the area of text. They fail to see how text data management is fundamentally different from tabular data management. Here’s a little article explaining why text doesn’t fit well into the relational model. • • • [...]
I agree with this entirely — wrote an article on the related topic of searching text and how DBMSs just tend to be the wrong model, http://www.searchtools.com/info/database-search.html. I will link to this as well, it as you have more of the database architecture details.
[...] Curt Monash has some observations about search, meaning, semantics, the suitability of relational DBMS to manage text data and similar subjects: [...]
[...] EDIT: Point #4 of my post on the mismatch between relational databases and text search is pretty relevant here. • • • [...]
[...] Text search is a huge business on the web, and a separate big business in enterprises. And text doesn’t fit well into the relational paradigm at [...]
Text Search and the Relational Model…
Since text search is one of my top areas of expertise, I hope I can explain to you why the relational model is perfectly capable of capturing the structure of text. I’ll start at the very bottom, explaining what text search really is….
[...] Po pierwsze, SQL/MM Full-Text na razie jest tylko w ułamkowej części implementowany przez niektóre silniki bazodanowe. Po drugie, relacyjne bazy danych mają swoje ograniczenia. Widoczne są one także na polu wyszukiwania “ze zrozumieniem” - bo w sumie do tego dążymy jak się chwilę zastanowić. Pisze o tym np. Curt Monash w artykule Relational DBMS versus text data. [...]