Data types

Analysis of data management technology optimized for specific datatypes, such as text, geospatial, object, RDF, or XML. Related subjects include:

November 7, 2008

Big scientific databases need to be stored somehow

A year ago, Mike Stonebraker observed that conventional DBMS don’t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods. Even so, some of the largest databases around are scientific ones, and they have to be managed somehow. For example:

Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility — e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.

October 14, 2008

Teradata Geospatial, and datatype extensibility in general

As part of it’s 13.0 release this week, Teradata is productizing its geospatial datatype, which previously was just a downloadable library. (Edit:  More precisely, Teradata announced 13.0, which will actually be shipped some time in 2009.) What Teradata Geospatial now amounts to is:

Teradata also intends in the future to implement actual geospatial indexing; candidates include r-trees and tesselation.

Hearing this was a good wake-up call for me, because in the past I’ve conflated two issues on datatype extensibility, namely:

But as Teradata just pointed out, those two issues can indeed be separated from each other.

October 5, 2008

Schema flexibility and XML data management

Conor O’Mahony, marketing manager for IBM’s DB2 pureXML talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:

Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange.

Read more

October 5, 2008

Vertical market XML standards

Tracking the alphabet soup of vertical market XML standards is hard. So as a starting point, I’m splitting a list I got from IBM into a standalone post.

Among the most important or successful IBM pureXML-supported standards, in terms of downloads and other evidence of customer interest, are:

Read more

October 5, 2008

Overview of IBM DB2 pureXML

On August 29, I had a great call with IBM about DB2 pureXML (most of the IBM side of the talking was done by Conor O’Mahony and Qi Jin). I’m finally getting around to writing it up now. (The world of tabular data warehousing has kept me just a wee bit busy …)

As I write it, I see there are a considerable number of holes, but that’s the way it seems to go when researching XML storage. I’m also writing up a September call from which I finally figured out (I think) the essence of how MarkLogic Server works – but only after five months of trying. It turns out that MarkLogic works rather differently from DB2 pureXML. Not coincidentally, IBM and Mark Logic focus on rather different use cases for native XML storage.

What I understand so far about the basic DB2 pureXML architecture goes like this:

Read more

October 5, 2008

MarkLogic architecture deep dive

While I previously posted in great detail about how MarkLogic Server is an ACID-compliant XML-oriented DBMS with integrated text search that indexes everything in real time and executes range queries fairly quickly, I didn’t have a good feel for how all those apparently contradictory characteristics fit into a single product. But I finally had a call with Mark Logic Director of Engineering Ron Avnur, and think I have a better grasp of the MarkLogic architecture and story.

Ron described MarkLogic Server as a DBMS for trees.

Read more

September 26, 2008

Netezza and Teradata on analytic geospatial data management

Geospatial data management is one of the flavors of the month:

So I asked Netezza and Teradata what this geospatial analytics stuff is all about.

Read more

September 23, 2008

Oracle spotlights its datatype support

Oracle put out a flurry of press releases today in conjunction with Oracle OpenWorld. One, which was simply positioned as a report on some “mission-critical” customer apps, caught my eye because all four detailed examples involved nonstandard datatypes:

September 23, 2008

Peter Batty on Netezza Spatial

As previously noted, I’m not up to speed on Netezza Spatial. Phil Francisco of Netezza has promised we’ll fix that ASAP. In the mean time, I found a blog by a guy named Peter Batty, who evidently:

Batty offers a lot of detail in two recent posts, intermixed with some gollygeewhiz about Netezza in general. If you’re interested in this stuff, Batty’s blog is well worth checking out. Read more

September 15, 2008

Teradata decides to compete head-on as a data warehouse appliance vendor

In a press release today that is surely timed to impinge on the Netezza user conference news cycle, Teradata has come out swinging. Highlights include:

Read more

August 26, 2008

Known applications of MapReduce

Most of the actual MapReduce applications I’ve heard of fall into a few areas:

That covers all MapReduce apps I recall hearing about via commercial companies and users, and also includes most of what’s in the two big sources I found online.

Read more

June 28, 2008

Who is doing what in XML data management these days?

A comment thread to a post on a different subject has opened up a discussion of XML storage. Frankly, I haven’t kept up with my briefings on the subject, in part because XML support hasn’t proved to be very important yet to the big DBMS vendors, somewhat to my surprise. When last I looked, the situation wasn’t much different from what it was back in November, 2005. Unless I’ve missed something (and please tell me if I have!), here’s what’s going on: Read more

June 8, 2008

Detailed analysis of Perst and other in-memory object-oriented DBMS

Dan Weinreb — inspired by but not linking to my recent short post on McObject’s object-oriented in-memory DBMS Perst — has posted a detailed discussion of Perst on his own blog. For context, he compares it briefly to analogous products, most especially Progress’s — which used to be ObjectStore, of which Dan was the chief architect.

This was based on documentation and general sleuthing (Dan figured out who McObject got Perst from), rather than hands-on experience, so performance figures and the like aren’t validated. Still, if you’re interested in such technology, it’s a fascinating post.

June 6, 2008

Open source in-memory DBMS

I’ve gotten email about two different open source in-memory DBMS products/projects. I don’t know much about either, but in case you care, here are some pointers to more info.

First, the McObject guys — who also sell a relational in-memory product — have an object-oriented, apparently Java-centric product called Perst. They’ve sent over various press releases about same, the details of which didn’t make much of an impression on me. (Upon review, I see that one of the main improvements they cite in Perst 3.0 is that they added 38 pages of documentation.)

Second, I just got email about something called CSQL Cache. You can read more about CSQL Cache here, if you’re willing to navigate some fractured English. CSQL’s SourceForge page is here. My impression is that CSQL Cache is an in-memory DBMS focused on, you guessed it, caching. It definitely seems to talk SQL, but possibly its native data model is of some other kind (there are references both to “file-based” and “network”.)

May 13, 2008

McObject eXtremeDB — a solidDB alternative

McObject — vendor of memory-centric DBMS eXtremeDB — is a tiny, tiny company, without a development team of the size one would think needed to turn out one or more highly-reliable DBMS. So I haven’t spent a lot of time thinking about whether it’s a serious alternative to solidDB for embedded DBMS, e.g. in telecom equipment. However:

And they do seem to have some nice features, including Patricia tries (like solidDB), R-trees (for geospatial), and some kind of hybrid disk-centric/memory-centric operation.

May 8, 2008

Database blades are not what they used to be

In which we bring you another instantiation of Monash’s First Law of Commercial Semantics: Bad jargon drives out good.

When Enterprise DB announced a partnership with Truviso for a “blade,” I naturally assumed they were using the term in a more-or-less standard way, and hence believed that it was more than a “Barney” press release.* Silly me. Rather than referring to something closely akin to “datablade,” EnterpriseDB’s “blade” program turns out to just to be a catchall set of partnerships.

*A “Barney” announcement is one whose entire content boils down to “I love you; you love me.”

According to EnterpriseDB CTO Bob Zurek, the main features of the “blade” program include:

Read more

April 29, 2008

Truviso and EnterpriseDB blend event processing with ordinary database management

Truviso and EnterpriseDB announced today that there’s a Truviso “blade” for Postgres Plus. By email, EnterpriseDB Bob Zurek endorsed my tentative summary of what this means technically, namely:

  • There’s data being managed transactionally by EnterpriseDB.

  • Truviso’s DML has all along included ways to talk to a persistent Postgres data store.

  • If, in addition, one wants to do stream processing things on the same data, that’s now possible, using Truviso’s usual DML.

Read more

April 29, 2008

The Mark Logic story in XML database management

Mark Logic* has an interesting, complex story. They sell a technology stack based on an XML DBMS with text search designed in from the get go. They usually want to be known as a “content” technology provider rather than a DBMS vendor, but not quite always.

*Note: Product name = MarkLogic, company name = Mark Logic.

I’ve agreed to do a white paper and webcast for Mark Logic (sponsored, of course). But before I start serious work on those, I want to blog based on what I know. As always, feedback is warmly encouraged.

Some of the big differences between MarkLogic and other DBMS are:

Other architectural highlights include:

Read more

March 28, 2008

XML versus sparse columns in variable schemas

Simon Sabin makes an interesting point: If you can have 30,000 columns in a table without sparsity management blowing up, you can handle entities with lots of different kinds of attributes. (And in SQL Server you can now do just that.) The example he uses is products — different products can have different sets of possible colors, different kinds of sizes, and so on. An example I’ve used in the past is marketing information — different prospects can reveal different kinds of information, which may have been gathered via non-comparable marketing programs.

I’ve suggested this kind of variability as a reason to actually go XML — you’re constantly adding not just new information, but new kinds of information, so your fixed schema is never up to date. But I haven’t detected many actual application designers who agree with me …

March 6, 2008

Microsoft SQL Server Data Services

As usual, Microsoft forgot to brief me, but Mary Jo Foley reports on Microsoft SQL Server Data Services. A look at the official site clarifies that this database-in-a-cloud offering uses “Microsoft SQL Server as a data storage node.” However, there seems to be a software layer on top of SQL Server providing scale-out and appropriate management.

In addition to the more-than-SQL-Server layer, there seems to be a less-than-SQL-Server aspect as well. In a particular, Microsoft SQL Server Data Services boasts “Support for simple types: string, numeric, datetime, boolean.” XML is the “primary wire format,” and hints dropped about the schema philosophy sound XMLish too.

Interestingly, Foley reports that Microsoft plans to offer an on-premises version of Microsoft SQL Server Data Services as well.

February 16, 2008

Mike Stonebraker’s DBMS taxonomy

In a response to my recent five-part series on DBMS diversity, Mike Stonebraker has proposed his own taxonomy of data management technologies over on Vertica’s Database Column blog.

  1. OLTP DBMSs focused on fast, reliable transaction processing
  2. Analytic/Data Warehouse DBMSs focused on efficient load and ad-hoc query performance
  3. Science DBMSs — after all MatLab does not scale to disk-sized arrays
  4. RDF stores focused on efficiently storing semi-structured data in this format
  5. XML stores focused on semi-structured data in this format
  6. Search engines — the big players all use proprietary engines in this area
  7. Stream Processing Engines focused on real-time StreamSQL
  8. “Lean and Mean,” less-than-a-database engines focused on doing a small number of things very well (embedded databases are probably in this category)
  9. MapReduce and Hadoop — after all Google has enough “throw weight” to define a category

He goes on to say that each will be architected differently, except that — as he already convinced me back in July — RDF will be well-managed by specialty data warehouse DBMS. Read more

February 15, 2008

Database management system choices – beyond relational

This is the fifth of a five-part series on database management system choices. For the first post in the series, please click here.

Relational database management systems have three essential elements:

  1. Rows and columns. Theoretically, rows and columns may be inessential to the relational model. But in reality, they are built into the design of every real-world relational product. If you don’t have rows and columns, you’re not using the product to do what it was well-designed for.
  2. Predicate logic. Theoretically, everything can be fitted into a predicate Procrustean bed. But if you’re looking for relevancy rankings on a text search, binary logic is a highly convoluted way to get them.
  3. Fixed schemas. Database theorists commonly assume that databases have fixed schemas. If this means that 90%+ of all information is null or missing, they have elegant ways of dealing with that. Even so, as computing gets ever more concerned with individuals — each with his/her/its unique “profile(s)” — fixed schemas get ever harder to maintain.

If any of these three elements is missing or inappropriate, then a traditional relational database management system may not be the best choice.

Read more

February 1, 2008

Dan Weinreb on ObjectStore

Dan Weinreb was one of the key techies at Object Design, the company that made the object-oriented database management system ObjectStore. (Object Design later merger into Excelon, which was eventually sold to Progress, which has deemphasized but still supports ObjectStore.) Recently he wrote a pair of long and fascinating articles about Object Design, ObjectStore, and OODBMS, the first of which makes the case that “object-oriented database management systems succeeded.” Read more

February 1, 2008

CouchDB — lazy database design taken to excess?

I’ve run into a research/alpha/whatever project called CouchDB a couple of times now. It’s yet another “Who needs relational databases? Who needs schemas?” kind of idea. Rather, CouchDB is for taking random documents and banging them into databases, then calculating views on the fly as needed. It’s REST-friendly. Lucene and a web server are built in.

Damien Katz seems to be the driving force behind CouchDB, and his discussion of document-oriented development seems to be a good starting point. Read more

January 31, 2008

5 kinds of data structure and 16 kinds of data access method

My recent post about datatype extensibility zoomed over at least one head, as per the comment thread. Since then I’ve googled, and come to suspect that part of what I was assuming as common knowledge may not be so common after all. So I’m going to back up and explain a bit about data access methods, as well as the sub-topic of data structures. If you take nothing else away from this post, I hope it will at least remind of you of the sheer variety of ways data can be stored on disk or in RAM.

First, let’s define the concept of data access method in three steps:

Read more

Next Page →

Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

The Explosion in DBMS Choice

August, 2008

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.