Analysis of data management technology optimized for object data. Related subjects include:
My recent argument that the common terms “unstructured data” and “semi-structured data” are misnomers, and that a word like “multi-” or “poly-structured”* would be better, seems to have been well-received. But which is it — “multi-” or “poly-“?
*Everybody seems to like “poly-structured” better when it has a hyphen in it — including me.
The big difference between the two is that “multi-” just means there are multiple structures, while “poly-” further means that the structures are subject to change. Upon reflection, I think the “subject to change” part is essential, so poly-structured it is.
The definitions I’m proposing are:
- A database is poly-structured to the extent that its structure is apt to be changed in the ordinary course of query, update, or programming.
- Data is poly-structured to the extent that it is best represented in a poly-structured database.
- A DBMS is poly-structured to the extent that it is oriented to managing poly-structured databases.
My clients at MarkLogic have a new CEO, Ken Bado, even though former CEO Dave Kellogg was quite successful. If you cut through all the happy talk and side issues, the reason for the change is surely that the board wants to see MarkLogic grow faster, and specifically to move beyond its traditional niches of publishing (especially technical publishing) and national intelligence.
So what other markets could MarkLogic pursue? Before Ken even started work, I sent over some thoughts. They included (but were not limited to): Read more
When people talk about document-oriented NoSQL or some similar term, they usually mean something like:
Or, if they really mean,
The essence of whatever it is that CouchDB and MongoDB have in common.
well, that’s pretty much the same thing as what I said in the first place.
Of the various questions that might arise, three of the more definitional ones are:
- Why JSON rather than XML?
- What’s with this fluidity between the terms “document” and “object”?
- Are you serious about the lack of joins?
Let me take a crack at each. Read more
|Categories: CouchDB, MapReduce, MarkLogic, MongoDB and 10gen, NoSQL, Object, Structured documents||16 Comments|
Since posting last Wednesday morning that I’m looking into NoSQL and HVSP, I’ve had a lot of conversations, including with (among others):
- Dwight Merriman of 10gen (MongoDB)
- Damien Katz of Couchio (CouchDB)
- Matt Pfeil of Riptano (Cassandra)
- Todd Lipcon of Cloudera (HBase committer)
- Tony Falco of Basho (Riak)
- John Busch of Schooner
- Ori Herrnstadt of Akiban
In my discussion of Workday’s technology, I gave an estimate that Workday’s database, if relationally designed, would require “1000s” of tables. That estimate came from Workday, Inc. CTO Stan Swete, in a thoughtful email that made several points about Workday’s database strategy. Workday kindly gave me permission to quote it below.
|Categories: Data models and architecture, Object, OLTP, Software as a Service (SaaS), Specific users, Theory and architecture, Workday||3 Comments|
One of my coolest company visits in some time was to SaaS (Software as a Service) vendor Workday, Inc., earlier this month. Reasons included:
- Workday has forward-thinking ideas about SaaS enterprise applications and the integration of business intelligence into same.
- Workday has highly innovative ideas in how it manages data.
- Companies founded by Dave Duffield tend to feature smart, likeable people who talk to one pleasantly and forthrightly. Workday is no exception; CTO Stan Swete and the other Workday folks present were a delight to talk with.
- I’d invited Merv Adrian to come along with me. He asked great questions, and I could gather myself a bit despite how sleep-deprived I was for the first part of that trip.
The biggie for me was the data and object management part. Specifically: Read more
|Categories: Business intelligence, Data integration and middleware, Data models and architecture, EAI, EII, ETL, ELT, ETLT, NoSQL, Object, OLTP, Software as a Service (SaaS), Specific users, Theory and architecture, Workday||13 Comments|
I chatted Wednesday night with Darren Wood, the Australia-based lead developer of Objectivity’s Infinite Graph database product. Background includes:
- Objectivity is a profitable, decades-old object-oriented DBMS vendor with about 50 employees.
- Like some other object-oriented DBMS of its generation, Objectivity is as much a toolkit for building DBMS as it is a real finished DBMS product. Objectivity sales are typically for custom deals, where Objectivity helps with the programming.
- The way Objectivity works is basically:
- You manage objects in memory, in the format of your choice.
- Objectivity bangs them to disk, across a network.
- Objectivity manages the (distributed) pointers to the objects.
- You can, if you choose, hard code exactly which objects are banged to which node.
- Objectivity’s DML for reading data is very different from Objectivity’s DML for writing data. (I think the latter is more like the program code itself, while the former is more like regular DML.)
- The point of Objectivity is not so much to have fast I/O. Rather, it is to minimize the CPU cost of getting the data that comes across the wire into useful form.
- Darren got the idea of putting a generic graph DBMS front-end on Objectivity while doing a relationship analytics project for an Australian intelligence agency.
- Darren redoubled his efforts to sell the project internally at Objectivity after reading what I wrote about relationship analytics back in 2006 or so.
- There is now a 5 or so person team developing Infinite Graph.
- Infinite Graph is just now going out to beta test.
Infinite Graph is an API or language binding on top of Objectivity that:
- Hides a lot of Objectivity’s complexity.
- Is suitable for graph/relationship analytics.
|Categories: Analytic technologies, Object, Objectivity and Infinite Graph, RDF and graphs, Surveillance and privacy||9 Comments|
Akiban responded quickly to my complaints about its communication style, and I chatted for a couple of hours with senior Akiban techies Ori Herrnstadt, Peter Beaman and Jack Orenstein. It’s still early days for Akiban product development, so some details haven’t been determined yet, and others I just haven’t yet pinned down. Still, I know a lot more than I did a day ago. Highlights of my talk with Akiban included: Read more
I talked with Robert Nagle of Intersystems last week, and it went better than at least one other Intersystems briefing I’ve had. Intersystems’ main product is Cache’, an object-oriented DBMS introduced in 1997 (before that Intersystems was focused on the fourth-generation programming language M, renamed from MUMPS). Unlike most other OODBMS, Cache’ is used for a lot of stuff one would think an RDBMS would be used for, across all sorts of industries. That said, there’s a distinct health-care focus to Intersystems, in that:
- MUMPS, the original Intersystems technology, was focused on health care.
- The reasons Intersystems went object-oriented have a lot to do with the structure of health-care records.
- Intersystems’ biggest and most visible ISVs are in the health-care area.
- Intersystems is actually beginning to sell an electronic health records system called TrakCare around the world (but not in the US, where it has lots of large competitive VARs).
Note: Intersystems Cache’ is sold mainly through VARs (Value-Added Resellers), aka ISVs/OEMs. I.e., it’s sold by people who write applications on top of it.
So far as I understand – and this is still pretty vague and apt to be partially erroneous – the Intersystems Cache’ technical story goes something like this: Read more
|Categories: Data models and architecture, Emulation, transparency, portability, Health care, Intersystems and Cache', Mid-range, Object, OLTP, Sybase, Theory and architecture||7 Comments|
Dan Weinreb — inspired by but not linking to my recent short post on McObject’s object-oriented in-memory DBMS Perst — has posted a detailed discussion of Perst on his own blog. For context, he compares it briefly to analogous products, most especially Progress’s — which used to be ObjectStore, of which Dan was the chief architect.
This was based on documentation and general sleuthing (Dan figured out who McObject got Perst from), rather than hands-on experience, so performance figures and the like aren’t validated. Still, if you’re interested in such technology, it’s a fascinating post.