Structured documents

Analysis of data management technology based on a structured-document model, or optimized for XML data. Related subjects include:

January 10, 2012

Splunk update

Splunk is announcing the Splunk 4.3 point release. Before discussing it, let’s recall a few things about Splunk, starting with:

As in any release, a lot of Splunk 4.3 is about “Oh, you didn’t have that before?” features and Bottleneck Whack-A-Mole performance speed-up. One performance enhancement is Bloom filters, which are a very hot topic these days. More important is a switch from Flash to HTML5, so as to accommodate mobile devices with less server-side rendering. Splunk reports that its users — especially the non-IT ones — really want to get Splunk information on the tablet devices. While this somewhat contradicts what I wrote a few days ago pooh-poohing mobile BI, let me hasten to point out:

That’s pretty much the ideal scenario for mobile BI: Timeliness matters and prettiness doesn’t.

Read more

November 1, 2011

MarkLogic 5, and why you might care

MarkLogic is releasing MarkLogic 5. Key elements of the announcement are:

Also, MarkLogic is early with a feature that most serious DBMS vendors will soon have – support for tiered storage, with writes going first to solid-state storage, then being flushed to disk via a caching-style algorithm.* And as befits a sometime search-engine-substitute, MarkLogic has finally licensed a large set of document filters, from an Australian company called Isys. Apparently, the special virtue of the Isys filters is that they’re good at extracting not only text, but metadata as well.

*If there’s a caching algorithm that doesn’t contain a major element of LRU (Least Recently Used), I don’t recall ever hearing about it.

MarkLogic seems to have settled on a positioning that, although distressingly buzzword-heavy, is at least partly based upon reality. The real part includes:

Based on that reality, MarkLogic talks a lot about Volume, Velocity, Variety, Big Data, unstructured data, semi-structured data, and big data analytics.

Read more

September 30, 2011

Oracle NoSQL is unlikely to be a big deal

Alex Williams noticed that there will be a NoSQL session at Oracle OpenWorld next week, and is wondering whether this will be a big deal. I think it won’t be.

There really are three major points to NoSQL.

Oracle can address the latter two points as aggressively as it wishes via MySQL. It so happens I would generally recommend MySQL enhanced by dbShards, Schooner, and/or dbShards/Schooner, rather than Oracle-only MySQL … but that’s a detail. In some form or other, Oracle’s MySQL is a huge player in the scale-out, open source, short-request database management market.

So that leaves us with dynamic schemas. Oracle has at least four different sets of technology in that area:

If Oracle is now refreshing and rebranding one or more of these as “NoSQL”, there’s no reason to view that as a big deal at all.

*That’s Mike Olson’s former company, if you’re keeping score at home.

August 18, 2011

HP/Autonomy sound bites

HP has announced that:

On a high level, this means:

My coverage of Autonomy isn’t exactly current, but I don’t know of anything that contradicts long-time competitor* Dave Kellogg’s skeptical view of Autonomy. Autonomy is a collection of businesses involved in the management, search, and retrieval of poly-structured data, in some cases with strong market share, but even so not necessarily with the strongest of reputations for technology or technology momentum. Autonomy started from a text search engine and a Bayesian search algorithm on top of that, which did a decent job for many customers. But if there’s been much in the way of impressive enhancement over the past 8-10 years, I’ve missed the news.

*Dave, of course, was CEO of MarkLogic.

Questions obviously arise about how the Autonomy acquisition relates to other HP businesses. My early thoughts include:  Read more

August 13, 2011

Couchbase business update

I decided I needed some Couchbase drilldown, on business and technology alike, so I had solid chats with both CEO Bob Wiederhold and Chief Architect Dustin Sallings. Pretty much everything I wrote at the time Membase and CouchOne merged to form Couchbase (the company) still holds up. But I have more detail now. ;)

Context for any comments on customer traction includes:

That said,

Membase sales are concentrated in five kinds of internet-centric companies, which in declining order are: Read more

July 31, 2011

Terminology: Dynamic- vs. fixed-schema databases

E. F. “Ted” Codd taught the computing world that databases should have fixed logical schemas (which protect the user from having to know about physical database organization).  But he may not have been as universally correct as he thought. Cases I’ve noted in which fixed schemas may be problematic include:

And if marketing profile analysis is ever done correctly, that will be a huge example for the list.

So what do we call those DBMS — for example NoSQL, object-oriented, or XML-based systems — that bake the schema into the applications or the records themselves? In the MongoDB post I went with “schemaless,” but I wasn’t really comfortable with that, so I took the discussion to Twitter. Comments from Vlad Didenko (in particular), Ryan Prociuk, Merv Adrian, and Roland Bouman favored the idea that schemas in such systems are changeable or late-bound, rather than entirely absent. I quickly agreed.

Read more

May 17, 2011

Terminology: poly-structured data, databases, and DBMS

My recent argument that the common terms “unstructured data” and “semi-structured data” are misnomers, and that a word like “multi-” or “poly-structured”* would be better, seems to have been well-received. But which is it — “multi-” or “poly-”?

*Everybody seems to like “poly-structured” better when it has a hyphen in it — including me. :)

The big difference between the two is that “multi-” just means there are multiple structures, while “poly-” further means that the structures are subject to change. Upon reflection, I think the “subject to change” part is essential, so poly-structured it is.

The definitions I’m proposing are:

Read more

April 5, 2011

Whither MarkLogic?

My clients at MarkLogic have a new CEO, Ken Bado, even though former CEO Dave Kellogg was quite successful. If you cut through all the happy talk and side issues, the reason for the change is surely that the board wants to see MarkLogic grow faster, and specifically to move beyond its traditional niches of publishing (especially technical publishing) and national intelligence.

So what other markets could MarkLogic pursue? Before Ken even started work, I sent over some thoughts. They included (but were not limited to):  Read more

February 7, 2011

Notes on document-oriented NoSQL

When people talk about document-oriented NoSQL or some similar term, they usually mean something like:

Database management that uses a JSON model and gives you reasonably robust access to individual field values inside a JSON (JavaScript Object Notation) object.

Or, if they really mean,

The essence of whatever it is that CouchDB and MongoDB have in common.

well, that’s pretty much the same thing as what I said in the first place. :)

Of the various questions that might arise, three of the more definitional ones are:

Let me take a crack at each.  Read more

November 29, 2010

Document-oriented DBMS without joins

When I talked with MarkLogic’s Ken Chestnut about MarkLogic 4.2, I was surprised to learn that MarkLogic really, truly doesn’t do anything like a join. Unlike some other non-SQL DBMS, MarkLogic has no SQL interface, no ODBC or JDBC. Nothing, nada. (MarkLogic has a Java interface for Xquery, but not for anything like SQL.)

Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.