Analysis of data management technology optimized for object data. Related subjects include:
From time to time, I try to step back and build a little taxonomy for the variety in database technology. One effort was 4 1/2 years ago, in a pre-planned exchange with Mike Stonebraker (his side, alas, has since been taken down). A year ago I spelled out eight kinds of analytic database.
The angle I’ll take this time is to say that every sufficiently large enterprise needs to be cognizant of at least 7 kinds of database challenge. General notes on that include:
- I’m using the weasel words “database challenge” to evade questions as to what is or isn’t exactly a DBMS.
- One “challenge” can call for multiple products and technologies even within a single enterprise, let alone at different ones. For example, in this post the “eight kinds of analytic database” are reduced to just a single category.
- Even so, one product or technology may be well-suited to address a couple different kinds of challenges.
The Big Seven database challenges that almost any enterprise faces are: Read more
|Categories: Data integration and middleware, Data models and architecture, Database diversity, EAI, EII, ETL, ELT, ETLT, Hadoop, Memory-centric data management, NoSQL, Object, OLTP, RDF and graphs, Structured documents, Talend, Text||3 Comments|
I have a bunch of backlogged post subjects in or around short-request processing, based on ongoing conversations with my clients at Akiban, Cloudant, Code Futures (dbShards), DataStax (Cassandra) and others. Let’s start with Akiban. When I posted about Akiban two years ago, it was reasonable to say:
- Akiban is in the short-request DBMS business.
- MySQL compatibility is one way to access Akiban, but it’s not the whole story.
- Akiban’s main point of technical differentiation is to arrange data hierarchically on disk so that many joins are “zero-cost”.
- Walking the hierarchy isn’t a great way to get at data for every possible query; Akiban recognizes the need for other access techniques as well.
All of the above are still true. But unsurprisingly, plenty of the supporting details have changed. Read more
A reporter tweeted: ”Is there a simple plain English definition for NoSQL?” After reminding him of my cynical yet accurate Third Law of Commercial Semantics, I gave it a serious try, and came up with the following. More precisely, I tweeted the bolded parts of what’s below; the rest is commentary added for this post.
NoSQL is most easily defined by what it excludes: SQL, joins, strong analytic alternatives to those, and some forms of database integrity. If you leave all four out, and you have a strong scale-out story, you’re in the NoSQL mainstream. Read more
|Categories: Cassandra, dbShards and CodeFutures, MarkLogic, MySQL, Object, Open source, Petabyte-scale data management, Schooner Information Technology||7 Comments|
Alex Williams noticed that there will be a NoSQL session at Oracle OpenWorld next week, and is wondering whether this will be a big deal. I think it won’t be.
There really are three major points to NoSQL.
- Dynamic schemas. This is the only one of the three that truly depends on NoSQL.
- Scale-out short-request processing. If you want to scale out efficiently at high request volumes, you’re best off not using all the flexibility SQL/relational DBMS offer. (In particular, you don’t want to do cross-node joins). Not coincidentally, a number of the best scale-out offerings were built to be NoSQL.
- Open source. Doing a relational DBMS is a big project. It seems easier to build NoSQL ones.
Oracle can address the latter two points as aggressively as it wishes via MySQL. It so happens I would generally recommend MySQL enhanced by dbShards, Schooner, and/or dbShards/Schooner, rather than Oracle-only MySQL … but that’s a detail. In some form or other, Oracle’s MySQL is a huge player in the scale-out, open source, short-request database management market.
So that leaves us with dynamic schemas. Oracle has at least four different sets of technology in that area:
- As Workday noticed years ago, MySQL can be used as a functional, basic key-value store.
- Oracle also has XML-based Berkeley DB/SleepyCat kicking around.*
- The XML extensions to Oracle’s core DBMS could be alleged to have a dynamic schema/NoSQL flavor. (Blech.)
- A dynamic schema argument could also be made for object-oriented DBMS technology. While Oracle doesn’t to my knowledge exactly sell that, it does have the Tangosol Coherence line of technology, with a potentially similar programming model.
If Oracle is now refreshing and rebranding one or more of these as “NoSQL”, there’s no reason to view that as a big deal at all.
*That’s Mike Olson’s former company, if you’re keeping score at home.
|Categories: MySQL, NoSQL, Object, OLTP, Open source, Oracle, Parallelization, Schooner Information Technology, Structured documents||13 Comments|
salesforce.com, force.com, and database.com use exactly the same database infrastructure and architecture. That’s the good news. The bad news is that salesforce.com is somewhat obscure about technical details, for reasons such as:
- A long-ago marketing decision to not give infrastructure details, so as to convey a “Don’t worry; we’ll take care of everything” message.
- Even so, a long-ago and perhaps now-regretted marketing decision to disclose and even exaggerate salesforce.com’s reliance on Oracle, as part of an early-days attempt to prove salesforce was using enterprise-class technology.
- A desire to hide the recipe for salesforce.com’s secret sauce.
- Force of habit — I’m not sure salesforce even knows how to tell its technical story with any clarity.
Actually, salesforce.com has moved some kinds of data out of Oracle that previously used to be stored there. Besides Oracle, salesforce uses at least a file system and a RAM-based data store about which I have no details. Even so, much of salesforce.com’s data is stored in Oracle — a single instance of Oracle, which it believes may be the largest instance of Oracle in the world.
|Categories: Data models and architecture, Market share and customer counts, Memory-centric data management, Object, OLTP, Oracle, salesforce.com, Software as a Service (SaaS)||18 Comments|
E. F. “Ted” Codd taught the computing world that databases should have fixed logical schemas (which protect the user from having to know about physical database organization). But he may not have been as universally correct as he thought. Cases I’ve noted in which fixed schemas may be problematic include:
- “A bunch of apps in one, similar but not the same” (in my recent post on MongoDB).
- Out-of-control product catalogs (ditto).
- Analytic use cases in which one keeps enhancing the database with derived data.
And if marketing profile analysis is ever done correctly, that will be a huge example for the list.
So what do we call those DBMS — for example NoSQL, object-oriented, or XML-based systems — that bake the schema into the applications or the records themselves? In the MongoDB post I went with “schemaless,” but I wasn’t really comfortable with that, so I took the discussion to Twitter. Comments from Vlad Didenko (in particular), Ryan Prociuk, Merv Adrian, and Roland Bouman favored the idea that schemas in such systems are changeable or late-bound, rather than entirely absent. I quickly agreed.
I talked with McObject yesterday. McObject has two product lines, both of which are something like in-memory DBMS — eXtremeDB, which is the main one, and Perst. McObject has been around since at least 2003, probably has no venture capital, and probably has a very low double-digit number of employees.*
*I could be wrong in those guesses; as small companies go, McObject is unusually prone to secrecy games.
As best I understand:
- eXtremeDB is something like an in-memory object-oriented DBMS, designed to be embeddable.
- However, much as with Objectivity and other old-school OODBMS, eXtremeDB winds up being more of a toolkit with which to build DBMS than a full DBMS.
- eXtremeDB has a few indexing schemes. The main one is good old B-trees. One customer wanted Patricia tries, so they’re in there. (Perhaps not coincidentally, solidDB relies on Patricia tries.) At least one wanted R-trees, so they’re in there too.
- eXtremeDB has long had the option of persistent logs.
- eXtremeDB newly has a hybrid memory-centric option, in which you can have more data in the database than fits into RAM.
- eXtremeDB newly has multi-master two-phase-commit clustering.
My guess three years ago that eXtremeDB might emerge as an alternative to solidDB seems to have been borne out. McObject CEO Steve Graves says that the core of McObject’s business is OEMs, in sectors such as telecom equipment and defense/aerospace. That’s exactly solidDB’s traditional market, except that solidDB got acquired by IBM and deemphasized it.
I’ve said before that if I were starting a SaaS effort — and it wasn’t just focused on analytics — I’d look at using a memory-centric OODBMS. Perhaps eXtremeDB is worth looking at in such scenarios.
|Categories: In-memory DBMS, McObject, Memory-centric data management, Object, Objectivity and Infinite Graph, solidDB, Telecommunications||9 Comments|
Edit: I checked with Oracle, and it’s indeed TimesTen that’s supposed to be the basis of this new appliance, as per a comment below. That would be less cool, alas.
Oracle seems to have said on yesterday’s conference call Oracle OpenWorld (first week in October) will feature appliances based on Tangosol and Hadoop. As I post this, the Seeking Alpha transcript of Oracle’s call is riddled with typos. Bolded comments below are by me. Read more
|Categories: Data warehouse appliances, Hadoop, In-memory DBMS, MapReduce, Memory-centric data management, Object, Oracle||8 Comments|
There are plenty of viable alternatives to relational database management systems. For short-request processing, both document stores and fully object-oriented DBMS can make sense. Text search engines have an important role to play. E. F. “Ted” Codd himself once suggested that relational DBMS weren’t best for analytics.* Analysis of machine-generated log data doesn’t always have a naturally relational aspect. And I could go on with more examples yet.
*Actually, he didn’t admit that what he was advocating was a different kind of DBMS, namely a MOLAP one — but he was. And he was wrong anyway about the necessity for MOLAP. But let’s overlook those details.
Nonetheless, relational DBMS dominate the market. As I see it, the reasons for relational dominance cluster into four areas (which of course overlap):
- Data re-use. Ted Codd’s famed original paper referred to shared data banks for a reason.
- The benefits of normalization, which include:
- You only have to do programming work of writing something once …
- … and you don’t have to do the programming work of keeping multiple versions of the information consistent.
- You only have to do processing work of writing something once.
- You only have to buy storage to hold each fact once.
- Separation of concerns.
- Different people can worry about programming and “database stuff.”
- Indeed, even performance optimization can sometimes be separated from programming (i.e., when all you have to do to get speed is implement the correct indexes).
- Maturity and momentum, as reflected in the availability of:
- A broad variety of mature relational DBMS.
- Vast amounts of packaged software that “talks” SQL.
Generally speaking, I find the reasons for sticking with relational technology compelling in cases such as: Read more
|Categories: Analytic technologies, Data models and architecture, Database diversity, MOLAP, NoSQL, Object, Theory and architecture||18 Comments|
Since posting recently about Starcounter, I’ve had the chance to actually talk with the company (twice). Hence I know more than before. Starcounter:
- Has been around as a company since 2006.
- Has developed memory-centric object-oriented DBMS technology that has been OEMed by a few application software companies (especially in bricks-and-mortar retailing and in online advertising).
- Is planning to actually launch an OODBMS product sometime this summer.
- Has 14 employees (most or all of whom are in Sweden, which is also where I think Starcounter’s current customers are centered).
- Is planning to shift emphasis soon to the US market.
Starcounter’s value propositions are programming ease (no object/relational impedance mismatch) and performance. Starcounter believes its DBMS has 100X the performance of conventional DBMS at short-request transaction processing, and 10X the performance of other memory-centric and/or object-oriented DBMS (e.g. Oracle TimesTen, or Versant). That said, Starcounter has not yet tested VoltDB. Starcounter does not claim performance much beyond that of disk-based DBMS on analytic tasks such as aggregations.
The key technical aspect to Starcounter is integration between the DBMS and the virtual machine, so that the same copy of the data is accessed by both the DBMS and the application program, without any movement or transformation being needed. (Starcounter isn’t aware of any other object-oriented DBMS that work this way.) Transient and persistent data are handled in the same way, seamlessly.
Other Starcounter technical highlights include: Read more
|Categories: Data models and architecture, In-memory DBMS, Memory-centric data management, Object, OLTP, Starcounter, Theory and architecture||3 Comments|