April 1, 2013

Some notes on new-era data management, March 31, 2013

Hmm. I probably should have broken this out as three posts rather than one after all. Sorry about that.

Performance confusion

Discussions of DBMS performance are always odd, for starters because:

But in NoSQL/NewSQL short-request processing performance claims seem particularly confused. Reasons include but are not limited to:

MongoDB and 10gen

I caught up with Ron Avnur at 10gen. Technical highlights included:

While this wasn’t a numbers-oriented conversation, business highlights included:

I can add that anecdotal evidence from other industry participants suggests there’s a lot of MongoDB mindshare.

Specific traditional-enterprise use cases we discussed focused on combining data from heterogeneous systems. Specifically mentioned were:

DBAs’ roles in development

A lot of marketing boils down to “We don’t need no stinking DBAs!!!” I’m thinking in particular of:

*See in particular the comments to that post.

The worst-case data warehousing scenario is indeed pretty bad. It could feature:

But if the goal is just to grab some data from an existing data warehouse, perhaps add in some additional data from the outside, and start analyzing it — well, then there are many attempted solutions to that problem, including from within the analytic RDBMS world. The question is whether the data warehouse administrators try to help — which usually means “Here’s your data; now go away and stop bothering me!” — or whether they focus on “business prevention”.

Meanwhile, on the NoSQL side:

It’s the old loose-/tight-coupling trade-off. Traditional relational practices offer a clean interface between database and code, but bundle the database characteristics for different applications tightly together. NoSQL tends to tie the database for any one app tightly to that app, at the cost of difficulties if multiple applications later try to use the same data. Either can make sense, depending on (for example):

Comments

8 Responses to “Some notes on new-era data management, March 31, 2013”

  1. Meng Mao on April 2nd, 2013 3:24 am

    “there’s an increasingly common choice in which data is written synchronously to RAM on 2 or more servers, than asynchronously to disk on each of them.”
    Here, you meant ‘then,’ not ‘than,’ right? I’m basing it on the context that follows.

  2. Curt Monash on April 2nd, 2013 3:41 am

    Thanks! Fixed.

  3. aaron on April 8th, 2013 3:46 pm

    DBAs….

    Your mention of DBAs follows some weak arguments in some vendors collaterals. The issue is conflating data modelers with operational/release roles – then teasing about locally controlled data vs. enterprise data.

    The fundamental issue with the arguments is that they offer little argument about their own products’ value. It is instead an appeal for freedom for for-purpose development vs. the constraints of lifecycle and modeling the larger enterprise. (This strategy worked historically for MS in getting departmental MS SQL databases by offering rejection of central IT, despite a weak product.) So, there are always competing (often aspirational) goals and skills/capabilities between the central and managed, and new and changing.

    Classical DBAs don’t offer much for Hadoop and other big stores; they certainly have a role with newsql if the app wants it and both sides get along.

    There is also an implication that data is being pulled from DW or classic OLTP rdbms to Hadoop – I’ve never seen that. It’s more they fit into broad app swaths. The places where they do overlap seem to be around analytic products or push of extracts into a common store

  4. Curt Monash on April 8th, 2013 4:08 pm

    Aaron,

    I disagree on several fronts.

    First, MS-SQL succeeded in its “weak” days in large part due to price and also to the superiority of its tools. It really was much simpler to install and administer. Eventually, Dan Rosenberg came over from Borland to open Oracle’s UI lab, and Andy Mendelsohn made a major push to improve Oracle’s tools. But basically through the latter 1990s Microsoft indeed had a major administrative usability advantage.

    Second, the implication is not that data moves from DWs to Hadoop; I don’t know why you think that, especially as my argument to the contrary in a recent post was widely supported, notably by a range of Hadoop luminaries.

    Third, while you’re right that operational/release DBAing and modeling aren’t the same thing, they tend to point the same way when you’re selecting a DBMS architecture to buy into. A database that is complicated to model in the first place is likely to be complicated to administer as well, because it’s likely to have more tables, more indexes on those tables, and so on. The correlation isn’t 1.0, but it sure is positive.

  5. aaron on April 9th, 2013 6:37 pm

    I think we’re mostly agreeing on MS SQL – it was a tool that couldn’t get past enterprise rules and got in as part of departmental byways, where it was an adequate db, but with easier admin and rich bundled tools (the point of appealing to local non-top management and to developers seems very similar to many newer dbmss. Another point, IBM had several working dbmss that needed little admin and didn’t sell much of these at the same time.)

    My misunderstanding on the second point.

    The complex models/multiple stakeholders vs. for purpose is the big issue newer db(ms) are confronting. If they are that simple, they are much easier to deal with front to back. Admin sees them as filesystems or caches, there is no complexity of multiple stakeholders. They are probably only interesting if they provide a capability not seen yet – and that would be a great topic for discussion.

    You give an example of an in-memory ddbms. My experience with these is poor; testing against complex workloads these don’t dramatically perform better than traditional rdbms (even if in simple workloads they may do 1-2 orders of magnitude better; scaling tests are often worse as well.) This seems to argue that complex tech is hard to do, and many are successful at caching, and some at FS :-)

  6. Curt Monash on April 9th, 2013 11:02 pm

    A lot is positioning and focus. Informix SE was very competitive with Progress — and I definitely include the respective 4GLs in that — but Informix defocused on that business to compete at the high end.

  7. Ian Posner on April 12th, 2013 3:12 am

    SQL Server grew because Microsoft realised that DBAs are viewed as a cost and in most organisations, perform little more activity than server builds and setting up backup and maintenance jobs. Back in the day, they performed a whole host of additional mundane tasks which Microsoft realised could easily be automated.

    Furthermore the real money-making battleground was SMEs rather than corporates for whom the cost of a DBA was a greater percentage of overall IT budget.

    They therefore set about inventing as close to a “DBA-less” database product as they could, opting to make the database as self-tuning (e.g. auto-updating stats) and set-and-forget (e.g. auto-expanding files) as possible, whilst still leaving the “knobs-on” for people that wanted to dig deeper.

    Microsoft also leveraged its toolset integration across the development stack, so it all works together easily, making it the preferred choice for developers on the .NET platform.

    Making a product easy for developers encouraged 3rd party software package producers to develop for SQL Server, and this too played a big part in forcing SQL Server into reluctant corporates. To this day there are many corporates whose primary database may be DB2 or Oracle, but whom have major SQL Server estates primarily due to the profusion of third party packaged software on that platform.

    In the financial world, they went after Sybase – and the effect has been DEVASTATING to Sybase sales, with bank-after-bank implementing Sybase to Microsoft migrations (to my knowledge, there is only one major investment bank that still has Sybase as a primary platform).

  8. Introduction to Deep Information Sciences and DeepDB | DBMS 2 : DataBase Management System Services on April 24th, 2013 5:42 pm

    [...] Tokutek has planted a small stake there too. [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.