November 30, 2014

Thoughts and notes, Thanksgiving weekend 2014

I’m taking a few weeks defocused from work, as a kind of grandpaternity leave. That said, the venue for my Dances of Infant Calming is a small-but-nice apartment in San Francisco, so a certain amount of thinking about tech industries is inevitable. I even found time last Tuesday to meet or speak with my clients at WibiData, MemSQL, Cloudera, Citus Data, and MongoDB. And thus:

1. I’ve been sloppy in my terminology around “geo-distribution”, in that I don’t always make it easy to distinguish between:

The latter case can be subdivided further depending on whether multiple copies of the data can accept first writes (aka active-active, multi-master, or multi-active), or whether there’s a clear single master for each part of the database.

What made me think of this was a phone call with MongoDB in which I learned that the limit on number of replicas had been raised from 12 to 50, to support the full-replication/latency-reduction use case.

2. Three years ago I posted about agile (predictive) analytics. One of the points was:

… if you change your offers, prices, ad placement, ad text, ad appearance, call center scripts, or anything else, you immediately gain new information that isn’t well-reflected in your previous models.

Subsequently I’ve been hearing more about predictive experimentation such as bandit testing. WibiData, whose views are influenced by a couple of Very Famous Department Store clients (one of which is Macy’s), thinks experimentation is quite important. And it could be argued that experimentation is one of the simplest and most direct ways to increase the value of your data.

3. I’d further say that a number of developments, trends or possibilities I’m seeing are or could be connected. These include agile and experimental predictive analytics in general, as noted in the previous point, along with: 

Also, the flashiest application I know of for only-moderately-successful KXEN came when one or more large retailers decided to run separate models for each of thousands of stores.

4. MongoDB, the product, has been refactored to support pluggable storage engines. In connection with that, MongoDB does/will ship with two storage engines – the traditional one and a new one from WiredTiger (but not TokuMX). Both will be equally supported by MongoDB, the company, although there surely are some tiers of support that will get bounced back to WiredTiger.

WiredTiger has the same techie principals as SleepyKat – get the wordplay?! – which was Mike Olson’s company before Cloudera. When asked, Mike spoke of those techies in remarkably glowing terms.

I wouldn’t be shocked if WiredTiger wound up playing the role for MongoDB that InnoDB played for MySQL. What I mean is that there were a lot of use cases for which the MySQL/MyISAM combination was insufficiently serious, but InnoDB turned MySQL into a respectable DBMS.

5. Hadoop’s traditional data distribution story goes something like:

However, Cloudera has noticed that some large enterprises really, really like to have storage separate from processing. Hence its recent partnership to work with EMC Isilon. Other storage partnerships, as well as a better fit with S3/object storage kinds of environments, are sure to follow, but I have no details to offer at this time.

6. Cloudera’s count of Spark users in its customer base is currently around 60. That includes everything from playing around to full production.

7. Things still seem to be going well at MemSQL, but I didn’t press for any details that I would be free to report.

8. Speaking of MemSQL, one would think that at some point something newer would replace Oracle et al. in the general-purpose RDBMS world, much as Unix and Linux grew to overshadow the powerful, secure, reliable, cumbersome IBM mainframe operating systems. On the other hand:

Also, perhaps no replacement will be needed. If we subdivide the database management world into multiple categories including:

it’s not obvious that the general-purpose RDBMS category on its own requires any new entrants to ever supplant the current leaders.

All that said – if any of the current new entrants do pull off the feat, SAP HANA is probably the best (longshot) guess to do so, and MemSQL the second-best.

9. If you’re a PostgreSQL user with performance or scalability concerns, you might want to check what Citus Data is doing.

Comments

12 Responses to “Thoughts and notes, Thanksgiving weekend 2014”

  1. Evan on November 30th, 2014 11:27 pm

    When you say:

    “it’s not obvious that the general-purpose RDBMS category on its own requires any new entrants to ever supplant the current leaders.”

    What do you mean by “requires”? Are you saying the products can’t be dramatically improved?

  2. Curt Monash on December 1st, 2014 12:22 am

    Evan,

    The older, larger vendors in a sector have much more previous engineering effort invested in their products, and much more in the way of current engineering resources.

    The newer, smaller vendors can have products with more modern architectures and simpler code lines. Who wins?

    In the case of general-purpose RDBMS, the older vendors have the option of in essence federating their legacy products with newer ones. If they do that the easy(ier) way, they still keep old-fashioned disk-centric, lock-heavy architectures, and eventually they lose. If they do that the hard way, however, with a thinner/more modular form of commonality among the engines, they can win.

    They’ve been trying the easy(ier) way first, but there’s probably still time for them to do it right.

  3. David Gruzman on December 1st, 2014 4:04 am

    In realm of big data I see split of monolithic (R)DBMS into two interchangeable parts:
    storage engine and query engine. HDFS, S3, are good examples of storage engines.
    Impala, Hive, Spark – are examples of query engines.
    I think this division gives to users two serious advantages : they still own data, since it stored in known format, and capability to use several engines on the same data.

  4. Mark Callaghan on December 1st, 2014 9:51 pm

    I agree with Mike. The people at WiredTiger are very good at building database engines. WiredTiger performance is very impressive.

    There is a new effort to make PostgreSQL faster for complex query processing — http://vitessedata.com

  5. Hadoop successor sparks a data analysis evolution - CIO - TechReact - Reacting to Today's Information Technology Leadership on December 6th, 2014 3:28 am

    […] are getting the message. Hadoop distributor Cloudera, which also includes Spark in its releases, has about 60 enterprise customers using Spark in some form or another, according to Monash. Other Hadoop distributors, notably Hortonworks and MapR, also offer Spark in […]

  6. Ranko Mosic on December 8th, 2014 10:34 pm

    Is Hana general purpose or analytical rdbms ?
    Looks like it is analytical rdbms which can’t cost effectively handle big data. Oltp database which is not really meant to be row oriented ( columnar seems to be main theme ). Product positioning seems to be a bit fuzzy, I think.

  7. MongoDB 3.0 | DBMS 2 : DataBase Management System Services on February 12th, 2015 2:44 pm

    […] big news in MongoDB 3.0* is the WiredTiger storage engine. The top-level claims for that are that one should “typically” expect […]

  8. MongoDB 3.0 with a new storage engine | VietHiP on February 13th, 2015 12:01 pm

    […] big news in MongoDB 3.0* is the WiredTiger storage engine. The top-level claims for that are that one should “typically” expect […]

  9. Where the innovation is | DBMS 2 : DataBase Management System Services on March 1st, 2015 1:31 am

    […] Predictive experimentation. […]

  10. Which analytic technology problems are important to solve for whom? | DBMS 2 : DataBase Management System Services on April 12th, 2015 11:52 pm

    […] Various notes (November, 2014) […]

  11. Notes and links, December 12, 2014 | DBMS 2 : DataBase Management System Services on October 26th, 2015 2:39 pm

    […] predictive experimentation I wrote about over Thanksgiving calls naturally for some BI/dashboarding to monitor how it’s […]

  12. Notes on machine-generated data, year-end 2014 | DBMS 2 : DataBase Management System Services on January 25th, 2016 5:17 am

    […] Thanksgiving round-up post points to a lot of my prior comments on predictive […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.