February 2, 2014

Some stuff I’m thinking about (early 2014)

From time to time I like to do “what I’m working on” posts. From my recent blogging, you probably already know that includes:

Other stuff on my mind includes but is not limited to:

1. Certain categories of buying organizations are inherently leading-edge.

Fine. But what really intrigues me is when more ordinary enterprises also put leading-edge technologies into production. I pester everybody for examples of that.

2. In particular, I hope to figure out where Hadoop is or soon will be getting major adoption.

3. Analytic RDBMS and data warehouse appliance pricing is always a big deal. Hadoop’s great price advantage doesn’t have to be permanent, and in fact there are a number of fairly low-cost RDBMS offerings, such as petascale Vertica, the Teradata 1000 series, or Infobright.

Speaking of that, it turns out Teradata now publishes per-terabyte pricing. Please note that those are uncompressed prices; actual prices can be assumed to be lower, at least for databases that compress well.

Analytic RDBMS prices are still shaking out.

4. As I previously noted, ensemble models have become the norm for machine learning. I want to learn more about the implications of that.

One conjecture — everything we learned in school about statistics is wrong, or at least it’s less important than we thought. Predictive modeling is not mainly about least squares, regressions, curve-fitting, etc. Rather, it’s first and foremost about data segmentation and clustering, with all the curve-fitting stuff being secondary.

Besides fitting — as it were — what I hear, this hypothesis also matches common sense. How do businesses use predictive modeling? For each customer/prospect/site-visitor/whatever, they decide which of a limited number of possible actions to take. At its core, that’s an exercise in segmentation.

5. I think data integration is getting a lot smarter than it was. Hadoop-based transformation is the obvious example. But there’s also ClearStory’s data intelligence pitch. (And yes, I know I need to talk with Paxata. There’s been a lot of ball-dropping on that one, including by me.)

6. There’s a meta-theme in the above — stuff that’s not exactly a DBMS or DBMS-like data store. Streaming fits into that. So does smart data integration. So, arguably, does Spark. So do data grids, another of those topics I’d like to know more about but haven’t nailed down yet.

Data management is getting ever more complex.

Comments

3 Responses to “Some stuff I’m thinking about (early 2014)”

  1. Mark Callaghan on February 3rd, 2014 10:07 am

    Can you elaborate on “widespread Hadoop adoption at ordinary large enterprises”? For example, do you expect them to write map-reduce jobs. If their primary use is SQL, R and packaged apps then do you see Hadoop getting better on ease of use/management faster then proprietary vendors get at reducing cost?

  2. Curt Monash on February 3rd, 2014 10:50 am

    Mark,

    1. People these days use a LOT of languages, programming frameworks, add-on execution frameworks, whatever. I wouldn’t want to anoint any 1 or 2 of those as the expected dominant winner, beyond such obvious points as there will be a lot of SQL for a long time.

    2. In particular, Hadoop doesn’t necessarily imply MapReduce. Spark, for example, has 15 primitives, 2 of which are Map and Reduce. Hadoop 2 has the capacity for multiple execution engines. Etc.

    3. Most of the major analytic RDBMS are now owned by big enterprise technology companies with classic enterprise technology cost structures. I’d still argue that Greenplum and Vertica, for example, should be offered at low prices. But I’m probably hearing fewer “those guys at Greenplum are buying business” complaints than I used to. (And I’m not going to say more about Vertica pricing at all, due to client confidentiality.)

  3. Asif Khan on February 8th, 2014 12:37 am

    Hi Curt,

    Have you come across implementations of enterprise data hub or Hadoop based transformations. What has been the learning from such implementations.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.