February 2, 2014

Some stuff I’m thinking about (early 2014)

From time to time I like to do “what I’m working on” posts. From my recent blogging, you probably already know that includes:

Hadoop (always, and please see below).
Analytic RDBMS (ditto).
NoSQL and NewSQL.
Specifically, SQL-on-Hadoop
Schema-on-need.
Spark and other memory-centric technology, including streaming.
Public policy, mainly but not only in the area of surveillance/privacy.
General strategic advice for all sizes of tech company.

Other stuff on my mind includes but is not limited to:

1. Certain categories of buying organizations are inherently leading-edge.

Internet companies have adopted Hadoop, NoSQL, NewSQL and all that en masse. Often, they won’t even look at things that are conventional or expensive.
US telecom companies have been buying 1 each of every DBMS on the market since pre-relational days.
Financial services firms — specifically algorithmic traders and broker-dealers — have been in their own technical world for decades …
… as have national-security agencies …
… as have pharmaceutical research departments.

Fine. But what really intrigues me is when more ordinary enterprises also put leading-edge technologies into production. I pester everybody for examples of that.

2. In particular, I hope to figure out where Hadoop is or soon will be getting major adoption.

Widespread Hadoop adoption at ordinary large enterprises is, I think, inevitable and imminent. But it hasn’t quite happened yet.
I think that part of the “enterprise data hub” story is a great bet to come true — Hadoop is becoming a key destination for data to land and be transformed. MapReduce was invented for data transformation; Hadoop was invented to do MapReduce; data transformation workloads have already been moving from expensive analytic RDBMS to cheaper Hadoop.
I also think Hadoop — enhanced with Spark or whatever — will win as a platform for sophisticated predictive modeling; Hadoop’s (and Spark’s) flexibility is at least as useful for the purpose as RDBMS’ SQL execution speed.
I’m still skeptical about ordinary enterprises’ adoption of Hadoop as a business intelligence platform, but it’s definitely another area to track.

3. Analytic RDBMS and data warehouse appliance pricing is always a big deal. Hadoop’s great price advantage doesn’t have to be permanent, and in fact there are a number of fairly low-cost RDBMS offerings, such as petascale Vertica, the Teradata 1000 series, or Infobright.

Speaking of that, it turns out Teradata now publishes per-terabyte pricing. Please note that those are uncompressed prices; actual prices can be assumed to be lower, at least for databases that compress well.

Analytic RDBMS prices are still shaking out.

4. As I previously noted, ensemble models have become the norm for machine learning. I want to learn more about the implications of that.

One conjecture — everything we learned in school about statistics is wrong, or at least it’s less important than we thought. Predictive modeling is not mainly about least squares, regressions, curve-fitting, etc. Rather, it’s first and foremost about data segmentation and clustering, with all the curve-fitting stuff being secondary.

Besides fitting — as it were — what I hear, this hypothesis also matches common sense. How do businesses use predictive modeling? For each customer/prospect/site-visitor/whatever, they decide which of a limited number of possible actions to take. At its core, that’s an exercise in segmentation.

5. I think data integration is getting a lot smarter than it was. Hadoop-based transformation is the obvious example. But there’s also ClearStory’s data intelligence pitch. (And yes, I know I need to talk with Paxata. There’s been a lot of ball-dropping on that one, including by me.)

6. There’s a meta-theme in the above — stuff that’s not exactly a DBMS or DBMS-like data store. Streaming fits into that. So does smart data integration. So, arguably, does Spark. So do data grids, another of those topics I’d like to know more about but haven’t nailed down yet.

Data management is getting ever more complex.

Categories: Business intelligence, Data warehouse appliances, Data warehousing, Databricks, Spark and BDAS, EAI, EII, ETL, ELT, ETLT, Hadoop, Health care, Infobright, Investment research and trading, Market share and customer counts, Predictive modeling and advanced analytics, Pricing, Telecommunications, Theory and architecture

Subscribe to our complete feed!

Comments

3 Responses to “Some stuff I’m thinking about (early 2014)”

Mark Callaghan on February 3rd, 2014 10:07 am

Can you elaborate on “widespread Hadoop adoption at ordinary large enterprises”? For example, do you expect them to write map-reduce jobs. If their primary use is SQL, R and packaged apps then do you see Hadoop getting better on ease of use/management faster then proprietary vendors get at reducing cost?
Curt Monash on February 3rd, 2014 10:50 am

Mark,

1. People these days use a LOT of languages, programming frameworks, add-on execution frameworks, whatever. I wouldn’t want to anoint any 1 or 2 of those as the expected dominant winner, beyond such obvious points as there will be a lot of SQL for a long time.

2. In particular, Hadoop doesn’t necessarily imply MapReduce. Spark, for example, has 15 primitives, 2 of which are Map and Reduce. Hadoop 2 has the capacity for multiple execution engines. Etc.

3. Most of the major analytic RDBMS are now owned by big enterprise technology companies with classic enterprise technology cost structures. I’d still argue that Greenplum and Vertica, for example, should be offered at low prices. But I’m probably hearing fewer “those guys at Greenplum are buying business” complaints than I used to. (And I’m not going to say more about Vertica pricing at all, due to client confidentiality.)
Asif Khan on February 8th, 2014 12:37 am

Hi Curt,

Have you come across implementations of enterprise data hub or Hadoop based transformations. What has been the learning from such implementations.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Some stuff I’m thinking about (early 2014)

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin