Some subjects just keep coming up. And so I keep saying things like:
Most generalizations about “Big Data” are false. “Big Data” is a horrific catch-all term, with many different meanings.
Most generalizations about Hadoop are false. Reasons include:
- Hadoop is a collection of disparate things, most particularly data storage and application execution systems.
- The transition from Hadoop 1 to Hadoop 2 will be drastic.
- For key aspects of Hadoop — especially file format and execution engine — there are or will be widely varied options.
Hadoop won’t soon replace relational data warehouses, if indeed it ever does. SQL-on-Hadoop is still very immature. And you can’t replace data warehouses unless you have the power of SQL.
Note: SQL isn’t the only way to provide “the power of SQL”, but alternative approaches are just as immature.
Most generalizations about NoSQL are false. Different NoSQL products are … different. It’s not even accurate to say that all NoSQL systems lack SQL interfaces. (For example, SQL-on-Hadoop often includes SQL-on-HBase.)
“Big Data” doesn’t create rapid IT growth. If we only had traditional kinds of data, IT growth would be drastically negative, since Moore’s Law swamps traditional data growth. Whole new categories of data are always needed to fill the gap. And these days, they’re all categorized as “Big Data”.
The single central database is a myth. Things are never that simple, at least at large enterprises. Hence, in particular, the ideal EDW (Enterprise Data Warehouse) is a myth.
Analytic RDBMS and appliances aren’t necessarily expensive. Deals can be had. Yes, most vendors want at least a few hundred thousand dollars for most sales, but there are plenty of exceptions even to that rule. And at either large or small scales, things get very cheap, for example:
- Various vendors’ free/”community” editions.
- The $2 million/petabyte hardware+software price I published for Vertica.
And Infobright is typically an economical option inbetween those extremes, if you’re cool with its focus on machine-generated data.
Columnar relational DBMS are relational. Examples include Sybase IQ, Vertica, ParAccel, Infobright and numerous others.
Yes, that’s a tautology. Even so, distressingly many people forget it, columnar RDBMS vendor employees not excepted.
Amazon Redshift proves very little about ParAccel. Amazon bought some stock in ParAccel, and got a cheap license to a subset of ParAccel’s code, perhaps in the same deal. Big whoop. Yes,
- It is claimed that there are a lot of Redshift users, I presume low-end ones.
- ParAccel is fast.*
But none of that speaks to some profound, ongoing Amazon/ParAccel/Actian relationship.
*I hear that ParAccel is usually faster than Vertica and other alternatives in POCs/benchmarks (Proofs of Concept). But I also hear that ParAccel’s installation complexity continues to be a POC problem.
New technology in old categories of application will only be adopted as quickly as firms replace their apps. Yes, that’s a tautology too. Even so, it puts an upper bound on, for example, the speed with which on-premises applications will be replaced by cloud alternatives.
SAP HANA is not yet a serious OLTP (OnLine Transaction Processing) DBMS. Yes,
- HANA has in some form been under development for a long time; its major antecedent is BI Accelerator, which shipped back in 2006.
- RAM-centric processing makes sense.
- HANA has a cool-sounding feature list.
- SAP claims lots of HANA sales, and not just in conjunction with a few new SAP apps that require HANA to run.
But the stories of HANA sales and deployment momentum sure seem concentrated on analytic use cases. And by the way — even among analytic DBMS vendors, I don’t hear much emphasis on competing vs. HANA.
Current BI trends reflect 1990s deja vu. The hottest business intelligence products and vendors are adopted by departments, on the strength of their snazzy interfaces and short adoption cycles.* That’s exactly how BI spread in the 1990s, only now the word “visualization” gets used more.
*A common phrase for that is land-and-expand.
I’m not impressed that your future products will in some small ways be superior to what your competitors have had in production for over a year.