There’s a lot of talk these days about transitioning to the cloud, by IT customers and vendors alike. Of course, I have thoughts on the subject, some of which are below.
1. The economies of scale of not running your own data centers are real. That’s the kind of non-core activity almost all enterprises should outsource. Of course, those considerations taken alone argue equally for true cloud, co-location or SaaS (Software as a Service).
2. When the (Amazon) cloud was newer, I used to hear that certain kinds of workloads didn’t map well to the architecture Amazon had chosen. In particular, shared-nothing analytic query processing was necessarily inefficient. But I’m not hearing nearly as much about that any more.
3. Notwithstanding the foregoing, not everybody loves Amazon pricing.
4. Infrastructure vendors such as Oracle would like to also offer their infrastructure to you in the cloud. As per the above, that could work. However:
- Is all your computing on Oracle’s infrastructure? Probably not.
- Do you want to move the Oracle part and the non-Oracle part to different clouds? Ideally, no.
- Do you like the idea of being even more locked in to Oracle than you are now? [Insert BDSM joke here.]
- Will Oracle do so much better of a job hosting its own infrastructure that you use its cloud anyway? Well, that’s an interesting question.
Actually, if we replace “Oracle” by “Microsoft”, the whole idea sounds better. While Microsoft doesn’t have a proprietary server hardware story like Oracle’s, many folks are content in the Microsoft walled garden. IBM has fiercely loyal customers as well, and so may a couple of Japanese computer manufacturers.
5. Even when running stuff in the cloud is otherwise a bad idea, there’s still: Read more
|Categories: Amazon and its cloud, Cloud computing, Emulation, transparency, portability, IBM and DB2, Microsoft and SQL*Server, Oracle, Pricing||6 Comments|
Parts of the business intelligence differentiation story resemble the one I just posted for data management. After all:
- Both kinds of products query and aggregate data.
- Both are offered by big “enterprise standard” behemoth companies and also by younger, nimbler specialists.
- You really, really, really don’t want your customer data to leak via a security breach in either kind of product.
That said, insofar as BI’s competitive issues resemble those of DBMS, they are those of DBMS-lite. For example:
- BI is less mission-critical than some other database uses.
- BI has done a lot less than DBMS to deal with multi-structured data.
- Scalability demands on BI are less than those on DBMS — indeed, they’re the ones that are left over after the DBMS has done its data crunching first.
And full-stack analytic systems — perhaps delivered via SaaS (Software as a Service) — can moot the BI/data management distinction anyway.
Of course, there are major differences between how DBMS and BI are differentiated. The biggest are in user experience. I’d say: Read more
|Categories: Business intelligence, Buying processes, ClearStory Data, Data mart outsourcing, Pricing, QlikTech and QlikView, Rocana, Tableau Software||1 Comment|
In the previous post I broke product differentiation into 6-8 overlapping categories, which may be abbreviated as:
- (Other) trustworthiness
- User experience
and sometimes also issues in adoption and administration.
Now let’s use this framework to examine two market categories I cover — data management and, in separate post, business intelligence.
Applying this taxonomy to data management:
|Categories: Buying processes, Clustering, Data warehousing, Database diversity, Microsoft and SQL*Server, Predictive modeling and advanced analytics, Pricing||2 Comments|
Obviously, a large fraction of what I write about involves technical differentiation. So let’s try for a framework where differentiation claims can be placed in context. This post will get through the generalities. The sequels will apply them to specific cases.
Many buying and design considerations for IT fall into six interrelated areas: Read more
Basho was on my (very short) blacklist of companies with whom I refuse to speak, because they have lied about the contents of previous conversations. But Tony Falco et al. are long gone from the company. So when Basho’s new management team reached out, I took the meeting.
- Basho management turned over significantly 1-2 years ago. The main survivors from the old team are 1 each in engineering, sales, and services.
- Basho moved its headquarters to Bellevue, WA. (You get one guess as to where the new CEO lives.) Engineering operations are very distributed geographically.
- Basho claims that it is much better at timely product shipments than it used to be. Its newest product has a planned (or at least hoped-for) 8-week cadence for point releases.
- Basho’s revenue is ~90% subscription.
- Basho claims >200 enterprise clients, vs. 100-120 when new management came in. Unfortunately, I forgot to ask the usual questions about divisions vs. whole organizations, OEM sell-through vs. direct, etc.
- Basho claims an average contract value of >$100K, typically over 2-3 years. $9 million of that (which would be close to half the total, actually), comes from 2 particular deals of >$4 million each.
Basho’s product line has gotten a bit confusing, but as best I understand things the story is:
- There’s something called Riak Core, which isn’t even a revenue-generating product. However, it’s an open source project with some big users (e.g. Goldman Sachs, Visa), and included in pretty much everything else Basho promotes.
- Riak KV is the key-value store previously known as Riak. It generates the lion’s share of Basho’s revenue.
- Riak S2 is an emulation of Amazon S3. Basho thinks that Riak KV loses efficiency when objects get bigger than 1 MB or so, and that’s when you might want to use Riak S2 in addition or instead.
- Riak TS is for time series, and just coming out now.
- Also in the mix are some (extra charge) connectors for Redis and Spark. Presumably, there are more of these to come.
- There’s an umbrella marketing term of “Basho Data Platform”.
Technical notes on some of that include: Read more
|Categories: Aerospike, Basho and Riak, Cassandra, Clustering, Couchbase, Databricks, Spark and BDAS, DataStax, HBase, Health care, Log analysis, MapR, Market share and customer counts, MongoDB, NoSQL, Pricing, Specific users, Splunk||Leave a Comment|
- My client Rocana is the renamed ScalingData, where Rocana is meant to signify ROot Cause ANAlysis.
- Rocana was founded by Omer Trajman, who I’ve referenced numerous times in the past, and who I gather is a former boss of …
- … cofounder Eric Sammer.
- Rocana recently told me it had 35 people.
- Rocana has a very small number of quite large customers.
Rocana portrays itself as offering next-generation IT operations monitoring software. As you might expect, this has two main use cases:
- Actual operations — figuring out exactly what isn’t working, ASAP.
Rocana’s differentiation claims boil down to fast and accurate anomaly detection on large amounts of log data, including but not limited to:
- The sort of network data you’d generally think of — “everything” except packet-inspection stuff.
- Firewall output.
- Database server logs.
- Point-of-sale data (at a retailer).
- “Application data”, whatever that means. (Edit: See Tom Yates’ clarifying comment below.)
|Categories: Business intelligence, Hadoop, Kafka and Confluent, Log analysis, Market share and customer counts, Petabyte-scale data management, Predictive modeling and advanced analytics, Pricing, Rocana, Splunk, Web analytics||1 Comment|
Occasionally I talk with an astute reporter — there are still a few left — and get led toward angles I hadn’t considered before, or at least hadn’t written up. A blog post may then ensue. This is one such post.
There is a group of questions going around that includes:
- Is Hadoop overhyped?
- Has Hadoop adoption stalled?
- Is Hadoop adoption being delayed by skills shortages?
- What is Hadoop really good for anyway?
- Which adoption curves for previous technologies are the best analogies for Hadoop?
To a first approximation, my responses are: Read more
|Categories: Application areas, Data warehousing, Databricks, Spark and BDAS, EAI, EII, ETL, ELT, ETLT, Hadoop, Hortonworks, MapR, MapReduce, Market share and customer counts, Open source, Pricing||6 Comments|
I talked with my clients at MemSQL about the release of MemSQL 4.0. Let’s start with the reminders:
- MemSQL started out as in-memory OTLP (OnLine Transaction Processing) DBMS …
- … but quickly positioned with “We also do ‘real-time’ analytic processing” …
- … and backed that up by adding a flash-based column store option …
- … before Gartner ever got around to popularizing the term HTAP (Hybrid Transaction and Analytic Processing).
- There’s also a JSON option.
The main new aspects of MemSQL 4.0 are:
- Geospatial indexing. This is for me the most interesting part.
- A new optimizer and, I suppose, query planner …
- … which in particular allow for serious distributed joins.
- Some rather parallel-sounding connectors to Spark. Hadoop and Amazon S3.
- Usual-suspect stuff including:
- More SQL coverage (I forgot to ask for details).
- Some added or enhanced administrative/tuning/whatever tools (again, I forgot to ask for details).
- Surely some general Bottleneck Whack-A-Mole.
There’s also a new free MemSQL “Community Edition”. MemSQL hopes you’ll experiment with this but not use it in production. And MemSQL pricing is now wholly based on RAM usage, so the column store is quasi-free from a licensing standpoint is as well.
While I don’t find the Open Data Platform thing very significant, an associated piece of news seems cooler — Pivotal is open sourcing a bunch of software, with Greenplum as the crown jewel. Notes on that start:
- Greenplum has been an on-again/off-again low-cost player since before its acquisition by EMC, but open source is basically a commitment to having low license cost be permanently on.
- In most regards, “free like beer” is what’s important here, not “free like speech”. I doubt non-Pivotal employees are going to do much hacking on the long-closed Greenplum code base.
- That said, Greenplum forked PostgreSQL a long time ago, and the general PostgreSQL community might gain ideas from some of the work Greenplum has done.
- The only other bit of newly open-sourced stuff I find interesting is HAWQ. Redis was already open source, and I’ve never been persuaded to care about GemFire.
Greenplum, let us recall, is a pretty decent MPP (Massively Parallel Processing) analytic RDBMS. Various aspects of it were oversold at various times, and I’ve never heard that they actually licked concurrency. But Greenplum has long had good SQL coverage and petabyte-scale deployments and a columnar option and some in-database analytics and so on; i.e., it’s legit. When somebody asks me about open source analytic RDBMS to consider, I expect Greenplum to consistently be on the short list.
Further, the low-cost alternatives for analytic RDBMS are adding up. Read more
|Categories: Amazon and its cloud, Citus Data, Data warehouse appliances, EAI, EII, ETL, ELT, ETLT, EMC, Greenplum, Hadoop, Infobright, MonetDB, Open source, Pricing||6 Comments|
MapR put out a press release aggregating some customer information; unfortunately, the release is a monument to vagueness. Let me start by saying:
- I don’t know for sure, but I’m guessing Derrick Harris was incorrect in suspecting that this release was a reaction to my recent post about Hortonworks’ numbers. For one thing, press releases usually don’t happen that quickly.
- And as should be obvious from the previous point — notwithstanding that MapR is a client, I had no direct involvement in this release.
- In general, I advise clients and other vendors to put out the kind of aggregate of customer success stories found in this release. However, I would like to see more substance than MapR offered.
Anyhow, the key statement in the MapR release is:
… the number of companies that have a paid subscription for MapR now exceeds 700.
Unfortunately, that includes OEM customers as well as direct ones; I imagine MapR’s direct customer count is much lower.
In one gesture to numerical conservatism, MapR did indicate by email that it counts by overall customer organization, not by department/cluster/contract (i.e., not the way Hortonworks does). Read more
|Categories: Hadoop, Health care, MapR, Market share and customer counts, Pricing, Telecommunications||3 Comments|