December 8, 2013

DataStax/Cassandra update

Cassandra’s reputation in many quarters is:

This has led competitors to use, and get away with, sales claims along the lines of “Well, if you really need geo-distribution and can’t wait for us to catch up — which we soon will! — you should use Cassandra. But otherwise, there are better choices.”

My friends at DataStax, naturally, don’t think that’s quite fair. And so I invited them — specifically Billy Bosworth and Patrick McFadin — to educate me. Here are some highlights of that exercise.

DataStax and Cassandra have some very impressive accounts, which don’t necessarily revolve around geo-distribution. Netflix, probably the flagship Cassandra user — since Cassandra inventor Facebook adopted HBase instead — actually hasn’t been using the geo-distribution feature. Confidential accounts include:

DataStax and Cassandra won’t necessarily win customer-brag wars versus MongoDB, Couchbase, or even HBase, but at least they’re strongly in the competition.

DataStax claims that simplicity is now a strength. There are two main parts to that surprising assertion. Read more

December 5, 2013

Vertica 7

It took me a bit of time, and an extra call with Vertica’s long-time R&D chief Shilpa Lawande, but I think I have a decent handle now on Vertica 7, code-named Crane. The two aspects of Vertica 7 I find most interesting are:

Other Vertica 7 enhancements include:

Overall, two recurring themes in our discussion were:

Read more

November 29, 2013

SaaS appliances, SaaS data centers, and customer-premises SaaS

Conclusions

I think that most sufficiently large enterprise SaaS vendors should offer an appliance option, as an alternative to the core multi-tenant service. In particular:

How I reached them

Core reasons for selling or using SaaS (Software as a Service) as opposed to licensed software start:

Conceptually, then, customer-premises SaaS is not impossible, even though one of the standard Big Three SaaS benefits is lost. Indeed:

But from an enterprise standpoint, that’s all (relatively) simple stuff. So we’re left with a more challenging question — does customer-premises SaaS make sense in the case of enterprise applications or other server software?

Read more

November 24, 2013

Thoughts on SaaS

Generalizing about SaaS (Software as a Service) is hard. To prune some of the confusion, let’s start by noting:

For smaller enterprises, the core outsourcing argument is compelling. How small? Well:

So except for special cases, an enterprise with less than $100 million or so in revenue may have trouble affording on-site data processing, at least at a mission-critical level of robustness. It may well be better to use NetSuite or something like that, assuming needed features are available in SaaS form.*

Read more

November 19, 2013

How Revolution Analytics parallelizes R

I talked tonight with Lee Edlefsen, Chief Scientist of Revolution Analytics, and now think I understand Revolution’s parallel R much better than I did before.

There are four primary ways that people try to parallelize predictive modeling:

One confusing aspect of this discussion is that it could reference several heavily-overlapping but not identical categories of algorithms, including:

  1. External memory algorithms, which operates on datasets too big to fit in main memory, by — for starters — reading in and working on a part of the data at a time. Lee observes that these are almost always parallelizable.
  2. What Revolution markets as External Memory Algorithms, which are those external memory algorithms it has gotten around to implementing so far. These are all parallelized. They are also all in the category of …
  3. … algorithms that can be parallelized by:
    • Operating on data in parts.
    • Getting intermediate results.
    • Combining them in some way for a final result.
  4. Algorithms of the previous category, where the way of combining them specifically is in the form of summation, such as those discussed in the famous paper Map-Reduce for Machine Learning on Multicore. Not all of Revolution’s current parallel algorithms fall into this group.

To be clear, all Revolution’s parallel algorithms are in Category #2 by definition and Category #3 in practice. However, they aren’t all in Category #4.

Read more

November 11, 2013

Cautionary tales

Before the advent of cheap computing power, statistics was a rather dismal subject. David Lax scared me off from studying much of it by saying that 90% of statistics was done on sets of measure 0.

The following cautionary tale also dates to that era. Other light verse below.  Read more

November 10, 2013

RDBMS and their bundle-mates

Relational DBMS used to be fairly straightforward product suites, which boiled down to:

Now, however, most RDBMS are sold as part of something bigger.

Read more

November 8, 2013

Comments on the 2013 Gartner Magic Quadrant for Operational Database Management Systems

The 2013 Gartner Magic Quadrant for Operational Database Management Systems is out. “Operational” seems to be Gartner’s term for what I call short-request, in each case the point being that OLTP (OnLine Transaction Processing) is a dubious term when systems omit strict consistency, and when even strictly consistent systems may lack full transactional semantics. As is usually the case with Gartner Magic Quadrants:

Anyhow:  Read more

October 31, 2013

Specialized business intelligence

A remarkable number of vendors are involved in what might be called “specialized business intelligence”. Some don’t want to call it that, because they think that “BI” is old and passé’, and what they do is new and better. Still, if we define BI technology as, more or less:

then BI is indeed a big part of what they’re doing.

Why would vendors want to specialize their BI technology? The main reason would be to suit it for situations in which even the best general-purpose BI options aren’t good enough. The obvious scenarios are those in which the mismatch is one or both of:

For example, in no particular order: Read more

October 30, 2013

Splunk strengthens its stack

I’m a little shaky on embargo details — but I do know what was in my own quote in a Splunk press release that went out yesterday. :)

Splunk has been rolling out a lot of news. In particular:

I imagine there are some operationally-oriented use cases for which Splunk instantly offers the best Hadoop business intelligence choice available. But what I really think is cool is Splunk’s schema-on-need story, wherein:

That highlights a pretty serious and flexible vertical analytic stack. I like it.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.