I wasn’t too impressed when I spoke with Revolution Analytics at the time of its relaunch last year. But a conversation Thursday evening was much clearer. And I even learned some cool stuff about general predictive modeling trends (see the bottom of this post).
Revolution Analytics business and business model highlights include:
- Revolution Analytics is an open-core vendor built around the R language. That is, Revolution Analytics offers proprietary code and support, with subscription pricing, that help in the use of open source software.
- Unlike most open-core vendors I can think of, Revolution Analytics takes little responsibility for the actual open source part. Some “grants” for developing certain open source R pieces seem to be the main exception. While this has caused some hard feelings, I don’t have an accurate sense for their scope or severity.
- Revolution Analytics also sells a single-user/workstation version of its product, freely admitting that this is mainly a lead generation strategy or, in my lingo, a “break-even leader.”
- Revolution Analytics boasts around 100 customers, split about 70-30 between the workstation seeding stuff and the real server product.
- Revolution Analytics has “about” 37 employees. Headquarters are at 101 University Avenue (do I have to say in what city? ). There are also a development office in Seattle and a sales office in New York.
- Revolution Analytics’ pricing is by size of server. “Small” servers — i.e. up to 12 cores — start at $25K/year.
- Unsurprisingly, adoption is more alongside SAS et al. than rip-and-replace.
Revolution Analytics’ top market sector by far appears to be financial services, both in trading/investment banks/hedge funds and in credit cards/risk analysis. Pharma/life sciences is second, but sales cycles are slow. There’s also been at least a little activity each in a variety of internet/media/entertainment/gaming/telecom sectors.
When I asked Revolution Analytics why one would use R rather than, say, SAS, Revolution cited three reasons that seemed to be driving customer interest:
- You can do more with R. That may be debatable, but what’s harder to dispute is that there are a bunch of things you can do straightforwardly in R and its thousands of routines that would at best be more difficult in SAS.
- Students today are learning R, so you have access to (affordable?) talent. That’s pretty clearly correct, although I do note SPSS’ long history of academic social sciences use .
- R is cheaper. It’s hard to argue with that one.
Revolution Analytics’ parallelized-R story starts something like this:
- Although R is generally thought of as requiring all data to be in RAM, Revolution also offers external memory algorithms. (“External memory algorithms” seems to be the discipline-standard way of saying “Not all data has to be in RAM.”)
- In principle, Revolution is willing to parallelize external memory algorithms for you any which way — MapReduce, MPI (Message Passing Interface), and more.
- Revolution parallelized for multi-core last fall. Multi-server scale-out is coming this summer.
- Revolution is working on Netezza support. Revolution expects to use nzMatrix in the effort.
- Yes, logistic regression is one of the algorithms Revolution parallelizes.
Like Netezza with nzMatrix or Greenplum (now EMC) with its sparse vector routine, Revolution has some useful underpinnings to help with parallelization/scale-out as well. The main one seems to be a variance/covariance matrix, which can be arbitrarily large and can be computed in a very distributed way. Revolution notes that you can use this not just on data but also, for example, on parameters.
One analytic approach — if not meta-approach — that Revolution sees as hot is ensemble learning. Specifically mentioned was Max Kuhn’s caret package, which evidently automates ensemble techniques. Also specifically mentioned was the Netflix Prize, which I gather was won by an ensemble approach. The idea behind ensemble techniques is that, rather than pick a particular kind of model, you throw a bunch against the wall. The first benefit is that you get to see what works best. The second benefit is that you can combine results and hopefully outperform any one of the models.
Obviously, ensemble techniques can require vastly more performance than just running a single model. I wouldn’t be surprised if, going forward, they turned out to be one of analytics’ biggest performance challenges.