Hadoop notes
I visited California recently, and chatted with numerous companies involved in Hadoop — Cloudera, Hortonworks, MapR, DataStax, Datameer, and more. I’ll defer further Hadoop technical discussions for now — my target to restart them is later this month — but that still leaves some other issues to discuss, namely adoption and partnering.
The total number of enterprises in the world paying subscription and license fees that they would regard as being for “Hadoop or something Hadoop-related” probably is not much over 100 right now, but I’d expect to see pretty rapid growth. Beyond that, let’s divide customers into three groups:
- Internet businesses.
- Traditional enterprises ‘ internet operations.
- Traditional enterprises’ other operations.
Hadoop vendors, in different mixes, claim to be doing well in all three segments. Even so, almost all use cases involve some kind of machine-generated data, with one exception being a credit card vendor crunching a large database of transaction details. Multiple kinds of machine-generated data come into play — web/network/mobile device logs, financial trade data, scientific/experimental data, and more. In particular, pharmaceutical research got some mentions, which makes sense, in that it’s one area of scientific research that actually enjoys fat for-profit research budgets.
| Categories: Cloudera, Hadoop, Health care, Hortonworks, Investment research and trading, Log analysis, MapR, MapReduce, Market share and customer counts, Scientific research, Web analytics | 5 Comments |
Revolution Analytics update
I wasn’t too impressed when I spoke with Revolution Analytics at the time of its relaunch last year. But a conversation Thursday evening was much clearer. And I even learned some cool stuff about general predictive modeling trends (see the bottom of this post).
Revolution Analytics business and business model highlights include:
- Revolution Analytics is an open-core vendor built around the R language. That is, Revolution Analytics offers proprietary code and support, with subscription pricing, that help in the use of open source software.
- Unlike most open-core vendors I can think of, Revolution Analytics takes little responsibility for the actual open source part. Some “grants” for developing certain open source R pieces seem to be the main exception. While this has caused some hard feelings, I don’t have an accurate sense for their scope or severity.
- Revolution Analytics also sells a single-user/workstation version of its product, freely admitting that this is mainly a lead generation strategy or, in my lingo, a “break-even leader.”
- Revolution Analytics boasts around 100 customers, split about 70-30 between the workstation seeding stuff and the real server product.
- Revolution Analytics has “about” 37 employees. Headquarters are at 101 University Avenue (do I have to say in what city?
). There are also a development office in Seattle and a sales office in New York. - Revolution Analytics’ pricing is by size of server. “Small” servers — i.e. up to 12 cores — start at $25K/year.
- Unsurprisingly, adoption is more alongside SAS et al. than rip-and-replace.
| Categories: Health care, Investment research and trading, Open source, Parallelization, Predictive modeling and advanced analytics, Pricing, Revolution Analytics, SAS Institute | 2 Comments |
Privacy dangers — an overview
This post is the first of a series. The second one delves into the technology behind the most serious electronic privacy threats.
The privacy discussion has gotten more active, and more complicated as well. A year ago, I still struggled to get people to pay attention to privacy concerns at all, at least in the United States, with my first public breakthrough coming at the end of January. But much has changed since then.
On the commercial side, Facebook modified its privacy policies, garnering great press attention and an intense user backlash, leading to a quick partial retreat. The Wall Street Journal then launched a long series of articles — 13 so far — recounting multiple kinds of privacy threats. Other media joined in, from Forbes to CNet. Various forms of US government rule-making to inhibit advertising-related tracking have been proposed as an apparent result.
In the US, the government had a lively year as well. The Transportation Security Administration (TSA) rolled out what have been dubbed “porn scanners,” and backed them up with “enhanced patdowns.” For somebody who is, for example, female, young, a sex abuse survivor, and/or a follower of certain religions, those can be highly unpleasant, if not traumatic. Meanwhile, the Wikileaks/Cablegate events have spawned a government reaction whose scope is only beginning to be seen. A couple of “highlights” so far are some very nasty laptop seizures, and the recent demand for information on over 600,000 Twitter accounts. (Christopher Soghoian provided a detailed, nuanced legal analysis of same.)
At this point, it’s fair to say there are at least six different kinds of legitimate privacy fear. Read more
| Categories: Analytic technologies, Facebook, GIS and geospatial, Health care, Liberty and privacy, Telecommunications, Web analytics | 4 Comments |
The privacy discussion is heating up
Internet privacy issues are getting more and more attention. Frankly, I think we’re getting past the point where the only big risk is loss of liberty. More and more, the risk of an excessive backlash is upon us as well. (In the medical area, I’d say it’s already more than a risk — it’s a life-wrecking reality. But now the problem is poised to become wider-spread.) Read more
| Categories: Health care, Liberty and privacy, Web analytics | 2 Comments |
Another medical records rant
I’ve previously ranted about the medical information problems in connection with my father’s care at Friendship Village of Dublin and Riverside Methodist Hospital (among others). Well, they’re getting worse. Read more
| Categories: Health care | 1 Comment |
A few notes from XLDB 4
As much as I believe in the XLDB conferences, I only found time to go to (a big) part of one day of XLDB 4 myself. In general: Read more
| Categories: Analytic technologies, Health care, Liberty and privacy, Michael Stonebraker, MySQL, Open source, Parallelization, Petabyte-scale data management, Scientific research | 2 Comments |
Notes and links October 10 2010
More quick-hit notes, links, and so on: Read more
| Categories: Analytic technologies, Aster Data, Data warehousing, Greenplum, Health care, Liberty and privacy, XtremeData | Leave a Comment |
A rant about medical records
It is very difficult to convey utterly tedious frustration without — well, without thoroughly boring one’s audience. And hence I will not try to explain the full awfulness of modern medical records and information compartmentalization. But I was personally present 5 times in one recent week while Linda gave detailed information about her contact information, medical history, etc. — and all 5 times it was to the same hospital.
In our case, that just costs time. But the information flow in my father’s case upsets me more. Read more
| Categories: Health care, Liberty and privacy | 2 Comments |
Reconciling medical privacy and elder care
In a previous post, I outlined how Friendship Village of Dublin has mishandled my father’s medical information, to the detriment of his medical care. Expanding on that story, here are some other complications or screw-ups in the same series of medical events. In these other cases, the blame clearly falls more on the information-flow system itself, rather than on some particular medical care provider such as Friendship Village of Dublin, Riverside Methodist Hospital, or the paramedics who transported my father from one to the other.
| Categories: Health care, Liberty and privacy | 2 Comments |
Intersystems Cache’ highlights
I talked with Robert Nagle of Intersystems last week, and it went better than at least one other Intersystems briefing I’ve had. Intersystems’ main product is Cache’, an object-oriented DBMS introduced in 1997 (before that Intersystems was focused on the fourth-generation programming language M, renamed from MUMPS). Unlike most other OODBMS, Cache’ is used for a lot of stuff one would think an RDBMS would be used for, across all sorts of industries. That said, there’s a distinct health-care focus to Intersystems, in that:
- MUMPS, the original Intersystems technology, was focused on health care.
- The reasons Intersystems went object-oriented have a lot to do with the structure of health-care records.
- Intersystems’ biggest and most visible ISVs are in the health-care area.
- Intersystems is actually beginning to sell an electronic health records system called TrakCare around the world (but not in the US, where it has lots of large competitive VARs).
Note: Intersystems Cache’ is sold mainly through VARs (Value-Added Resellers), aka ISVs/OEMs. I.e., it’s sold by people who write applications on top of it.
So far as I understand – and this is still pretty vague and apt to be partially erroneous – the Intersystems Cache’ technical story goes something like this: Read more
