Couchbase update
I checked in with James Phillips for a Couchbase update, and I understand better what’s going on. In particular:
- Give or take minor tweaks, what I wrote in my August, 2010 Couchbase updates still applies.
- Couchbase now and for the foreseeable future has one product line, called Couchbase.
- Couchbase 2.0, the first version of Couchbase (the product) to use CouchDB for persistence, has slipped …
- … because more parts of CouchDB had to be rewritten for performance than Couchbase (the company) had hoped.
- Think mid-year or so for the release of Couchbase 2.0, hopefully sooner.
- In connection with the need to rewrite parts of CouchDB, Couchbase has:
- Gotten out of the single-server CouchDB business.
- Donated its proprietary single-sever CouchDB intellectual property to the Apache Foundation.
- The 150ish new customers in 2011 Couchbase brags about are real, subscription customers.
- Couchbase has 60ish people, headed to >100 over the next few months.
Categories: Basho and Riak, Cassandra, Couchbase, CouchDB, DataStax, Market share and customer counts, MongoDB, NoSQL, Open source, Parallelization, Web analytics, Zynga | 7 Comments |
Microsoft SQL Server 2012 and enterprise database choices in general
Microsoft is launching SQL Server 2012 on March 7. An IM chat with a reporter resulted, and went something like this.
Reporter: [Care to comment]?
CAM: SQL Server is an adequate product if you don’t mind being locked into the Microsoft stack. For example, the ColumnStore feature is very partial, given that it can’t be updated; but Oracle doesn’t have columnar storage at all.
Reporter: Is the lock-in overall worse than IBM DB2, Oracle?
CAM: Microsoft locks you into an operating system, so yes.
Reporter: Is this release something larger Oracle or IBM shops could consider as a lower-cost alternative a co-habitation scenario, in the event they’re mulling whether to buy more Oracle or IBM licenses?
CAM: If they have a strong Microsoft-stack investment already, sure. Otherwise, why?
Reporter: [How about] just cost?
CAM: DB2 works just as well to keep Oracle honest as SQL Server does, and without a major operating system commitment. For analytic databases you want an analytic DBMS or appliance anyway.
Best is to have one major vendor of OTLP/general-purpose DBMS, a web DBMS, a DBMS for disposable projects (that may be the same as one of the first two), plus however many different analytic data stores you need to get the job done.
By “web DBMS” I mean MySQL, NewSQL, or NoSQL. Actually, you might need more than one product in that area.
Categories: Data warehousing, IBM and DB2, Microsoft and SQL*Server, Mid-range, MySQL, NoSQL, Oracle | 9 Comments |
Notes from the Couch blogs
Couchbase in general, and CouchDB project founder Damien Katz in particular, are to some extent walking away from CouchDB. That is:
- The Couchbase product will not be upward compatible with CouchDB.
- Couchbase will no longer offer a CouchDB distribution, and is doing the natural and responsible thing, namely …
- … donating to the Apache Foundation the previously proprietary aspects of that distribution.
Even so:
- All — or at least “all” — the code Couchbase offers will, at least for now, be open source.
The story unfolded in a bombshell post by Damien, and clarification follow-ups by Damien and by Couchbase CEO Bob Wiederhold. The meatiest of the three was probably Damien’s follow-up, in which he said, among other things:
Read more
Categories: Couchbase, CouchDB, Market share and customer counts, Open source | 1 Comment |
KXEN clarifies its story
I frequently badger my clients to tell their story in the form of a company blog, where they can say what needs saying without being restricted by the rules of other formats. KXEN actually listened, and put up a pair of CTO posts that make the company story a lot clearer.
Excerpts from the first post include (with minor edits for formatting, including added emphasis):
Back in 1995, Vladimir Vapnik … changed the machine learning game with his new ‘Statistical Learning Theory’: he provided the machine learning guys with a mathematical framework that allowed them finally to understand, at the core, why some techniques were working and some others were not. All of a sudden, a new realm of algorithms could be written that would use mathematical equations instead of engineering data science tricks (don’t get me wrong here: I am an engineer at heart and I know the value of “tricks,” but tricks cannot overcome the drawbacks of a bad mathematical framework). Here was a foundation for automated data mining techniques that would perform as well as the best data scientists deploying these tricks. Luck is not enough though; it was because we knew a lot about statistics and machine learning that we were able to decipher the nuggets of gold in Vladimir’s theory.
Categories: KXEN, Predictive modeling and advanced analytics | 1 Comment |
Has illuminate Solutions joined the choir invisible?
A correspondent today asked about illuminate Solutions, noting that its website is down.
I put the question out to Twitter, and was messaged by an extremely reliable source, who had heard that illuminate has shut down and is in receivership.
illuminate’s website and CTO blog that I previously linked both appear to be rather dead sites. Archive.org emphatically confirms that perception.
I can’t find anybody on LinkedIn who says they’ve worked at illuminate more recently than May, 2011.
It would seem that illuminate Solutions is no more, has ceased to be, has kicked the bucket, has joined the choir invisible, and is an ex-company.
Categories: illuminate Solutions | 1 Comment |
Notes on the Oracle Big Data Appliance
Oracle announced its Big Data Appliance. Specs may be found in the Oracle Big Data Appliance press release. Beyond that:
- The most important software on the Oracle Big Data Appliance is a full set of Cloudera Enterprise code. Oracle will do Tier 1 Cloudera/Hadoop support, while Cloudera handles Tiers 2 and 3.
- The key spec ratios are 1 core/4 GB RAM/3 TB raw disk. That’s reasonably in line with Cloudera figures I published in June, 2010.
- This is really Oracle’s multi-structured big data appliance. Oracle’s relational big data appliance is Exadata, which has been out for years and has comparable capacity to Oracle’s new “Big Data Appliance.” (Chris Preimesberger made a similar point.)
- The Oracle Big Data Appliance list price is $450,000 for 18 12-core servers, plus $54,000/year maintenance.
- That’s around $25,000 per server (and associated storage).
- That’s also around $2,000/core.
- That’s also around $500/TB of spinning disk, before compression.
- None of those per-unit figures sounds ridiculous …
- … but because of Oracle’s appliance configuration there’s indeed a hefty minimum initial purchase.
A couple of links explaining Cloudera Manager
Predictably, I wasn’t pre-briefed on the details of Oracle’s Big Data Appliance announcement today, and an inquiry to partner Cloudera doesn’t happen to have been immediately answered.* But anyhow, it’s clear from coverage by Larry Dignan and Derrick Harris that Oracle’s Big Data Appliance includes:
- Some version of Cloudera Manager (I’m guessing more or less the best one).*
- Some version of Apache Hadoop (I’m guessing the same distribution that Cloudera prefers to use).*
- Some kind of support.
In other words, it’s a lot like getting Cloudera Enterprise,* plus some hardware, plus some other stuff.
*Edit: About 2 minutes after I posted this, I got email from Cloudera CEO Mike Olson. Yes, the Oracle Big Data Appliance bundles Cloudera Enterprise.
That raises an anyway recurring question: What exactly is Cloudera Manager? Read more
Splunk update
Splunk is announcing the Splunk 4.3 point release. Before discussing it, let’s recall a few things about Splunk, starting with:
- Splunk is first and foremost an analytic DBMS …
- … used to manage logs and similar multistructured data.
- Splunk’s DML (Data Manipulation Language) is based on text search, not on SQL.
- Splunk has extended its DML in natural ways (e.g., you can use it to do calculations and even some statistics).
- Splunk bundles some (very) basic, Splunk-specific business intelligence capabilities.
- The paradigmatic use of Splunk is to monitor IT operations in real time. However:
- There also are plenty of non-real-time uses for Splunk.
- Splunk is proudest of its growth in non-IT quasi-real-time uses, such as the marketing side of web operations.
As in any release, a lot of Splunk 4.3 is about “Oh, you didn’t have that before?” features and Bottleneck Whack-A-Mole performance speed-up. One performance enhancement is Bloom filters, which are a very hot topic these days. More important is a switch from Flash to HTML5, so as to accommodate mobile devices with less server-side rendering. Splunk reports that its users — especially the non-IT ones — really want to get Splunk information on the tablet devices. While this somewhat contradicts what I wrote a few days ago pooh-poohing mobile BI, let me hasten to point out:
- Splunk is used for a lot of (quasi) real-time monitoring.
- Splunk’s desktop user interfaces are, by BI standards, quite primitive.
That’s pretty much the ideal scenario for mobile BI: Timeliness matters and prettiness doesn’t.
Categories: Business intelligence, Data models and architecture, Data warehousing, Log analysis, Specific users, Splunk, Structured documents, Web analytics | 3 Comments |
Big data terminology and positioning
Recently, I observed that Big Data terminology is seriously broken. It is reasonable to reduce the subject to two quasi-dimensions:
- Bigness — Volume, Velocity, size
- Structure — Variety, Variability, Complexity
given that
- High-velocity “big data” problems are usually high-volume as well.*
- Variety, variability, and complexity all relate to the simply-structured/poly-structured distinction.
But the conflation should stop there.
*Low-volume/high-velocity problems are commonly referred to as “event processing” and/or “streaming”.
When people claim that bigness and structure are the same issue, they oversimplify into mush. So I think we need four pieces of terminology, reflective of a 2×2 matrix of possibilities. For want of better alternatives, my suggestions are:
- Relational big data is data of high volume that fits well into a relational DBMS.
- Multi-structured big data is data of high volume that doesn’t fit well into a relational DBMS. Alternative: Poly-structured big data.
- Conventional relational data is data of not-so-high volume that fits well into a relational DBMS. Alternatives: Ordinary/normal/smaller relational data.
- Smaller poly-structured data is data for which dynamic schema capabilities are important, but which doesn’t rise to “big data” volume.
Some issues in business intelligence
In November I wrote two parts of a planned multi-post series on issues in analytic technology. Then I got caught up in year-end things and didn’t blog for a month. Well … Happy New Year! I’m back. Let’s survey a few BI-related topics.
Mobile business intelligence — real business value or just a snazzy demo?
I discussed some mobile BI use cases in July 2010, but I’m still not convinced the whole area is a legitimate big deal. BI has a long history of snazzy, senior-exec-pleasing demos that have little to do with substantive business value. For now, I think mobile BI is another of those; few people will gain deep analytic insights staring into their iPhones. I don’t see anything coming that’s going to change the situation soon.
BI-centric collaboration — real business value or just a snazzy demo?
I’m more optimistic about collaborative business intelligence. QlikView’s direct sharing of dashboards will, I think, be a feature competitors must and will imitate. Social media BI collaboration is still in the “mainly a demo” phase, but I think it meets a broader and deeper need than does mobile BI. Over the next few years, I expect numerous enterprises to establish strong cultures of analytic chatter (and then give frequent talks about same at industry conferences). Read more