A number of recent posts have had good comments. This time, I won’t call them out individually.
Evidently Mike Olson of Cloudera is still telling the machine-generated data story, exactly as he should be. The Information Arbitrage/IA Ventures folks said something similar, focusing specifically on “sensor data” …
… and, even better, went on to say:
Privacy is dead.
What do we consider to be the boundaries of privacy, especially with respect to items like medical data? In a data privacy-free world, should we be regulating data usage instead? How do we deal with asymmetric access to our personal data, e.g., how is it that insurance companies claim the right to our personal information?
Obviously, my answer to the second question is Yes!!!!
Also from Hadoop World — Dave Menninger, now an analyst, reports on some Hadoop metrics:
How big is “big data”? In his opening remarks, Mike shared some statistics from a survey of attendees. The average Hadoop cluster among respondents was 66 nodes and 114 terabytes of data. However there is quite a range. The largest in the survey responses was a cluster of 1,300 nodes and more than 2 petabytes of data. (Presenters from eBay blew this away, describing their production cluster of 8,500 nodes and 16 petabytes of storage.) Over 60 percent of respondents had 10 terabytes or less, and half were running 10 nodes or less.
A while back, Doug Henschen noted that Netezza flagship reference Catalina Marketing is now at 2.5 petabytes. Most of that is in one 600 billion row table. Oddly, the article talks of the Netezza/SAS partnership accelerating model-building via in-database scoring (not modeling) technology. Doug also wrote of a lot of analytic DBMS replacements, including:
- Microsoft by ParAccel
- Oracle by Aster Data, IBM, Oracle Exadata, probably Netezza, and probably Hadoop
- Netezza by Greenplum
- IBM by Teradata
Carl Olofson pointed out on Twitter that DataScaler was an in-memory database technology just bought by Oracle. This inspired me to google on them, and I found a sparse DataScaler CEO blog. I link it because of an amusing juxtaposition — the second-to-last post says, in effect, “We make appliances and we recommend all these awesome technology design partners who helped us design the hardware,” while the very last post says “Designing our own hardware was a mistake.”
Fred Holahan is now VP of Marketing at VoltDB, which is a lesson to me about giving free consulting … Anyhow, Fred tells me that VoltDB has about a dozen users on their way to production, some of whom are headed to being VoltDB paying customers, some of whom are not.