As much as I believe in the XLDB conferences, I only found time to go to (a big) part of one day of XLDB 4 myself. In general:
- XLDB 4 had a good crowd, including Phil Bernstein (quiet), Mike Stonebraker (not quiet), Martin Kersten (ditto), Luke Lonergan (ditto), Todd Walter (almost unrecognizable without his usual cowboy gear), Oliver Ratzesberger, and a bunch of actual science types.
- XLDB 4 had one weakness — panels with lots of participants, but only a single microphone among them. That tends to make for serial declamations more than true interactive discussion, at least until the audience starts chiming in, which thankfully it tends to eventually do. (I had the same problem in spades while moderating the Boston Big Data Summit panel last year; at least at XLDB 4 nobody was TRYING to filibuster.)
My notes have unfortunately disappeared, but from memory:
- Mike Stonebraker asserted that SciDB outperforms sharded MySQL by two orders of magnitude for some classes of scientific application. One of the big reasons was that SciDB lets you overlap partitions, so that for any feature you want to extract, you can be confident there’s at least one partition that actually contains it.
- I chatted with Peter Breunig of Chevron about analytic issues in the oil & gas industry. I got the impression:
- Refineries are generally well-instrumented with sensors.
- Oil wells may not be, especially the less valuable/lower producing ones.
- He’d love to scatter passive sensors all around, waiting for natural tremors — as opposed to just geologist-set explosions — to provide more insight into what’s under the ground.
- 50-100 TB geological data sets are common. Processing them takes 2-3 weeks. As the technology gets better, so do the results (rather than the time being shortened).
- All this suggests that there’s a huge need for better technology in resovoir analysis.
- His other big unmet analytic desire is refinery simulation.
- Kevin Winsen told about the proposed radio astronomy project ASKAP, which will have raw data volumes that make the LSST’s look small. (More precisely, ASKAP is the name proposed by Australia, one of the two finalists for the location; South Africa presumably has a different name for it.) 8 petabytes/day were mentioned, although most of this will be rapidly discarded. That could be the largest unclassified data acquisition rate out there, although it’s known that there’s a classified one at >10 PB/day (image data).
- Health care researchers repeatedly complained that privacy regulations get in the way of them using clinical data for medical research. Just more grist for my “HIPAA must die so that people can live” mill.
- Mike Stonebraker is pushing the idea of a “science benchmark.” (A paper on same has been posted.) The idea is that the existence of said benchmark should provide a spur for DBMS vendors to make their products run faster for scientific purposes, in line with the supposed salutary effects of TPC-A, TPC-B, and TPC-C. Notwithstanding that attendees included Oracle, Microsoft, EMC/Greenplum, Teradata, and Aster Data — with Greenplum, IBM, and Aster Data also being sponsors — I am skeptical because: