My clients at Odiago, vendors of WibiData, have changed their company name simply to WibiData. Even better, they blogged with more detail as to how WibiData works, in what is essentially a follow-on to my original WibiData post last October. Among other virtues, WibiData turns out to be a poster child for my views on derived data and the corresponding schema evolution.
Interesting quotes include:
WibiData is designed to store … transactional data side-by-side with profile and other derived data attributes.
… the ability to add new ad-hoc columns to a table enables more flexible analysis: output data that is the result of one analytic pipeline is stored adjacent to its input data, meaning that you can easily use this as input to second- or third-order derived data as well.
schemas can vary over time; you can easily add a field to a record, or delete a field. … But even though you start collecting that new data, your existing analysis pipelines can treat records like they always did; programs that don’t yet know about the new cookie are still compatible with both the old records already collected, and the new records with the additional field. New programs fill in default values for old data recorded before a field was added, applying the new schema at read time.
schemas for every column are stored in a data dictionary that matches column names with their schemas, as well as human-readable descriptions of the data.
Interesting aspects of the post that don’t lend themselves as well to being excerpted include:
- How the Produce-Gather “analysis calculus” — i.e. framework — works.
- How this all ties into Apache projects (and sub-projects) such as Hadoop, HBase, and Avro.