Christophe Bisciglia and Aaron Kimball have a new company.
- It’s called Odiago, and is one of my gratifyingly more numerous tiny clients.
- Odiago’s product line is called WibiData, after the justly popular We Be Sushi restaurants.
- We’ve agreed on a split exclusive de-stealthing launch. You can read about the company/founder/investor stuff on TechCrunch. But this is the place for — well, for the tech crunch.
WibiData is designed for management of, investigative analytics on, and operational analytics on consumer internet data, the main examples of which are web site traffic and personalization and their analogues for games and/or mobile devices. The core WibiData technology, built on HBase and Hadoop,* is a data management and analytic execution layer. That’s where the secret sauce resides. Also included are:
- REST APIs for interactive access.
- Import/export tools, including JDBC access.
- Management tools.
- Analytic libraries — data mining, predictive analytics, machine learning, and so on.
The whole thing is in beta, with about three (paying) beta customers.
*And Avro and so on.
The core ideas of WibiData include:
- ALL data pertaining to a single user (or mobile device) is kept in a single, possibly very long, HBase row.
- There are two primary operators in WibiData, Produce and Gather.
- Produce operates on single rows. It can operate on one row at HBase speed (milliseconds) if you need to inform an interactive user response. Or it can operate on the whole database in batch via Hadoop MapReduce.
- It is reasonable to think of Produce as mainly doing two things. One is the aforementioned serving of data out of WibiData into interactive applications. The other is scoring, classifying, recommending, etc. on individual users (i.e. rows), in line with an analytic model.
- Gather typically operates on all your rows at once, and emits suitable input for a MapReduce Reduce step. It is reasonable to think of Gather as being a key cog in the training of analytic models.
- HBase schema management is done at the WibiData system level, not directly in applications. There’s a WibiData HBase data dictionary, powered by a set of system tables, that specifies cell data types/record types and, in effect, primitive schemas.
WibiData-enhanced HBase differs from relational DBMS in most of the ways you would imagine, both good and bad. In particular:
- Depending on how you look at it, WibiData-enhanced HBase either has no DML (Data Manipulation Language) at all, or else has one that ‘s a lot less rich than SQL.
- WibiData-enhanced HBase schemas are much more dynamic than SQL schemas.
- WibiData-enhanced HBase schemas can have nested or recursive data structures, such as array-valued cells.
To expand on each of those points in turn:
WibiData’s underlying one-giant-table philosophy notwithstanding, there are times you manage multiple tables with it. (For example, you ingest data into WibiData however you can, and then run transformations — typically batch — until the data is in the preferred structure.) While Wibidata does have ways to simulate joins, foreign keys, and so on, there’s nothing resembling referential integrity or foreign key constraints.
WibiData takes single-table schema flexibility to an extreme. Not only can different rows in the same table have different associated columns — something that relational systems can in effect also do via NULL values — but schemas can even change over the life of a column. If you have an array-valued cell storing the results of a marketing campaign, and you start recording more data partway through the campaign, then different rows in the table will, in the same column, hold different-sized arrays.
That nesting can also get pretty serious; where you’d have a single value in a relational table, you might have the equivalent of a whole relational table (or at least selection/view) in WibiData-enhanced HBase. For example, if a user visits the same web page ten times, and each time 50 attributes are recorded (including a timestamp), all 500 data – to use the word “data” in its original “plural of datum” sense – would likely be stored in the same WibiData cell.
That’s about all Odiago is disclosing about WibiData right now. Christophe will also be talking at Hadoop World next week, and presumably can be hit up with any burning questions then.