Christophe Bisciglia and Aaron Kimball have a new company.
- It’s called Odiago, and is one of my gratifyingly more numerous tiny clients.
- Odiago’s product line is called WibiData, after the justly popular We Be Sushi restaurants.
- We’ve agreed on a split exclusive de-stealthing launch. You can read about the company/founder/investor stuff on TechCrunch. But this is the place for — well, for the tech crunch.
WibiData is designed for management of, investigative analytics on, and operational analytics on consumer internet data, the main examples of which are web site traffic and personalization and their analogues for games and/or mobile devices. The core WibiData technology, built on HBase and Hadoop,* is a data management and analytic execution layer. That’s where the secret sauce resides. Also included are:
- REST APIs for interactive access.
- Import/export tools, including JDBC access.
- Management tools.
- Analytic libraries — data mining, predictive analytics, machine learning, and so on.
The whole thing is in beta, with about three (paying) beta customers.
*And Avro and so on.
The core ideas of WibiData include:
- ALL data pertaining to a single user (or mobile device) is kept in a single, possibly very long, HBase row.
- There are two primary operators in WibiData, Produce and Gather.
- Produce operates on single rows. It can operate on one row at HBase speed (milliseconds) if you need to inform an interactive user response. Or it can operate on the whole database in batch via Hadoop MapReduce.
- It is reasonable to think of Produce as mainly doing two things. One is the aforementioned serving of data out of WibiData into interactive applications. The other is scoring, classifying, recommending, etc. on individual users (i.e. rows), in line with an analytic model.
- Gather typically operates on all your rows at once, and emits suitable input for a MapReduce Reduce step. It is reasonable to think of Gather as being a key cog in the training of analytic models.
- HBase schema management is done at the WibiData system level, not directly in applications. There’s a WibiData HBase data dictionary, powered by a set of system tables, that specifies cell data types/record types and, in effect, primitive schemas.
|Categories: Data models and architecture, Hadoop, HBase, NoSQL, Predictive modeling and advanced analytics, Web analytics, WibiData||14 Comments|