November 2, 2011

The cool aspects of Odiago WibiData

Christophe Bisciglia and Aaron Kimball have a new company.

It’s called Odiago, and is one of my gratifyingly more numerous tiny clients.
Odiago’s product line is called WibiData, after the justly popular We Be Sushi restaurants.
We’ve agreed on a split exclusive de-stealthing launch. You can read about the company/founder/investor stuff on TechCrunch. But this is the place for — well, for the tech crunch.

WibiData is designed for management of, investigative analytics on, and operational analytics on consumer internet data, the main examples of which are web site traffic and personalization and their analogues for games and/or mobile devices. The core WibiData technology, built on HBase and Hadoop,* is a data management and analytic execution layer. That’s where the secret sauce resides. Also included are:

REST APIs for interactive access.
Import/export tools, including JDBC access.
Management tools.
Analytic libraries — data mining, predictive analytics, machine learning, and so on.

The whole thing is in beta, with about three (paying) beta customers.

*And Avro and so on.

The core ideas of WibiData include:

ALL data pertaining to a single user (or mobile device) is kept in a single, possibly very long, HBase row.
There are two primary operators in WibiData, Produce and Gather.
- Produce operates on single rows. It can operate on one row at HBase speed (milliseconds) if you need to inform an interactive user response. Or it can operate on the whole database in batch via Hadoop MapReduce.
- It is reasonable to think of Produce as mainly doing two things. One is the aforementioned serving of data out of WibiData into interactive applications. The other is scoring, classifying, recommending, etc. on individual users (i.e. rows), in line with an analytic model.
- Gather typically operates on all your rows at once, and emits suitable input for a MapReduce Reduce step. It is reasonable to think of Gather as being a key cog in the training of analytic models.
HBase schema management is done at the WibiData system level, not directly in applications. There’s a WibiData HBase data dictionary, powered by a set of system tables, that specifies cell data types/record types and, in effect, primitive schemas.

WibiData-enhanced HBase differs from relational DBMS in most of the ways you would imagine, both good and bad. In particular:

Depending on how you look at it, WibiData-enhanced HBase either has no DML (Data Manipulation Language) at all, or else has one that ‘s a lot less rich than SQL.
WibiData-enhanced HBase schemas are much more dynamic than SQL schemas.
WibiData-enhanced HBase schemas can have nested or recursive data structures, such as array-valued cells.

To expand on each of those points in turn:

WibiData’s underlying one-giant-table philosophy notwithstanding, there are times you manage multiple tables with it. (For example, you ingest data into WibiData however you can, and then run transformations — typically batch — until the data is in the preferred structure.) While Wibidata does have ways to simulate joins, foreign keys, and so on, there’s nothing resembling referential integrity or foreign key constraints.

WibiData takes single-table schema flexibility to an extreme. Not only can different rows in the same table have different associated columns — something that relational systems can in effect also do via NULL values — but schemas can even change over the life of a column. If you have an array-valued cell storing the results of a marketing campaign, and you start recording more data partway through the campaign, then different rows in the table will, in the same column, hold different-sized arrays.

That nesting can also get pretty serious; where you’d have a single value in a relational table, you might have the equivalent of a whole relational table (or at least selection/view) in WibiData-enhanced HBase. For example, if a user visits the same web page ten times, and each time 50 attributes are recorded (including a timestamp), all 500 data – to use the word “data” in its original “plural of datum” sense – would likely be stored in the same WibiData cell.

That’s about all Odiago is disclosing about WibiData right now. Christophe will also be talking at Hadoop World next week, and presumably can be hit up with any burning questions then.

Categories: Data models and architecture, Hadoop, HBase, NoSQL, Predictive modeling and advanced analytics, Web analytics, WibiData

Subscribe to our complete feed!

Comments

14 Responses to “The cool aspects of Odiago WibiData”

Aaron Kimball on November 2nd, 2011 11:38 am

Regarding your description of our system as designed for “consumer internet data:”

WibiData has enjoyed the most traction to date in the high-tech industry, but works well with any type of user- or customer-centric data: finance, retail, etc.

For more info, come see our Hadoop World talk!
Vlad Rodionov on November 2nd, 2011 2:35 pm

Hadoop county, California, 2011. Gold Rush. Thinking about opening liqueur store and saloon over there.
Below the surface of Cloudera founder’s new project — Cloud Computing News on November 2nd, 2011 3:11 pm

[…] industry analyst Curt Monash delved into that issue on his DBMS2 blog, explaining how WibiData does what it does. Here’s how Monash describes the […]
Investigative Analytics: Cloudera Founder Launches New Startup Backed by Eric Schmidt | SiliconANGLE on November 2nd, 2011 3:31 pm

[…] Odiago this morning, giving to the business side scoop to TechCrunch and the technical details to Curt Monash. The company is launching a product called Wibidata (“we be data”) specializing in data […]
Cloudera founder’s new project shows Hadoop’s future - Actualidad on November 2nd, 2011 4:39 pm

[…] industry analyst Curt Monash delved into that issue on his DBMS2 blog, explaining how WibiData does what it does. Here’s how Monash describes the […]
Richard Tibbetts on November 2nd, 2011 6:31 pm

I’d be interested to understand how the interactive store compares to Mongo for app development. I’ve been thinking integration between mongo apps and an analytic backend or replica would be attractive.
UW CSE News » TechCrunch on UW CSE alum Christophe Bisciglia’s startup Odiago on November 3rd, 2011 1:23 am

[…] Read the TechCrunch post here. A more technical description of Odiago’s WibiData offering appears in DBMS2 here. […]
The cool aspects of Odiago WibiData — Functionals on November 3rd, 2011 3:00 am

[…] built on HBase and Hadoop,* is a data management and analytic execution layer. More details here. Cancel […]
Cloudera Founder Launches Odiago and Big Data Product WibiData | Cloud Computing Today on November 5th, 2011 12:48 pm

[…] Curt Monash provides a terrific summary of the technical components of WibiData in his blog DBMS2. LD_AddCustomAttr("AdOpt", "1"); LD_AddCustomAttr("Origin", "other"); […]
The cool aspects of Odiago WibiData « Another Word For It on November 5th, 2011 7:42 pm

[…] The cool aspects of Odiago WibiData […]
How WibiData Works | WibiData on February 6th, 2012 8:38 pm

[…] Monash, author of DBMS2 also does a great job of summarizing some of the highlights of WibiData, as well as helping clarify how WibiData fits into the taxonomy of investigative, operational, and […]
How WibiData Works | Data Radar on February 15th, 2012 9:25 pm

[…] Monash, author of DBMS2 also does a great job of summarizing some of the highlights of WibiData, as well as helping clarify how WibiData fits into the taxonomy of investigative, operational, and […]
Strata Week: Cloudera founder has a new data product - O'Reilly Radar on August 29th, 2012 7:00 pm

[…] Hbase to analyze consumer web data. Database industry analyst Curt Monash describes WibiData on his DBMS2 blog: WibiData is designed for management of, investigative analytics on, and operational analytics on […]
WibiData and its Kiji technology | DBMS 2 : DataBase Management System Services on June 30th, 2013 10:46 pm

[…] Spring — running over Hadoop/HBase. Except for some newfound modularity, it is much like what I described at the time of WibiData’s launch or what WibiData further disclosed a few months later. Key aspects […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

The cool aspects of Odiago WibiData

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin