October 19, 2011

What those nested data structures are about

As I’ve noted before, the very big web companies have an issue with nested data structures. The subject came up in XLDB talks yesterday too, so my big goal for lunch was to finally understand what was being talked about. Sitting at a table full of eBay and LinkedIn folks turned out to be a good tactic.

The explanation was led by Oliver Ratzesberger, late of eBay* and progenitor of eBay’s Singularity project. In simplest terms, one event can spawn a lot of event attribute information, perhaps in the form of name-value pairs, which it then makes sense to store together in some way. The example Oliver dwelled on was that, on any given web page, there can be 100+ pieces of information to record, including:

*Edit: Oliver subsequently moved on to Sears and then Teradata.

There are several reasons why one might wish to store this information in ways that grieve relational purists. First, reconstructing all this information via joins would be brutally expensive. What’s more, reconstructing all this information via joins could be impractical. Some comes from third party ad servers, which might not reproduce the same ads upon demand. Other is in the form of rankings, which can’t always be reliably reproduced from one query to the next. (That’s just one of several reasons text search and relational DBMS are an awkward fit.)

Also, there’s a strong dynamic schema flavor to these databases. The list of attributes for one web click might be very different in kind from the list for the next page. Forcing that kind of variability into a fixed relational schema, while theoretically possible, doesn’t necessarily make a lot of sense.

Comments

6 Responses to “What those nested data structures are about”

  1. Michael McIntire on October 20th, 2011 6:42 pm

    Well Curt, here’s your answer… It’s Sears Holdings.

    http://www.linkedin.com/in/oliverratzesberger

  2. Curt Monash on October 21st, 2011 9:47 am

    Hi Michael!

    That was my first guess. Oliver was only modestly taken aback when I mentioned it. :)

    Never mind Sears; KMart had an industry-leading, publicly visible CIO in the 90s or so, Dave Carlson. Oliver is stepping into a rich tradition.

  3. Alessandro on January 29th, 2013 1:24 pm

    This might be a naïve question, but regarding “reconstructing all this information via joins would be brutally expensive,” what about materialized joins? I’ve never actually had the chance to use materialized joins (my current company uses Mongo and previous places I’ve worked never had this problem), so I have no idea how practical they are. But when I read about them, they seemed like a possible solution to this problem.

  4. DBMS development and other subjects | DBMS 2 : DataBase Management System Services on March 18th, 2013 1:32 am

    […] the nested data structure story? (It seems there is […]

  5. Impala and Parquet | DBMS 2 : DataBase Management System Services on January 31st, 2014 9:11 am

    […] addition to ordinary tables, Parquet can handle nested data structures, ala Dremel. That is, a field can be array-valued, a cell in the array can itself be array-valued, […]

  6. NoSQL vs. NewSQL vs. traditional RDBMS | DBMS 2 : DataBase Management System Services on March 28th, 2014 1:19 pm

    […] model. Increasingly often, dynamic schemas seem preferable to fixed ones. Internet-tracking nested data structures are just one of the […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.