July 8, 2008

Google has thousands of internal data formats, mostly simple ones

In connection with the release of Protocol Buffers, Kenton Varda of Google wrote:

At Google, our mission is organizing all of the world’s information. We use literally thousands of different data formats to represent networked messages between servers, index records in repositories, geospatial datasets, and more. Most of these formats are structured, not flat. This raises an important question: How do we encode it all?

That sounds like a lot. On the other hand, if “data format” is just a synonym for “table structure,” “file structure,” and/or “schema,” it sounds more plausible. Varda goes on to say

a simple lists-and-records model … solves the majority of problems

Come to think of it, that sounds very consistent with the idea that MapReduce solves a large fraction of Google’s data management issues.


2 Responses to “Google has thousands of internal data formats, mostly simple ones”

  1. Daniel Weinreb on July 9th, 2008 7:19 am

    The printed representation looks an awful lot like JSON (http://en.wikipedia.org/wiki/JSON). I wonder why not just use JSON, which is well-known and precisely specified? Anyway, this and JSON are very useful for many applications.

    I agree that it’s another IDL. It’s not all THAT simple. But I haven’t used IDL’s too much in practice and probably it’s simpler than CORBA’s IDL! So, it looks nice; no major breakthrough or anything like that, just an incremental improvement on what we all know about. That’s fine; incremental improvements are perfectly respectable.

  2. Curt Monash on July 11th, 2008 4:37 am


    They addressed the JSON point directly, albeit briefly, in the comment thread.

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.