At Google, our mission is organizing all of the world’s information. We use literally thousands of different data formats to represent networked messages between servers, index records in repositories, geospatial datasets, and more. Most of these formats are structured, not flat. This raises an important question: How do we encode it all?
That sounds like a lot. On the other hand, if “data format” is just a synonym for “table structure,” “file structure,” and/or “schema,” it sounds more plausible. Varda goes on to say
a simple lists-and-records model … solves the majority of problems
Come to think of it, that sounds very consistent with the idea that MapReduce solves a large fraction of Google’s data management issues.