Derived data

The management of data that is derived, augmented, enhanced, adjusted, or cooked — as opposed to just the raw stuff.

May 30, 2011

Another category of derived data

Six months ago, I argued the importance of derived analytic data, saying

… there’s no escaping the importance of derived/augmented/enhanced/cooked/adjusted data for analytic data processing. The five areas I have in mind are, loosely speaking:

  • Aggregates, when they are maintained, generally for reasons of performance or response time.
  • Calculated scores, commonly based on data mining/predictive analytics.
  • Text analytics.
  • The kinds of ETL (Extract/Transform/Load) Hadoop and other forms of MapReduce are commonly used for.
  • Adjusted data, especially in scientific contexts.

Probably there are yet more examples that I am at the moment overlooking.

Well, I did overlook at least one category. :)

A surprisingly important kind of derived data is metadata, especially for large, poly-structured data sets. For example, CERN has vastly quantities of experiment sensor data, stored as files; just the metadata alone fills over 10 terabytes in an Oracle database. MarkLogic is big on storing derived metadata, both on the publishing/media and intelligence sides of the business.

Read more

November 29, 2010

Data that is derived, augmented, enhanced, adjusted, or cooked

On this food-oriented weekend, I could easily go on long metaphorical flights about the distinction between “raw” and “cooked” data. I’ll spare you that part — reluctantly, given my fondness for fresh fruit, sushi, and steak tartare — but there’s no escaping the importance of derived/augmented/enhanced/cooked/adjusted data for analytic data processing. The five areas I have in mind are, loosely speaking:

Read more

October 6, 2010

eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more

I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla’s home base of SLAC, which used to stand for “Stanford Linear Accelerator Center”). Todd Walter of Teradata also sat in on the latter part of the conversation. Things I learned included:  Read more

← Previous Page

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.