June 20, 2011

Temporal data, time series, and imprecise predicates

I’ve been confused about temporal data management for a while, because there are several different things going on.

In essence, the point of time series/event series SQL functionality is to do SQL against incomplete, imprecise, or derived data.* For example, suppose in one time series events happen at times 3.00, 3.01, 3.03, and 3.05; in another time series events happen at times 3.00, 3.02, 3.03, 3.04, and 3.05; and you want to join the time series together. Then you can do an event series join — i.e., you can join on each of the times 3.00, 3.01, 3.02, 3.03, 3.04, and 3.05, using interpolated values to check WHERE conditions. Vertica says that the only interpolation methods anybody ever wants are “first value in the interval,” “last value in the interval,” and “linear average of the endpoint values” (I forget whether that’s weighted by time-distance from the endpoints, or is a simple arithmetic mean).

*This is a limited counterexample to my dictum that you should explicitly store derived data because it’s too much trouble to keep re-deriving it on the fly.

Also cool is the “CONDITION_TRUE_EVENT” syntax Vertica has had since Release 4.0, which generalized SQL 99 windowing; you now can look at all the rows that meet a specific criterion — via an arbitrary expression — rather than just being restricted to a row count or strict time duration. Vertica says it’s gone further in the direction of event series pattern matching in Vertica 5.0; I didn’t grasp the details, but it sounded philosophically akin to Aster Data’s nPath, albeit without the arbitrary-language procedural extensibility.

Finally, Vertica also gave me an imprecise-SQL example that has little to do with time series. Vertica has a concept of “range join,” implemented so that telecom firms can save space by storing partial IP addresses. I’ve noted before that while we should retain all human-generated data, it will never be practical to retain all machine-generated data (because its volume will keep going up based on the same technological factors that keep storage cost per unit volume going down). This sounds like one interesting (if specialized) approach to storing machine-generated data in summarized form.

Comments

2 Responses to “Temporal data, time series, and imprecise predicates”

  1. Vertica as an analytic platform | DBMS 2 : DataBase Management System Services on June 20th, 2011 1:14 am

    [...] Vertica has cool temporal and time series features. [...]

  2. Entity-centric event series analytics | DBMS 2 : DataBase Management System Services on October 19th, 2013 2:06 pm

    [...] that was even before I recalled hearing the term from Vertica a couple of years [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.