June 20, 2011

Temporal data, time series, and imprecise predicates

I’ve been confused about temporal data management for a while, because there are several different things going on.

Date arithmetic. This of course has been around for a very long — er, for a very long time.
Time-series-aware compression. This has been around for quite a while too.
“Time travel”/snapshotting — preserving the state of the database at previous points in time. This is a matter of exposing (and not throwing away) the information you capture via MVCC (Multi-Version Concurrency Control) and/or append-only updates (as opposed to update-in-place). Those update strategies are increasingly popular for pretty much anything except update-intensive OLTP (OnLine Transaction Processing) DBMS, so time-travel/snapshotting is an achievable feature for most vendors.
Bitemporal data access. This occurs when a fact has both a transaction timestamp and a separate validity duration. A Wikipedia article seems to cover the subject pretty well, and I touched on Teradata’s bitemporal plans back in 2009.
Time series SQL extensions. Vertica explained its version of these to me a few days ago. I imagine Sybase IQ and other serious financial-trading market players have similar features.

In essence, the point of time series/event series SQL functionality is to do SQL against incomplete, imprecise, or derived data.* For example, suppose in one time series events happen at times 3.00, 3.01, 3.03, and 3.05; in another time series events happen at times 3.00, 3.02, 3.03, 3.04, and 3.05; and you want to join the time series together. Then you can do an event series join — i.e., you can join on each of the times 3.00, 3.01, 3.02, 3.03, 3.04, and 3.05, using interpolated values to check WHERE conditions. Vertica says that the only interpolation methods anybody ever wants are “first value in the interval,” “last value in the interval,” and “linear average of the endpoint values” (I forget whether that’s weighted by time-distance from the endpoints, or is a simple arithmetic mean).

*This is a limited counterexample to my dictum that you should explicitly store derived data because it’s too much trouble to keep re-deriving it on the fly.

Also cool is the “CONDITION_TRUE_EVENT” syntax Vertica has had since Release 4.0, which generalized SQL 99 windowing; you now can look at all the rows that meet a specific criterion — via an arbitrary expression — rather than just being restricted to a row count or strict time duration. Vertica says it’s gone further in the direction of event series pattern matching in Vertica 5.0; I didn’t grasp the details, but it sounded philosophically akin to Aster Data’s nPath, albeit without the arbitrary-language procedural extensibility.

Finally, Vertica also gave me an imprecise-SQL example that has little to do with time series. Vertica has a concept of “range join,” implemented so that telecom firms can save space by storing partial IP addresses. I’ve noted before that while we should retain all human-generated data, it will never be practical to retain all machine-generated data (because its volume will keep going up based on the same technological factors that keep storage cost per unit volume going down). This sounds like one interesting (if specialized) approach to storing machine-generated data in summarized form.

Categories: Analytic technologies, Data types, Investment research and trading, Log analysis, Sybase, Telecommunications, Theory and architecture, Vertica Systems

Subscribe to our complete feed!

Comments

2 Responses to “Temporal data, time series, and imprecise predicates”

Vertica as an analytic platform | DBMS 2 : DataBase Management System Services on June 20th, 2011 1:14 am

[…] Vertica has cool temporal and time series features. […]
Entity-centric event series analytics | DBMS 2 : DataBase Management System Services on October 19th, 2013 2:06 pm

[…] that was even before I recalled hearing the term from Vertica a couple of years […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Temporal data, time series, and imprecise predicates

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin