Much of modern analytic technology deals with what might be called an entity-centric sequence of events. For example:
- You receive and open various emails.
- You click on and look at various web sites and pages.
- Specific elements are displayed on those pages.
- You study various products, and even buy some.
Analytic questions are asked along the lines “Which sequences of events are most productive in terms of leading to the events we really desire?”, such as product sales. Another major area is sessionization, along with data preparation tasks that boil down to arranging data into meaningful event sequences in the first place.
A number of my clients are focused on such scenarios, including WibiData, Teradata Aster (e.g. via nPath), Platfora (in the imminent Platfora 3), and others. And so I get involved in naming exercises. The term entity-centric came along a while ago, because “user-centric” is too limiting. (E.g., the data may not be about a person, but rather specifically about the actions taken on her mobile device.) Now I’m adding the term event series to cover the whole scenario, rather than the “event sequence(s)” I might appear to have been hinting at above.
I decided on “event series” earlier this week, after noting that:
- “Time series” isn’t quite right, because it generally refers to a collection of time-stamped data of a single datatype.
- “Event stream” isn’t quite right, because it connotes the immediacy of complex event/stream processing.
- “Series” sounds better than “sequence”. While “sequence” would be the more accurate term from a strict mathematical standpoint, that ship sailed when time series weren’t called “time sequences” instead.
And that was even before I recalled hearing the term from Vertica a couple of years ago.
Analyzing event series is tricky even when all the events are of the same kind, and hence naturally fit into the same database table. For example:
- Even the most specific of pattern-matches can, in SQL, require several nestings of time-stamp range sub-queries. (How else do you ensure that Event 2 happened after Event 1 but before Event 3?)
- The most common end-user business intelligence UIs aren’t well suited to such analyses; specific new ones are being invented instead. I think they’re already OK for static views – trees, funnels, etc. – but I haven’t seen anything yet that seems great for navigation, or for human real-time interaction.
When you’re correlating events from multiple database columns or tables – or their nested data structure equivalents – things get hairier yet.
I also think that predictive modeling on event series, a huge subject for consumer internet companies, still has a long way to go. How exactly do you characterize the independent variables? For that matter, how do you characterize the dependent ones?
Bottom line: Event series are likely to be a major subject of data management and analytics innovation for a number of years to come.