October 18, 2013

Entity-centric event series analytics

Much of modern analytic technology deals with what might be called an entity-centric sequence of events. For example:

Analytic questions are asked along the lines “Which sequences of events are most productive in terms of leading to the events we really desire?”, such as product sales. Another major area is sessionization, along with data preparation tasks that boil down to arranging data into meaningful event sequences in the first place.

A number of my clients are focused on such scenarios, including WibiData, Teradata Aster (e.g. via nPath), Platfora (in the imminent Platfora 3), and others. And so I get involved in naming exercises. The term entity-centric came along a while ago, because “user-centric” is too limiting. (E.g., the data may not be about a person, but rather specifically about the actions taken on her mobile device.) Now I’m adding the term event series to cover the whole scenario, rather than the “event sequence(s)” I might appear to have been hinting at above.

I decided on “event series” earlier this week, after noting that: 

And that was even before I recalled hearing the term from Vertica a couple of years ago.

Analyzing event series is tricky even when all the events are of the same kind, and hence naturally fit into the same database table. For example:

When you’re correlating events from multiple database columns or tables – or their nested data structure equivalents – things get hairier yet.

I also think that predictive modeling on event series, a huge subject for consumer internet companies, still has a long way to go. How exactly do you characterize the independent variables? For that matter, how do you characterize the dependent ones?

Bottom line: Event series are likely to be a major subject of data management and analytics innovation for a number of years to come.

Comments

13 Responses to “Entity-centric event series analytics”

  1. Morgan Goeller on October 18th, 2013 8:30 am

    An ointeresting new engine for real time is Kibana (http://demo.kibana.org). A very beautiful and dynamic tool that really shows the powerr of NoSQL and text search for Analytics in human real time. All JavaScript and rendered in the browser as well.

    The combo of Elasticsearch+Kibana is a really nice one-stop shop for building practical analytics on event series data, especially if there is fuzzy matching involved. We have seen a lot of success here, especially with identity resolution.

  2. Thomas Bernhardt on October 18th, 2013 12:24 pm

    I think event series analysis is simply CEP and there is not need for another term. Of perhaps CEP could be renamed to event series analysis.

    I see CEP not necessarily about “streaming” or “immediacy”, since its about providing a domain-specific analysis for correlation, pattern detection on events whether historical, recorded or arriving sooner or later.

    Esper CEP customers have been using our EPL (event processing language) for performing event series analysis for 6+ years.

    Product Info + docs:
    http://esper.codehaus.org
    Company info:
    http://www.espertech.com

    Thomas Bernhardt
    CTO EsperTech

  3. David Gruzman on October 21st, 2013 5:11 pm

    While MapReduce is not efficient for SQL emulation I think it is perfect match for this kind of processing.
    It is natural in hadoop to take multiple inputs, group them by some key and then write custom logic in reducer to analyze it.
    I do not think logic of processing series of events can be expressed in SQL. If so – why should I pay for DB license to run my own logic?

  4. Curt Monash on October 21st, 2013 5:47 pm

    Hi David!

    I agree that this kind of processing is in most cases embarrassingly parallel (certain explorations in predictive modeling might be exceptions), which in most respects would make it a great candidate for MapReduce. However, in the ideal case a lot of it happens at interactive speeds, the most obvious example being the web page that is informed by your actions a moment ago on its predecessor. Those speeds aren’t a great fit for MapReduce — hence, for example, the Produce/Gather alternative from WibiData.

  5. Kyle Wild on October 21st, 2013 9:37 pm

    I like your thoughts on this, Curt. Thanks for sharing!

    I thought you or your audience may be interested in this further reading:

    How to Think About Event Data
    https://keen.io/blog/53958349217/analytics-for-hackers-how-to-think-about-event-data

  6. Julian Hyde on October 25th, 2013 5:02 pm

    Event series seem to be a new spin on an an old idea. The “fact tables” that are central to the traditional data warehouse contain not entities but facts. A fact, so the definition goes, is an occurrence of a business event.

    Are event series different from facts? If not, can we please just stick with the old term?

    I don’t disagree that data management systems can do more to manage event data. DW practitioners have been struggling with problems like sessionization for years.

  7. Curt Monash on October 25th, 2013 7:01 pm

    Julian,

    I guess we could say, at a very high level of abstraction, that everything in a database records a fact and also that everything in a database records an event.

    But unless you want to say that therefore all of database management is trivial, because all databases in principle serve the same purposes, I don’t understand your point.

  8. Julian Hyde on October 26th, 2013 2:02 am

    Curt,

    “Event-series analysis” as you describe it is a data modeling technique. Of course all data modeling techniques ultimately boil down to storing real-world information as records on disk, but that doesn’t mean they’re all the same.

    Entity-relationship modeling (ER) is the best known modeling technique. Entities are generally represented as tables, and relationships by either foreign keys or intersection tables. An entity in the real world, such as a person, has identity, and its state changes over time. The corresponding record in the database has a key, and is updated as its attributes change.

    Dimensional modeling (DM) [1] is a different paradigm. In dimensional modeling, a fact represents an occurrence of a business process, not an entity. It doesn’t have a unique identifier, but it does have a timestamp. It is not generally updated, except to make corrections. A case can be made that facts can be aggregated to represent higher-level processes such as “monthly sales”.

    My point was that “event-series analysis” has a lot in common with DM, and in particular your “event” is very similar to DM’s “fact” concept. I value your insights into how people are using data in the real world, and this particular kind of analysis could be an interesting trend. But if you want to coin a new term, you have to explain to old dogs like me why the old one will not suffice.

    Julian

    [1] http://www.kimballgroup.com/1997/08/02/a-dimensional-modeling-manifesto/

  9. Curt Monash on October 26th, 2013 4:30 am

    Julian,

    “Event-series analysis” as you describe it is a data modeling technique.

    Inaccurate. For starters, I’m looking at full analytic stacks. Hence my references to BI UIs and to predictive modeling.

    Also, efforts may be made to accelerate certain operations that are awkward with standard RDBMS and SQL semantics. Hence my references to Aster nPath and to Vertica.

    My point was that “event-series analysis” has a lot in common with DM, and in particular your “event” is very similar to DM’s “fact” concept. I value your insights into how people are using data in the real world, and this particular kind of analysis could be an interesting trend. But if you want to coin a new term, you have to explain to old dogs like me why the old one will not suffice.

    Nor is the modeling necessarily traditional. For example, in a typical relational model, a fact table will have like facts. If there are different kinds of facts stored, they will be stored in different tables, which of course may share common keys. (And if the database is big enough to be parallelized, it is often wise to distribute all the tables on that common key.) By way of contrast, somebody who’s serious about event series analytics may arrange data differently. For example, WibiData/Kiji stores all facts about a single entity in a single HBase row, likely with lots of nesting.

    I think links supporting all of the above were in my post. If not, I’ll gladly post them down here.

  10. Rules for names | Strategic Messaging on November 3rd, 2013 1:33 am

    […] them for you — are sort of like time series but also somewhat like event streams. “Event series” was the winning […]

  11. An idealized log management and analysis system — from whom? | DBMS 2 : DataBase Management System Services on September 11th, 2014 1:58 pm

    […] more players are doing product management with an explicit conception either of log management or event-series analytics, so for this post I’ll share that focus […]

  12. Customer Journey Analytics | Chronomics on October 18th, 2014 1:13 pm

    […] custom methodology, algorithms, and software. Monash has a nice coverage of event stream analytics here. Most people confuse event stream analytics with complex event processing, or time series analysis […]

  13. Datameer at the time of Datameer 5.0 | DBMS 2 : DataBase Management System Services on October 26th, 2014 4:42 am

    […] Datameer does have a bit in the way of event series visualization, it seems […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.