It should surprise nobody that web analytics – and specifically clickstream data — is one of the biggest areas for high-end data warehousing. For example:
- I believe that both of the previously mentioned petabyte+ databases on Greenplum will feature clickstream data.
- Aster Data’s largest disclosed database, by almost two orders of magnitude, is at MySpace.
- Clickstream analytics is a big application area for Vertica Systems.
- Clickstream analytics is a big application area for Netezza.
- Infobright’s customer success stories appear to be concentrated in clickstream analytics.
- Coral8 tells me that CEP is also being used for clickstream data, although I suspect that a lot of Coral8’s evidence in that regard comes from a single flagship account. Edit: Actually, Coral8 has a bunch of clickstream customers.
But what surprised me a bit was to discover that clickstream data is joined at the hip to more general network event data. I hadn’t heard much about that until recently. But over the past month or so, Greenplum, Aster, Vertica, Coral8, and some very big users have all told me the same thing:
Where there’s clickstream data, there’s usually also network event data – and the latter is in even higher volumes.
It’s obvious what one does with clickstream data (or at least what one tries to do — current web analytics are still highly primitive, for reasons I’ll lay out in another post). But what one does with network event data is a little murkier, and how one integrates clickstream and network event data is very unclear. I hear phrases like “All that TIBCO data is just falling on the floor,” from users and vendors alike. (Come to think of it, I’ve heard that exact phrasing from one vendor and one user, and the user is that vendor’s biggest customer … coincidence?) I hear the general sentiment that network event data is useful for optimizing network operations. But I haven’t gotten a lot more detail than that. Vertica did mention some network event management VARs – e.g., OpenService and nMetrics are on Vertica’s customer list – but I got the sense those might be network event management pure plays, not outfits looking at the integration of clickstream and network data.
The one clear application message I’ve gotten so far for clickstream/network event integration came from Coral8. The idea is:
- You track where on your websites there is anomalous user data – e.g., unusually high rates of abandonment – which might be indicative of technical difficulties.
- You track network anomalies, the vast majority of which are false alarms.
- If a website anomaly and a network anomaly do match up – boom, the network jocks spring into action and fix the problem as fast as they can.
Of course, coming from Coral8, that was a real-time story. But it seems as if similar scenarios would make sense on a data warehousing basis as well.