Edit: Unfortunately, this post and its sequel rely on Aster Data posts that Aster’s buyer Teradata no longer makes easily available.
At the same time as it rolled out its cloud story, Aster Data told of nPath, a MapReduce-based feature in nCluster. As best I understand it, the core idea of nPath is that it preprocesses sequential data via MapReduce so that you can then do ordinary SQL on it. (Steve Wooledge’s blog post about nPath outlines why that might be needed. Point 1 in Mayank Bawa’s August, 2008 post is much more concise. ) Now, that might seem to contradict the syntax, which is all about MapReduce being invoked via SQL — still, it’s what’s really going on.
That leads to two obvious questions: What is nPath used (or useful) for? and How is the preprocessing done anyway? Steve offers some ideas for the former in his post, but anyhow the most obvious applications are the analyses of actual paths, most notably in web clickstreams. Since a large fraction of Aster’s current customers use it for web analytics, it seems natural that that’s what nPath is being used for today.
The answer to the latter question was plain old regular expressions, with the rest being up to you. That’s hardly surprising, as MapReduce’s much-vaunted support for text analytics seems to boil down to regex as well. Still, I suspect the world needs some higher-level MapReduce libraries for this kind of application really to take flight.