June 19, 2011

Investigative analytics and derived data: Enzee Universe 2011 talk

I’ll be speaking Monday, June 20 at IBM Netezza’s Enzee Universe conference. Thus, as is my custom:

I’m posting draft slides.
I’m encouraging comment (especially in the short time window before I have to actually give the talk).
I’m offering links below to more detail on various subjects covered in the talk.

The talk concept started out as “advanced analytics” (as opposed to fast query, a subject amply covered in the rest of any Netezza event), as a lunch break in what is otherwise a detailed “best practices” session. So I suggested we constrain the subject by focusing on a specific application area — customer acquisition and retention, something of importance to almost any enterprise, and which exploits most areas of analytic technology. Then I actually prepared the slides — and guess what? The mix of subjects will be skewed somewhat more toward generalities than I first intended, specifically in the areas of investigative analytics and derived data. And, as always when I speak, I’ll try to raise consciousness about the issues of liberty and privacy, our options as a society for addressing them, and the crucial role we play as an industry in helping policymakers deal with these technologically-intense subjects.

Slide 3 refers back to a post I made last December, saying there are six useful things you can do with analytic technology:

Operational BI/Analytically-infused operational apps: You can make an immediate decision.
Planning and budgeting: You can plan in support of future decisions.
Investigative analytics (multiple disciplines): You can research, investigate, and analyze in support of future decisions.
Business intelligence: You can monitor what’s going on, to see when it necessary to decide, plan, or investigate.
More BI: You can communicate, to help other people and organizations do these same things.
DBMS, ETL, and other “platform” technologies: You can provide support, in technology or data gathering, for one of the other functions.

Slide 4 observes that investigative analytics:

Is the most rapidly advancing of the six areas …
… because it most directly exploits performance & scalability.

Slide 5 gives my simplest overview of investigative analytics technology to date:

Fast query
- Persistent storage (any data volume)
- RAM (10s -100s of gigabytes, or more)
Fast analytics
- Predictive modeling
- Transformation/tagging
- Graph

Slide 6 points out that this is all supported by cheap data creation and acquisition, specifically in the area of machine-generated data, which gets the full benefit of Moore’s Law.

Slides 7-13 point out how the example problem domain involves lots of analytic tasks performed on lots of kinds of data. Specific examples cited include text analytics and graph/relationship analytics.

Slide 14 contains the punch line, so I’ll quote it in full:

Derived data

You can’t keep re-analyzing all that in raw form …

… so don’t.

If you have one takeaway from this session, let it be the utter importance of derived data.

Slide 16 lists kinds of derived data that are important in the single application of reducing telco churn:

Normalized data
- Parsed/sessionized logs
- Text/sentiment highlights
- Social network graph(s)
- Web de-anonymization
- Household matching
Scores and buckets
- Demographic
- Psychographic
- Offer hot buttons
- (Dis)satisfaction
- Credit/fraud risk
- Lifetime customer value
- Influence on others!

And finally, Slide 17 is my first pass at best practices for dealing with derived data:

Evolving data warehouse schema
Data marts
- Physical or virtual
- Inputs/outputs to “EDW”
“Data science”
- Research != production
Multiple processing pipelines
- Log parsing
- Text
- Predictive analytics
- Generic ETL
- Streaming “ETL”

That last list looks like a starting point for a whole set of interesting future posts.

Categories: Analytic technologies, Business intelligence, Data warehousing, Derived data, EAI, EII, ETL, ELT, ETLT, GIS and geospatial, Netezza, Predictive modeling and advanced analytics, RDF and graphs, Text

Subscribe to our complete feed!

Comments

4 Responses to “Investigative analytics and derived data: Enzee Universe 2011 talk”

AV on June 19th, 2011 10:42 pm

Hi Curt:

Should you also mention various query paradigms? For some use cases, the ideal solution would be to write a SQL query? When do I switch over to map reduce? Should I be using SAS-UDF type functionality (http://bit.ly/ceQ3Nk) embedded inside the database?

Second, shed some light on the various components of the ‘derived data’ pipeline? For instance, why does EBay store raw data in HDFS (http://bit.ly/cJF4Jw) and then move derived data into a SQL data warehouse?

Hope this helps.

-av
Curt Monash on June 19th, 2011 10:59 pm

AV,

All good subjects. All beyond the scope of the slide deck, if not quite of the talk.

Also, your first question is to some extent DBMS-specific.
Enzee Universe 2011 « EMA Blog Community on June 20th, 2011 5:47 pm

[…] Investigative analytics and derived data: Enzee Universe 2011 talk (dbms2.com) […]
Juggling analytic databases : DBMS 2 : DataBase Management System Services on March 16th, 2012 6:09 am

[…] Derived data based on other data already in the data warehouse. […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Investigative analytics and derived data: Enzee Universe 2011 talk

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin