1. A couple years ago I wrote skeptically about integrating predictive modeling and business intelligence. I’m less skeptical now.
- The predictive experimentation I wrote about over Thanksgiving calls naturally for some BI/dashboarding to monitor how it’s going.
- If you think about Nutonian’s pitch, it can be approximated as “Root-cause analysis so easy a business analyst can do it.” That could be interesting to jump to after BI has turned up anomalies. And it should be pretty easy to whip up a UI for choosing a data set and objective function to model on, since those are both things that the BI tool would know how to get to anyway.
I’ve also heard a couple of ideas about how predictive modeling can support BI. One is via my client Omer Trajman, whose startup ScalingData is still semi-stealthy, but says they’re “working at the intersection of big data and IT operations”. The idea goes something like this:
- Suppose we have lots of logs about lots of things.* Machine learning can help:
- Notice what’s an anomaly.
- Group* together things that seem to be experiencing similar anomalies.
- That can inform a BI-plus interface for a human to figure out what is happening.
Makes sense to me.
* The word “cluster” could have been used here in a couple of different ways, so I decided to avoid it altogether.
Finally, I’m hearing a variety of “smart ETL/data preparation” and “we recommend what columns you should join” stories. I don’t know how much machine learning there’s been in those to date, but it’s usually at least on the roadmap to make the systems (yet) smarter in the future. The end benefit is usually to facilitate BI.
2. Discussion of graph DBMS can get confusing. For example: Read more
|Categories: Business intelligence, Greenplum, Hadoop, Hortonworks, Log analysis, Neo Technology and Neo4j, Nutonian, Predictive modeling and advanced analytics, RDF and graphs, WibiData||3 Comments|
I’ve been talking some with the Neo Technology/Neo4j guys, including Emil Eifrem (CEO/cofounder), Johan Svensson (CTO/cofounder), and Philip Rathle (Senior Director of Products). Basics include:
- Neo Technology came up with Neo4j, open sourced it, and is building a company around the open source core product in the usual way.
- Neo4j is a graph DBMS.
- Neo4j is unlike some other graph DBMS in that:
- Neo4j is designed for OLTP (OnLine Transaction Processing), or at least as a general-purpose DBMS, rather than being focused on investigative analytics.
- To every node or edge managed by Neo4j you can associate an arbitrary collection of (name,value) pairs — i.e., what might be called a document.
Numbers and historical facts include:
- > 50 paying Neo4j customers.
- Estimated 1000s of production Neo4j users of open source version.*
- Estimated 1/3 of paying customers and free users using Neo4j as a “system of record”.
- >30,000 downloads/month, in some sense of “download”.
- 35 people in 6 countries, vs. 25 last December.
- $13 million in VC, most of it last October.
- Started in 2000 as the underpinnings for a content management system.
- A version of the technology in production in 2003.
- Neo4j first open-sourced in 2007.
- Big-name customers including Cisco, Adobe, and Deutsche Telekom.
- Pricing of either $6,000 or $24,000 per JVM per year for two different commercial versions.
|Categories: Market share and customer counts, Neo Technology and Neo4j, Open source, Pricing, RDF and graphs, Structured documents, Telecommunications||10 Comments|
This post is part of a series on managing and analyzing graph data. Posts to date include:
- Graph data model basics (this post)
- Relationship analytics definition
- Relationship analytics applications
- Analysis of large graphs
Interest in graph data models keeps increasing. But it’s tough to discuss them with any generality, because “graph data model” encompasses so many different things. Indeed, just as all data structures can be mapped to relational ones, it is also the case that all data structures can be mapped to graphs.
Formally, a graph is a collection of (node, edge, node) triples. In the simplest case, the edge has no properties other than existence or maybe direction, and the triple can be reduced to a (node, node) pair, unordered or ordered as the case may be. It is common, however, for edges to encapsulate additional properties, the canonical examples of which are:
- Weight. Usually, the intuition here is that the weight is a number indicating the strength of the connection. This is generally derived from more basic data.
- Kind. The edge can encapsulate one or more descriptors indicating the kind of relationship between the nodes.
Many of the graph examples I can think of fit into four groups: Read more