This post is part of a series on managing and analyzing graph data. Posts to date include:
- Graph data model basics (this post)
- Relationship analytics definition
- Relationship analytics applications
- Analysis of large graphs
Interest in graph data models keeps increasing. But it’s tough to discuss them with any generality, because “graph data model” encompasses so many different things. Indeed, just as all data structures can be mapped to relational ones, it is also the case that all data structures can be mapped to graphs.
Formally, a graph is a collection of (node, edge, node) triples. In the simplest case, the edge has no properties other than existence or maybe direction, and the triple can be reduced to a (node, node) pair, unordered or ordered as the case may be. It is common, however, for edges to encapsulate additional properties, the canonical examples of which are:
- Weight. Usually, the intuition here is that the weight is a number indicating the strength of the connection. This is generally derived from more basic data.
- Kind. The edge can encapsulate one or more descriptors indicating the kind of relationship between the nodes.
Many of the graph examples I can think of fit into four groups:
- Networks of people, aka social networks. Three (overlapping) areas of particular importance are:
- One canonical example is influencer-finding in telecommunications customer bases. The nodes are subscribers; the edges are call details (raw or aggregated).
- Other examples may be found as subgraphs of our next category, namely …
- … people/places/things.
- This is the classic structure for anti-terrorism, law enforcement, or anti-fraud use cases.
- Nodes are people, buildings/addresses, cars, businesses, etc. , except that …
- … nodes can actually be ordered pairs (tangible thing, timestamp). After all, it’s more interesting if two people were, not just in the same place, but in the same place at the same time.
- Similarly, there are use cases in which various people have social network connections, and then also recommend products of some kind.
- Edges can carry information about the evident strength of the social network connection …
- … but also about apparent similarities in taste.
- Graphs of IT objects. Various sets of conceptual IT objects can be viewed as graphs. For example:
- I visited Workday recently. They refer to their Java object model as a “graph.”
- Neo Technology (the Neo4j guys) started out doing a content management system, and eventually decided that what they really wanted underneath it was a graph-oriented DBMS.
- Now one of Neo’s major application areas is MDM (Master Data Management). Edit: I’m not wholly persuaded by the way they use that term.
- Most dramatically, there’s Tim Berners-Lee’s “Semantic Web”, which is built on RDF, which models things as “a directed, labeled graph”. SPARQL, OWL and so on are in the mix as well. To date, the Semantic Web has been a lot of hot air, only without the hot aspect; still, it’s obviously influenced many people’s thinking about graphs.
- Edit: Please see Marie’s comment below for a rather major example I left out.
- Taxonomies, ontologies, and/or semantic networks.
- To a large extent this overlaps with my previous category …
- … but I’m particularly fond of the example of straightforward taxonomies of words, e.g. WordNet. The nodes are the words themselves, or more precisely word senses (i.e., specific meanings of a word); edges are typically chosen from a limited set of alternatives such as is_a, is_part_of, or entails.
- Finally, there are representations of physical graphs. Examples might include telecom networks, utility grids, or locations and routes for physical deliveries.
My main reason for reciting these diverse examples is to illustrate that, for any really interesting technical discussion, it is necessary to focus on a subset of the possible use cases.
This post is intended to start a short series. When the next one goes up — focusing on a particular set of use cases — this footer will be edited accordingly.