RDF and graphs

Analysis of data management technology optimized for RDF-formatted and/or graph data.

July 2, 2012

Introduction to Yarcdata

Cray’s strategy these days seems to be:

At the moment, the main diversifications are:

The last of the three is what Cray subsidiary Yarcdata is all about. Read more

May 13, 2012

Notes on the analysis of large graphs

This post is part of a series on managing and analyzing graph data. Posts to date include:

My series on graph data management and analytics got knocked off-stride by our website difficulties. Still, I want to return to one interesting set of issues — analyzing large graphs, specifically ones that don’t fit comfortably into RAM on a single server. By no means do I have the subject figured out. But here are a few notes on the matter.

How big can a graph be? That of course depends on:

*Even if your graph has 10 billion nodes, those can be tokenized in 34 bits, so the main concern is edges. Edges can include weights, timestamps, and so on, but how many specifics do you really need? At some point you can surely rely on a pointer to full detail stored elsewhere.

The biggest graph-size estimates I’ve gotten are from my clients at Yarcdata, a division of Cray. (“Yarc” is “Cray” spelled backwards.) To my surprise, they suggested that graphs about people could have 1000s of edges per node, whether in:

Yarcdata further suggested that bioinformatics use cases could have node counts higher yet, characterizing Bio2RDF as one of the “smaller” ones at 22 billion nodes. In these cases, the nodes/edge average seems lower than in people-analysis graphs, but we’re still talking about 100s of billions of edges.

Recalling that relationship analytics boils down to finding paths and subgraphs, the naive relational approach to such tasks would be: Read more

May 7, 2012

Relationship analytics application notes

This post is part of a series on managing and analyzing graph data. Posts to date include:

In my recent post on graph data models, I cited various application categories for relationship analytics. For most applications, it’s hard to get a lot of details. Reasons include:

Even so, it’s fairly safe to say:

Read more

May 7, 2012

Terminology: Relationship analytics

This post is part of a series on managing and analyzing graph data. Posts to date include:

In late 2005, I encountered a company called Cogito that was using a graphical data manager to analyze relationships. They called this “relational analytics”, which I thought was a terrible name for something that they were trying to claim should NOT be done in a relational DBMS. On the spot, I coined relationship analytics as an alternative. A business relationship ensued, which included a short white paper. Cogito didn’t do so well, however, and for a while the term “relationship analytics” faltered too. But recently it’s made a bit of a comeback, having been adopted by Objectivity, Qlik Tech, Yarcdata and others.

“Relationship analytics” is not a perfect name, both because it’s longish and because it might over-connote a social-network focus. But then, no other term would be perfect either. So we might as well stick with it.

In that case, “relationship analytics” could use an actual definition, preferably one a little heftier than just:

Analytics on graphs.

Read more

May 4, 2012

Notes on graph data management

This post is part of a series on managing and analyzing graph data. Posts to date include:

Interest in graph data models keeps increasing. But it’s tough to discuss them with any generality, because “graph data model” encompasses so many different things. Indeed, just as all data structures can be mapped to relational ones, it is also the case that all data structures can be mapped to graphs.

Formally, a graph is a collection of (node, edge, node) triples. In the simplest case, the edge has no properties other than existence or maybe direction, and the triple can be reduced to a (node, node) pair, unordered or ordered as the case may be. It is common, however, for edges to encapsulate additional properties, the canonical examples of which are:

Many of the graph examples I can think of fit into four groups: Read more

April 4, 2012

IBM DB2 10

Shortly before Tuesday’s launch of DB2 10, IBM’s Conor O’Mahony checked in for a relatively non-technical briefing.* More precisely, this is about DB2 for “distributed” systems, aka LUW (Linux/Unix/Windows); some of the features have already been in the mainframe version of DB2 for a while. IBM is graciously permitting me to post the associated DB2 10 announcement slide deck.

*I hope any errors in interpretation are minor.

Major aspects of DB2 10 include new or improved capabilities in the areas of:

Of course, there are various other enhancements too, including to security (fine-grained access control), Oracle compatibility, and DB2 pureScale. Everything except the pureScale part is also reflected in IBM InfoSphere Warehouse, which is a near-superset of DB2.*

*Also, the data ingest part isn’t in base DB2.

Read more

September 8, 2011

Aster Data business trends

Last month, I reviewed with the Aster Data folks which markets they were targeting and selling into, subsequent to acquisition by their new orange overlords. The answers aren’t what they used to be. Aster no longer focuses much on what it used to call frontline (i.e., low-latency, operational) applications; those are of course a key strength for Teradata. Rather, Aster focuses on investigative analytics — they’ve long endorsed my use of the term — and on the batch run/scoring kinds of applications that inform operational systems.

Read more

June 20, 2011

Vertica as an analytic platform

Vertica 5.0 is coming out today, and delivering the down payment on Vertica’s analytic platform strategy. In Vertica lingo, there’s now a Vertica SDK (Software Development Kit), featuring Vertica UDT(F)s* (User-Defined Transform Functions). Vertica UDT syntax basics start:  Read more

June 19, 2011

Investigative analytics and derived data: Enzee Universe 2011 talk

I’ll be speaking Monday, June 20 at IBM Netezza’s Enzee Universe conference. Thus, as is my custom:

The talk concept started out as “advanced analytics” (as opposed to fast query, a subject amply covered in the rest of any Netezza event), as a lunch break in what is otherwise a detailed “best practices” session. So I suggested we constrain the subject by focusing on a specific application area — customer acquisition and retention, something of importance to almost any enterprise, and which exploits most areas of analytic technology. Then I actually prepared the slides — and guess what? The mix of subjects will be skewed somewhat more toward generalities than I first intended, specifically in the areas of investigative analytics and derived data. And, as always when I speak, I’ll try to raise consciousness about the issues of liberty and privacy, our options as a society for addressing them, and the crucial role we play as an industry in helping policymakers deal with these technologically-intense subjects.

Slide 3 refers back to a post I made last December, saying there are six useful things you can do with analytic technology:

Slide 4 observes that investigative analytics:

Slide 5 gives my simplest overview of investigative analytics technology to date:  Read more

April 5, 2011

Whither MarkLogic?

My clients at MarkLogic have a new CEO, Ken Bado, even though former CEO Dave Kellogg was quite successful. If you cut through all the happy talk and side issues, the reason for the change is surely that the board wants to see MarkLogic grow faster, and specifically to move beyond its traditional niches of publishing (especially technical publishing) and national intelligence.

So what other markets could MarkLogic pursue? Before Ken even started work, I sent over some thoughts. They included (but were not limited to):  Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.