RDF and graphs

Analysis of data management technology optimized for RDF-formatted and/or graph data.

December 12, 2014

Notes and links, December 12, 2014

1. A couple years ago I wrote skeptically about integrating predictive modeling and business intelligence. I’m less skeptical now.

For starters:

I’ve also heard a couple of ideas about how predictive modeling can support BI. One is via my client Omer Trajman, whose startup ScalingData is still semi-stealthy, but says they’re “working at the intersection of big data and IT operations”. The idea goes something like this:

Makes sense to me.

* The word “cluster” could have been used here in a couple of different ways, so I decided to avoid it altogether.

Finally, I’m hearing a variety of “smart ETL/data preparation” and “we recommend what columns you should join” stories. I don’t know how much machine learning there’s been in those to date, but it’s usually at least on the roadmap to make the systems (yet) smarter in the future. The end benefit is usually to facilitate BI.

2. Discussion of graph DBMS can get confusing. For example: Read more

February 2, 2014

Spark and Databricks

I’ve heard a lot of buzz recently around Spark. So I caught up with Ion Stoica and Mike Franklin for a call. Let me start by acknowledging some sources of confusion.

The “What is Spark?” question may soon be just as difficult as the ever-popular “What is Hadoop?” That said — and referring back to my original technical post about Spark and also to a discussion of prominent Spark user ClearStory — my try at “What is Spark?” goes something like this:

Read more

October 10, 2013

Aster 6, graph analytics, and BSP

Teradata Aster 6 has been preannounced (beta in Q4, general release in Q1 2014). The general architectural idea is:

There’s much more, of course, but those are the essential pieces.

Just to be clear: Teradata Aster 6, aka the Teradata Aster Discovery Platform, includes HDFS compatibility, native MapReduce and ways of invoking Hadoop MapReduce on non-Aster nodes or clusters — but even so, you can’t run Hadoop MapReduce within Aster over Aster’s version of HDFS.

The most dramatic immediate additions are in the graph analytics area.* The new SQL-Graph is supported by something called BSP (Bulk Synchronous Parallel). I’ll start by observing (and some of this is confusing):

Use cases suggested are a lot of marketing, plus anti-fraud.

*Pay no attention to Aster’s previous claims to do a good job on graph — and not only via nPath — in SQL-MR.

So far as I can infer from examples I’ve seen, the semantics of Teradata Aster SQL-Graph start:

Within those functions, the core idea is:  Read more

August 25, 2013

Cloudera Hadoop strategy and usage notes

When we scheduled a call to talk about Sentry, Cloudera’s Charles Zedlewski and I found time to discuss other stuff as well. One interesting part of our discussion was around the processing “frameworks” Cloudera sees as most important.

HBase was artificially omitted from this “frameworks” discussion because Cloudera sees it as a little bit more of a “storage” system than a processing one.

Another good subject was offloading work to Hadoop, in a couple different senses of “offload”: Read more

June 13, 2013

How is the surveillance data used?

Over the past week, discussion has exploded about US government surveillance. After summarizing, as best I could, what data the government appears to collect, now I ‘d like to consider what they actually do with it. More precisely, I’d like to focus on the data’s use(s) in combating US-soil terrorism. In a nutshell:

Consider the example of Tamerlan Tsarnaev:

In response to this 2011 request, the FBI checked U.S. government databases and other information to look for such things as derogatory telephone communications, possible use of online sites associated with the promotion of radical activity, associations with other persons of interest, travel history and plans, and education history.

While that response was unsuccessful in preventing a dramatic act of terrorism, at least they tried.

As for actual success stories — well, that’s a bit tough. In general, there are few known examples of terrorist plots being disrupted by law enforcement in the United States, except for fake plots engineered to draw terrorist-leaning individuals into committing actual crimes. One of those examples, that of Najibullah Zazi, was indeed based on an intercepted email — but the email address itself was uncovered through more ordinary anti-terrorism efforts.

As for machine learning/data mining/predictive modeling, I’ve never seen much of a hint of it being used in anti-terrorism efforts, whether in the news or in my own discussions inside the tech industry. And I think there’s a great reason for that — what would they use for a training set? Here’s what I mean.  Read more

November 19, 2012

Incremental MapReduce

My clients at Cloudant, Couchbase, and 10gen/MongoDB (Edit: See Alex Popescu’s comment below) all boast the feature incremental MapReduce. (And they’re not the only ones.) So I feel like making a quick post about it. For starters, I’ll quote myself about Cloudant:

The essence of Cloudant’s incremental MapReduce seems to be that data is selected only if it’s been updated since the last run. Obviously, this only works for MapReduce algorithms whose eventual output can be run on different subsets of the target data set, then aggregated in a simple way.

These implementations of incremental MapReduce are hacked together by teams vastly smaller than those working on Hadoop, and surely fall short of Hadoop in many areas such as performance, fault-tolerance, and language support. That’s a given. Still, if the jobs are short and simple, those deficiencies may be tolerable.

A StackOverflow thread about MongoDB’s version of incremental MapReduce highlights some of the implementation challenges.

But all practicality aside, let’s return to the point that incremental MapReduce only works for some kinds of MapReduce-based algorithms, and consider how much of a limitation that really is. Looking at the Map steps sheds a little light: Read more

August 6, 2012

Notes, links and comments August 6, 2012

I haven’t done a notes/link/comments post for a while. Time for a little catch-up.

1. MySQL now has a memcached integration story. I haven’t checked the details. The MySQL team is pretty hard to talk with, due to the heavy-handedness of Oracle’s analyst relations.

2. The Large Hadron Collider offers some serious numbers, including:

3. One application area we don’t talk about much for analytic technologies is education. However: Read more

July 15, 2012

Memory-centric data management when locality matters

Ron Pressler of Parallel Universe/SpaceBase pinged me about a data grid product he was open sourcing, called Galaxy. The idea is that a distributed RAM grid will allocate data, not randomly or via consistent hashing, but rather via a locality-sensitive approach. Notes include:

The whole thing is discussed in considerable detail in a blog post and a especially in a Hacker News comment thread. There’s also an error-riddled TechCrunch article. Read more

July 8, 2012

Database diversity revisited

From time to time, I try to step back and build a little taxonomy for the variety in database technology. One effort was 4 1/2 years ago, in a pre-planned exchange with Mike Stonebraker (his side, alas, has since been taken down). A year ago I spelled out eight kinds of analytic database.

The angle I’ll take this time is to say that every sufficiently large enterprise needs to be cognizant of at least 7 kinds of database challenge. General notes on that include:

The Big Seven database challenges that almost any enterprise faces are: Read more

July 5, 2012

Introduction to Neo Technology and Neo4j

I’ve been talking some with the Neo Technology/Neo4j guys, including Emil Eifrem (CEO/cofounder), Johan Svensson (CTO/cofounder), and Philip Rathle (Senior Director of Products). Basics include:

Numbers and historical facts include:

Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.