Complex event processing (CEP)

Discussion of complex event processing (CEP), aka event processing or stream processing – i.e., of technology that executes queries before data is ever stored on disk. Related subjects include:

February 2, 2014

Spark and Databricks

I’ve heard a lot of buzz recently around Spark. So I caught up with Ion Stoica and Mike Franklin for a call. Let me start by acknowledging some sources of confusion.

The “What is Spark?” question may soon be just as difficult as the ever-popular “What is Hadoop?” That said — and referring back to my original technical post about Spark and also to a discussion of prominent Spark user ClearStory — my try at “What is Spark?” goes something like this:

Read more

August 25, 2013

Cloudera Hadoop strategy and usage notes

When we scheduled a call to talk about Sentry, Cloudera’s Charles Zedlewski and I found time to discuss other stuff as well. One interesting part of our discussion was around the processing “frameworks” Cloudera sees as most important.

HBase was artificially omitted from this “frameworks” discussion because Cloudera sees it as a little bit more of a “storage” system than a processing one.

Another good subject was offloading work to Hadoop, in a couple different senses of “offload”: Read more

February 13, 2013

It’s hard to make data easy to analyze

It’s hard to make data easy to analyze. While everybody seems to realize this — a few marketeers perhaps aside — some remarks might be useful even so.

Many different technologies purport to make data easy, or easier, to an analyze; so many, in fact, that cataloguing them all is forbiddingly hard. Major claims, and some technologies that make them, include:

*Complex event/stream processing terminology is always problematic.

My thoughts on all this start:  Read more

November 5, 2012

Real-time confusion

I recently proposed a 2×2 matrix of BI use cases:

Let me now introduce another 2×2 matrix of analytic scenarios:

My point is that there are at least three different cool things people might think about when they want their analytics to be very fast:

There’s also one slightly boring one that however drives a lot of important applications: Read more

August 20, 2012

In-memory, (hybrid) memory-centric DBMS — three analytic glossary draft entries

These are three closely-related draft entries for the DBMS2 analytic glossary. Please comment with any ideas you have for their improvement!

1. We coined the term memory-centric data management to comprise several kinds of technology that manage data in RAM (Random Access Memory), including:

Related link

2. An in-memory DBMS is a DBMS designed under the assumption that substantially all database operations will be performed in RAM (Random Access Memory). Thus, in-memory DBMS form a subcategory of memory-centric data management systems.

Ways in which in-memory DBMS are commonly different from those that query and update persistent storage include: Read more

July 15, 2012

Memory-centric data management when locality matters

Ron Pressler of Parallel Universe/SpaceBase pinged me about a data grid product he was open sourcing, called Galaxy. The idea is that a distributed RAM grid will allocate data, not randomly or via consistent hashing, but rather via a locality-sensitive approach. Notes include:

The whole thing is discussed in considerable detail in a blog post and a especially in a Hacker News comment thread. There’s also an error-riddled TechCrunch article. Read more

April 7, 2012

Many kinds of memory-centric data management

I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:

Getting more specific than that is hard, however, because:

Consider, for example, some of the in-memory data management ideas kicking around. Read more

November 28, 2011

Terminology: Data mustering

I find myself in need of a word or phrase that means bring data together from various sources so that it’s ready to be used, where the use can be analysis or operations. The first words I thought of were “aggregation” and “collection,” but they both have other meanings in IT. Even “data marshalling” has a specific meaning different from what I want. So instead, I’ll go with data mustering.

I mean for the term “data mustering” to encompass at least three scenarios:

Let me explain what I mean by each.  Read more

November 10, 2011

StreamBase LiveView — push-based real-time BI

My clients at StreamBase are coming out with a new product line called LiveView, and I agreed they could launch it via this blog. Key points about StreamBase LiveView Version 1.0 include:

The basic StreamBase LiveView pipeline goes something like:   Read more

November 10, 2011

StreamBase catchup

While I was cryptic in my general CEP/streaming catchup, I’ll say a bit more regarding StreamBase in particular. At the highest level, non-technically:

Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.