Memory-centric data management

Analysis of technologies that manage data entirely or primarily in random-access memory (RAM). Related subjects include:

September 23, 2013

Thoughts on in-memory columnar add-ons

Oracle announced its in-memory columnar option Sunday. As usual, I wasn’t briefed; still, I have some observations. For starters:

I’d also add that Larry Ellison’s pitch “build columns to avoid all that index messiness” sounds like 80% bunk. The physical overhead should be at least as bad, and the main saving in administrative overhead should be that, in effect, you’re indexing ALL columns rather than picking and choosing.

Anyhow, this technology should be viewed as applying to traditional business transaction data, much more than to — for example — web interaction logs, or other machine-generated data. My thoughts around that distinction start:

Read more

September 8, 2013

Layering of database technology & DBMS with multiple DMLs

Two subjects in one post, because they were too hard to separate from each other

Any sufficiently complex software is developed in modules and subsystems. DBMS are no exception; the core trinity of parser, optimizer/planner, and execution engine merely starts the discussion. But increasingly, database technology is layered in a more fundamental way as well, to the extent that different parts of what would seem to be an integrated DBMS can sometimes be developed by separate vendors.

Major examples of this trend — where by “major” I mean “spanning a lot of different vendors or projects” — include:

Other examples on my mind include:

And there are several others I hope to blog about soon, e.g. current-day PostgreSQL.

In an overlapping trend, DBMS increasingly have multiple data manipulation APIs. Examples include:  Read more

August 25, 2013

Cloudera Hadoop strategy and usage notes

When we scheduled a call to talk about Sentry, Cloudera’s Charles Zedlewski and I found time to discuss other stuff as well. One interesting part of our discussion was around the processing “frameworks” Cloudera sees as most important.

HBase was artificially omitted from this “frameworks” discussion because Cloudera sees it as a little bit more of a “storage” system than a processing one.

Another good subject was offloading work to Hadoop, in a couple different senses of “offload”: Read more

August 17, 2013

Aerospike 3

My clients at Aerospike are coming out with their Version 3, and as several of my clients do, have encouraged me to front-run what otherwise would be the Monday embargo.

I encourage such behavior with arguments including:

Aerospike 2’s value proposition, let us recall, was:

… performance, consistent performance, and uninterrupted operations …

  • Aerospike’s consistent performance claims are along the lines of sub-millisecond latency, with 99.9% of responses being within 5 milliseconds, and even a node outage only borking performance for some 10s of milliseconds.
  • Uninterrupted operation is a core Aerospike design goal, and the company says that to date, no Aerospike production cluster has ever gone down.

The major support for such claims is Aerospike’s success in selling to the digital advertising market, which is probably second only to high-frequency trading in its low-latency demands. For example, Aerospike’s CMO Monica Pal sent along a link to what apparently is:

Read more

August 12, 2013

Things I keep needing to say

Some subjects just keep coming up. And so I keep saying things like:

Most generalizations about “Big Data” are false. “Big Data” is a horrific catch-all term, with many different meanings.

Most generalizations about Hadoop are false. Reasons include:

Hadoop won’t soon replace relational data warehouses, if indeed it ever does. SQL-on-Hadoop is still very immature. And you can’t replace data warehouses unless you have the power of SQL.

Note: SQL isn’t the only way to provide “the power of SQL”, but alternative approaches are just as immature.

Most generalizations about NoSQL are false. Different NoSQL products are … different. It’s not even accurate to say that all NoSQL systems lack SQL interfaces. (For example, SQL-on-Hadoop often includes SQL-on-HBase.)

Read more

July 20, 2013

The refactoring of everything

I’ll start with three observations:

As written, that’s probably pretty obvious. Even so, it’s easy to forget just how pervasive the refactoring is and is likely to be. Let’s survey some examples first, and then speculate about consequences. Read more

June 16, 2013

Webinar Wednesday, June 26, 1 pm EST — Real-Time Analytics

I’m doing a webinar Wednesday, June 26, at 1 pm EST/10 am PST called:

             Real-Time Analytics in the Real World

The sponsor is MemSQL, one of my numerous clients to have recently adopted some version of a “real-time analytics” positioning. The webinar sign-up form has an abstract that I reviewed and approved … albeit before I started actually outlining the talk. 😉

Our plan is:

*MemSQL is debuting pretty high in my rankings of content sponsors who are cool with vendor neutrality. I sent them a draft of my slides mentioning other tech vendors and not them, and they didn’t blink.

In other news, I’ll be in California over the next week. Mainly I’ll be visiting clients — and 2 non-clients and some family — 10:00 am through dinner, but I did set aside time to stop by GigaOm Structure on Wednesday. I have sniffles/cough/other stuff even before I go. So please don’t expect a lot of posts until I’ve returned, rested up a bit, and also prepared my webinar deck.

April 14, 2013

Introduction to Deep Information Sciences and DeepDB

I talked Friday with Deep Information Sciences, makers of DeepDB. Much like TokuDB — albeit with different technical strategies — DeepDB is a single-server DBMS in the form of a MySQL engine, whose technology is concentrated around writing indexes quickly. That said:

*For reasons that do not seem closely related to product reality, DeepDB is marketed as if it supports “unstructured” data today.

Other NewSQL DBMS seem “designed for big data and the cloud” to at least the same extent DeepDB is. However, if we’re interpreting “big data” to include multi-structured data support — well, only half or so of the NewSQL products and companies I know of share Deep’s interest in branching out. In particular:

Edit: MySQL has some sort of an optional NoSQL interface, and hence so presumably do MySQL-compatible TokuDB, GenieDB, Clustrix, and MemSQL.

Also, some of those products do not today have the transparent scale-out that Deep plans to offer in the future.

Read more

April 1, 2013

Some notes on new-era data management, March 31, 2013

Hmm. I probably should have broken this out as three posts rather than one after all. Sorry about that.

Performance confusion

Discussions of DBMS performance are always odd, for starters because:

But in NoSQL/NewSQL short-request processing performance claims seem particularly confused. Reasons include but are not limited to:

MongoDB and 10gen

I caught up with Ron Avnur at 10gen. Technical highlights included: Read more

March 26, 2013

Platfora at the time of first GA

Well-resourced Silicon Valley start-ups typically announce their existence multiple times. Company formation, angel funding, Series A funding, Series B funding, company launch, product beta, and product general availability may not be 7 different “news events”, but they’re apt to be at least 3-4. Platfora, no exception to this rule, is hitting general availability today, and in connection with that I learned a bit more about what they are up to.

In simplest terms, Platfora offers exploratory business intelligence against Hadoop-based data. As per last weekend’s post about exploratory BI, a key requirement is speed; and so far as I can tell, any technological innovation Platfora offers relates to the need for speed. Specifically, I drilled into Platfora’s performance architecture on the query processing side (and associated data movement); Platfora also brags of rendering 100s of 1000s of “marks” quickly in HTML5 visualizations, but I haven’t a clue as to whether that’s much of an accomplishment in itself.

Platfora’s marketing suggests it obviates the need for a data warehouse at all; for most enterprises, of course, that is a great exaggeration. But another dubious aspect of Platfora marketing actually serves to understate the product’s merits — Platfora claims to have an “in-memory” product, when what’s really the case is that Platfora’s memory-centric technology uses both RAM and disk to manage larger data marts than could reasonably be fit into RAM alone. Expanding on what I wrote about Platfora when it de-stealthedRead more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.