Parallelization

Analysis of issues in parallel computing, especially parallelized database management. Related subjects include:

June 14, 2017

Cloudera Altus

I talked with Cloudera before the recent release of Altus. In simplest terms, Cloudera’s cloud strategy aspires to:

In other words, Cloudera is porting its software to an important new platform.* And this port isn’t complete yet, in that Altus is geared only for certain workloads. Specifically, Altus is focused on “data pipelines”, aka data transformation, aka “data processing”, aka new-age ETL (Extract/Transform/Load). (Other kinds of workload are on the roadmap, including several different styles of Impala use.) So what about that is particularly interesting? Well, let’s drill down.

*Or, if you prefer, improving on early versions of the port.

Read more

July 31, 2016

Notes on Spark and Databricks — generalities

I visited Databricks in early July to chat with Ion Stoica and Reynold Xin. Spark also comes up in a large fraction of the conversations I have. So let’s do some catch-up on Databricks and Spark. In a nutshell:

I shall explain below. I also am posting separately about Spark evolution, especially Spark 2.0. I’ll also talk a bit in that post about Databricks’ proprietary/closed-source technology.

Spark is the replacement for Hadoop MapReduce.

This point is so obvious that I don’t know what to say in its support. The trend is happening, as originally decreed by Cloudera (and me), among others. People are rightly fed up with the limitations of MapReduce, and — niches perhaps aside — there are no serious alternatives other than Spark.

The greatest use for Spark seems to be the same as the canonical first use for MapReduce: data transformation. Also in line with the Spark/MapReduce analogy:  Read more

December 10, 2015

Readings in Database Systems

Mike Stonebraker and Larry Ellison have numerous things in common. If nothing else:

I mention the latter because there’s a new edition of Readings in Database Systems, aka the Red Book, available online, courtesy of Mike, Joe Hellerstein and Peter Bailis. Besides the recommended-reading academic papers themselves, there are 12 survey articles by the editors, and an occasional response where, for example, editors disagree. Whether or not one chooses to tackle the papers themselves — and I in fact have not dived into them — the commentary is of great interest.

But I would not take every word as the gospel truth, especially when academics describe what they see as commercial market realities. In particular, as per my quip in the first paragraph, the data warehouse market has not yet gone to the extremes that Mike suggests,* if indeed it ever will. And while Joe is close to correct when he says that the company Essbase was acquired by Oracle, what actually happened is that Arbor Software, which made Essbase, merged with Hyperion Software, and the latter was eventually indeed bought by the giant of Redwood Shores.**

*When it comes to data warehouse market assessment, Mike seems to often be ahead of the trend.

**Let me interrupt my tweaking of very smart people to confess that my own commentary on the Oracle/Hyperion deal was not, in retrospect, especially prescient.

Mike pretty much opened the discussion with a blistering attack against hierarchical data models such as JSON or XML. To a first approximation, his views might be summarized as:  Read more

November 25, 2015

Splunk engages in stupid lawyer tricks

Using legal threats as an extension of your marketing is a bad idea. At least, it’s a bad idea in the United States, where such tactics are unlikely to succeed, and are apt to backfire instead. Splunk seems to actually have had some limited success intimidating Sumo Logic. But it tried something similar against Rocana, and I was set up to potentially be collateral damage. I don’t think that’s working out very well for Splunk.

Specifically, Splunk sent a lawyer letter to Rocana, complaining about a couple of pieces of Rocana marketing collateral. Rocana responded publicly, and posted both the Splunk letter and Rocana’s lawyer response. The Rocana letter eviscerated Splunk’s lawyers on matters of law, clobbered them on the facts as well, exposed Splunk’s similar behavior in the past, and threw in a bit of snark at the end.

Now I’ll pile on too. In particular, I’ll note that, while Splunk wants to impose a duty of strict accuracy upon those it disagrees with, it has fewer compunctions about knowingly communicating falsehoods itself.

1. Splunk’s letter insinuates that Rocana might have paid me to say what I blogged about them. Those insinuations are of course false.

Splunk was my client for a lot longer, and at a higher level of annual retainer, than Rocana so far has been. Splunk never made similar claims about my posts about them. Indeed, Splunk complained that I did not write about them often or favorably enough, and on at least one occasion seemed to delay renewing my services for that reason.

2. Similarly, Splunk’s letter makes insinuations about quotes I gave Rocana. But I also gave at least one quote to Splunk when they were my client. As part of the process — and as is often needed — I had a frank and open discussion with them about my quote policies. So Splunk should know that their insinuations are incorrect.

3. Splunk’s letter actually included the sentences  Read more

October 26, 2015

Differentiation in data management

In the previous post I broke product differentiation into 6-8 overlapping categories, which may be abbreviated as:

and sometimes also issues in adoption and administration.

Now let’s use this framework to examine two market categories I cover — data management and, in separate post, business intelligence.

Applying this taxonomy to data management:
Read more

October 15, 2015

Cassandra and privacy requirements

For starters:

But when I made that connection and checked in accordingly with my client Patrick McFadin at DataStax, I discovered that I’d been a little confused about how multi-data-center Cassandra works. The basic idea holds water, but the details are not quite what I was envisioning.

The story starts:

In particular, a remote replication factor for Cassandra can = 0. When that happens, then you have data sitting in one geographical location that is absent from another geographical location; i.e., you can be in compliance with laws forbidding the export of certain data. To be clear (and this contradicts what I previously believed and hence also implied in this blog):

Read more

October 15, 2015

Basho and Riak

Basho was on my (very short) blacklist of companies with whom I refuse to speak, because they have lied about the contents of previous conversations. But Tony Falco et al. are long gone from the company. So when Basho’s new management team reached out, I took the meeting.

For starters:

Basho’s product line has gotten a bit confusing, but as best I understand things the story is:

Technical notes on some of that include:  Read more

October 15, 2015

Couchbase 4.0 and related subjects

I last wrote about Couchbase in November, 2012, around the time of Couchbase 2.0. One of the many new features I mentioned then was secondary indexing. Ravi Mayuram just checked in to tell me about Couchbase 4.0. One of the important new features he mentioned was what I think he said was Couchbase’s “first version” of secondary indexing. Obviously, I’m confused.

Now that you’re duly warned, let me remind you of aspects of Couchbase timeline.

Technical notes on Couchbase 4.0 — and related riffs 🙂 — start: Read more

June 10, 2015

Hadoop generalities

Occasionally I talk with an astute reporter — there are still a few left 🙂 — and get led toward angles I hadn’t considered before, or at least hadn’t written up. A blog post may then ensue. This is one such post.

There is a group of questions going around that includes:

To a first approximation, my responses are:  Read more

June 8, 2015

Teradata will support Presto

At the highest level:

Now let’s make that all a little more precise.

Regarding Presto (and I got most of this from Teradata)::

Daniel Abadi said that Presto satisfies what he sees as some core architectural requirements for a modern parallel analytic RDBMS project:  Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.