June 23, 2013

Hadoop news and rumors, June 23, 2013


*Of course, there will always be exceptions. E.g., some formats can be updated on a short-request basis, while others can only be written to via batch conversions.

Everybody else

February 27, 2013

Hadoop distributions

Elephants! Elephants!
One elephant went out to play
Sat on a spider’s web one day.
They had such enormous fun
Called for another elephant to come.

Elephants! Elephants!
Two elephants went out to play
Sat on a spider’s web one day.
They had such enormous fun
Called for another elephant to come.

Elephants! Elephants!
Three elephants went out to play

–  Popular children’s song

It’s Strata week, with much Hadoop news, some of which I’ve been briefed on and some of which I haven’t. Rather than delve into fine competitive details, let’s step back and consider some generalities. First, about Hadoop distributions and distro providers:

Most of the same observations could apply to Hadoop appliance vendors.

Read more

July 2, 2012

Catching up with Cray

Cray is a legendary name in supercomputing hardware. Cray CTO Bill Blake (Netezza’s early-rise VP Development) seem to be there in part because of Cray’s name and history. I’m now consulting to Cray largely because of Bill Blake, specifically to Cray subsidiary Yarcdata. Along the way, I’ve picked up enough about Cray in general — largely from Bill and from Cray president Pete Ungaro — to perhaps be worth splitting out as a separate post.

Cray business highlights include:

I haven’t sorted through all the details in Cray’s SEC filings, but huge government contracts play a big role, as do the associated revenue recognition delays.

At the highest level, Cray’s technical story looks like: Read more

February 28, 2011

Updating our vendor client disclosures

Edit: This disclosure has been superseded by a March, 2012 version.

From time to time, I disclose our vendor client lists. Another iteration is below. To be clear:

With that said, our vendor client disclosures at this time are:

Read more

November 7, 2007

Clarifying SAS-in-the-DBMS, and other SAS tidbits

I followed up with Keith Collins of SAS today about SAS-in-the-database, expanding on what I learned or thought I did when we talked last month. Here’s the scoop:

SAS users do a lot of data filtering, aka data preparation, in SAS. These have WHERE clauses, just like SQL. However, only some of them map to actual SQL WHERE clauses. SAS is now implementing many of the rest as UDFs (User-Defined Functions), one DBMS at a time, starting with Teradata. In addition, SAS users can write custom filters that get registered as UDFs. This capability will be released with SAS 9.2. (The timing on SAS 9.2 is in line with the comment thread to my prior post on SAS-in-the-DBMS.) Read more

June 28, 2006

Good DATallegro/Intel white paper

I really like this short white paper, which carries the personal byline of Stuart Frost. Stuart is DATallegro’s CEO, and also the guy who does analyst relations for them (at least in my case). Part of it just does a concise job of spelling out some of the DATallegro story. But the rest is about the comparison between Intel’s new dual-core “Woodcrest” Xeons and their single-core predecessors. Not only does it give credible statistics, it gives understanding of the reasons behind them.

Read more

May 8, 2006

Memory-centric data management whitepaper

I have finally finished and uploaded the long-awaited white paper on memory-centric data management.

This is the project for which I origially coined the term “memory-centric data management,” after realizing that the prevalent “in-memory DBMS” creates all sorts of confusion about how and whether data persists on disk. The white paper clarifies and updates points I have been making about memory-centric data management since last summer. Sponsors included:

If there’s one area in my research I’m not 100% satisfied with, it may be the question of where the true hardware bottlenecks to memory-centric data management lie (it’s obvious that the bottleneck to disk-centric data management is random disk access). Is it processor interconnect (around 1 GB/sec)? Is it processor-to-cache connections (around 5 GB/sec)? My prior pronouncements, the main body of the white paper, and the Intel Q&A appendix to the white paper may actually have slightly different spins on these points.

And by the way — the current hard limit on RAM/board isn’t 2^64 bytes, but a “mere” 2^40. But don’t worry; it will be up to 2^48 long before anybody actually puts 256 gigabytes under the control of a single processor.

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.