May 6, 2014

Notes and comments, May 6, 2014

After visiting California recently, I made a flurry of posts, several of which generated considerable discussion.

Here is a catch-all post to complete the set.  Read more

April 30, 2014

Hardware and storage notes

My California trip last week focused mainly on software — duh! — but I had some interesting hardware/storage/architecture discussions as well, especially in the areas of:

I also got updated as to typical Hadoop hardware.

If systems are designed at the whole-rack level or higher, then there can be much more flexibility and efficiency in terms of mixing and connecting CPU, RAM and storage. The Google/Facebook/Amazon cool kids are widely understood to be following this approach, so others are naturally considering it as well. My most interesting of several mentions of that point was when I got the chance to talk with Berkeley computer architecture guru Dave Patterson, who’s working on plans for 100-petabyte/terabit-networking kinds of systems, for usage after 2020 or so. (If you’re interested, you might want to contact him; I’m sure he’d love more commercial sponsorship.)

One of Dave’s design assumptions is that Moore’s Law really will end soon (or at least greatly slow down), if by Moore’s Law you mean that every 18 months or so one can get twice as many transistors onto a chip of the same area and cost than one could before. However, while he thinks that applies to CPU and RAM, Dave thinks flash is an exception. I gathered that he thinks the power/heat reasons for Moore’s Law to end will be much harder to defeat than the other ones; note that flash, because of what it’s used for, has vastly less power running through it than CPU or RAM do.

Read more

April 30, 2014

The Intel investment in Cloudera

Intel recently made a huge investment in Cloudera, stated facts about which start:

Give or take stock preferences, etc., that’s around a $4.1 billion valuation post-money, but Cloudera does say it now has “most of $1 billion” in the bank.

Cloudera further told me when I visited last Friday that the majority of the Intel investment is net new money. (I presume that the rest of the round is net-new as well.) Hence, I conclude that previous investors sold in the aggregate less than 10% of total holdings to Intel. While I’m pretty sure Mike Olson is buying himself a couple of nice toys, in most respects it’s business-as-usual at Cloudera, with the same investors, directors and managers they had before. By way of contrast, many of the “cashing-out” rumors going around are OBVIOUSLY absurd, unless you think Intel acquired a much larger fraction of Cloudera than it actually did.

That said, Intel spent a lot of money, and in connection with the investment there’s a tight Cloudera/Intel partnership. In particular, Read more

June 23, 2013

Hadoop news and rumors, June 23, 2013

Cloudera

*Of course, there will always be exceptions. E.g., some formats can be updated on a short-request basis, while others can only be written to via batch conversions.

Everybody else

February 27, 2013

Hadoop distributions

Elephants! Elephants!
One elephant went out to play
Sat on a spider’s web one day.
They had such enormous fun
Called for another elephant to come.

Elephants! Elephants!
Two elephants went out to play
Sat on a spider’s web one day.
They had such enormous fun
Called for another elephant to come.

Elephants! Elephants!
Three elephants went out to play
Etc.

—  Popular children’s song

It’s Strata week, with much Hadoop news, some of which I’ve been briefed on and some of which I haven’t. Rather than delve into fine competitive details, let’s step back and consider some generalities. First, about Hadoop distributions and distro providers:

Most of the same observations could apply to Hadoop appliance vendors.

Read more

July 2, 2012

Catching up with Cray

Cray is a legendary name in supercomputing hardware. Cray CTO Bill Blake (Netezza’s early-rise VP Development) seem to be there in part because of Cray’s name and history. I’m now consulting to Cray largely because of Bill Blake, specifically to Cray subsidiary Yarcdata. Along the way, I’ve picked up enough about Cray in general — largely from Bill and from Cray president Pete Ungaro — to perhaps be worth splitting out as a separate post.

Cray business highlights include:

I haven’t sorted through all the details in Cray’s SEC filings, but huge government contracts play a big role, as do the associated revenue recognition delays.

At the highest level, Cray’s technical story looks like: Read more

February 28, 2011

Updating our vendor client disclosures

Edit: This disclosure has been superseded by a March, 2012 version.

From time to time, I disclose our vendor client lists. Another iteration is below. To be clear:

With that said, our vendor client disclosures at this time are:

Read more

November 7, 2007

Clarifying SAS-in-the-DBMS, and other SAS tidbits

I followed up with Keith Collins of SAS today about SAS-in-the-database, expanding on what I learned or thought I did when we talked last month. Here’s the scoop:

SAS users do a lot of data filtering, aka data preparation, in SAS. These have WHERE clauses, just like SQL. However, only some of them map to actual SQL WHERE clauses. SAS is now implementing many of the rest as UDFs (User-Defined Functions), one DBMS at a time, starting with Teradata. In addition, SAS users can write custom filters that get registered as UDFs. This capability will be released with SAS 9.2. (The timing on SAS 9.2 is in line with the comment thread to my prior post on SAS-in-the-DBMS.) Read more

June 28, 2006

Good DATallegro/Intel white paper

I really like this short white paper, which carries the personal byline of Stuart Frost. Stuart is DATallegro’s CEO, and also the guy who does analyst relations for them (at least in my case). Part of it just does a concise job of spelling out some of the DATallegro story. But the rest is about the comparison between Intel’s new dual-core “Woodcrest” Xeons and their single-core predecessors. Not only does it give credible statistics, it gives understanding of the reasons behind them.

Read more

May 8, 2006

Memory-centric data management whitepaper

I have finally finished and uploaded the long-awaited white paper on memory-centric data management.

This is the project for which I origially coined the term “memory-centric data management,” after realizing that the prevalent “in-memory DBMS” creates all sorts of confusion about how and whether data persists on disk. The white paper clarifies and updates points I have been making about memory-centric data management since last summer. Sponsors included:

If there’s one area in my research I’m not 100% satisfied with, it may be the question of where the true hardware bottlenecks to memory-centric data management lie (it’s obvious that the bottleneck to disk-centric data management is random disk access). Is it processor interconnect (around 1 GB/sec)? Is it processor-to-cache connections (around 5 GB/sec)? My prior pronouncements, the main body of the white paper, and the Intel Q&A appendix to the white paper may actually have slightly different spins on these points.

And by the way — the current hard limit on RAM/board isn’t 2^64 bytes, but a “mere” 2^40. But don’t worry; it will be up to 2^48 long before anybody actually puts 256 gigabytes under the control of a single processor.

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.