July 6, 2011

Hadapt update

I met with the Hadapt guys today.  I think I can be a bit crisper than before in positioning Hadapt and its use cases, namely:

Other evolution from what I wrote about Hadapt a few months ago includes:

In other news, Hadapt is our newest client.

July 6, 2011

Petabyte-scale Hadoop clusters (dozens of them)

I recently learned that there are 7 Vertica clusters with a petabyte (or more) each of user data. So I asked around about other petabyte-scale clusters. It turns out that there are several dozen such clusters (at least) running Hadoop.

Cloudera can identify 22 CDH (Cloudera Distribution [of] Hadoop) clusters holding one petabyte or more of user data each, at 16 different organizations. This does not count Facebook or Yahoo, who are huge Hadoop users but not, I gather, running CDH. Meanwhile, Eric Baldeschwieler of Hortonworks tells me that Yahoo’s latest stated figures are:

Read more

July 6, 2011

Hadoop hardware and compression

A month ago, I posted about typical Hadoop hardware. After talking today with Eric Baldeschwieler of Hortonworks, I have an update. I also learned some things from Eric and from Brian Christian of Zettaset about Hadoop compression.

First the compression part. Eric thinks 6-10X compression is common for “curated” Hadoop data — i.e., the data that actually gets used a lot. Brian used an overall figure of 6-8X, and told of a specific customer who had 6X or a little more. By way of comparison, it sounds as if the kinds of data involved are like what Vertica claimed 10-60X compression for almost three years ago.

Eric also made an excellent point about low-value machine-generated data. I was suggesting that as Moore’s Law made sensor networks ever more affordable:  Read more

July 5, 2011

Eight kinds of analytic database (Part 2)

In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear.  Read more

July 5, 2011

Eight kinds of analytic database (Part 1)

Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.

Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning.  Read more

June 27, 2011

What colleges should teach in analytics

Based on a Teradata press release calling attention to the small amount of explicit university instruction in business intelligence, I was asked:

Does BI really need a dedicated undergrad track? What sort of BI and analytics-related skills should students look to obtain now in order to be viable in the job marketplace five years out?

My answers were (slightly edited):

Of course, there are more specialized skills also worth teaching, in a number of areas, starting with statistics and other predictive modeling technologies. But it’s OK to go through life not knowing those.

June 24, 2011

Observations on Oracle pricing

A couple of months ago, Oracle asked me to pull some observations on pricing until after the earnings call that just occurred, and I grudgingly acquiesced. In the interim, more information on Oracle pricing has emerged (including in the comment thread to that post). The original notes are:

Oracle disputes some common claims about its cost and pricing. In particular, Oracle software maintenance costs a fixed 22% of your annual license price, so if you get a discount on your licenses, it ripples through to your maintenance. This is true even if you have an all-you-can-eat ULA (Unlimited License Agreement).

June 24, 2011

Forthcoming Oracle appliances

Edit: I checked with Oracle, and it’s indeed TimesTen that’s supposed to be the basis of this new appliance, as per a comment below. That would be less cool, alas.

Oracle seems to have said on yesterday’s conference call Oracle OpenWorld (first week in October) will feature appliances based on Tangosol and Hadoop. As I post this, the Seeking Alpha transcript of Oracle’s call is riddled with typos. Bolded comments below are by me.  Read more

June 22, 2011

Citrusleaf RTA

Citrusleaf has released an add-on product called Citrusleaf RTA (Real-Time Attribution). It’s to be used when:

The metrics envisioned are:

A consistent relational schema is NOT assumed.

Citrusleaf’s solution is:

The downside is that when you do read 100 objects/records per person, you might need to do 100 seeks.

June 21, 2011

It’s official — the grand central EDW will never happen

I pointed out last year that the grand central enterprise data warehouse couldn’t happen; the post started:

An enterprise data warehouse should:

  • Manage data to high standards of accuracy, consistency, cleanliness, clarity, and security.
  • Manage all the data in your organization.

Pick ONE.

IBM’s main theme at the Enzee Universe conference has been to say the same thing.

Merv Adrian’s talk at the same conference made it clear that Gartner feels the same way, as does he personally. Indeed, like me, he’s racked up multiple decades of industry experience without ever finding a single theoretically ideal grand central EDW.

Forrester Research has been a little less clear on the point, but generally seems to be on the correct side of the issue as well.

If somebody is still saying that one central enterprise data warehouse can hold all the information or data you need on which to base your business decisions, they’re probably not somebody you should be listening to very hard.

Is that clear, or should I hammer home the point even harder? 😀

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.