August 6, 2012

Notes, links and comments August 6, 2012

I haven’t done a notes/link/comments post for a while. Time for a little catch-up.

1. MySQL now has a memcached integration story. I haven’t checked the details. The MySQL team is pretty hard to talk with, due to the heavy-handedness of Oracle’s analyst relations.

2. The Large Hadron Collider offers some serious numbers, including:

3. One application area we don’t talk about much for analytic technologies is education. However: Read more

August 6, 2012

People’s facility with statistics — extremely difficult to predict

My recent post on broadening the usefulness of statistics presupposed two things about the statistical sophistication of business intelligence tool users:

Let me now say a little more on the subject. My basic message is — people’s facility with statistics is extremely difficult to predict.

If you DO have to make a point estimate, however, you could do worse than just putting quotation marks around the last four words of that sentence …

Suppose we measure people’s statistical understanding on a 5-point scale:

  1. People who haven’t clue what a p-value is.
  2. People who think a p-value of .05 signifies a 95% chance of truth.
  3. People who know better than that, but who still think that “statistically significant” is pretty close to the same as “true”.
  4. People who know better yet, but aren’t fluent in using statistical techniques correctly.
  5. People who are fluent in statistics.

Just knowing somebody’s job description, can you confidently predict their ranking to within, say, +/- 1 point? I suggest you can’t. People differ wildly in general numeracy and in specific statistical knowledge.

Even our guesses about average knowledge may be off, not least because education is changing things. Read more

July 31, 2012

Integrating statistical analysis into business intelligence

Business intelligence tools have been around for two decades.* In that time, many people have had the idea of integrating statistical analysis into classical BI. Yet I can’t think of a single example that was more than a small, niche success.

*Or four decades, if you count predecessor technologies.

The first challenge, I think, lies in the paradigm. Three choices that come to mind are:

But the first of those approaches requires too much intelligence from the software, while the third requires too much numeracy from the users. So only the second option has a reasonable chance to work, and even that one may be hard to pull off unless vendors focus on one vertical market at a time.

The challenges in full automation start: Read more

July 30, 2012

DBMS2.com is back up!

After several hours of DBMS 2 being down, I put out a “We’re broken” note from another blog. Naturally, the next fix I tried seems to have worked. My joy in that far outweighs my embarrassment. 🙂 This kind of thing just happens once in a while when one has business-critical software that isn’t good at having a test-to-production staging cycle.

In case anybody ever runs into the same problems, the short form of the story is:

1. DBMS2.com came down due to a corrupted automatic upgrade of WordPress.

2. The fix was to do an automatic install of WordPress to a dummy domain, then copy over the files in the domain’s root and wp-includes directories.

3. The one file that needed to be copied back from the old installation was wp-configure. (Once it occurred to me to start reading from index.php, it took me about 1 minute to figure that out …)

 

July 28, 2012

Some Vertica 6 features

Vertica 6 was recently announced, and so it seemed like a good time to catch up on Vertica features. The main topics I want to address are:

Also:

In general, the main themes of Vertica 6 appear to be:

Let’s do the analytic functionality first. Notes on that include:

I’ll also take this opportunity to expand on something I wrote about a few vendors — including Vertica — at the end of my post on approximate query results. When I probed how customers of Vertica and other RDBMS-based analytic platform vendors used vendor-proprietary advanced analytic SQL and other analytic capabilities, answers included: Read more

July 25, 2012

SQL Server to MySQL migration — why?

Oracle wants you to help you migrate from Microsoft SQL Server to MySQL. I was asked for comment, and replied:

Am I missing anything?

July 25, 2012

Thoughts on the next releases of Oracle and Exadata

A reporter asked me to speculate about the next releases of Oracle and Exadata. He and I agreed:

My answers mixed together thoughts on what Oracle should and will emphasize (which aren’t the same thing but hopefully bear some relationship to each other ;)). They were (lightly edited):

July 25, 2012

The eternal bogosity of performance marketing

Chris Kanaracus uncovered a case of Oracle actually pulling an ad after having been found “guilty” of false advertising. The essence seems to be that Oracle claimed 20X hardware performance vs. IBM, based on a comparison done against 6 year old hardware running an earlier version of the Oracle DBMS. My quotes in the article were:

Another example of Oracle exaggeration was around the Exadata replacement of Teradata at Softbank. But the bogosity flows both ways. Netezza used to make a flat claim of 50X better performance than Oracle, while Vertica’s standard press release boilerplate long boasted

50x-1000x faster performance at 30% the cost of traditional solutions

Of course, reality is a lot more complicated. Even if you assume apples-to-apples comparisons in terms of hardware and software versions, performance comparisons can vary greatly depending upon queries, databases, or use cases. For example:

And so, vendor marketing claims about across-the-board performance should be viewed with the utmost of suspicion.

Related links

July 24, 2012

Notes on Datameer

In a short October, 2011 post about Datameer, I wrote:

Datameer is designed to let you do simple stuff on large amounts of data, where “large amounts of data” typically means data in Hadoop, and “simple stuff” includes basic versions of a spreadsheet, of BI, and of EtL (Extract/Transform/Load, without much in the way of T).

That’s all still mainly true, although with the recent Datameer 2.0:

In essence, Datameer has two positionings.

Read more

July 23, 2012

Hadoop YARN — beyond MapReduce

A lot of confusion seems to have built around the facts:

Here’s my best effort to make sense of all that, helped by a number of conversations with various Hadoop companies, but most importantly a chat Friday with Arun Murthy and other Hortonworks folks.

Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.