Data warehousing

Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:

May 22, 2012

Kognitio’s story today

I had dinner tonight with the Kognitio folks. So far as I can tell:

Kognitio believes that this story is appealing, especially to smaller venture-capital-backed companies, and backs that up with some frieNDA pipeline figures.

Between that success claim and SAP’s HANA figures, it seems that the idea of using an in-memory DBMS to accelerate analytics has legs. This makes sense, as the BI vendors — Qlik Tech excepted — don’t seem to be accomplishing much with their proprietary in-memory alternatives. But I’m not sure that Kognitio would be my first choice to fill that role. Rather, if I wanted to buy an unsuccessful analytic RDBMS to use as an in-memory accelerator, I might consider ParAccel, which is columnar, has an associated compression story, has always had a hybrid memory-centric flavor much as Kognitio has, and is well ahead of Kognitio in the analytic platform derby. That said, I’ll confess to not having talked with or heard much about ParAccel for a while, so I don’t know if they’ve been able maintain technical momentum any more than Kognitio has.

May 17, 2012

Thoughts on “data science”

Teradata is paying me to join a panel on “data science” in downtown Boston, Tuesday May 22, at 3:00 pm. A planning phone call led me to jot down a few notes on the subject, which I’m herewith adapting into a blog post.

For starters, I have some concerns about the concepts of data science and data scientist. Too often, the term “data scientist” is used to suggest that one person needs to have strong skills both in analytics and in data management. But in reality, splitting those roles makes perfect sense. Further:

The leader in raising these issues is probably Neil Raden.

But there’s one respect in which I think the term “data science” is highly appropriate. In conventional science, gathering data is just as much of an accomplishment as analyzing it. Indeed, most Nobel Prizes are given for experimental results. Similarly, if you’re doing data science, you should be thinking hard about how to corral ever more useful data. Techniques include but are not limited to:

 

May 3, 2012

Big Data hype?

A reporter wrote in to ask whether investor interest in “Big Data” was justified or hype. (More precisely, that’s how I reinterpreted his questions. :) ) His examples were Splunk’s IPO, Teradata’s stock price increase, and Birst’s financing. In a nutshell:

1. A great example of hype is that anybody is calling Birst a “Big Data” or “Big Data analytics” company. If anything, Birst is a “little data” analytics company that claims, as a differentiating feature, that it can handle ordinary-sized data sets as well. Read more

April 7, 2012

Many kinds of memory-centric data management

I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:

Getting more specific than that is hard, however, because:

Consider, for example, some of the in-memory data management ideas kicking around. Read more

April 5, 2012

Human real-time

I first became an analyst in 1981. And so I was around for the early days of the movement from batch to interactive computing, as exemplified by:

Of course, wherever there is interactive computing, there is a desire for interaction so fast that users don’t notice any wait time. Dan Fylstra, when he was pitching me the early windowing system VisiOn, characterized this as response so fast that the user didn’t tap his fingers waiting.* And so, with the move to any kind of interactive computing at all came a desire that the interaction be quick-response/low-latency. Read more

April 4, 2012

IBM DB2 10

Shortly before Tuesday’s launch of DB2 10, IBM’s Conor O’Mahony checked in for a relatively non-technical briefing.* More precisely, this is about DB2 for “distributed” systems, aka LUW (Linux/Unix/Windows); some of the features have already been in the mainframe version of DB2 for a while. IBM is graciously permitting me to post the associated DB2 10 announcement slide deck.

*I hope any errors in interpretation are minor.

Major aspects of DB2 10 include new or improved capabilities in the areas of:

Of course, there are various other enhancements too, including to security (fine-grained access control), Oracle compatibility, and DB2 pureScale. Everything except the pureScale part is also reflected in IBM InfoSphere Warehouse, which is a near-superset of DB2.*

*Also, the data ingest part isn’t in base DB2.

Read more

March 16, 2012

Juggling analytic databases

I’d like to survey a few related ideas:

Here goes. Read more

March 9, 2012

Hardware and components — lessons from Teradata

I love talking with Carson Schmidt, chief of Teradata’s hardware engineering (among other things), even if I don’t always understand the details of what he’s talking about. It had been way too long since our last chat, so I requested another one. We were joined by Keith Muller, who I presume is pictured here. Takeaways included:

Read more

February 26, 2012

SAP HANA today

SAP HANA has gotten much attention, mainly for its potential. I finally got briefed on HANA a few weeks ago. While we didn’t have time for all that much detail, it still might be interesting to talk about where SAP HANA stands today.

The HANA section of SAP’s website is a confusing and sometimes inaccurate mess. But an IBM whitepaper on SAP HANA gives some helpful background.

SAP HANA is positioned as an “appliance”. So far as I can tell, that really means it’s a software product for which there are a variety of emphatically-recommended hardware configurations — Intel-only, from what right now are eight usual-suspect hardware partners. Anyhow, the core of SAP HANA is an in-memory DBMS. Particulars include:

SAP says that the row-store part is based both on P*Time, an acquisition from Korea some time ago, and also on SAP’s own MaxDB. The IBM white paper mentions only the MaxDB aspect. (Edit: Actually, see the comment thread below.) Based on a variety of clues, I conjecture that this was an aspect of SAP HANA development that did not go entirely smoothly.

Other SAP HANA components include:  Read more

February 8, 2012

Comments on SAS

A reporter interviewed me via IM about how CIOs should view SAS Institute and its products. Naturally, I have edited my comments (lightly) into a blog post. They turned out to be clustered into three groups, as follows:

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.