December 7, 2009

A framework for thinking about data warehouse growth

There are only three ways that the amount of data stored in data warehouses can grow:

Read more

December 2, 2009

Webinar on MapReduce for complex analytics (Thursday, December 3, 10 am and 2 pm Eastern)

The second in my two-webinar series for Aster Data will occur tomorrow, twice (both live), at 10 am and 2 pm Eastern time. The other presenters will be Jonathan Goldman, who was a Principal Scientist at LinkedIn but now has joined Aster himself, and Steve Wooledge of Aster (playing host). Key links are:

The main subjects of the webinar will be:

Arguably, aspects of data transformation fit into each of those three categories, which may help explain why data transformation has been so prominent among the early applications of MapReduce.

As you can see from Aster’s title for the webinar (which they picked while I was on vacation), at least their portion will be focused on customer analytics, e.g. web analytics.

November 25, 2009

New England Database Summit (January 28, 2010)

New England Database Day has now, in its third year, become a “Summit.”  It’s a nice event, providing an opportunity for academics and business folks to mingle.  The organizers are basically the local branch of the Mike Stonebraker research tree, with this year’s programming head being Daniel Abadi. It will be on Thursday, January 28, 2010, once again in the Stata Center at MIT. It would be reasonable to park in the venerable 4/5 Cambridge Center parking lot, especially if you’d like to eat at Legal Seafood afterwards.

So far there are two confirmed speakers — Raghu Ramakrishnan of Yahoo and me.  My talk title will be something like “Database and analytic technology: The state of the union”, with all wordplay intended.

There’s more information at the official New England Database Summit website. There’s also a post with similar information on Daniel Abadi’s DBMS Musings blog.

Edit after the event:

Posts based on my January, 2010 New England Database Summit keynote address

November 23, 2009

Comments on a fabricated press release quote

My clients at Kickfire put out a press release last week quoting me as saying things I neither said nor believe. The press release is about a “Queen For A Day” kind of contest announced way back in April, in which users were invited to submit stories of their data warehouse problems, with the biggest sob stories winning free Kickfire appliances. The fabricated “quote” reads: Read more

November 23, 2009

Boston Big Data Summit keynote outline

Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.

Read more

November 7, 2009

Calpont’s InfiniDB

Since its inception, Calpont has gone through multiple management teams, strategies, and investor groups. What it hadn’t done, ever, is actually shipped a product. Last week, however, Calpont introduced a free/open source DBMS, InfiniDB, with technical details somewhat reminiscent of what Calpont was promising last April. Highlights include:

Being on vacation, I’ll stop there for now. (If it weren’t for Tropical Storm/ depression Ida, I might not even be posting this much until I get back.)

October 30, 2009

Aster Data 4.0 and the evolution of “advanced analytic(s) servers”

Since Linda and I are leaving on vacation in a few hours, Aster Data graciously gave me permission to morph its “12:01 am Monday, November 2” embargo into “late Friday night.”

Aster Data is officially announcing the 4.0 release of nCluster. There are two big pieces to this announcement:

In addition, Aster has matured nCluster in various ways, for example cleaning up a performance problem with single-row updates.

Highlights of the Aster “Data-Application Server” story include: Read more

October 30, 2009

A question on MDX performance

An enterprise user wrote in with a question that boils down to:

What are reasonable MDX performance expectations?

MDX doesn’t come up in my life very much, and I don’t have much intuition about it. E.g., I don’t know whether one can slap an MDX-to-SQL converter on top of a fast analytic RDBMS and go to town. What’s more, I’m heading off on vacation and don’t feel like researching the matter myself in the immediate future. 🙂

So here’s the long form of the question. Any thoughts?

I have a general question on assessing the performance of an OLAP technology using a set of MDX queries. I would be interested to know if there are any benchmark MDX performance tests/results comparing different OLAP technologies (which may be based on different underlying DBMS’s if appropriate) on similar hardware setup, or even comparisons of complete appliance solutions. More generally, I want to determine what performance limits I could reasonably expect on what I think are fairly standard servers.

In my own work, I have set up a star schema model centered on a Fact table of 100 million rows (approx 60 columns), with dimensions ranging in cardinality from 5 to 10,000. In ad hoc analytics, is it expected that any query against such a dataset should return a result within a minute or two (i.e. before a user gets impatient), regardless of whether that query returns 100 cells or 50,000 cells (without relying on any aggregate table or caching mechanism)? Or is that level of performance only expected with a high end massively parallel software/hardware solution? The server specs I’m testing with are: 32-bit 4 core, 4GB RAM, 7.2k RPM SATA drive, running Windows Server 2003; 64-bit 8 core, 32GB RAM, 3 Gb/s SAS drive, running Windows Server 2003 (x64).

I realise that caching of query results and pre-aggregation mechanisms can significantly improve performance, but I’m coming from the viewpoint that in purely exploratory analytics, it is not possible to have all combinations of dimensions calculated in advance, in addition to being maintained.

October 27, 2009

Teradata’s nebulous cloud strategy

As the pun goes, Teradata’s cloud strategy is – well, it’s somewhat nebulous. More precisely, for the foreseeable future, Teradata’s cloud strategy is a collection of rather disjointed parts, including:

Teradata openly admits that its direction is heavily influenced by Oliver Ratzesberger at eBay. Like Teradata, Oliver and eBay favor virtual data marts over physical ones. That is, Oliver and eBay believe that the ideal scenario is that every piece of data is only stored once, in an integrated Teradata warehouse. But eBay believes and Teradata increasingly agrees that users need a great deal of control over their use of this data, including the ability to import additional data into private sandboxes, and join it to the warehouse data already there. Read more

October 25, 2009

Teradata hardware strategy and tactics

In my opinion, the most important takeaways about Teradata’s hardware strategy from the Teradata Partners conference last week are:

In addition, some non-SSD componentry tidbits from Carson Schmidt include:

Let’s get back now to SSDs, because over the next few years they’re the potential game-changer. Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.