Comments on: Data mining is driving much of data warehousing

By: Will Dwinnell

Will Dwinnell — Mon, 26 Feb 2007 10:36:55 +0000

The data I deal with comes from several sources. Much of it would have been recorded anyway: administrative items and transactions. Some of it comes from other sources, and is purchased largely for analytical purposes: credit bureau data, etc.

I’m not sure where you’re going with this, though, since the answer would be the same whether we used our current process, KXEN or any other analytical tool.

By: Greg

Greg — Thu, 21 Dec 2006 16:28:28 +0000

Late contribution, but I have to ask: is this from the technical journal of “Duh!” or “Dee dee dee!”

Why would anyone store such voluminous amounts of data, in such ridiculous formats, if they did not intend to mine the data store?

By: Curt Monash

Curt Monash — Wed, 22 Nov 2006 18:43:05 +0000

Will,

Yes, you’re right about KXEN. They’re trying to be classic “disruptors.” And KDD2006 — well, that was a conference for the putative disruptees. I don’t recall talking with or about KXEN there, but I have zero difficulty believing everything you’re suggesting about their reception there.

As for your core point — I see what you mean. But let me throw another set of questions back at you: Where did those several hundred candidate predictors come from? Are they ALL from transactional data that HAD to be recorded in the ordinary cost of business anyway? Or is there a cost to accumulating the info in the first place?

Thanks for the good discussion,

CAM

By: Will Dwinnell

Will Dwinnell — Wed, 22 Nov 2006 12:12:03 +0000

From my conversations with the folks at KXEN, I’d say your point ‘B’ is their real goal, although it is questionable to what extent this is possible. I can tell you that KXEN representatives who pushed point ‘B’ got a skeptical reception at KDD2006 (including from myself).

I work at an international bank where I construct statistical models of customer behavior. My last project involved 200,000 records with several hundred candidate predictors. The raw data was housed in a relational database and a local data warehouse, which was accessed via SQL and related qureying tools. I completed this project using my tool of choice, MATLAB, running on a Windows PC sporting an AMD Athlon64 FX-53 and 2GB RAM (I have recently upgraded to a Windows PC with an Intel Core 2 Extreme X6800 and 4GB RAM).

As to the cost issue, which was my original point: Neither PC I mentioned cost over US$3000 at the time of purchase (and that includes the monitor). I use MATLAB which is a little less than $2,000 new (less than $3,000 with typical analytical options), and much less than that (a few hundred bucks) for the annual subscription. The database would have been there anyway: incremental cost: $0. The data extraction software I wouldn’t count since any business analyst would have it anyway, but I admit that could be debated. The most expensive part of this process? Me. Hiring a qualified nerd to do this work is not cheap.

By: Curt Monash

Curt Monash — Thu, 09 Nov 2006 02:20:19 +0000

Will,

The issue isn’t so much whether the traditional tools and data sets of full-time data miners happen to fit on desktop machines in a certain enterprise. (Although I’m curious — what tools do you use, on what kinds of data sets, and what kind of warehouse/mart did they emerge from?)

Rather, KXEN is trying to do a few things:

A. Divide problems into more, smaller, simpler models.
B. Make data mining accessible to people who aren’t data mining experts.
C. Take out a lot of steps from the data mining process, such as variable reduction.

And Verix is trying to outsource the whole data mining task altogether.

By: Will Dwinnell

Will Dwinnell — Thu, 09 Nov 2006 01:24:00 +0000

“…to change the rules of data mining. (Verix currently runs on Oracle, by the way.) KXEN, in particular, would like data mining to be done in a lot more, but probably a lot smaller, processing runs than it is today.”

You have been reading too much literature from bloated companies selling over-priced tools. Data mining has been done on the desktop for years. I work as a data miner for an international bank and spend much more time on my (admittedly, beefy) PC than I do on the UNIX machine.

By: The Monash Report»Blog Archive » The problem with dashboards, and business intelligence segmented

Fri, 06 Oct 2006 01:03:06 +0000

[…] Deep analysis and decision support. Routine, scheduling reporting was covered in my first two categories. But this third one is where the bulk of ad hoc query and data mining fall. Generally, it’s where lots of specialized and/or calculation-intensive analytic technology comes into play. It’s also where the drilldown aspect of standard reporting shows up. Also, this is the area that is driving much of the recent transformation and disruption in the data warehouse market, because different kinds of BI need different kinds of data warehousing technology. […]

By: DBMS2 — DataBase Management System Services»Blog Archive » SAS Intelligence Storage

Wed, 04 Oct 2006 10:22:26 +0000

[…] It sounds as if the product is optimized for data mining and generic OLAP alike. Indeed, SAS Intelligence Storage is used to power both SAS’s data mining and other advanced analytics, and also its more conventional BI suite. • • • […]