September 12, 2008

Tesco kerfuffle

Netezza evidently put out a press release bragging of a competitive replacement of Teradata at UK retailing giant Tesco. That press release cannot be now found on Netezza’s site, but it lives on elsewhere. Meanwhile, Teradata has put out a press release in which Tesco is quoted emphatically contradicting what it is quoted as saying in the Netezza press release. While I haven’t discussed this with Netezza, my guess is that somebody there got a little overenthusiastic in advance of their user conference next week and thought they’d gotten a permission they really hadn’t.

Beyond that, I’d note that the Netezza quote made reference to around 25 heavy analytical users, while the Teradata quote talked of 8000 people across more than 2000 suppliers.

Categories: Data warehouse appliances, Data warehousing, Memory-centric data management, Netezza, Oracle, Specific users, Teradata

2 Comments

September 8, 2008

The layered messaging marketing model as applied to Netezza

I just put up a post claiming that enterprise IT marketing arguments commonly boil down to one of two layered messaging templates. Let’s test how that claim applies to one of the most innovative technology companies of this decade: Netezza.

Categories: Netezza

2 Comments

September 6, 2008

SANs vs. DAS in MPP data warehousing

Generally speaking:

SANs (Storage Area Networks) are pulling ahead of DAS (Direct Attached Storage).
Much of the growth in storage is due to data warehousing.
MPP (Massively Parallel Processing) is pulling ahead of SMP (Symmetric MultiProcessing) for high-end data warehousing.
MPP architectures are commonly shared-nothing.
Shared-nothing entails DAS.

But if you think about it, those facts don’t exactly add up. Read more

Categories: Calpont, Parallelization, Storage, Vertica Systems

24 Comments

September 5, 2008

Dividing the data warehousing work among MPP nodes

I talk with lots of vendors of MPP data warehouse DBMS. I’ve now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives.

Categories: Aster Data, Calpont, Exasol, Greenplum, Parallelization, Theory and architecture, Vertica Systems

22 Comments

September 5, 2008

More on known MapReduce application areas

In surveying MapReduce applications to date, I said that they fell mainly into three overlapping categories:

Text tokenization, indexing, and search
Creation of other kinds of data structures (e.g., graphs)
Data mining and machine learning

and really should have included a fourth:

Data transformation

Nokia just released another MapReduce implementation, Disco, and its list of applications to date fits right into that template. The relevant quote is:

This far Disco has been succesfully used, for instance, in parsing and reformatting data, data clustering, probabilistic modelling, data mining, full-text indexing, and log analysis with hundreds of gigabytes of real-world data.

Categories: MapReduce

Three different implementations of MapReduce

So far as I can see, there are three implementations of MapReduce that matter for enterprise analytic use – Hadoop, Greenplum’s, and Aster Data’s.* Hadoop has of course been available for a while, and used for a number of different things, while Greenplum’s and Aster Data’s versions of MapReduce – both in late-stage beta – have far fewer users.

*Perhaps Nokia’s Disco or another implementation will at some point join the list.

Earlier this evening I posted some Mike Stonebraker criticisms of MapReduce. It turns out that they aren’t all accurate across all MapReduce implementations. So this seems like a good time for me to stop stalling and put up a few notes about specific features of different MapReduce implementations. Here goes. Read more

Categories: Aster Data, Greenplum, MapReduce

3 Comments

September 4, 2008

Mike Stonebraker’s counterarguments to MapReduce’s popularity

In response to recent posting I’ve done about MapReduce, Mike Stonebraker just got on the phone to give me his views. His core claim, more or less, is that anything you can do in MapReduce you could already do in a parallel database that complies with SQL-92 and/or has PostgreSQL underpinnnings. In particular, Mike says: Read more

Categories: Data warehousing, MapReduce, Michael Stonebraker, PostgreSQL

5 Comments

September 4, 2008

More data on data warehouse sizes and issues

I spoke today with Paul Barth and Randy Bean of consultancy NewVantage Partners. The core of NewVantage’s business seems to be helping large enterprises (especially financial services) with their data warehouse strategies. Takeaways — none of which should shock regular readers of DBMS2 — included:

Administrative cost and difficulty are often the single biggest issue in selecting analytic DBMS products.
Oracle hits a wall around 10 terabytes of user data. The one customer NewVantage can think of with an Oracle data warehouse over 10 terabytes is fleeing Oracle for Netezza.
NewVantage says that very specialized data warehouses on Oracle could conceivably be larger than that.
NewVantage does have a customer on DB2/UDB in the 30-40 terabyte range. That customer does a lot of careful tuning to make it work.
About 15% of NewVantage’s customers use Netezza. Few if any use newer analytic DBMS (but I got the sense more will soon). The rest rely on “traditional” DBMS, a group that includes Teradata.

Categories: Data warehousing, IBM and DB2, Netezza, Oracle

1 Comment

September 3, 2008

Head to head blog debate between EMC, NetApp, and HP

Chuck Hollis of EMC started a fierce debate with a blog post on how to measure effective storage capacity. Competitors from NetApp and HP responded in often sarcastic detail in the comment thread, Hollis shot back, and the volleying continued for quite a while.

I’m not a storage maven, and I don’t understand all the details of that stuff. If you’re like me in that regard, you may find the post worth skimming just to see what some of the choices, trade-offs, and complications are in designing and measuring storage systems. Stephen Foskett’s related post is also worth a look in that regard.

My recent foray into measuring disk storage pales by comparison.

Categories: Storage, Theory and architecture

3 Comments

September 2, 2008

Introduction to Aster Data and nCluster

I’ve been writing a lot about Greenplum since a recent visit. But on the same trip I met with Aster Data, and have talked with them further since. Let me now redress the balance and outline some highlights of the Aster Data story.

Categories: Analytic technologies, Aster Data, Data warehousing, Parallelization, Specific users

4 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in