August 4, 2009

Please ping me if one of your comments doesn’t appear

I just found two comments that went to Akismet spam wrongly, one because the author (Marcin Zukowski) pinged me, and one because I searched my spam folder on “Netezza” and there it was.

If one of your comments doesn’t go up, please ping me, and also suggest a keyword I could search on to find it.

I’m sorry for any inconvenience!

Categories: About this blog

FlexStore and the rest of Vertica 3.5

Today, Vertica is announcing its 3.5 release, timed in line with a TDWI conference. Vertica 3.5 is scheduled to go into beta test in mid-August and be released to general availability in early October. Vertica 3.5 highlights include:

Vertica/MapReduce integration, which I’m covering in a separate post.
A new storage architecture called Vertica FlexStore, which seems to boil down essentially to three things:
- A sort of row/column hybridization — Vertica would probably prefer to call it something like a column clustering feature — that I’m also covering in a separate post.
- The beginnings of a multi-temperature capability, somewhat akin to Teradata Virtual Storage.
- Enhancements to Vertica’s WOS (Write-Optimized Store, the in-memory part of Vertica that first receives updates). I don’t understand WOS architecture well enough to write about that yet.
Load-balancing, to route queries evenly among Vertica nodes — probably just round-robin — rather than having them just be processed by whichever node happens to receive them.

Categories: Columnar database management, Data warehousing, Vertica Systems

11 Comments

August 4, 2009

PAX Analytica? Row- and column-stores begin to come together

Column-store proponents are prone to argue, in effect, that the only reason to implement an analytic DBMS with row-based storage is laziness. Their case generally runs along the lines:

Analytic queries commonly return only a fraction of all possible columns.
Only returning the columns needed
- Saves I/O
- Saves cache space
- Reduces processing
- Facilitates compression
Presumably all those row-based MPP vendors just went row-based because they had a fine row-based DBMS (usually but not always PostgreSQL) to build on.

Pushbacks to this argument from row-based vendors include:

Yes, but it’s harder to update a column store
Yes, but there are more steps to retrieving a bunch of columns than there are to retrieving the same information from row stores

Categories: Analytic technologies, Columnar database management, Data warehousing, Theory and architecture, VectorWise, Vertica Systems

11 Comments

August 4, 2009

Vertica’s version of MapReduce integration

I talked with Omer Trajman of Vertica Monday night about Vertica’s MapReduce integration, part of its Vertica 3.5 release. Highlights included:

By “integrating Vertica and MapReduce,” Vertica means “integrating Vertica and Hadoop.”
Vertica’s Hadoop integration is based on Cloudera’s DBInputFormat.
Omer called out for me several features of Vertica’s Hadoop integration that didn’t just come from Cloudera, namely:
- Cloudera’s DBInputFormat assumes the database runs on a single computer, or a single head node of an MPP system. Vertica’s technology, however, runs on peer parallel nodes with no head, and so Vertica adapted the DBInputFormat technology accordingly.
- Vertica lets you push down Map functions to the database. Omer reports a roughly even division among users and prospects between those who want to do this and ones who don’t.
- Vertica lets you do Reduce functions (or Map functions, if you don’t push them down to the database) on a separate cluster than you run the database software. Vertica asserts that its customers and prospects all want to do this. Right here is the big difference between Vertica’s MapReduce integration and Aster’s or Greenplum’s. (Aster would also say that Vertica’s weaker MapReduce/SQL programming integration is a big difference as well.)
- Indeed, Vertica lets you Reduce into a different DBMS than Vertica, if you choose.
- Vertica gives you flexibility on the size of the Map and Reduce clusters. Omer agreed with me when I said there were some limits on how fast one can add or subtract nodes in a Vertica grid, because there’s data redistribution involved. But one can add/change/delete Hadoop clusters extremely quickly.

Apparently, the use cases for Vertica/Hadoop integration to date lie in algorithmic trading and two kinds of web analytics. Specifically: Read more

Categories: Analytic technologies, Cloudera, Columnar database management, Data warehousing, Hadoop, Investment research and trading, MapReduce, Parallelization, Theory and architecture, VectorWise, Vertica Systems, Web analytics

5 Comments

August 4, 2009

VectorWise, Ingres, and MonetDB

I talked with Peter Boncz and Marcin Zukowski of VectorWise last Wednesday, but didn’t get around to writing about VectorWise immediately. Since then, VectorWise and its partner Ingres have gotten considerable coverage, especially from an enthusiastic Daniel Abadi. Basic facts that you may already know include:

VectorWise, the product, will be an open-source columnar analytic DBMS. (But that’s not quite true. Pending productization, it’s more accurate to call the VectorWise technology a row/column hybrid.)
VectorWise is due to be introduced in 2010. (Peter Boncz said that to me more clearly than I’ve seen in other coverage.)
VectorWise and Ingres have a deal in which Ingres will at least be the exclusive seller of the VectorWise technology, and hopefully will buy the whole company.
Notwithstanding that it was once named something like “MonetDB,” VectorWise actually is not the same thing as MonetDB, another open source columnar analytic DBMS from the same research group.
The MonetDB and VectorWise research groups consist in large part of academics in Holland, specifically at CWI (Centrum voor Wiskunde en Informatica). But Ingres has a research group working on the project too. (Right now there are about seven “highly experienced” people each on the VectorWise and Ingres sides, although at least the VectorWise folks aren’t all full-time. More are being added.)
Ingres and VectorWise haven’t agreed exactly how VectorWise and Ingres Classic will play together in the Ingres product line. (All of the obvious possibilities are still on the table.)
VectorWise is shared-everything, just as Ingres is. But plans — still tentative — are afoot to integrate VectorWise with MapReduce in Daniel Abadi’s HadoopDB project.

Categories: Actian and Ingres, Analytic technologies, Columnar database management, Data warehousing, Database compression, MonetDB, Open source, Theory and architecture, VectorWise

12 Comments

August 4, 2009

The Boston Globe had an article on VoltDB

The Boston Globe article has more detail than Vertica and VoltDB have ever OKed me to put out, and some business details they’ve never given me.

Categories: In-memory DBMS, Memory-centric data management, OLTP, Vertica Systems, VoltDB and H-Store

Teradata 13 focuses on advanced analytic performance

Last October I wrote about the Teradata 13 release of Teradata’s database management software. Teradata 13, which will be used across the various Teradata product lines, has now been announced for GCA (General Customer Availability)*. So far as I can tell, there were two main points of emphasis for Teradata 13:

Performance (of course, performance is a point of emphasis for almost any release of any analytic DBMS product), especially but not only in the areas of aggregates, ETL (Extract/Transform/Load), and UDFs.
UDFs (User Defined Functions), especially but not only in the areas of data mining and geospatial analysis.

To put it even more concisely, the focus of Teradata 13 is on advanced analytic performance, although there of course are some enhancements in simple query performance and in analytic functionality as well. Read more

Categories: Analytic technologies, Data types, Data warehouse appliances, Data warehousing, EAI, EII, ETL, ELT, ETLT, GIS and geospatial, Parallelization, SAS Institute, Teradata, Theory and architecture

6 Comments

July 30, 2009

“The Netezza price point”

Over the past couple of years, quite a few data warehouse appliance or DBMS vendors have talked to me directly in terms of “Netezza’s price point,” or some similar phrase. Some have indicated that they’re right around the Netezza price point, but think their products are superior to Netezza’s. Others have stressed the large gap between their price and Netezza’s. But one way or the other, “Netezza’s price” has been an industry metric.

One reason everybody talks about the “Netezza (list) price” is that it hasn’t been changing much, seemingly staying stable at $50-60K/terabyte for a long time. And thus Teradata’s 2550 and Oracle’s larger-disk Exadata configuration — both priced more or less in the same range — have clearly been price-competitive with Netezza since their respective introductions.

That just changed. Netezza is cutting its pricing to the $20K/terabyte range imminently, with further cuts to come. So where does that leave competitors?

The Teradata 1550 is in the Netezza price range (still a little below, actually).
Oracle basically has nothing price-competitive with Netezza.
Microsoft has stated it plans to introduce Madison below the old DATAllegro price points; conceivably, that could be competitive with Netezza’s new pricing, although I haven’t checked as to how much it now costs simply to buy a lot of SQL Server licenses (which presumably would be a Madison lower bound, and might except for hardware be the whole thing, since Microsoft likes to create large product bundles).
XtremeData just launched in the new Netezza price range.
Troubled Dataupia is hard to judge. While on the surface Dataupia’s prices sound very low, you can’t use a Dataupia box unless you also have a brand-name DBMS (license and hardware) alongside it. That obviously affects total cost significantly.
Kickfire seems unaffected, as it doesn’t and most likely won’t compete with Netezza (different database size ranges).
For the most part, software-only vendors are free to adapt or not as they choose. Hardware prices generally don’t need to be over $10K/terabyte, and in some cases could be a lot less. So the question is how far they’re willing to discount their software.

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Dataupia, Exadata, Kickfire, Oracle, Pricing, Teradata, XtremeData

14 Comments

July 30, 2009

Netezza’s worldwide show-and-tell

In this economy, conference attendance is way down. Accordingly, a number of vendors have reevaluated whether it makes sense to have a traditional big-bang user conference, or whether it might make more sense to do a tour, bringing their message to multiple geographical areas. Netezza has opted for the latter course, something I’ve been well aware of for two reasons:

Planning for the conferences and for Netezza’s product roll-out is of course coordinated, and product roll-out is something I advise my clients on.
Netezza engaged me to speak at six different versions of the event (i.e., America and Europe, but not the Far East). There’s still time to contribute suggestions about my talk here.

Apparently, I’ll be talking late morning each time. My dates are:

September 2, Boston
September 9, Washington, DC
September 15, Milan
September 17, London
September 24, San Francisco
September 29, Chicago

The brand name of the events is Enzee Universe. Locations, registration information, and other particulars may be found on the Enzee Universe website.

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza, Presentations

2 Comments

July 30, 2009

Netezza is changing its hardware architecture and slashing prices accordingly

Netezza is about to make its biggest product announcement in years. In particular:

Netezza is cutting prices to under $20K/terabyte of user data, with even lower numbers promised for the near future.
Netezza is replacing its PowerPC chips with Intel-based IBM blades.
There will be substantial changes in how data flows between the various parts of a Netezza node.
Netezza claims this will all produce an immediate 10-15X increase in price-performance, based on a 3X cut in price/terabyte and a 3-5X improvement in mixed workload performance. (Edit: Netezza now agrees that it shouldn’t have phrased things that way”.)

Allow me to explain. Read more

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza, Pricing, Theory and architecture

35 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Please ping me if one of your comments doesn’t appear

FlexStore and the rest of Vertica 3.5

PAX Analytica? Row- and column-stores begin to come together

Vertica’s version of MapReduce integration

VectorWise, Ingres, and MonetDB

The Boston Globe had an article on VoltDB

Teradata 13 focuses on advanced analytic performance

“The Netezza price point”

Netezza’s worldwide show-and-tell

Netezza is changing its hardware architecture and slashing prices accordingly

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin