July 24, 2009

Not-so-great moments in planning

Categories: Analytic technologies, Fun stuff, Humor, Microsoft and SQL*Server

July 18, 2009

Netezza on concurrency and workload management

I visited Netezza Friday for what was mainly an NDA meeting. But while I was there I asked where Netezza stood on concurrency, workload management, and rapid data mart spin-out. Netezza’s claims in those regards turned out to be surprisingly strong.

In the biggest surprise, Netezza claimed at least one customer had >5,000 simultaneous users, and a second had >4,000. Both are household names. Other unspecified Netezza customers apparently also have >1,000 simultaneous users. Read more

Categories: Data warehouse appliances, Data warehousing, Netezza, Teradata, Theory and architecture, Workload management

13 Comments

July 16, 2009

Vertica customer notes

Dave Menninger of Vertica called to discuss NDA product futures, as vendors tend to do in the weeks before a TDWI conference. So we also talked a bit about the Vertica customer base. That’s listed as 86 at the end of Q2, up from 74 in Q1. That’s pretty small growth compared with Q1, which Dave didn’t fully explain. But then, off the top of his head, he was recalling Q1 numbers as being lower than that 74, so maybe there’s a reporting glitch in the loop somewhere.

Vertica’s two biggest customer segments are telecommunications and financial services, and Dave drew an interesting distinction between what the two groups care about. Telecom companies care about data warehouses that are big and 24/7 reliable, but don’t do particularly complex analytics. Financial services — by which he presumably means mainly proprietary traders — are most focused on complex and competitively innovative analytics.

Also mentioned in various contexts were web-based outfits such as data mart outsourcers, social networkers, and open-source software providers.

Vertica also offers customer win stories in other segments, but most actual discussion about what Vertica does revolves around the application areas mentioned above, just as it has been in the past.

Similar (not necessarily identical) generalizations would be true of many other analytic DBMS vendors.

Categories: Analytic technologies, Application areas, Data warehousing, Investment research and trading, Market share and customer counts, Telecommunications, Vertica Systems, Web analytics

17 Comments

July 15, 2009

Update on Microsoft’s Madison and Fast Track data warehouse products

I chatted with Stuart Frost of Microsoft yesterday. Stuart is and remains GM of Microsoft’s data warehouse product unit, covering about $1 billion or so of revenue. While rumors of Stuart’s departure from Microsoft are clearly exaggerated, it does seem that his role is more one of coordination than actual management.

Microsoft Madison availability remains scheduled for H1 2010. Nothing new there. Tangible progress includes a few customer commitments of various sorts, including one outright planned purchase (due to some internal customer considerations around using up a budget). At the moment various Microsoft Madison technology “previews” are going on, which seem to amount to proofs-of-concept, that:

Start with actual customer data (some from Microsoft, some from outside)
Generate larger synthesized data sets based on those (database size seems to be 10-100 TB)
Run in Microsoft data centers or “technology centers”, rather than on customer premises.

The basic Microsoft Madison product distribution strategy seems to be: Read more

Categories: Data warehouse appliances, Data warehousing, Microsoft and SQL*Server

7 Comments

July 11, 2009

Groovy Corp

Groovy Corp sent over a press release and apparently suggested I write about the company’s wonderfulness immediately. This was without any kind of briefing. I don’t do that kind of thing.

However, a Twitter check revealed that Tony Bain is familiar with Groovy Corp and the Groovy SQL Switch (apparently they started out in Australia, where he lives and works, and he evidently knows the guys). Tony’s take, in summary, is (emphasis mine):

They are an in memory RDBMS

They have worked with Intel to architect from the ground up for large multi processor concurrency

Initially they are launching as a multi-core appliance

They claim 200,000 sql operations per second from a single box

They are proprietary (not built on MySQL or any other open source database) which means they have had a lot of control around their architecture

They are a pretty cool company with some interesting people

There’s a little more detail at the above link.

Categories: DBMS product categories, Groovy Corporation, In-memory DBMS, Memory-centric data management, OLTP

3 Comments

July 9, 2009

Oracle cites Exadata wins

A couple of weeks ago, Oracle put out a press release about Exadata wins. Highlights include:

20 names of actual customers.
One quote citing a competitive win (over Netezza)
One quote citing a ~50X speedup of one query “without manual tuning”
One quote citing consistent 10-72X query performance speedups
One quote citing a speedup from “days” to “minutes”

Unless I missed it, none of the quotes implied Exadata was actually in production, and none compared hardware between the old/slow/production and Exadata/fast/test systems.

Categories: Data warehouse appliances, Data warehousing, Exadata, Market share and customer counts, Netezza, Oracle

Leave a Comment

July 8, 2009

While I’m venting about benchmarks

Late last year, Vertica made hoo-hah about what it called a world-record data warehouse load speed benchmark. I wrote at the time that this showed Vertica wasn’t painfully slow at loading, always a concern with column stores. But otherwise I mocked the idea that there was something useful to be learned from the whole exercise.

Well, guess what? In a throwaway line in a comment on Daniel Abadi’s blog, Barry Zane of ParAccel pointed out

we posted a load rate of almost 9TB/hour, which is, of course record breaking on its own

Quite right.

I hope the nonsense stops there, but I’m not optimistic …

Categories: Benchmarks and POCs, Columnar database management, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Vertica Systems

Leave a Comment

July 8, 2009

Progress in figuring out what ParAccel is doing

Barry Zane of ParAccel has — finally! — started a blog. Barrry’s first post, probably in connection with ParAccel’s recent TPC-H submission and subsequent brouhaha, consisted mainly of metaphor + very elementary and well-known arguments for column stores. Barry’s second post, however, was in direct response to Daniel Abadi’s speculation about ParAccel’s architecture. That post also promises a follow-up addressing the TPC-H in a more substantive way.

(Edit: As of October, 2010, those links have been redirected away from the original posts, which seem to have been taken down.)

Barry’s points include:

ParAccel never used the row-oriented Postgres execution engine. This is contrary to Daniel’s speculation.
ParAccel previously used an adaption of the Postgres cost-based optimizer, but now has written a new one from scratch.
ParAccel has designed its optimizer to handle lots and lots of joins. One reason Barry offers is that ParAccel wants to run customers’ old schemas unaltered, whether or not those are really optimal for the ParAccel DBMS. That approach is somewhat in contrast to Vertica, which originally focused entirely on star schemas. And it goes well with ParAccel’s interest in appealing to customers who at least think they want to run ParAccel in Oracle or SQL Server emulation mode.

Also in the post, Barry:

Makes an extremely silly marketing exaggeration by referring to ” the only other vendor that was able to run the 30TB TPC-H” (emphasis mine).
Makes the more excusable marketing exaggeration “Publishing the benchmark with unmatched performance is simply one way to demonstrate robustness and flexibility. Nothing more, nothing less.”
Makes the very clear marketing claim “For customers, the real test will be their own bake-offs, where our performance has never been beaten.” (Emphasis mine.) That last one directly contradicts what I’ve been told by at least two ParAccel competitors, so I’ll be curious to see what they come up with to substantiate their version of the story.

Anyhow, it’s great to see ParAccel retreating from its obsessive secrecy, which in my opinion has been even worse than Netezza’s used to be.

Categories: Columnar database management, Data warehousing, ParAccel

2 Comments

July 8, 2009

Infobright metrics

Merv Adrian posted about Infobright, and included some company-supplied metrics. Most looked familiar from a post I did in April, but Infobright’s latest figure for # of paying customers seems to be “>60”, up from “>50”. Pricing aside, that’s Vertica/Greenplum territory — behind Netezza, Teradata, and the big OLTP DBMS vendors, but ahead of everybody else I think of as a modern analytic DBMS vendor.

Categories: Data warehousing, Infobright, Market share and customer counts

Leave a Comment

July 7, 2009

Hasso Plattner calls for in-memory OLTP column stores

Former SAP CEO Hasso Plattner has written a paper called A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database, in association with a SIGMOD keynote address.* The approach Plattner advocates is an MPP in-memory column store, presumably somewhat akin to SAP’s frequently renamed Business Warehouse Accelerator/Business Intelligence Accelerator/BWA/BIA/Son-of-TREX technology. There also are strong similarities to the MPP in-memory row store project H-Store/VoltDB, although I don’t know whether Plattner would go so far as to adopt the H-Store view that all transactions should run in stored procedures. Unsurprisingly, SAP applications are used as the OLTP paradigm throughout.

*Thanks to Dave Kellogg for tipping me off to Plattner’s paper. I only went to two SIGMOD sessions, neither of which was Plattner’s. Nobody actually mentioned Plattner’s talk to me when I was down at SIGMOD.

Perhaps the most interesting part is Plattner’s claim that what’s demanding about OLTP isn’t database updating per se, but rather maintaining aggregates for quick-response analytics. In his main example of that point, Plattner proposes a real-life “more than 18” table schema, of which 2 are base tables, and (most of?) the rest are materialized views that his proposed database architecture dispenses with (because analytic performance is sufficiently good without them). Thus, Plattner’s core columnar argument seemingly is

columnar –> natively fast analytics –> no need to maintain aggregates –> much lower update burden.

That said — if Plattner’s paper contained a clear statement of how much more expensive it is to insert or update a single row in a columnar vs. row-based system, I overlooked it. Instead, Plattner seems to be arguing that the volume of base-table updates is low enough that — whatever it may be — column-store update overhead is an acceptable price to pay. (At one point he claims that only 5% of the data inserted in a financial application ever gets changed.) That may actually be true in a financial accounting system, but seems more questionable in a sufficiently large application that gets its updates from automatic devices, or from the consumer web.

Other highlights include: Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, DBMS product categories, In-memory DBMS, Memory-centric data management, OLTP, Parallelization, SAP AG, Software as a Service (SaaS), Theory and architecture

7 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Not-so-great moments in planning

Netezza on concurrency and workload management

Vertica customer notes

Update on Microsoft’s Madison and Fast Track data warehouse products

Groovy Corp

Oracle cites Exadata wins

While I’m venting about benchmarks

Progress in figuring out what ParAccel is doing

Infobright metrics

Hasso Plattner calls for in-memory OLTP column stores

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin