July 18, 2009

Netezza on concurrency and workload management

I visited Netezza Friday for what was mainly an NDA meeting. But while I was there I asked where Netezza stood on concurrency, workload management, and rapid data mart spin-out. Netezza’s claims in those regards turned out to be surprisingly strong.

In the biggest surprise, Netezza claimed at least one customer had >5,000 simultaneous users, and a second had >4,000. Both are household names. Other unspecified Netezza customers apparently also have >1,000 simultaneous users. (Perhaps one is Ross Stores, given how long ago it was said to be in the many 100s, but I didn’t think to ask.) I did not probe as to how demanding a typical user was, so these numbers may not really indicate what they appear to, but anyhow they’re vastly bigger than what I’ve heard from any analytic DBMS vendor newer than Netezza.

On the data mart spin-out side, another household-name Netezza customer has been rapidly spinning out virtual data marts, in a manner somewhat akin to eBay’s virtual data mart/”analytics-as-a-service” strategy* since 2004. However, the whole thing isn’t necessarily as slick as what eBay has going. This Netezza customer’s virtual data marts are more in the way of trials, with those data marts that prove really useful eventually getting instantiated physically on separate Netezza equipment.

*Actually, it’s not just eBay. Teradata told me earlier this week that a large fraction of its high-end customers spin out virtual data marts.

Both of these factoids lead naturally to questions along the line of “Oh really? Well, what have you got in workload management?” It turns out that Netezza has 3 layers of workload management:

  1. Things (Queries? Workloads? Users?) can be labeled as high/medium/low priority
  2. Beyond their priority level, things can get guaranteed resource allocation – i.e., an assured minimum share of disk (for temp space?), CPU, etc.
  3. Netezza software has a “short query bias” — i.e., shorter queries generally get higher priority.

Netezza further says that it’s working to enhance its workload management tools significantly.

Of the top of my head, I don’t recall how much workload management Teradata includes in its non-55xx products, which are the ones — especially the Teradata 2550 — Teradata positions as comparable to Netezza’s 10-xxx series.

Comments

13 Responses to “Netezza on concurrency and workload management”

  1. Greg Rahn on July 18th, 2009 10:56 am

    I think it is worth clarifying that concurrent users is not the same as concurrent executing queries. Supporting thousands of connections to a database is not nearly as impressive as supporting thousands of executing, in-flight, queries.

  2. Curt Monash on July 18th, 2009 1:31 pm

    Absolutely true.

    On the other hand, it’s pretty hard to get live-customer metrics on concurrent queries, and it’s also hard to think of a lot of use cases where that capability is needed. What’s more, the slower a system the more queries it may have to do at once. 😉

  3. Greg Rahn on July 18th, 2009 5:57 pm

    These high concurrent user/login numbers are actually a bit disturbing. OLTP database folks figured this out years ago by leveraging connection pooling. Why would there be so many concurrent users/logins on a DW system? Is this the result of some very bad custom programming with no middle tier connection multiplexing? I find it difficult to believe that any enterprise BI tool would require a 1:1 ratio of users to connections.

  4. Curt Monash on July 19th, 2009 12:03 pm

    Greg,

    I’m sorry. Who said anything about concurrent log-ins and the like? I’m referring to the number of human beings who have access to analytics against the same database at the same time.

  5. Greg Rahn on July 19th, 2009 8:46 pm

    @Curt

    Based on your most recent comment is seems though I was mislead by your (or Netezza’s) wording of “simultaneous users”. Given your clarification, “simultaneous users” is probably not the appropriate terminology. I would say “named users” would be much more appropriate as simultaneous users insinuates users are simultaneous doing something, thus the explanation of my comments. As you mentioned, what is meant is how many users have access to the platform, not how many users are actively using it at any one point in time.

  6. Curt Monash on July 19th, 2009 10:43 pm

    Greg,

    One of the users they gave me was a US-only company. So I think pretty much all the named users are probably at their desks at the same time. 🙂

    But yes, we’re talking about named users.

  7. Shawn Fox on July 20th, 2009 3:28 pm

    In my opinion a question of ‘how many simultaneous queries are running’ is a silly question anyway. As Curt implied in his reply to his own post, a large query backlog just indicates the system is not performing fast enough to keep up with the query workload.

    The number of “concurrent users” which can be supported is a matter of how many queries can be executed in a certain time period, not how many queries can be executed at the same time.

    A properly designed MPP database should be able to give close to 100% of the system resources to a query if it is running by itself, 50% each if 2 queries are running, etc. A workload management system has to exist primarily to keep the big analytic queries from getting in the way of the small/fast queries. In the end the system should have enough performance so that very little query backlog ever exists.

    Claims of being able to run 100s of queries at the same time is a straw man because it misses the point entirely. I don’t care if my data warehouse only runs 1 query at a time. As long as it manages to complete all of the queries fast enough to satisfy all of the SLAs everyone will be happy.

  8. Shawn Fox on July 20th, 2009 3:37 pm

    The Netezza customer which has the very high number of concurrent users is Corporate Express. I remembered a press release about it and looked it up.

    http://www.netezza.com/releases/2007/release073107.htm

  9. Curt Monash on July 20th, 2009 5:18 pm

    Corporate Express was neither of the two names I got, unless there’s a subsidiary-of relationship that would surprise me.

  10. Sean on July 22nd, 2009 11:35 pm

    Curt,

    you categorized Analytics technology into the following groups, what is your reasoning behind that?
    Analytics technology
    * Business intelligence
    * Data mart outsourcing
    * Data warehousing
    * MOLAP

    Should not BI be the umbrella name? I feel you need to provide some explanation to avoid any misunderstanding i.e.,

    Business intelligence
    * Analytics
    * Data mart outsourcing
    * Data warehousing
    * MOLAP

  11. Curt Monash on July 23rd, 2009 1:18 am

    Sean,

    I don’t think database management is a subset of “business intelligence”. And while you’re probably not alone in disagreeing with me, I think you’re in a small minority. So I’m not terribly concerned about confusion on that score.

    Anyhow, blog category names are hardly meant to comprise a precise or complete industry taxonomy.

  12. A partial overview of Netezza database software technology | DBMS2 -- DataBase Management System Services on August 18th, 2010 4:42 am

    […] with its workload management capabilities for queries, but nonetheless keeps adding features. Workload management has not yet been extended to cover all the non-query parts of the analytic […]

  13. Some thoughts on the announcement that IBM is buying Netezza | DBMS 2 : DataBase Management System Services on September 20th, 2010 4:40 pm

    […] Enterprise data warehouse (EDW) for medium-sized enterprises. (E.g. — I think — Ross Stores.) […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.