June 14, 2010

Best practices for analytic DBMS POCs

When you are selecting an analytic DBMS or appliance, most of the evaluation boils down to two questions:

And so, in undertaking such a selection, you need to start by addressing three issues:

Key elements of cost* include:

*Assuming a classical in-house IT shop, where products are typically bought rather than leased/rented. With outsourced and/or monthly-fee structures, the details change but the principles remain the same.

Most of that can be evaluated pretty well via a spreadsheet, although things can get a bit tricky when you get to people costs, which are a large fraction of the whole. In particular, different analytic DBMS product suites have great, high-performance support for different (and often rapidly growing) sets of functionality – basic and advanced SQL, statistics, and more. Figuring out which ones will be best for your programmers, and how significant the differences are — well, that’s a lot like any other programming language evaluation, and those are rarely neat or clean-cut.

But when it comes to evaluating speed, there’s no substitute for a well-designed proof of concept (POC). Many analytic DBMS and appliance vendors are happy to let you do a POC, on your own premises (or remotely if you prefer), under your control, at no cost to you. And that’s great. It is crucial that a POC be run either by you, by a consultant* answerable to you, or – if you decide the vendor must run it for you – at least with you watching every step of the way and knowing exactly what is being done. Appliance vendors do find it cheaper to run POCs on their own premises, so a certain reluctance to ship you a box is understandable. But make no compromises about the transparency of a POC, or about your control of exactly what it is that gets tested.

*Since I sell consulting services for users evaluating analytic DBMS, I naturally am biased to think that consultants can be very useful in the process. 🙂 But whether you should use them a little (sanity check), a medium amount (work with you through the process), or heavily (actually drive the process for you and/or execute the POCs) is very dependent upon your specific situation.

So far as I’ve been able to tell:

Most of the criticisms I’ve heard of vendors’ POC practices have been directed at Oracle or ParAccel.

For most POCs, it’s a good conceptual template to form and then test a hypothesis to the effect of:

Sometimes absolute throughput and price/performance are important secondary considerations; sometimes they’re less germane. But either way, it’s almost always right to focus primarily on the questions of “What do I want this system to do?” and “What do I think we’re going to have to invest in it?” By way of contrast, it’s often misleading to focus too much on questions like “What’s the one number that best describes the performance of this system?” — even if you customize that calculation for your environment – or, even worse, “How much speed-up can I get on my single worst Query from Hell?”

The fundamental rule of POC construction is: Model your entire use case as best you can. That means you need to consider, at a minimum:

Of course, that’s not as easy as it sounds. Presumably, the main reason you’re getting a new analytic DBMS is that you want to do new kinds of analysis. By the very nature of analytics, you won’t know what analytic operations are most useful until you try them out and see what their results are. On the other hand – if you haven’t done considerable thinking about how you’re going to use your new analytic database, how did you ever get funding for the project in the first place? 😉

Seriously, I could write multiple posts, each as long as this one (but more application-oriented), about how to upgrade your analytic capabilities (and which fool’s gold to avoid). But this has gotten pretty long already, so for now I’ll just stop here.

Note: My clients at Netezza asked me to write something short about POCs they could use as a kind of foreword to some collateral, where by “short” they meant single-paragraph or something like that. They’re great clients, so I said yes, under the condition I could also use it as a blog post. Except … this post didn’t turn out to be nearly as short as they envisioned. Oops. 🙂

Related links

Comments

7 Responses to “Best practices for analytic DBMS POCs”

  1. Ramakrishna Vedantam on June 16th, 2010 10:08 am

    Will POC’s be a fair evaluation if performance parameters are tweaked for better performance only for selected queries? We do not see a wide TPC evaluation participations nowadays.

  2. Randy Lea on June 16th, 2010 1:21 pm

    You offer some great advice in your POCs best practices. Your “Model your entire use case as best you can” advice is spot on but where many customers fall short in their evaluation. This isn’t easy as you say but without dedicating the time and effort to do so POCs typically don’t produce meaningful results to effectively make a good decision. At Teradata we review the pros and cons of customer on-site or vendor location benchmarks and let the customer decide. Currently we’re doing more than half of our POCs on-site at customer locations. Due to the challenges of doing effective POCs, and the tricks some vendors play, we also encourage customers to get real world customer references. POCs and customer references are two key areas that customers should challenge their vendors with, as hesitation by the vendor in either area typically means the vendor’s marketing hype falls far short of reality.

    Randy Lea
    Teradata

  3. Curt Monash on June 16th, 2010 2:52 pm

    @Ramakrishna,

    I’d say relying on TPC results is pretty much a “worst practice” for doing an evaluation.

  4. zedware on June 18th, 2010 11:23 pm

    as far as i know, software providers often play trick to get great performance. Make pocs reliable is difficult due to limited investment.

  5. Curt Monash on June 19th, 2010 5:23 am

    @zedware,

    Yes, which is why I’m reminding people to do “real” POCs.

  6. Building Testing Competency | My Views on Testing on September 5th, 2010 12:59 pm

    […] Best practices for analytic DBMS POCs (dbms2.com) […]

  7. Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same : DBMS 2 : DataBase Management System Services on February 18th, 2012 6:27 pm

    […] Oracle Exadata users who say that the product works; Gartner also has stopped beating Oracle up for its previous policy of almost never doing onsite POCs (Proofs of Concept); both parts of that ring true with me. But Gartner also rightly dings Oracle for various issues in […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.