June 14, 2010

Best practices for analytic DBMS POCs

When you are selecting an analytic DBMS or appliance, most of the evaluation boils down to two questions:

How quickly and cost-effectively does it execute SQL?
What analytic functionality, SQL or otherwise, does it do a good job of executing?

And so, in undertaking such a selection, you need to start by addressing three issues:

What does “speed” mean to you?
What does “cost” mean to you?
What analytic functionality do you need anyway?

Key elements of cost* include:

Software license and maintenance
Hardware purchase cost, maintenance, electric power, and computer room burden
Database and system administration
(For some uses cases) Programming

*Assuming a classical in-house IT shop, where products are typically bought rather than leased/rented. With outsourced and/or monthly-fee structures, the details change but the principles remain the same.

Most of that can be evaluated pretty well via a spreadsheet, although things can get a bit tricky when you get to people costs, which are a large fraction of the whole. In particular, different analytic DBMS product suites have great, high-performance support for different (and often rapidly growing) sets of functionality – basic and advanced SQL, statistics, and more. Figuring out which ones will be best for your programmers, and how significant the differences are — well, that’s a lot like any other programming language evaluation, and those are rarely neat or clean-cut.

But when it comes to evaluating speed, there’s no substitute for a well-designed proof of concept (POC). Many analytic DBMS and appliance vendors are happy to let you do a POC, on your own premises (or remotely if you prefer), under your control, at no cost to you. And that’s great. It is crucial that a POC be run either by you, by a consultant* answerable to you, or – if you decide the vendor must run it for you – at least with you watching every step of the way and knowing exactly what is being done. Appliance vendors do find it cheaper to run POCs on their own premises, so a certain reluctance to ship you a box is understandable. But make no compromises about the transparency of a POC, or about your control of exactly what it is that gets tested.

*Since I sell consulting services for users evaluating analytic DBMS, I naturally am biased to think that consultants can be very useful in the process. 🙂 But whether you should use them a little (sanity check), a medium amount (work with you through the process), or heavily (actually drive the process for you and/or execute the POCs) is very dependent upon your specific situation.

So far as I’ve been able to tell:

Netezza loves to ship boxes to prospects for POCs, and have them set up the boxes and do POCs themselves. That’s a big reason why Netezza wants to call attention to this subject.
Oracle has generally been pretty reluctant to ship Exadata boxes out for POCs. That’s the other reason Netezza wants to call attention to the issue. 🙂
Open source vendors make it easy for you to download and test at least their community editions.
Vertica makes it pretty easy for you to test its software too (download or cloud).
ParAccel has generally insisted on running POCs itself, although it will do so on your premises if you insist.
Teradata naturally tries to do POCs on its own premises, but doesn’t insist too hard. (Edit: Randy Lea of Teradata says that Teradata is now doing over half its POCs onsite.)

Most of the criticisms I’ve heard of vendors’ POC practices have been directed at Oracle or ParAccel.

For most POCs, it’s a good conceptual template to form and then test a hypothesis to the effect of:

For a given technology product assemblage (brand of DBMS, number of nodes, etc.), and
For a given level of human effort (e.g., administrative effort), you can
Run a given a workload, with
Satisfactory and satisfactorily consistent response times

Sometimes absolute throughput and price/performance are important secondary considerations; sometimes they’re less germane. But either way, it’s almost always right to focus primarily on the questions of “What do I want this system to do?” and “What do I think we’re going to have to invest in it?” By way of contrast, it’s often misleading to focus too much on questions like “What’s the one number that best describes the performance of this system?” — even if you customize that calculation for your environment – or, even worse, “How much speed-up can I get on my single worst Query from Hell?”

The fundamental rule of POC construction is: Model your entire use case as best you can. That means you need to consider, at a minimum:

Your whole concurrent query, other analytic, and low-latency update workload (peak).
Your whole query, analytic, load, backup, and maintenance workload (ongoing).
Partial-failure scenarios.
Your core SLAs (Service-Level Agreements).

Of course, that’s not as easy as it sounds. Presumably, the main reason you’re getting a new analytic DBMS is that you want to do new kinds of analysis. By the very nature of analytics, you won’t know what analytic operations are most useful until you try them out and see what their results are. On the other hand – if you haven’t done considerable thinking about how you’re going to use your new analytic database, how did you ever get funding for the project in the first place? 😉

Seriously, I could write multiple posts, each as long as this one (but more application-oriented), about how to upgrade your analytic capabilities (and which fool’s gold to avoid). But this has gotten pretty long already, so for now I’ll just stop here.

Note: My clients at Netezza asked me to write something short about POCs they could use as a kind of foreword to some collateral, where by “short” they meant single-paragraph or something like that. They’re great clients, so I said yes, under the condition I could also use it as a blog post. Except … this post didn’t turn out to be nearly as short as they envisioned. Oops. 🙂

Related links

My February, 2009 slide deck on how to select an analytic DBMS is in many parts still pretty current

Categories: Benchmarks and POCs, Data warehousing, Exadata, Netezza, ParAccel, Teradata

Subscribe to our complete feed!

Comments

7 Responses to “Best practices for analytic DBMS POCs”

Ramakrishna Vedantam on June 16th, 2010 10:08 am

Will POC’s be a fair evaluation if performance parameters are tweaked for better performance only for selected queries? We do not see a wide TPC evaluation participations nowadays.
Randy Lea on June 16th, 2010 1:21 pm

You offer some great advice in your POCs best practices. Your “Model your entire use case as best you can” advice is spot on but where many customers fall short in their evaluation. This isn’t easy as you say but without dedicating the time and effort to do so POCs typically don’t produce meaningful results to effectively make a good decision. At Teradata we review the pros and cons of customer on-site or vendor location benchmarks and let the customer decide. Currently we’re doing more than half of our POCs on-site at customer locations. Due to the challenges of doing effective POCs, and the tricks some vendors play, we also encourage customers to get real world customer references. POCs and customer references are two key areas that customers should challenge their vendors with, as hesitation by the vendor in either area typically means the vendor’s marketing hype falls far short of reality.

Randy Lea
Teradata
Curt Monash on June 16th, 2010 2:52 pm

@Ramakrishna,

I’d say relying on TPC results is pretty much a “worst practice” for doing an evaluation.
zedware on June 18th, 2010 11:23 pm

as far as i know, software providers often play trick to get great performance. Make pocs reliable is difficult due to limited investment.
Curt Monash on June 19th, 2010 5:23 am

@zedware,

Yes, which is why I’m reminding people to do “real” POCs.
Building Testing Competency | My Views on Testing on September 5th, 2010 12:59 pm

[…] Best practices for analytic DBMS POCs (dbms2.com) […]
Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same : DBMS 2 : DataBase Management System Services on February 18th, 2012 6:27 pm

[…] Oracle Exadata users who say that the product works; Gartner also has stopped beating Oracle up for its previous policy of almost never doing onsite POCs (Proofs of Concept); both parts of that ring true with me. But Gartner also rightly dings Oracle for various issues in […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Best practices for analytic DBMS POCs

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin