June 28, 2008

Response to Rita Sallam of Oracle

In a comment thread on Seth Grimes’ blog, Rita Sallam of Oracle engaged in a passionate defense of her data warehousing software. I’d like to take it upon myself to respond to a few of here points here.

If a shared disk architecture is not scalable, then how is it that Oracle consistently has more customers in the Winter Group top ten data warehouses than any other vendor?

Unfortunately, the Winter Corporation list is a joke, which may be why it hasn’t been updated since 2005. (I mean it — Dick Winter seems like too good a guy to keep publishing something so misleading indefinitely.) It counts not just user data, but indices, aggregates, and everything else. Based on that, I’d guess the largest Oracle site listed there to be at 10-20 terabytes of user data, and all the others to be at 5 TB or less. Even assuming 3-5X database growth since the list was compiled, that puts Oracle behind Teradata, DATAllegro, Netezza, Dataupia, and probably SAS — just counting ones I can think of quickly — none of whom are actually represented on the list. Teradata in particular blows Oracle away on warehouse size.

And by the way — the largest Oracle warehouse by far on that list is at Yahoo. But Oracle isn’t Yahoo’s major data warehouse software provider.

If a shared disk architecture is not scalable, then how is it that Oracle is the leader in Data Warehouse Performance. It is the TPC-H leader in the 300GB, 1TB, 3TB, and 30TB categories.

TPCs are a joke too. Oracle’s third-longest-serving exec (or maybe second-longest — I always forget whether he or Ken Jacobs has been there longer) e-mailed me a few years ago, asking for my help in making them go away. Be that as it may:

A shared disk architecture (Oracle) is more flexible. Since all processing units can see all data the system can at runtime decide what the degree of parallelism should be. In addition, some queries may be more efficiently run in serial (simple index lookups) in which case parallelism isn’t even used.

Data warehouse appliances (at least the row-based ones) excel at fast table scans.

Also, if individual servers in a cluster contain many CPUs (or cores) the parallelism can be co-located on the node. Hence, statements may run in parallel but do not require the interconnect to ship data.

Appliance makers use multicore systems too. Everybody does, these days.

Oracle’s ‘Shared Everything’ architecture provides the ability to dynamically optimize each query requirement. The current workload is examined and the degree of parallelism is adjusted rather than blindly starting with the same degree of parallelism every time. Therefore, the degree of parallelism is optimized for every query and there is no requirement for a minimal degree of parallelism across all nodes. Operations can run in parallel using one, some or all nodes of a Real Application cluster depending on the current workload, the characteristics and importance of the query.

That’s all irrelevant to the chief benefit of parallelism. Parallelism isn’t about optimizing the use of CPUs. Parallelism is about optimizing the system where it’s actually bottlenecked, which is getting data off of disk.

Parallelism is not related to the partitioning strategy of the data as in a shared-nothing environment.

Parallelism isn’t particularly related to partitioning strategy in a shared-nothing environment either. Kognitio offers a competitive shared-nothing system with no partitioning whatsoever. And many queries on most vendors’ systems relate to partitioning only in that the data is distributed so that approximately equal amounts of data may be found on each node.

True, since that’s done by hash partitioning, you try to pick hash keys so that you get lucky as often as possible, and benefit from the hash key when doing a hash join. And further partitioning can be added as an optimization. But that’s hardly a disadvantage for shared-nothing systems vs. Oracle.

With Oracle’s shared disk approach, fail over is built-in and the configuration remains balanced.

If your disks are failing often enough for that to be more than a tiny benefit, you might want to consider changing your storage supplier.

Oracle does not require re-organization of data. Oracle’s hash partitioning is also automatic and does not require re-distribution of data. The Oracle Optimizer automatically tunes queries. In addition the Oracle Database 10g ADDM, (Automatic Database Diagnostic Monitor) runs automatically to make performance recommendations. Index management is very simple in Oracle. The ADDM tool recommends indexes, generates script to create indexes and will run them with the DBA’s approval. Oracle also supports all types of data including stars, normalized and de-normalized data. Oracle supports Join Indexes and Aggregate Join Indexes.

Somebody please remind me to start an international Scrabulous tournament for Oracle DBAs, since they have nothing else to occupy their time.

In addition, Oracle supports superior concurrency and parallelism, Oracle can execute several queries at the same time (in parallel) without performance degradation. With Oracle’s model, there are several checkout counters that customers can use, which parallelizes the process and provides a higher throughput. Even if a customer at one counter takes a long time to checkout, other customers are not affected. Once all checkout counters are full, Oracle queues the remaining customers (queries) until the next checkout counter opens up and sends the next customer in line to the open counter. If this starts occurring consistently, and the processing of customers (queries) slows down, Oracle allows for more checkout counters to be added dynamically using RAC or by simply adding more CPUs to an environment.

That one’s probably real.

Related links:

Comments

10 Responses to “Response to Rita Sallam of Oracle”

  1. Serge Rielau on June 29th, 2008 9:31 am

    Some comments:
    * Checking out TPC-H I find that Exasol seems to be very active now in the lower end. So it’s not true that TPC-H is a MS SQL Server, Oracle, IBM private party.
    * In the thread you point to comments were made that NETEZZA has pulled out of TPC-H. I was unaware that they had ever pulled in to begin with…
    * Perhaps you should ask Jeff Jones to discuss measurable marketshare numbers between Teradata and DB2 Warehousing. The result might surprise you.
    The spinning of of Teradata was quite revealing.
    * I did find it fascinating that Oracle points to TPC-H to shore up RAC scalability claims as it is really rarely used in that benchmark and the bigger the system the less so. E.g. the 30TB system is not RAC.
    * I see TPC-H as one of those benchmarks where the numbers say less than non having any.
    If a system is any good it should “at least” be able to do decent TPC-H. So if a company doesn’t show up it means they are either: Too small to fund (or get it funded by a hardware vendor) or just can’t. Either way raises questions for customers.
    It’s a bit like claiming the fastest swimmers but never showing up at the Olympics.
    * From my very own experience I can assert that DB2 wins most warehouse engagements where we are allowed to participate via proof of concepts. Vendors tend to come to DB2 after they fail with their default Oracle choice carried over from their OLTP deployments.
    W.r.t. NETEZZA the moment of truth tends to be multi-user. Al that fast table-scanning doesn’t help when users compete for the spindles/FPGA.

    Just my two cents
    Serge Rielau
    DB2 Benchmark and Solutions Development
    IBM Toronto Lab

  2. Curt Monash on June 29th, 2008 10:47 am

    Serge,

    DB2 clearly has a significant warehousing presence somewhere, although I run across it less often than the numbers would suggest.

    As for concurrency — yep, the new guys are still improving significantly every release. But if I have 10X+ the single-query performance, I don’t need to reduce my multi-user penalty quite as low as I do if my base speed is nothing to write home about.

  3. Curt Monash on June 29th, 2008 10:53 am

    And you’re right, to my knowledge, about Netezza. I’m sure they evaluated participating, but I don’t recall them getting particularly far along.

    CAM

  4. Stuart Frost on June 30th, 2008 1:25 pm

    Curt,

    Great post! You surpassed even your usual high standards and brightened my Monday morning.

    One thing the major vendors don’t like to be reminded of wrt TPC-H is that they’ve spent millions of dollars over several years to optimize their DBMS for those particular queries.

    The end result is that TPC-H just isn’t representative of any real world installation. Even the TPC regards TPC-H as obsolete and has a major refresh (TPC-DS) in the works. The problem with that is the major vendors are spending huge amounts of time and effort to ensure it suits their architectures over those of the newer vendors.

    The only way to really judge a DW platform is to run a PoC with the customer’s own data and queries.

    Stuart Frost
    CEO, DATAllegro

  5. Infology.Ru » Blog Archive » Microsoft покупает DATAllegro on August 14th, 2008 3:58 pm

    [...] Why Oracle’s counterarguments don’t hold water [...]

  6. Infology.Ru » Blog Archive » Как Оракл будет спасать свой бизнес хранилищ данных? on August 14th, 2008 4:01 pm

    [...] Response to Rita Sallam of Oracle [...]

  7. Oracle Exadata and Oracle data warehouse appliance sound bites | DBMS2 -- DataBase Management System Services on September 25th, 2008 2:10 pm

    [...] long denying it, Oracle has finally admitted that putting more than 10 TB on Oracle had been an extremely [...]

  8. Infology.Ru » Blog Archive » Кадры дня - Oracle Exadata и комплекс для хранилищ данных от Oracle on September 27th, 2008 4:40 am

    [...] долгих отрицаний, Oracle наконец признал, что работа с объёмами, [...]

  9. Winter Corporation on Exadata | DBMS2 -- DataBase Management System Services on February 3rd, 2009 9:54 pm

    [...] at least since Aberdeen pulled back from the “You pay; we say” business — is Winter Corporation’s list of large data warehouses. (Failings include that it only lists warehouses run by software from certain vendors, primarily [...]

  10. Advice for some non-clients | DBMS2 -- DataBase Management System Services on July 30th, 2010 11:52 am

    [...] making progress against your reputation for untruthfulness. Oh, I’ve dinged you for some past slip-ups, but on the whole they’ve been no worse than other vendors.’ But recently you pulled a [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.