October 1, 2008

Automatic redistribution of data warehouse data

In a recent Oracle Exadata FAQ, Kevin Closson writes:

Q. […] don’t some of the DW vendors split the data up in a shared nothing method. Thus when the data has to be repartitioned it gets expensive. Whereas here you just add another cell and ASM goes to work in the background. (depending upon the ASM power level you set.)
A. All the DW Appliance vendors implement shared-nothing so, yes, the data is chopped up into physical partitions. If you add hardware to increase performance of queries against your current dataset the data will have to be reloaded into the new partitioning scheme. As has always been the case with ASM, adding new disks-and therefore Exadata Storage Server cells-will cause the existing data to be redistributed automatically over all (including the new) drives. This ASM data redistribution is an online function.

Hmm. That sounds much like the story I’ve heard from various other data warehousing DBMS vendors as well.

Rather than try to speak for them, however, I’ll just post this and see whether they choose to add anything to the comment thread.

Comments

7 Responses to “Automatic redistribution of data warehouse data”

  1. Sanjay on October 2nd, 2008 8:05 am

    I think the key here is the automatic redistribution. If as they claim the data distribution is an online function and is automatic that definitely is different from, say Teradata, where you have to take an outage to redistribute the data.

  2. David Aldridge on October 2nd, 2008 11:34 am

    No expertise,but here’s what the docs say:

    “Rebalancing a disk group moves data between disks to ensure that every file is evenly spread across all of the disks in a disk group. When all of the files are evenly dispersed, all of the disks are evenly filled to the same percentage; this ensures load balancing. Rebalancing does not relocate data based on I/O statistics nor is rebalancing started as a result of statistics. ASM rebalancing operations are controlled by the size of the disks in a disk group.

    “ASM automatically initiates a rebalance after storage configuration changes, such as when you add, drop, or resize disks. The power setting parameter determines the speed with which rebalancing operations occur.

    “You can manually start a rebalance to change the power setting of a running rebalance. A rebalance is automatically restarted if the instance on which the rebalancing is running stops; databases can remain operational during rebalancing operations. A rebalance has almost no effect on database performance because only one megabyte at a time is locked for relocation and only writes are blocked.”

    http://download.oracle.com/docs/cd/B28359_01/server.111/b31107/asmcon.htm#CJHGGECE

  3. Ajeet Singh on October 2nd, 2008 4:05 pm

    Hi Curt,

    I would like to mention our online scalability capabilities in this context. Aster nCluster provides online scaling, not only for storage but also for the whole system. When adding a new server to the cluster, the administrator only needs to input the MAC address of the first network interface and power on the bare-metal machine. The system automatically gets the software (including the operating system), formats the drives, configures the network, and balances the existing data and workload. All this is done in the background and the system continues to be available to users during this process. Similarly, servers can be taken out of nCluster and repurposed for other use with a single-click on the Aster Management Console without incurring any system downtime.

    http://www.asterdata.com/product/management.html has an overview of our manageability features.

    Thanks,
    Ajeet

  4. Curt Monash on October 2nd, 2008 4:55 pm

    Hi Ajeet,

    I was guessing Aster might be the first vendor to respond to this thread. 😉

    Best,

    CAM

  5. Stu Greenberg on October 6th, 2008 9:38 am

    Hi Curt,

    We’ll be the second vendor to respond to this thread.

    EXASolution also has an automatic redistribution feature. No reloading is necessary. You integrate new servers by specifying MAC-adress and booting the server. The system remains accessible at all times, and it redistributes the data in the background.

    Since EXASolution is based on an SPMD architecture, the new servers will increase performance linearly.

    Regards,
    Stu Greenberg

  6. Curt Monash on October 6th, 2008 11:43 am

    Stu,

    No shock there, either.

    I’m not sure, however, that I buy the claim SPMD = “pure linear scalability with no exceptions”, absent further elucidation. 😉

    Best,

    CAM

  7. Jacky on November 19th, 2008 11:16 am

    I confirm that teradata has to take the system offline to be able to do the redistribution..it is a critical operation that could lead to a lots of troubles if something crashes in the middle of that….!

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.