April 7, 2011

Introduction to Syncsort and DMExpress

Let’s start with some Syncsort basics.

One of Syncsort’s favorite value propositions is to contrast the cost of doing ETL in Syncsort, on commodity hardware, to the cost of doing ELT (Extract/Load/Transform) on high-end Teradata gear.

*I forget whether Syncsort actually bothered to say “almost” when making those claims, but one should of course assume the word is in there.

Syncsort general highlights include:

The high-level technology picture for Syncsort DMExpress is:

Syncsort DMExpress competitive claims include:

Syncsort estimates that one DMExpress customer is loading 1000 records/second/machine on 500 machines, around the clock. That would be about 2 billion records/hour, which is not implausible given who the customer is. Syncsort also told a story of an unnamed customer for whom Oracle utterly choked on joining 5 tables of 1 terabyte each. (27 days to run with clever workarounds.) DMExpress did the join in 6 hours and the whole load in 15.

By the way, I gather that Syncsort DMExpress is sometimes nicknamed “DMX”.

Syncsort became a client since the last time I posted a vendor client list.

Comments

9 Responses to “Introduction to Syncsort and DMExpress”

  1. Keith Kohl on April 7th, 2011 3:04 pm

    Curt – thanks for the post and for nicely capturing the main points from the recent conversation (disclosure for DBMS2 readers: I lead DI product management for Syncsort).

    I think it is also important to point out that many of our customers use DMExpress to augment their existing PowerCenter or DataStage environments and address performance issues. We see waning performance as a byproduct of the large DI vendors competing against each other feature for feature. Because DMExpress leverages the industry standard for metadata interchange (MITI), we can easily import slow running PowerCenter or DataStage routines and run almost immediately. We also maintain lineage when exporting the mapping.

    Under Syncsort’s new management, we’ve also focused on simplifying our licensing and packaging model to make it easier for customers to get more value from their DMExpress investments.

  2. Paul Johnson on April 8th, 2011 7:30 am

    How exactly do we measure the “cost of doing ELT (Extract/Load/Transform) on high-end Teradata gear”?

    Given that we must already have the Teradata server for query processing, where does the ELT cost come from?

    Adding ETL software and servers into the flow into Teradata adds to the cost, surely?

    I don’t doubt DMX has it’s capabilities, I just don’t think the attempt to contrast ETL v ELT cost adds to the value proposition message.

    If I had a penny for every MVS Syncsort job I’d run “back in the day”…

  3. Curt Monash on April 8th, 2011 7:32 am

    The contention, correct or otherwise, is that Teradata machines that would otherwise have insufficient throughput work just fine if some of their duties are offloaded.

  4. George Chen on April 8th, 2011 2:04 pm

    Paul Johnson has a good comment, now Syncsort claims to compete with Teradata?

  5. Curt Monash on April 8th, 2011 7:27 pm

    Offloading a particular kind of functionality is a limited kind of competition.

  6. Keith Kohl on April 9th, 2011 1:26 pm

    We are not claiming to compete with Teradata and actually see ourselves as quite complementary to them. What we are seeing with our customers is that they have had to push processing into Teradata (or other databases – source and target warehouse) because their ETL engine couldn’t handle the throughput requirements as well as a scalable database like Teradata.

    Many of these customers have made a large investment (many times more than once) in their database environments and have not realized a linear gain in ELT capacity with the investment made. They are also being asked to 1) shorten batch windows, 2) add sources & reports, and 3) provide intra-day updates to the warehouse while the end users are using it. As customers point out, there is the double whammy that once transformations are pushed to the database by the ETL engine, the often expensive ETL software simply becomes a scheduler executing the pushed down SQL. Customers are even telling us they’re writing the SQL in Teradata, copying it into query objects of major ETL tools for subsequent pushdown and scheduling. Needless to say, this is a huge waste of expensive ETL software and a huge labor cost.

    We are suggesting that customers are better off putting the “T” back where it belongs and let the warehouse service the business users. We’ve seen too many instances where pushing the “T” into the database creates management, agility and metadata governance challenges since these transformations are represented as SQL. When we explain DMExpress capabilities (minimal or no tuning, extreme efficiency, throughput at I/O rates, extreme performance, etc.) customers ask us to take on the transformation requirements freeing up capacity on Teradata to fulfill its mission: service user queries and reports.

    We believe that we offer a unique and efficient processing layer that reduces the cost structure and labor costs associated with managing transformations in the face of exploding data volumes. We also understand how Teradata is primarily focused on support for user queries contained in analytic and reporting applications.

  7. Syncsort extends Hadoop MapReduce | DBMS 2 : DataBase Management System Services on May 29th, 2013 7:24 am

    […] an ETL (Extract/Transform/Load) vendor, whose flagship product DMExpress was evidently renamed to […]

  8. Jorge Torres on July 2nd, 2013 12:58 pm

    I want to know more about the life support of the product. Do you have primary support? For how long?

  9. Alvin Limpin on January 28th, 2015 4:39 am

    I’m using DMExpress DMXMMSRT 14.%-m.%-d SunOS 5.10 SPARC 64-bit. It uses two files namely:
    adp1340_sort01.jcl:
    //SORT01 EXEC PGM=SYNCSORT,PARM=EQUALS
    //SORTIN DD DSN=IGNORED,
    // DCB=(LRECL=499,RECFM=FB)
    //SORTOUT DD DSN=IGNORED
    //SYSIN DD *
    SORT FIELDS=(1,6,A,32,3,A,46,28,A,44,2,A,35,9,A,74,16,A),FORMAT=CH
    END
    /*

    adp1340_sort01.map
    SORTIN_DSN=$DATADIR/$REGION/adp1340/gasserv.ppmf1340
    SORTOUT_DSN=$DATADIR/$REGION/adp1340/gasserv.ppmf1340.sort

    But the result is ASCII collating sequence, is there a possible that I couldchange it to EBCDIC collating sequence. Thanks

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.