November 29, 2012

Notes on Microsoft SQL Server

I’ve been known to gripe that covering big companies such as Microsoft is hard. Still, Doug Leland of Microsoft’s SQL Server team checked in for phone calls in August and again today, and I think I got enough to be worth writing about, albeit at a survey level only,

Subjects I’ll mention include:

One topic I can’t yet comment about is MOLAP/ROLAP, which is a pity; if anybody can refute my claim that ROLAP trumps MOLAP, it’s either Microsoft or Oracle.

Microsoft’s slides mentioned Yahoo refining a 6 petabyte Hadoop cluster into a 24 terabyte SQL Server “cube”, which was surprising in light of Yahoo’s history as an Oracle reference.

But first we need some housekeeping. As best I understood Microsoft’s lingo:

*I.e. Azure; pay no attention to dictionaries and poets, who say that skies are azure, while clouds are puffy white.

Microsoft’s Hadoop/HDInsight story starts with what you’d expect:

The first level of HDInsight management tools will be based on Ambari and donated back to Apache open source, but you might want to integrate the use of those with Microsoft’s long-standing proprietary management suites.

Notes on SQL Server Parallel Data Warehouse include:

What sounds like it might be cool is PolyBase, a PDW extension comparable to Hadapt or Teradata Aster SQL-H. Notes on that start:

Technically, I gather:

I didn’t ask whether HDFS and SQL Server live on the same nodes, ala Hadapt, or different ones, ala Teradata Aster — but I’m guessing the latter, based on Microsoft’s PolyBase page.

And by the way — if SQL Server has significant analytic platform capabilities, nobody’s ever briefed me on them. To the extent it doesn’t, PolyBase/Hadoop might evolve into a partial substitute.

Microsoft SQL Server has for a while had a columnar capability, kludged from its indexing system. The big limitations were:

Both those restrictions are being lifted — initially just in PDW appliances, but later in the “box” products as well. Naturally, Microsoft reports that compression is great, calling it “10X” just like the other cool columnar kids now do. At one point there were hasty mentions of “vector processing” and something that sounds like Netezza zone maps, but I didn’t get details of either.

Actually, I suspect there’s a bit of kludge left in there somewhere, as the no-row-based-version feature is “optional”, and the column store is being described as a “clustered index”.

That takes us to Hekaton, which is already in “preview” with about 100 customers even though it won’t be generally available until the next major SQL Server release a few years out. As on other subjects, I lack detail, but I gather that Hekaton has some serious in-memory DBMS design features. Specifically mentioned were the absence of locking and latching.

A key point is that you only have to move some of your tables into Hekaton; you can manage the rest on disk as you always did. This may be regarded as somewhere in between storage tiering and full federation, in that SQL Server is one DBMS, but can invoke several very different storage engines.

And that’s all I have for now. Greater substance may or may not follow.

Related links

Comments

10 Responses to “Notes on Microsoft SQL Server”

  1. Al DeLosSantos on November 29th, 2012 9:58 am

    Thanks for the update Curt. The SAN versus DAS link provides very helpful background information and industry discussion…

    Regards,
    Al D.

  2. Mark Callaghan on November 29th, 2012 11:10 am

    When you write “(Parallel Data Warehouse, nee’ DATAllegro)” does this mean that DATAllegro technology made it into the PDW product? I am skeptical that much would be useful from a product that re-wrote SQL before sending it to mostly unmodified Ingres nodes and then collected/summarized the results.

  3. Curt Monash on November 29th, 2012 11:20 am

    Mark,

    The details on that are unknowable from the outside. Microsoft told me in the mid-1990s that they’d already celebrate “Sybase Liberation Day”, because the last lines of Sybase code were gone. A decade later that still wasn’t wholly accurate. More to the point, Sybase “technology” persists in the product long after the code does.

    Anyhow, the row-based new analytic DBMS — Netezza, Greenplum, DATAllegro, Aster — followed similar paths. The question is how far each got, when, and how many scratches they picked up from brambles along the way. So while you’re surely correct to imply heavy rewriting and replacement, that doesn’t mean the DATAllegro seed isn’t still in there somewhere.

    Anyhow, porting is the least of it; the major issues were the sophistication needed that DATAllegro hadn’t gotten around to.

  4. Mark Callaghan on November 29th, 2012 12:46 pm

    Regardless, it is nice to see Microsoft innovate to keep up in the world of data management. Thanks for the summary.

  5. David DeWitt on November 29th, 2012 5:29 pm

    Mark,

    1) There is no DatAllegro code left in PDW. None. As Naughton’s characterizes the system, a classic example of the “stone soap” parable.

    2) For details on how queries are executed in PDW works see our SIGMOD 2012 paper on query optimization. We still do send SQL fragments to individual nodes as was done in DA but the code that does this is entirely new. The reason we have retained this approach is that it avoids forcing all nodes to execute exactly the same plan step for the SQL fragment (a portion of entire SQL query). THis has proven to be very useful when the data on the different nodes is highly skewed. It has some inefficiencies but frankly in a data warehousing environment the overhead of the approach is noise.

    David

  6. Mark Callaghan on November 29th, 2012 6:27 pm

    I guess I need to read the paper now.

    Using SQL for query fragments is very interesting. I encountered a DBMS that migrated from using SQL to something else to do the opposite of what PDW wants. It wanted to guarantee that the same plans were used on all nodes. But this was before the days of big data.

    Is there a lot of research that needs to be done for PQ optimization? Using different plans can be a problem when a plan is blocking on one node and not on other nodes, or when a plan produces ordered results on when node and not on other nodes.

  7. Stuart Bunby on November 30th, 2012 5:42 am

    Thanks for the update Curt – an interesting read

    Is PDW a recommended platform for running OLTP workloads?

    Or would Microsoft typically advocate SQL Server on a non-appliance platform while Hekaton is being developed?

  8. Curt Monash on November 30th, 2012 5:49 pm

    Stuart,

    I would think the latter. Since PDW seems to plan every database operation twice (once globally and once at every individual node), it’s not ideal for transactional updates.

  9. Layering of database technology & DBMS with multiple DMLs | DBMS 2 : DataBase Management System Services on September 8th, 2013 4:52 am

    [...] e.g. Greenplum (especially early on), Aster (ditto), DATAllegro, DATAllegro’s offspring Microsoft PDW (Parallel Data Warehouse), or [...]

  10. NoSQL vs. NewSQL vs. traditional RDBMS | DBMS 2 : DataBase Management System Services on March 28th, 2014 1:21 pm

    [...] Microsoft has now launched Hekaton, about which I long ago wrote: I lack detail, but I gather that Hekaton has some serious in-memory DBMS design features. [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.