May 20, 2015

MemSQL 4.0

I talked with my clients at MemSQL about the release of MemSQL 4.0. Let’s start with the reminders:

The main new aspects of MemSQL 4.0 are:

There’s also a new free MemSQL “Community Edition”. MemSQL hopes you’ll experiment with this but not use it in production. And MemSQL pricing is now wholly based on RAM usage, so the column store is quasi-free from a licensing standpoint is as well.

Before MemSQL 4.0, distributed joins were restricted to the easy cases:

Now arbitrary tables can be joined, with data reshuffling as needed. Notes on MemSQL 4.0 joins include:

To understand the Spark/MemSQL connector, recall that MemSQL has “leaf” nodes, which store data, and “aggregator” nodes, which combine query results and ship them back to the requesting client. The Spark/MemSQL connector manages to skip the aggregation step, instead shipping data directly from the various MemSQL leaf nodes to a Spark cluster. In the other direction, a Spark RDD can be saved into MemSQL as a table. This is also somehow parallel, and can be configured either as a batch update or as an append; intermediate “conflict resolution” policies are possible as well.

In other connectivity notes:

Other application areas cited for streaming/lambda kinds of architectures are — you guessed it! — ad-tech and “anomaly detection”.

And now to the geospatial stuff. I thought I heard:

Given that Earth’s surface area is a little over 500,000,000 square meters, I’d think 2^50 would be a better figure, but fortunately that discrepancy doesn’t matter to the rest of the discussion. (Edit: As per a comment below, that’s actually square kilometers, so unless I made further errors we’re up to the 2^70 range.)

Anyhow, if the two popular alternatives for geospatial indexing are R-trees or space-filling curves, MemSQL favors the latter. (One issue MemSQL sees with R-trees is concurrency.) Notes on space-filling curves start:

*You could say it’s true except in edge cases … but then you’d deserve to be punished.

Given all that, my understanding of the way MemSQL indexes geospatial stuff — specifically points and polygons — is:

As for company metrics — MemSQL cites >50 customers and >60 employees.

Related links

Comments

10 Responses to “MemSQL 4.0”

  1. Heiko Korndorf on May 20th, 2015 6:33 am

    Curt,
    small correction: “… the Earth’s surface area is a little over 500,000,000 square _kilometers_ …”
    Regards,
    Heiko

  2. Curt Monash on May 20th, 2015 8:38 am

    Thanks, Heiko! Corrected above.

  3. David Gruzman on May 21st, 2015 5:17 am

    I believe that “in memory” data can be both efficient for OLTP and for OLAP, something disk can not give us… Having said that – it is a challenge to build reliable system which allow data mutation.

  4. John on May 21st, 2015 8:34 am

    Vertica already has a in memory row store and disk based column store. Doesn’t seem anything new here. Most of the OLTP workloads are still not out growing current architectures as much as analytical workloads did. I suspect that Memsql use cases are more analytical.
    I am also a bit skeptical about running OLTP and analytics on same box even if data is memory. Being in memory vs on SSD isn’t a big difference in scan speeds for large data sets. Memsql column store isn’t in memory anyways.
    I agree with what Curt says about geospatial stuff being the
    interesting part.

  5. David Gruzman on May 21st, 2015 8:52 am

    Running analytic and OLTP in the same box do require very good level of resource management. In the same time – it is very valuable for the customers, since save them one more system to support, data replication, etc.

  6. Curt Monash on May 21st, 2015 9:18 am

    John,

    Zynga is a flagship Vertica customer. They then became a flagship MemSQL customer as well.

    I’d also note that the architecture is quite different. MemSQL has some tables in the row store and others in the column store. It’s like Teradata in that regard, except that in Teradata one wouldn’t expect either part to be in-memory-only, and one wouldn’t be surprised at either part being on spinning disk, and one wouldn’t choose to update Teradata at OLTP speeds in any case.

    Vertica, by way of contrast, has the same tables in both, with the row store being a kind of write cache for the column store.

  7. Hans on May 26th, 2015 3:09 am

    I don’t understand how they can claim to support “sophisticated analytical SQL queries” if they don’t even support window functions (_very_ important for analytical queries), common table expressions or recursive queries.

  8. Curt Monash on May 26th, 2015 10:12 pm

    Hans,

    I agree that that claim should be counted against MemSQL’s marketing BS budget, for your reasons.

    On the plus side, I have reasonable confidence in MemSQL when it comes to taking care of obvious roadmap items.

  9. Are analytic RDBMS and data warehouse appliances obsolete? | DBMS 2 : DataBase Management System Services on August 28th, 2016 9:29 pm

    […] Processing), your OLTP RDBMS vendor surely has a story worth listening to. Memory-centric offerings MemSQL and SAP HANA are also pitched that […]

  10. “Real-time” is getting real | DBMS 2 : DataBase Management System Services on September 6th, 2016 1:08 pm

    […] short-request-capable data stores to also capture some analytic workloads. E.g., this is central to MemSQL’s pitch, and to some NoSQL applications as […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.