I stopped by MemSQL last week, and got a range of new or clarified information. For starters:
- Even though MemSQL (the product) was originally designed for OLTP (OnLine Transaction Processing), MemSQL (the company) is now focused on analytic use cases …
- … which was the point of introducing MemSQL’s flash-based columnar option.
- One MemSQL customer has a 100 TB “data warehouse” installation on Amazon.
- Another has “dozens” of terabytes of data spread across 500 machines, which aggregate 36 TB of RAM.
- At customer Shutterstock, 1000s of non-MemSQL nodes are monitored by 4 MemSQL machines.
- A couple of MemSQL’s top references are also Vertica flagship customers; one of course is Zynga.
- MemSQL reports encountering Clustrix and VoltDB in a few competitive situations, but not NuoDB. MemSQL believes that VoltDB is still hampered by its traditional issues — Java, reliance on stored procedures, etc.
On the more technical side:
- Some MemSQL users are running 7- or 8-way joins and other long-ish SQL statements.
- But MemSQL doesn’t yet have fully peer-to-peer data redistribution.
- MemSQL “leaves” only talk to MemSQL “aggregator nodes,” not each other …
- … but note the plural on “aggregator nodes”, which should immunize MemSQL from the worst of “fat head” bottlenecks.
- Of course, you can sometimes get join locality by sharding multiple tables on the same key …
- … or by broadcast-replicating tables that are sufficiently small.
- Better SQL coverage — e.g. SQL Windowing — is coming soon.
- MemSQL believes it has an aggressive data skipping story.
- MemSQL doesn’t yet have a true workload management story; they’re still at the stage “Our queries run so fast not many of them have to be active at once, and if things nevertheless get too busy we have some throttling capabilities.” But MemSQL at least sounds aware of the difference between that and true workload management, which puts them ahead of some other vendors I talk with.
- MemSQL doesn’t have stored procedures. In particular, since MemSQL (the product) generates code on the fly, MemSQL (the company) doesn’t think the performance benefits of stored procedure pre-compilation are needed.
And finally, MemSQL’s column-store compression story — which I mangled in a previous post — goes like this:
- There are numerous compression algorithm choices, both columnar (e.g. dictionary/tokenization, run-length encoding) and block (Lempel-Ziv, I presume in multiple variations).
- Compression is block-by-block, something I hear more commonly these days than Vertica’s alternative of global compression choices.
- The choice of compression scheme is automagic for each block, unless you give explicit hints.
- Default block size for the columnar store is 10 million rows.