April 23, 2013

MemSQL scales out

The third of my three MySQL-oriented clients I alluded to yesterday is MemSQL. When I wrote about MemSQL last June, the product was an in-memory single-server MySQL workalike. Now scale-out has been added, with general availability today.

MemSQL’s flagship reference is Zynga, across 100s of servers. Beyond that, the company claims (to quote a late draft of the press release):

Enterprises are already using distributed MemSQL in production for operational analytics, network security, real-time recommendations, and risk management.

All four of those use cases fit MemSQL’s positioning in “real-time analytics”. Besides Zynga, MemSQL cites penetration into traditional low-latency markets — financial services (various subsectors) and ad-tech.

Highlights of MemSQL’s new distributed architecture start:

There are two kinds of MemSQL node — “aggregator” and “leaf”.
- Aggregators are a kind of head node. You can have a bunch of them.
- Leafs run full single-server MemSQL. You can have a bunch of them too.
MemSQL has two query optimizers. One kind runs on the aggregator nodes, and thinks about the whole cluster. The other runs on the leafs, and only thinks about its own node.
Much of the join and aggregation work is done on the aggregator nodes, but I didn’t pursue that issue in much detail.
It is good policy — and supported — to replicate small dimension/reference tables across the cluster. These are replicated to aggregator and leaf nodes alike. (This tells us that some joins are indeed done on the leafs. ;))
MemSQL replication can be synchronous or asynchronous. It can be used for high availability.

Also:

MemSQL writes (whether primary or replicated) go to a buffer. The buffer size can be 0 or positive, in a tradeoff of durability vs. the likelihood of a disk I/O bottleneck.
MemSQL has many virtual nodes on each physical (leaf) node. (This is pretty much an industry-standard best practice, as it helps with elasticity, recovery from node failure, and so on.)
Compression is still a future feature.
So is online schema change.
Leaf nodes have cost-based optimizers.
MemSQL’s aggregator (cluster-wide) optimizer is mainly heuristic, but is supposed to get more cost-based in future releases.
In some releases it will be possible to keep MemSQL running while upgrading the software. But that’s not a promise for releases that change how replication works.

And which not-easily-parallelized aggregate did MemSQL implement first? The same one Platfora did — COUNT DISTINCT.

Categories: Clustering, Database compression, Emulation, transparency, portability, Games and virtual worlds, Investment research and trading, Log analysis, MemSQL, MySQL, NewSQL, Transparent sharding, Zynga

Subscribe to our complete feed!

Comments

6 Responses to “MemSQL scales out”

Analytic application themes | DBMS 2 : DataBase Management System Services on April 25th, 2013 4:42 am

[…] data” included as well. I hear variants of that positioning from NewSQL vendors (e.g. MemSQL), NoSQL vendors (e.g. AeroSpike), BI stack vendors (e.g. Platfora), application-stack vendors (e.g. […]
GregW on June 7th, 2013 1:22 pm

I looked into memsql in part based on your post.

I do appreciate hearing about new options or improvements such as you mentioned; MPP was of interest. I have a strong interest in MPP SQL databases of the open source or closed-but-MySQL-compatible variety for scaling out data warehouse ETLs.

But I was a bit disillusioned that they didn’t support more than a 2-way join. (See http://developers.memsql.com/docs/1b/sql/join.html .) For a new project that can be worked around, but for migrating existing work, that’s a pretty big non-starter or important caveat I thought worth mentioning to you or your readers.
Ryan on June 7th, 2013 7:08 pm

Hi Greg,
Sorry you found an outdated version of our documentation. If you go here – http://developers.memsql.com/docs/2.0/sql/join.html – you should find what you need. If you’re interested in banging away on MemSQL, you can download a free trial at http://www.memsql.com/download.
Thanks,
Ryan
GregW on June 10th, 2013 3:27 pm

Thanks Ryan.
Webinar Tuesday, June 26, 1 pm EST — Real-Time Analytics | DBMS 2 : DataBase Management System Services on June 16th, 2013 2:47 am

[…] sponsor is MemSQL, one of my numerous clients to have recently adopted some version of a “real-time […]
MemSQL 3.0 | DBMS 2 : DataBase Management System Services on February 10th, 2014 5:32 pm

[…] MemSQL has historically been an in-memory row store, which as of last year scales out. […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

MemSQL scales out

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin