Once again, I’m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:
- 100s of gigabytes of data at first, growing to >1 terabyte over time.
- High peak loads.
- Public cloud portability (but they have private data centers they can use today).
- Simple database design — not a lot of tables, not a lot of columns, not a lot of joins, and everything can be distributed on the same customer_ID key.
- Stream the data to a data warehouse, that will grow to a few terabytes. (Keeping only one year of OLTP data online actually makes sense in this application, but of course everything should go into the DW.)
So I’m leaning to saying:
- They should go with a scalable, MySQL-based solution.
- Lots of third-party software works with MySQL, in case that’s helpful.
- Yes, any one vendor is small and not yet firmly established, but there are numerous vendors around with interesting MySQL scaling stories.
- In a vendor emergency, just going with Oracle’s MySQL stuff would probably work …
- … especially because there are these lovely things in the world called solid-state drives.
- There’s also good escapability if one wants to move away from MySQL, because everybody knows how to handle MySQL data.
- The first product to look at is dbShards, because it meets all the topology needs:
- The first analytic DBMS to look at is Infobright.
- Yes, I know Infobright is focused more on machine-generated data these days, but this client’s analytic needs are so straightforward Infobright should pass with flying colors.
- The MySQL-to-MySQL aspect should make ETL dead simple.
- Again, there’s escapability.
Mainly, this is all fine. But I’m getting pushback on the solid-state aspect, for fear that it will compromise public cloud portability.
Am I missing something here? As far as I’m concerned, if you’re planning an OLTP system with a many-year lifespan today, of course you should assume solid-state storage. Maybe you scale out just as far as you would with disk, striping indexes or entire databases across the RAM of multiple servers. It that case, having solid-state backing reduces the risk of bottlenecks. Maybe you don’t scale out as far as you would with disk. In that case, solid-state backing saves you money.
As for public-cloud support for solid-state storage, that’s coming fast, right? (Actually, I have data points in support of that theory, but they’re a bit tenuous.) A large fraction of web businesses with private data centers seem to be using solid-state storage — from Facebook on down — or so the NoSQL/NewSQL/short-request DBMS guys tell me. Surely a number of public cloud vendors are close behind.