September 19, 2011

Are there any remaining reasons to put new OLTP applications on disk?

Once again, I’m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:

So I’m leaning to saying:  

Mainly, this is all fine. But I’m getting pushback on the solid-state aspect, for fear that it will compromise public cloud portability.

Am I missing something here? As far as I’m concerned, if you’re planning an OLTP system with a many-year lifespan today, of course you should assume solid-state storage. Maybe you scale out just as far as you would with disk, striping indexes or entire databases across the RAM of multiple servers. It that case, having solid-state backing reduces the risk of bottlenecks. Maybe you don’t scale out as far as you would with disk. In that case, solid-state backing saves you money.

As for public-cloud support for solid-state storage, that’s coming fast, right? (Actually, I have data points in support of that theory, but they’re a bit tenuous.) A large fraction of web businesses with private data centers seem to be using solid-state storage — from Facebook on down — or so the NoSQL/NewSQL/short-request DBMS guys tell me. Surely a number of public cloud vendors are close behind.

Comments

13 Responses to “Are there any remaining reasons to put new OLTP applications on disk?”

  1. anon on September 20th, 2011 11:31 am

    Curt,
    The cost in ETL is usually not a function of whether the source/destination db are the same platform. Its usually a cost of the logical transformation needed from the source to destination schema’s. If your schemas are identical, ETL is free (or nearly so) independent of source and destination db types.

    As for sharding things by ‘customer – id’ , do all their large tables have this column? (something like product-catalog for e.g. may not)

    Finally, on disk vs. flash – would really love to understand more. If ssd’s were not an option how would you change the above architecture?

  2. Fred Holahan on September 20th, 2011 12:54 pm

    @Curt

    You’ve described a good use case for VoltDB.

    Thanks.
    Fred

  3. Curt Monash on September 20th, 2011 1:46 pm

    Fred,

    I don’t think VoltDB’s programming model would work well in this situation.

  4. Curt Monash on September 20th, 2011 1:58 pm

    Anon,

    On the ETL side, you are in essence right. But what I didn’t say is that there’s some doubt as to exactly what workload should go against the OLTP database and what should go against the analytic one. Using the “same” DBMS for both simplifies changing one’s mind.

    We’ll see. As of yesterday, I’m not wholly convinced a true analytic DBMS has to be in the mix at all.

    As for the SSD part — right now the app runs on a single SMP box, running a name-brand DBMS. That, we’re assuming, is unsustainable, and certainly doesn’t transition well to the public cloud. So scale-out is the obvious way to go.

    But MySQL scale-out vendors are small — small enough to create non-trivial vendor risk. It’s already easier for me to recommend them because there are a number, so if one disappoints you port to using another. But using vanilla MySQL with SSDs is another nice option to have.

    Conversely, while the “scale-out” vendors have advantages even without splitting your database — at least dbShards and Schooner do — if you can get away from splitting and just use better storage, that might be more cost-effective. And, as always, cost matters.

    As for the data distribution on a single customer_ID key — off the top of my head, I haven’t thought of ANY join that would be done other than on that key. If I’m missing something, I’m pretty confident it would involve tiny tables, easily replicated. Indeed, the SQL vs. NoSQL choice is not wholly obvious here — even if I am pretty sure that SQL is actually the way to go.

  5. Cloud Database on September 20th, 2011 7:37 pm

    Sounds like a good fit for ScaleDB. My sources say SSD will soon be very available at Public Clouds.

  6. Zman on September 20th, 2011 8:01 pm

    Drizzle (a fork of MySQL) would be a good option to look at, if your OLTP database doesn’t require Stored Procedures, Triggers or Views. It has advantages over MySQL in the area of concurrency.

    Drizzle

  7. Fred Holahan on September 20th, 2011 8:30 pm

    @Curt

    FYI, we have recently added JDBC support, so developers can now choose to use that or our Java stored procedure interface (or both in combination).

    That said, my prior comment was more about the use case you described than the specific client opportunity. Apologies if that was unclear.

    Fred

  8. Curt Monash on September 20th, 2011 9:40 pm

    Zman,

    I haven’t heard much lately about Drizzle, other than grumbling as to its momentum (technical, adoption, whatever).

    I really would like there to be healthy MySQL front-end clones, given how I sometimes encourage people to use specialized MySQL engines and the like. But I haven’t heard much favorable rumbling from Drizzle or the Monty Project for a while.

  9. Curt Monash on September 20th, 2011 9:41 pm

    Fred,

    Is VoltDB’s performance the same (or very similar) using both interfaces?

  10. Curt Monash on September 20th, 2011 10:54 pm

    Fred,

    More to the point, have you relaxed the essential limitations of the VoltDB programming model, such as perfect commutativity? If I understand correctly, that’s what the JSP-only programming model was supposed to enforce.

  11. Fred Holahan on September 21st, 2011 10:17 am

    @Curt

    While there is some interesting academic research on commutativity and H-Store style systems, VoltDB has never made commutativity a requirement. Our transactions can perform an arbitrary mix of SQL and computation with few restrictions.

    Fundamentally, we require transactions to have a single round trip between the client app and the database. This enables our serialized, no-waits execution model. It requires deterministic transactions and necessitates stored procedures. We leverage this determinism extensively to achieve the same state in parallel on all database nodes, as well as to allow for deterministic replay of transactions in the event of a global failure.

    To your original question, VoltDB’s core programming model has not changed in 2.0. While you can do ad hoc database operations via JDBC, the way to achieve uber-high throughput and scale is by executing Java stored procedures via your chosen client API (JDBC or a number of other available language interfaces). So, to clarify my prior comment, we added a JDBC interface in 2.0 to lower the bar for developers who are familiar with that API, as well as to support third party monitoring, reporting and developer tools.

    I hope this information is helpful.

    Fred

  12. Curt Monash on September 21st, 2011 2:52 pm

    Thanks, Fred. That clears up a lot!

  13. Highlights of a busy news week | DBMS 2 : DataBase Management System Services on September 26th, 2011 12:51 am

    [...] best comment thread of the week was probably on my post about scale-out relational OLTP choices, in which people discussed the merits of various particular [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.