OLTP
Analysis of database management systems designed with a focus on OTLP (OnLine Transaction Processing) uses.
The Naming of the Foo
Let’s start from some reasonable premises.
- No technology category name is ever perfect.
- It’s particularly hard to describe NoSQL (Not Only SQL) accurately, given the basic confusion as to what NoSQL is all about.
- That said, it seems pretty clear that NoSQL is about making big websites (and perhaps other cloud-like installations) run and scale.
- Dwight Merriman (founder/CEO of MongoDB vendor 10gen) is heading in the right direction when he says that the unifying ideas of NoSQL are that you do away with transactions and joins. But if he’s ever said something like “NoSQL is Foo without joins and transactions,” I don’t know what Foo is.
- Actually, I do know what Foo is – Foo is what happens when lots of people want to get small amounts each of information in or out of a database at the same time. I just don’t know what Foo is called.
- Obviously, Foo is a lot like OLTP (OnLine Transaction Processing). However, it would be pretty silly for Foo to actually be OLTP, given that one of the core points of NoSQL is that you don’t have transactions.
- It not just the “T” part of OLTP that’s fried. Calling something “OnLine” only makes sense as long as offline is an option, and offline transaction processing has been obsolete for a very long time.*
*Sure, if you strain you can talk yourself into exceptions. But the point stands.
So we need a name for Foo, where Foo is what happens when lots of people want to get small amounts each of information in or out of a database at the same time. Thus, three major subcategories of more-or-less disk-based Foo are:
- No-compromises ACID-compliant relational OLTP
- Sharded MySQL
- NoSQL
There may be some more purely memory-centric versions too, but let’s put those aside for the moment.
Absent a better idea, I can squeeze Foo into yet another four-letter acronym:
HVSP (High-Volume Simple Processing)
That’s as imperfect as any other category name, and an awkward mouthful to boot. So I’d love to hear a better one; if you have such, please share it! In the mean time, I think “HVSP” has merit because:
- The “Processing” part should be noncontroversial.
- “High-Volume” is inherent to the challenge. If RDBMS scale well enough for your use case, using something less powerful is probably silly.* Similarly, while Oracle shines at high-volume OLTP workloads, there are many cheaper DBMS that do a fine job of OLTP at lower volumes.
- “Simple” is the core principle of NoSQL systems, which drop joins and transactions as being too much foofarah. That only makes sense at all under the assumption that you have bone-simple queries and updates, so that programming around the lack of joins and transactions isn’t all that much of a burden.
- Something similar is true of sharded MySQL.
- Less obviously, “simple” is a core principle of relational OLTP as well. The point of the relational model is to cap the complexity of data operations, or more precisely to hide that complexity from programmers.
- And overloading the word “simple” a bit, it’s fair to say that if you’re reading or writing one record at a time, you’re doing something relatively simple, at least as opposed to what you do in analytic processing. The OLTP vs. OLAP distinction is preserved in this name change.
- The whole thing matches my definition above, namely “what happens when lots of people want to get small amounts each of information in or out of a database at the same time.”
*Assuming, of course, that rows-and-tables are a good metaphor for your data structure in the first place.
Systems I’m leaving out of the HVSP and hence also NoSQL categories include:
- Hadoop and other batch-oriented MapReduce. Hadoop isn’t part of NoSQL. I’m pretty sure that Cloudera CEO Mike Olson agrees with me.
- More generally, non-SQL data stores that don’t meet the HVSP criteria. Dave Kellogg stretches things when he claims that MarkLogic is a NoSQL system. (But then, that was in a post where he seemingly praised a train wreck of an article.)
But hey – what good is a categorization if it doesn’t leave some things out?
| Categories: Data models and architecture, Database diversity, Hadoop, MapReduce, Mark Logic, NoSQL, OLTP, Theory and architecture | 2 Comments |
Cassandra and the NoSQL scalable OLTP argument
Todd Hoff put up a provocative post on High Scalability called MySQL and Memcached: End of an Era? The post itself focuses on observations like:
- Facebook invented and is adopting Cassandra.
- Twitter is adopting Cassandra.
- Digg is adopting Cassandra.
- LinkedIn invented and is adopting Voldemort.
- Gee, it seems as if the super-scalable website biz has moved beyond MySQL/Memcached.
But in addition, he provides a lot of useful links, which DBMS-oriented folks such as myself might have previously overlooked. Read more
| Categories: Cassandra, Data models and architecture, NoSQL, OLTP, Open source, Parallelization, Specific users, Theory and architecture | 10 Comments |
Two cornerstones of Oracle’s database hardware strategy
After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:
- Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
- Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.
And by the way, Oracle doesn’t make its storage-tier software available to run on anything than Oracle-designed boxes. At the moment, that means Exadata Versions 1 and 2. Since Exadata is by far Oracle’s best DBMS offering (at least in theory), that means Oracle’s best database offering only runs on specific Oracle-sold hardware platforms. Read more
Intersystems Cache’ highlights
I talked with Robert Nagle of Intersystems last week, and it went better than at least one other Intersystems briefing I’ve had. Intersystems’ main product is Cache’, an object-oriented DBMS introduced in 1997 (before that Intersystems was focused on the fourth-generation programming language M, renamed from MUMPS). Unlike most other OODBMS, Cache’ is used for a lot of stuff one would think an RDBMS would be used for, across all sorts of industries. That said, there’s a distinct health-care focus to Intersystems, in that:
- MUMPS, the original Intersystems technology, was focused on health care.
- The reasons Intersystems went object-oriented have a lot to do with the structure of health-care records.
- Intersystems’ biggest and most visible ISVs are in the health-care area.
- Intersystems is actually beginning to sell an electronic health records system called TrakCare around the world (but not in the US, where it has lots of large competitive VARs).
Note: Intersystems Cache’ is sold mainly through VARs (Value-Added Resellers), aka ISVs/OEMs. I.e., it’s sold by people who write applications on top of it.
So far as I understand – and this is still pretty vague and apt to be partially erroneous – the Intersystems Cache’ technical story goes something like this: Read more
| Categories: Data models and architecture, Emulation, transparency, portability, Intersystems and Cache', Mid-range, OLTP, Object, Sybase, Theory and architecture | 2 Comments |
Boston Big Data Summit keynote outline
Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.
Thoughts on the integration of OLTP and data warehousing, especially in Exadata 2
Oracle is pushing Exadata 2 as being a great system for any of OLTP (OnLine Transaction Processing), data warehousing or, presumably, the integration of same. This claim rests on a few premises, namely: Read more
| Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Exadata, OLTP, Oracle, Solid-state memory, Theory and architecture | 35 Comments |
Notes on the Oracle Database 11g Release 2 white paper
The Oracle Database 11g Release 2 white paper I cited a couple of weeks ago has evidently been edited, given that a phrase I quoted last month is no longer to be found. Anyhow, here are some quotes from and comments on what evidently is the latest version. Read more
The Boston Globe had an article on VoltDB
The Boston Globe article has more detail than Vertica and VoltDB have ever OKed me to put out, and some business details they’ve never given me.
| Categories: In-memory DBMS, Memory-centric data management, OLTP, Vertica Systems, VoltDB and H-Store | Leave a Comment |
Groovy Corp puts out a ridiculous press release
I knew Groovy Corp’s press release today would be bad, as it was pitched in advance as being about an awe-inspiring benchmark. That part met my very low expectations, emphasizing how the Groovy SQL Switch massively outperformed MySQL* in a benchmark, and how this supposedly shows the Groovy SQL Switch would outperform every other competitive RDBMS by at least similar margins.
*While a few use cases are exceptions, being “better than MySQL” for a DBMS is basically like being “better than Pabst Blue Ribbon” for a beer. Unless price is your top consideration, why are you even making the comparison?
Even worse, the press release, from its subhead and very first sentence, emphasizes the claim “the Groovy SQL Switch’s ability to significantly outperform relational databases.” As CEO Joe Ward quickly agreed by email, that’s not accurate. As you would expect from the “SQL” in its name, the Groovy SQL Switch is just as relational as the products it’s being contrasted to. Unfortunately for Joe, who I gather aspires to edit it to say something more sensible, the press release is out already in multiple places.
More favorably, Renee Blodgett has a short, laudatory post about Groovy, with some kind of embedded video.
| Categories: Groovy Corporation, In-memory DBMS, Memory-centric data management, MySQL, OLTP | 16 Comments |
What are the best choices for scaling Postgres?
I have a client who wants to build a new application with peak update volume of several million transactions per hour. (Their base business is data mart outsourcing, but now they’re building update-heavy technology as well. ) They have a small budget. They’ve been a MySQL shop in the past, but would prefer to contract (not eliminate) their use of MySQL rather than expand it.
My client actually signed a deal for EnterpriseDB’s Postgres Plus Advanced Server and GridSQL, but unwound the transaction quickly. (They say EnterpriseDB was very gracious about the reversal.) There seem to have been two main reasons for the flip-flop. First, it seems that EnterpriseDB’s version of Postgres isn’t up to PostgreSQL’s 8.4 feature set yet, although EnterpriseDB’s timetable for catching up might have tolerable. But GridSQL apparently is further behind yet, with no timetable for up-to-date PostgreSQL compatibility. That was the dealbreaker.
The current base-case plan is to use generic open source PostgreSQL, with scale-out achieved via hand sharding, Hibernate, or … ??? Experience and thoughts along those lines would be much appreciated.
Another option for OLTP performance and scale-out is of course memory-centric options such as VoltDB or the Groovy SQL Switch. But this client’s database is terabyte-scale, so hardware costs could be an issue, as of course could be product maturity.
By the way, a large fraction of these updates will be actual changes, as opposed to new records, in case that matters. I expect that the schema being updated will be very simple — i.e., clearly simpler than in a classic order entry scenario.
