Analysis of open source DBMS vendor MySQL (recently acquired by Sun Microsystems), its products, and other products in the MySQL ecosystem. Related subjects include:
According to the MySQL Cluster home page, today’s MySQL Cluster release has — give or take terminology details — added transparent sharding (Edit: Actually, please see the first comment below) and a memcached interface. My quick comments on all this to a reporter a couple of days ago were:
- Persistent memcached is a useful thing. Couchbase’s sales illustrate that point: http://www.dbms2.com/2012/02/01/couchbase-update/
- MySQL has always given good performance when used just as a key-value store, e.g. http://www.dbms2.com/2010/08/22/workday-technology-stack/ . So it’s reasonable to hope the memcached interface will have good performance out of the box.
- MySQL’s clustering capabilities have long been weak, providing a window of opportunity for companies and products such as Schooner Information and dbShards. The gold standard for clustering is:
- Efficient transparent sharding: http://www.dbms2.com/2011/02/24/transparent-sharding/
- Synchronous replication at much better than two-phase-commit speeds. http://www.dbms2.com/2011/10/23/schooner-pivots-further/
I don’t really know enough about MySQL Cluster right now to comment in more detail.
Microsoft is launching SQL Server 2012 on March 7. An IM chat with a reporter resulted, and went something like this.
Reporter: [Care to comment]?
CAM: SQL Server is an adequate product if you don’t mind being locked into the Microsoft stack. For example, the ColumnStore feature is very partial, given that it can’t be updated; but Oracle doesn’t have columnar storage at all.
Reporter: Is the lock-in overall worse than IBM DB2, Oracle?
CAM: Microsoft locks you into an operating system, so yes.
Reporter: Is this release something larger Oracle or IBM shops could consider as a lower-cost alternative a co-habitation scenario, in the event they’re mulling whether to buy more Oracle or IBM licenses?
CAM: If they have a strong Microsoft-stack investment already, sure. Otherwise, why?
Reporter: [How about] just cost?
CAM: DB2 works just as well to keep Oracle honest as SQL Server does, and without a major operating system commitment. For analytic databases you want an analytic DBMS or appliance anyway.
Best is to have one major vendor of OTLP/general-purpose DBMS, a web DBMS, a DBMS for disposable projects (that may be the same as one of the first two), plus however many different analytic data stores you need to get the job done.
By “web DBMS” I mean MySQL, NewSQL, or NoSQL. Actually, you might need more than one product in that area.
|Categories: Data warehousing, IBM and DB2, Microsoft and SQL*Server, Mid-range, MySQL, NoSQL, Oracle||9 Comments|
In a comedy of briefing errors, I’m not too clear on the details of my client salesforce.com’s new PostgreSQL-as-a-service offering, nor exactly on what my clients at VMware are bringing to the PostgreSQL virtualization/cloud party. That said:
- PostgreSQL is good technology.
- MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL in some ways. (Database extensibility if nothing else.)
- PostgreSQL has a lot of users. (Many of them in academia and/or Russia.)
- Neither EnterpriseDB (which now calls itself “The enterprise PostgreSQL company”) nor the PostgreSQL community leadership have covered themselves with stewardship glory.
- A significant number of interesting DBMS products can be regarded as PostgreSQL forks (e.g. Greenplum, Aster Data nCluster, Netezza if you squint, and Vertica if you stand on your head*).
- PostgreSQL advancement is not dead. For example, Hadapt beta users are running actual PostgreSQL on many nodes each.
- There’s no assurance that Oracle will be a benevolent MySQL steward forever. (Specifically, Oracle’s “Play nicely with others” antitrust commitments expire in 2014.)
So I think it would be cool if one or the other big company put significant wood behind the PostgreSQL arrow.
*While Vertica was originally released using little or no PostgreSQL code — reports varied — it featured high degrees of PostgreSQL compatibility.
|Categories: Aster Data, EnterpriseDB and Postgres Plus, Greenplum, MySQL, Netezza, Open source, salesforce.com, Vertica Systems||8 Comments|
There’s a perception that, if you want (relatively) worry-free database scale-out, you need a non-relational/NoSQL strategy. That perception is false. In the analytic case it’s completely ridiculous, as has been demonstrated by Teradata, Vertica, Netezza, and various other MPP (Massively Parallel Processing) analytic DBMS vendors. And now it’s false for short-request/OLTP (OnLine Transaction Processing) use cases as well.
My favorite relational OLTP scale-out choice these days is the SchoonerSQL/dbShards partnership. Schooner Information Technology (SchoonerSQL) and Code Futures (dbShards) are young, small companies, but I’m not too concerned about that, because the APIs they want you to write to are just MySQL’s. The main scenarios in which I can see them failing are ones in which they are competitively leapfrogged, either by other small competitors – e.g. ScaleBase, Akiban, TokuDB, or ScaleDB — or by Oracle/MySQL itself. While that could suck for my clients Schooner and Code Futures, it would still provide users relying on MySQL scale-out with one or more good product alternatives.
Relying on non-MySQL NewSQL startups, by way of contrast, would leave me somewhat more concerned. (However, if their code is open sourced. you have at least some vendor-failure protection.) And big-vendor scale-out offerings, such as Oracle RAC or DB2 pureScale, may be more complex to deploy and administer than the MySQL and NewSQL alternatives.
|Categories: Clustering, dbShards and CodeFutures, IBM and DB2, MySQL, NewSQL, NoSQL, OLTP, Open source, Oracle, Parallelization, Schooner Information Technology, Transparent sharding||2 Comments|
Schooner Information Technology started out as a complete-system MySQL appliance vendor. Then Schooner went software-only, but continued to brag about great performance in configurations with solid-state drives. Now Schooner has pivoted further, and is emphasizing high availability, clustered performance, and other hardware-agnostic OLTP (OnLine Transaction Processing) features. Fortunately, Schooner has some interesting stuff in those areas to talk about.
The short form of the SchoonerSQL (as Schooner’s product is now called) story goes roughly like this:
- SchoonerSQL replicates data — synchronously if the replication target is local, asynchronously if it is remote.
- Local synchronous replication provides high availability; remote asynchronous replication provides disaster recovery.
- SchoonerSQL’s local synchronous replication also provides read scale-out.
- Schooner has a partnership with Code Futures/dbShards to provide write scale-out via transparent sharding.
- SchoonerSQL has some secret sauce in replication performance. This has the effect of significantly increasing write performance (assuming you were going to replicate anyway), because otherwise you might have to slow down the master server’s write performance so that the slaves can keep up with it.
- Schooner believes it still has some single-server performance advantages as well.
|Categories: Clustering, dbShards and CodeFutures, MySQL, OLTP, Oracle, Parallelization, Schooner Information Technology||3 Comments|
A reporter tweeted: “Is there a simple plain English definition for NoSQL?” After reminding him of my cynical yet accurate Third Law of Commercial Semantics, I gave it a serious try, and came up with the following. More precisely, I tweeted the bolded parts of what’s below; the rest is commentary added for this post.
NoSQL is most easily defined by what it excludes: SQL, joins, strong analytic alternatives to those, and some forms of database integrity. If you leave all four out, and you have a strong scale-out story, you’re in the NoSQL mainstream. Read more
|Categories: Cassandra, dbShards and CodeFutures, MarkLogic, MySQL, Object, Open source, Petabyte-scale data management, Schooner Information Technology||7 Comments|
Alex Williams noticed that there will be a NoSQL session at Oracle OpenWorld next week, and is wondering whether this will be a big deal. I think it won’t be.
There really are three major points to NoSQL.
- Dynamic schemas. This is the only one of the three that truly depends on NoSQL.
- Scale-out short-request processing. If you want to scale out efficiently at high request volumes, you’re best off not using all the flexibility SQL/relational DBMS offer. (In particular, you don’t want to do cross-node joins). Not coincidentally, a number of the best scale-out offerings were built to be NoSQL.
- Open source. Doing a relational DBMS is a big project. It seems easier to build NoSQL ones.
Oracle can address the latter two points as aggressively as it wishes via MySQL. It so happens I would generally recommend MySQL enhanced by dbShards, Schooner, and/or dbShards/Schooner, rather than Oracle-only MySQL … but that’s a detail. In some form or other, Oracle’s MySQL is a huge player in the scale-out, open source, short-request database management market.
So that leaves us with dynamic schemas. Oracle has at least four different sets of technology in that area:
- As Workday noticed years ago, MySQL can be used as a functional, basic key-value store.
- Oracle also has XML-based Berkeley DB/SleepyCat kicking around.*
- The XML extensions to Oracle’s core DBMS could be alleged to have a dynamic schema/NoSQL flavor. (Blech.)
- A dynamic schema argument could also be made for object-oriented DBMS technology. While Oracle doesn’t to my knowledge exactly sell that, it does have the Tangosol Coherence line of technology, with a potentially similar programming model.
If Oracle is now refreshing and rebranding one or more of these as “NoSQL”, there’s no reason to view that as a big deal at all.
*That’s Mike Olson’s former company, if you’re keeping score at home.
|Categories: MySQL, NoSQL, Object, OLTP, Open source, Oracle, Parallelization, Schooner Information Technology, Structured documents||13 Comments|
It turns out that Oracle’s new small appliance isn’t really an Exadata Mini-Me. Rather, the Oracle Database Appliance is — well, it seems to be a box with an Oracle DBMS in it. (Plus Oracle RAC and so on.) The whole thing is priced for and targeted at the SMB (Small & Medium Business) market, whatever that means to Oracle.
I’m not hugely optimistic about the Oracle Database Appliance. Rather, my thoughts — lightly edited from a chat with a reporter — include:
- This doesn’t solve Oracle’s SMB problems, which include:
- Oracle software is too difficult and costly to administer. The appliance will make a modest dent in that one, but it’s not any kind of game-changer, because the issues relate to the antique design of the Oracle DBMS. (I.e., I think ongoing database administration is a bigger deal than, say, one-time system set-up.)
- SMBs use third-party applications whenever they can, with an increasing preference for SaaS. Application and SaaS vendors prefer non-Oracle alternatives when they are feasible.
- Thus, Oracle is not well positioned to thrive in the SMB market … except maybe through its MySQL subsidiary, but that has a long way to go too.
- Clayton Christensen’s The Innovator’s Solution teaches us that Oracle should focus on selling a thick stack of technology to its highest-end customers, and that’s exactly what Oracle does focus on.
Once again, I’m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:
- 100s of gigabytes of data at first, growing to >1 terabyte over time.
- High peak loads.
- Public cloud portability (but they have private data centers they can use today).
- Simple database design — not a lot of tables, not a lot of columns, not a lot of joins, and everything can be distributed on the same customer_ID key.
- Stream the data to a data warehouse, that will grow to a few terabytes. (Keeping only one year of OLTP data online actually makes sense in this application, but of course everything should go into the DW.)
So I’m leaning to saying: Read more
|Categories: Analytic technologies, Cloud computing, Clustering, Data warehousing, dbShards and CodeFutures, Facebook, Infobright, MySQL, OLTP, Open source, Parallelization, Software as a Service (SaaS), Solid-state memory||13 Comments|
My Couchbase business update with Bob Wiederhold was very interesting, but it didn’t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven’t had their designs locked down yet anyway. But here’s at least a partial explanation of what’s up.
memcached is a way to cache data in RAM across a cluster of servers and have it all look logically like a single memory pool, extremely popular among large internet companies. The Membase product — which is what Couchbase has been selling this year — adds persistence to memcached, an obvious improvement on requiring application developers to write both to memcached and to non-transparently-sharded MySQL. The main technical points in adding persistence seem to have been:
- A persistent backing store (duh), namely SQLite.
- A change to the hashing algorithm, to avoid losing data when the cluster configuration is changed.
Couchbase is essentially Membase improved by integrating CouchDB into it, with the main changes being:
- Changing the backing store to CouchDB (duh). This will be in the first Couchbase release.
- Adding cross data center replication on CouchDB’s consistency model. This will not, I believe, be in the first Couchbase release.
- Offering CouchDB’s programming and query interfaces as an option. So far as I can tell, this will be implemented straightforwardly in the first Couchbase release, with elegance planned for later down the road.
Let’s drill down a bit into Membase/Couchbase clustering and consistency. Read more
|Categories: Cache, Clustering, Couchbase, memcached, Memory-centric data management, MySQL, Parallelization, Solid-state memory||7 Comments|