Analysis of open source database management system PostgreSQL and other products in the PostgreSQL ecosystem. Related subjects include:
When the Oracle/MySQL deal was first announced, I wrote:
I can probably come up with business practices that could make things very hard on Oracle/MySQL competitors … but I haven’t found a compelling antitrust trigger on my first pass over the subject.
Now that the European Commission is delaying the Oracle/Sun deal, explicitly because of Oracle/MySQL antitrust fears. That is, the European Commission wants to be reassured that an Oracle takeover of MySQL won’t unduly impinge upon the future availability of open source/low cost DBMS alternatives. This raises that natural question:
What could Oracle do to assure concerned parties that its ownership of MySQL won’t unduly hamper open-source-based DBMS competition?
I think that’s indeed the crucial question. The Oracle/Sun deal has enough momentum at this point that it both should and will be allowed to happen — perhaps with safeguards — rather than banned outright. If you have concerns about Oracle’s pending acquisition of MySQL, you should speak up and outline what kinds of regulatory safeguards would alleviate the problems you foresee.
More or less obvious possibilities include:
- Divest MySQL. This is obviously an extreme measure, but it surely would work.
- Provide some money and trademark rights to MySQL forkers. If MariaDB and Drizzle were put into strong competitive positions with MySQL today, it’s hard to argue how regulators could object to any future Oracle maneuverings Oracle might envision with the GPLed side of MySQL.
- Offer a standard, attractive, long-term deal to MySQL bundlers. The commercial/non-GPL version of MySQL is a requirement for appliance vendors (surely), OEM vendors (probably), and storage engine vendors (maybe — I disagree, but I’m evidently in the minority).
- Strengthen PostgreSQL. Realistically, that’s not going to be part of any Oracle/MySQL resolution, so I’ll leave it as a subject for another time.
Robert Hodges, CTO of my client Continuent, put up a blog post laying out his and Continuent’s views on database clustering. Continuent offers Tungsten, its third try at database clustering technology, targeted at MySQL, PostgreSQL, and perhaps Oracle. Unlike Continuent’s more ambitious. second-generation product, Tungsten offers single-master replication, which in Robert’s view allows for great ease of deployment and administration (he likes the phrase “bone-simple”).
The downside to Continuent Tungsten ‘s stripped down architecture is that it doesn’t solve the most extreme performance scale-out problems. Instead, Continuent focuses on the other big benefits of keeping your data in more than one place, namely high availability and data loss prevention (i.e., backup).
Continuent has been around for a number of years, starting out in Finland but now being based in Silicon Valley. For most purposes, however, it’s reasonable to think of Continuent and Tungsten as start-up efforts.
As you might guess from the references to Finland and MySQL, Continuent’s products are open source, or at least have open source versions. I’m still a little fuzzy as to which features are open sourced and which are not. For that matter, I’m still unclear as to Tungsten’s feature list overall …
March, 2011 edit: In its quaintness, this post is a reminder of just how fast Short Request Processing DBMS technology has been moving ahead. If I had to do it all over again, I’d suggest they use one of the high-performance MySQL options like dbShards, Schooner, or both together. I actually don’t know what they finally decided on in that area. (I do know that for analytic DBMS they chose Vertica.)
I have a client who wants to build a new application with peak update volume of several million transactions per hour. (Their base business is data mart outsourcing, but now they’re building update-heavy technology as well. ) They have a small budget. They’ve been a MySQL shop in the past, but would prefer to contract (not eliminate) their use of MySQL rather than expand it.
My client actually signed a deal for EnterpriseDB’s Postgres Plus Advanced Server and GridSQL, but unwound the transaction quickly. (They say EnterpriseDB was very gracious about the reversal.) There seem to have been two main reasons for the flip-flop. First, it seems that EnterpriseDB’s version of Postgres isn’t up to PostgreSQL’s 8.4 feature set yet, although EnterpriseDB’s timetable for catching up might have tolerable. But GridSQL apparently is further behind yet, with no timetable for up-to-date PostgreSQL compatibility. That was the dealbreaker.
The current base-case plan is to use generic open source PostgreSQL, with scale-out achieved via hand sharding, Hibernate, or … ??? Experience and thoughts along those lines would be much appreciated.
Another option for OLTP performance and scale-out is of course memory-centric options such as VoltDB or the Groovy SQL Switch. But this client’s database is terabyte-scale, so hardware costs could be an issue, as of course could be product maturity.
By the way, a large fraction of these updates will be actual changes, as opposed to new records, in case that matters. I expect that the schema being updated will be very simple — i.e., clearly simpler than in a classic order entry scenario.
|Categories: Cache, Clustering, Data mart outsourcing, EnterpriseDB and Postgres Plus, In-memory DBMS, Memory-centric data management, MySQL, OLTP, Open source, Parallelization, PostgreSQL, Software as a Service (SaaS), Vertica Systems||30 Comments|
I visited Greenplum in early April, and talked with them again last night. As I noted in a separate post, there are a couple of subjects I won’t write about today. But that still leaves me free to cover a number of other points about Greenplum, including: Read more
|Categories: Data warehousing, Database compression, EAI, EII, ETL, ELT, ETLT, Greenplum, MapReduce, Market share and customer counts, Parallelization, PostgreSQL, Pricing||11 Comments|
The issue of MySQL forks and their possible effect on closed-source storage engine vendors continues to get attention. The underlying question is:
Suppose Oracle wants to make life difficult for third-party storage engine vendors via its incipient control of MySQL? Can the storage engine vendors insulate themselves from this risk by working with a MySQL fork?
- And during the week of the MySQL conference, too.
- In the must-read slide presentation, Oracle’s says all the right things about being committed to all product lines and technologies. On the whole, this is believable.
- Oracle says it’s focusing Sun hardware sales on existing Oracle/Sun customers. Makes sense.
- Oracle mentions OpenStorage prominently. Makes sense. Integrating DBMS with storage is Oracle’s high-end DBMS future. (E.g., Exadata.)
- HP can’t be happy.
- MySQL and InnoDB are reunited.
- MySQL is apt to get decent, much as it would have under IBM.
- Even so, if you really believe in open source’s freedom, it’s time to look at PostgreSQL …
- … or EnterpriseDB’s Postgres Plus, although my recent dealings with EnterpriseDB underscore the importance of being VERY careful about counting your fingers after you shake hands with that company.
- And I wouldn’t be surprised if another shoe dropped soon on the EnterpriseDB front. (Please excuse the mixed metaphor.)
- I used to laugh at how many different app servers Sun had acquired. Oracle acquired a number too. Together it’s quite a pile of them.
- Oracle says acquiring Java is a great big deal. I’m not sure I see why that would really be true.
More later. I have a radio interview in a few minutes on a very different subject.
|Categories: EnterpriseDB and Postgres Plus, HP and Neoview, MySQL, Open source, Oracle, PostgreSQL||20 Comments|
I talked with Ingres today. Much of the call was fluff — open-source rah-rah, plus some numbers showing purported success, but so finely parsed as to be pretty meaningless. (To Ingres’ credit, they did offer to let me talk w/ their CFO, even if they offered no promises as to whether he’d offer any more substantive information.) Highlights included: Read more
|Categories: Actian and Ingres, Data warehousing, EnterpriseDB and Postgres Plus, MySQL, Open source, Oracle, PostgreSQL, Sybase||6 Comments|
Reported or rumored merger discussions between IBM and Sun are generating huge amounts of discussion today (some links below). Here are some quick thoughts around the subject of how the IBM/Sun deal — if it happens — might affect the database management system industry. Read more
|Categories: Actian and Ingres, Data warehousing, EnterpriseDB and Postgres Plus, Greenplum, IBM and DB2, Infobright, Kickfire, Kognitio, Microsoft and SQL*Server, Mid-range, MySQL, Open source, ParAccel, PostgreSQL, solidDB||10 Comments|
I often find it hard to write about ParAccel’s technology, for a variety of reasons:
- With occasional exceptions, ParAccel is reluctant to share detailed information.
- With occasional exceptions, ParAccel is reluctant to say anything for attribution.
- In ParAccel’s version of an “agile” development approach, product details keep changing, as do plans and schedules. (The gibe that ParAccel’s product plans are whatever their current sales prospect wants them to be — while of course highly exaggerated — isn’t wholly unfounded.)
- ParAccel has sold very few copies of its products, so it’s hard to get information from third parties.
ParAccel is quick, however, to send email if I post anything about them they think is incorrect.
All that said, I did get careless when I neglected to doublecheck something I already knew. Read more
A year ago, Mike Stonebraker observed that conventional DBMS don’t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods. Even so, some of the largest databases around are scientific ones, and they have to be managed somehow. For example:
- Microsoft just put out an overwrought press release. The substance seems to be that Pan-STARRS — a Jim Gray legacy also discussed in an August, 2008 Computerworld article — is adding 1.4 terabytes of image data per night, and one not so new database adds 15 terabytes per year of some kind of computer simulation output used to analyze protein folding. Both run on SQL Server, of course.
- Kognitio has an astronomical database too, at Cambridge University, adding 1/2 a terabyte of data per night.
- Oracle is used for a McGill University proteonomics database called CellMapBase. A figure of 50 terabytes of “mass storage” is included, which doesn’t include tape backup and so on.
- The Large Hadron Collider, once it actually starts functioning, is projected to generate 15 petabytes of data annually, which will be initially stored on tape and then distributed to various computing centers around the world.
- Netezza is proud of its ability to serve images and the like quickly, although off the top of my head I’m not thinking of a major customer it has in that area. (But then, if you just sell software, your academic discount can approach 100%; but if like Netezza you have an actual cost of goods sold, that’s not as appealing an option.)
Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility — e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.
|Categories: Aster Data, Data types, Greenplum, IBM and DB2, Kognitio, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, PostgreSQL, Scientific research||1 Comment|