Analysis of open source database management system PostgreSQL and other products in the PostgreSQL ecosystem. Related subjects include:
Despite a thoughtful heads-up from Daniel Abadi at the time of his original posting about HadoopDB, I’m just getting around to writing about it now. HadoopDB is a research project carried out by a couple of Abadi’s students. Further research is definitely planned. But it seems too early to say that HadoopDB will ever get past the “research and oh by the way the code is open sourced” stage and become a real code line — whether commercialized, open source, or both.
The basic idea of HadoopDB is to put copies of a DBMS at different nodes of a grid, and use Hadoop to parcel work among them. Major benefits when compared with massively parallel DBMS are said to be:
- Query fault-tolerance
- The related concept of tolerating node degradation that isn’t an outright node failure.
HadoopDB has actually been built with PostgreSQL. That version achieved performance well below that of a commercial DBMS “DBX”, where X=2. Column-store guru Abadi has repeatedly signaled his intention to try out HadoopDB with VectorWise at the nodes instead. (Recall that VectorWise is shared-everything.) It will be interesting to see how that configuration performs.
The real opportunity for HadoopDB, however, in my opinion may lie elsewhere. Read more
When the Oracle/MySQL deal was first announced, I wrote:
I can probably come up with business practices that could make things very hard on Oracle/MySQL competitors … but I haven’t found a compelling antitrust trigger on my first pass over the subject.
Now that the European Commission is delaying the Oracle/Sun deal, explicitly because of Oracle/MySQL antitrust fears. That is, the European Commission wants to be reassured that an Oracle takeover of MySQL won’t unduly impinge upon the future availability of open source/low cost DBMS alternatives. This raises that natural question:
What could Oracle do to assure concerned parties that its ownership of MySQL won’t unduly hamper open-source-based DBMS competition?
I think that’s indeed the crucial question. The Oracle/Sun deal has enough momentum at this point that it both should and will be allowed to happen — perhaps with safeguards — rather than banned outright. If you have concerns about Oracle’s pending acquisition of MySQL, you should speak up and outline what kinds of regulatory safeguards would alleviate the problems you foresee.
More or less obvious possibilities include:
- Divest MySQL. This is obviously an extreme measure, but it surely would work.
- Provide some money and trademark rights to MySQL forkers. If MariaDB and Drizzle were put into strong competitive positions with MySQL today, it’s hard to argue how regulators could object to any future Oracle maneuverings Oracle might envision with the GPLed side of MySQL.
- Offer a standard, attractive, long-term deal to MySQL bundlers. The commercial/non-GPL version of MySQL is a requirement for appliance vendors (surely), OEM vendors (probably), and storage engine vendors (maybe — I disagree, but I’m evidently in the minority).
- Strengthen PostgreSQL. Realistically, that’s not going to be part of any Oracle/MySQL resolution, so I’ll leave it as a subject for another time.
Robert Hodges, CTO of my client Continuent, put up a blog post laying out his and Continuent’s views on database clustering. Continuent offers Tungsten, its third try at database clustering technology, targeted at MySQL, PostgreSQL, and perhaps Oracle. Unlike Continuent’s more ambitious. second-generation product, Tungsten offers single-master replication, which in Robert’s view allows for great ease of deployment and administration (he likes the phrase “bone-simple”).
The downside to Continuent Tungsten ‘s stripped down architecture is that it doesn’t solve the most extreme performance scale-out problems. Instead, Continuent focuses on the other big benefits of keeping your data in more than one place, namely high availability and data loss prevention (i.e., backup).
Continuent has been around for a number of years, starting out in Finland but now being based in Silicon Valley. For most purposes, however, it’s reasonable to think of Continuent and Tungsten as start-up efforts.
As you might guess from the references to Finland and MySQL, Continuent’s products are open source, or at least have open source versions. I’m still a little fuzzy as to which features are open sourced and which are not. For that matter, I’m still unclear as to Tungsten’s feature list overall …
March, 2011 edit: In its quaintness, this post is a reminder of just how fast Short Request Processing DBMS technology has been moving ahead. If I had to do it all over again, I’d suggest they use one of the high-performance MySQL options like dbShards, Schooner, or both together. I actually don’t know what they finally decided on in that area. (I do know that for analytic DBMS they chose Vertica.)
I have a client who wants to build a new application with peak update volume of several million transactions per hour. (Their base business is data mart outsourcing, but now they’re building update-heavy technology as well. ) They have a small budget. They’ve been a MySQL shop in the past, but would prefer to contract (not eliminate) their use of MySQL rather than expand it.
My client actually signed a deal for EnterpriseDB’s Postgres Plus Advanced Server and GridSQL, but unwound the transaction quickly. (They say EnterpriseDB was very gracious about the reversal.) There seem to have been two main reasons for the flip-flop. First, it seems that EnterpriseDB’s version of Postgres isn’t up to PostgreSQL’s 8.4 feature set yet, although EnterpriseDB’s timetable for catching up might have tolerable. But GridSQL apparently is further behind yet, with no timetable for up-to-date PostgreSQL compatibility. That was the dealbreaker.
The current base-case plan is to use generic open source PostgreSQL, with scale-out achieved via hand sharding, Hibernate, or … ??? Experience and thoughts along those lines would be much appreciated.
Another option for OLTP performance and scale-out is of course memory-centric options such as VoltDB or the Groovy SQL Switch. But this client’s database is terabyte-scale, so hardware costs could be an issue, as of course could be product maturity.
By the way, a large fraction of these updates will be actual changes, as opposed to new records, in case that matters. I expect that the schema being updated will be very simple — i.e., clearly simpler than in a classic order entry scenario.
|Categories: Cache, Clustering, Data mart outsourcing, EnterpriseDB and Postgres Plus, In-memory DBMS, Memory-centric data management, MySQL, OLTP, Open source, Parallelization, PostgreSQL, Software as a Service (SaaS), Vertica Systems||30 Comments|
I visited Greenplum in early April, and talked with them again last night. As I noted in a separate post, there are a couple of subjects I won’t write about today. But that still leaves me free to cover a number of other points about Greenplum, including: Read more
|Categories: Data warehousing, Database compression, EAI, EII, ETL, ELT, ETLT, Greenplum, MapReduce, Market share and customer counts, Parallelization, PostgreSQL, Pricing||11 Comments|
The issue of MySQL forks and their possible effect on closed-source storage engine vendors continues to get attention. The underlying question is:
Suppose Oracle wants to make life difficult for third-party storage engine vendors via its incipient control of MySQL? Can the storage engine vendors insulate themselves from this risk by working with a MySQL fork?
- And during the week of the MySQL conference, too.
- In the must-read slide presentation, Oracle’s says all the right things about being committed to all product lines and technologies. On the whole, this is believable.
- Oracle says it’s focusing Sun hardware sales on existing Oracle/Sun customers. Makes sense.
- Oracle mentions OpenStorage prominently. Makes sense. Integrating DBMS with storage is Oracle’s high-end DBMS future. (E.g., Exadata.)
- HP can’t be happy.
- MySQL and InnoDB are reunited.
- MySQL is apt to get decent, much as it would have under IBM.
- Even so, if you really believe in open source’s freedom, it’s time to look at PostgreSQL …
- … or EnterpriseDB’s Postgres Plus, although my recent dealings with EnterpriseDB underscore the importance of being VERY careful about counting your fingers after you shake hands with that company.
- And I wouldn’t be surprised if another shoe dropped soon on the EnterpriseDB front. (Please excuse the mixed metaphor.)
- I used to laugh at how many different app servers Sun had acquired. Oracle acquired a number too. Together it’s quite a pile of them.
- Oracle says acquiring Java is a great big deal. I’m not sure I see why that would really be true.
More later. I have a radio interview in a few minutes on a very different subject.
|Categories: EnterpriseDB and Postgres Plus, HP and Neoview, MySQL, Open source, Oracle, PostgreSQL||20 Comments|
I talked with Ingres today. Much of the call was fluff — open-source rah-rah, plus some numbers showing purported success, but so finely parsed as to be pretty meaningless. (To Ingres’ credit, they did offer to let me talk w/ their CFO, even if they offered no promises as to whether he’d offer any more substantive information.) Highlights included: Read more
|Categories: Actian and Ingres, Data warehousing, EnterpriseDB and Postgres Plus, MySQL, Open source, Oracle, PostgreSQL, Sybase||6 Comments|
Reported or rumored merger discussions between IBM and Sun are generating huge amounts of discussion today (some links below). Here are some quick thoughts around the subject of how the IBM/Sun deal — if it happens — might affect the database management system industry. Read more
|Categories: Actian and Ingres, Data warehousing, EnterpriseDB and Postgres Plus, Greenplum, IBM and DB2, Infobright, Kickfire, Kognitio, Microsoft and SQL*Server, Mid-range, MySQL, Open source, ParAccel, PostgreSQL, solidDB||10 Comments|
I often find it hard to write about ParAccel’s technology, for a variety of reasons:
- With occasional exceptions, ParAccel is reluctant to share detailed information.
- With occasional exceptions, ParAccel is reluctant to say anything for attribution.
- In ParAccel’s version of an “agile” development approach, product details keep changing, as do plans and schedules. (The gibe that ParAccel’s product plans are whatever their current sales prospect wants them to be — while of course highly exaggerated — isn’t wholly unfounded.)
- ParAccel has sold very few copies of its products, so it’s hard to get information from third parties.
ParAccel is quick, however, to send email if I post anything about them they think is incorrect.
All that said, I did get careless when I neglected to doublecheck something I already knew. Read more