Analysis of open source database management system PostgreSQL and other products in the PostgreSQL ecosystem. Related subjects include:
I haven’t done a pure notes/links/comments post for a while. Let’s fix that now. (A bunch of saved-up links, however, did find their way into my recent privacy threats overview.)
First and foremost, the fourth annual New England Database Summit (nee “Day”) is next week, specifically Friday, January 28. As per my posts in previous years, I think well of the event, which has a friendly, gathering-of-the-clan flavor. Registration is free, but the organizers would prefer that you register online by the end of this week, if you would be so kind.
The two things potentially wrong with the New England Database Summit are parking and the rush hour drive home afterwards. I would listen with interest to any suggestions about dinner plans.
One thing I hope to figure out at the Summit or before is what the hell is going on on Vertica’s blog or, for that matter, at Vertica. The recent Mike Stonebraker post that spawned a lot of discussion and commentary has disappeared. Meanwhile, Vertica has had three consecutive heads of marketing leave the company since June, and I don’t know who to talk to there any more. Read more
|Categories: About this blog, Analytic technologies, Data warehousing, GIS and geospatial, Investment research and trading, MongoDB and 10gen, OLTP, Open source, PostgreSQL, Vertica Systems||4 Comments|
I talked yesterday w/ Cory Isaacson, who runs CodeFutures, makers of dbShards. dbShards is a software layer that turns an ordinary DBMS (currently MySQL or PostgreSQL) into an MPP shared-nothing ACID-compliant OLTP DBMS. Technical highlights included: Read more
There’s a point I keep making in speeches, and used to keep making in white papers, yet have almost never spelled out in this blog. Let me now (somewhat) correct the oversight.
Analytic technology isn’t only for you. It’s also for your customers, citizens, and other stakeholders.
I am not referring here to what is well understood to be an important, fast-growing activity — providing data and its analysis to customers as your primary or only business — nor to the related business of taking people’s data, crunching it for them, and giving them results. That combined sector — which I am pretty alone in aggregating into one and calling data mart outsourcing — is one of the top several vertical markets for a lot of the analytic DBMS vendors I write about. Rather, I’m talking about enterprises that gather data for some primary purpose, and have discovered that a good secondary use of the data is to reflect it back to stakeholders, often the same ones who provided or created it in the first place.
For now I’ll call this category stakeholder-facing analytics, as the shorter phrase “stakeholder analytics” would be ambiguous.* I first picked up the idea early this decade from Information Builders, for whom it had become something of a specialty. I’ve been asking analytics vendors for examples of stakeholder-facing analytics ever since, and a number have been able to comply. But the whole thing is in its early days even so; almost any sufficiently large enterprise should be more active in stakeholder-facing analytics than it currently is.
|Categories: Analytic technologies, Business intelligence, Data mart outsourcing, Fox and MySpace, PostgreSQL||4 Comments|
The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part). OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache’, solidDB’s exit, etc.) generally accruing to products that originated in the 20th Century.
Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place. These include:
- Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.
- By number, most of an enterprise’s OLTP/general-purpose databases are low-volume and low-value.
- Most interesting new OLTP/general-purpose data management products are either MySQL-based or NoSQL.
- It’s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.
- The era of silicon-centric relational DBMS is coming.
- The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.
- Users’ instance on “free” could be a major problem for OLTP DBMS innovation.
I shall explain. Read more
Greenplum is announcing today that you can run Greenplum software on a single 8-core commodity server, free. First and foremost, that’s a strong statement that Greenplum wants enterprises to pay it for Greenplum’s parallelization/”private cloud” capabilities. Second, it may be an attractive gift to a variety of folks who want to extract insight from terabyte-scale databases of various kinds.
Greenplum Single-Node Edition:
- Is free of charge, although you can buy support.
- Has no restrictions on use, production or otherwise.
- Has no restrictions on database size.
- Is closed-source.
For those who want free, terabyte-scale data warehousing software, Greenplum Single-Node Edition may be quite appealing, considering that the main available alternatives are:
- General-purpose open-source DBMS, such as PostgreSQL and MySQL (lacking analytic DBMS performance and features)
- Infobright Community Edition (the other best choice – Infobright’s commercial sales success indicates the solidity of Infobright’s technology)
- Rough research-project code and other other questionable open source offerings
- Crippleware from other commercial analytic DBMS vendors (e.g., Teradata)
For example, comparing PostgreSQL-based Greenplum with PostgreSQL itself, Greenplum offers:
- The ability to scale out queries across all cores in your box (and no, pgpool is not a serious alternative)
- Storage alternatives such as columnar (I am told that EnterpriseDB recently stopped funding a project for a PostgreSQL columnar option)
|Categories: Analytic technologies, Data warehousing, EnterpriseDB and Postgres Plus, Greenplum, Infobright, Open source, PostgreSQL, Pricing, Scientific research||12 Comments|
Despite a thoughtful heads-up from Daniel Abadi at the time of his original posting about HadoopDB, I’m just getting around to writing about it now. HadoopDB is a research project carried out by a couple of Abadi’s students. Further research is definitely planned. But it seems too early to say that HadoopDB will ever get past the “research and oh by the way the code is open sourced” stage and become a real code line — whether commercialized, open source, or both.
The basic idea of HadoopDB is to put copies of a DBMS at different nodes of a grid, and use Hadoop to parcel work among them. Major benefits when compared with massively parallel DBMS are said to be:
- Query fault-tolerance
- The related concept of tolerating node degradation that isn’t an outright node failure.
HadoopDB has actually been built with PostgreSQL. That version achieved performance well below that of a commercial DBMS “DBX”, where X=2. Column-store guru Abadi has repeatedly signaled his intention to try out HadoopDB with VectorWise at the nodes instead. (Recall that VectorWise is shared-everything.) It will be interesting to see how that configuration performs.
The real opportunity for HadoopDB, however, in my opinion may lie elsewhere. Read more
When the Oracle/MySQL deal was first announced, I wrote:
I can probably come up with business practices that could make things very hard on Oracle/MySQL competitors … but I haven’t found a compelling antitrust trigger on my first pass over the subject.
Now that the European Commission is delaying the Oracle/Sun deal, explicitly because of Oracle/MySQL antitrust fears. That is, the European Commission wants to be reassured that an Oracle takeover of MySQL won’t unduly impinge upon the future availability of open source/low cost DBMS alternatives. This raises that natural question:
What could Oracle do to assure concerned parties that its ownership of MySQL won’t unduly hamper open-source-based DBMS competition?
I think that’s indeed the crucial question. The Oracle/Sun deal has enough momentum at this point that it both should and will be allowed to happen — perhaps with safeguards — rather than banned outright. If you have concerns about Oracle’s pending acquisition of MySQL, you should speak up and outline what kinds of regulatory safeguards would alleviate the problems you foresee.
More or less obvious possibilities include:
- Divest MySQL. This is obviously an extreme measure, but it surely would work.
- Provide some money and trademark rights to MySQL forkers. If MariaDB and Drizzle were put into strong competitive positions with MySQL today, it’s hard to argue how regulators could object to any future Oracle maneuverings Oracle might envision with the GPLed side of MySQL.
- Offer a standard, attractive, long-term deal to MySQL bundlers. The commercial/non-GPL version of MySQL is a requirement for appliance vendors (surely), OEM vendors (probably), and storage engine vendors (maybe — I disagree, but I’m evidently in the minority).
- Strengthen PostgreSQL. Realistically, that’s not going to be part of any Oracle/MySQL resolution, so I’ll leave it as a subject for another time.
Robert Hodges, CTO of my client Continuent, put up a blog post laying out his and Continuent’s views on database clustering. Continuent offers Tungsten, its third try at database clustering technology, targeted at MySQL, PostgreSQL, and perhaps Oracle. Unlike Continuent’s more ambitious. second-generation product, Tungsten offers single-master replication, which in Robert’s view allows for great ease of deployment and administration (he likes the phrase “bone-simple”).
The downside to Continuent Tungsten ‘s stripped down architecture is that it doesn’t solve the most extreme performance scale-out problems. Instead, Continuent focuses on the other big benefits of keeping your data in more than one place, namely high availability and data loss prevention (i.e., backup).
Continuent has been around for a number of years, starting out in Finland but now being based in Silicon Valley. For most purposes, however, it’s reasonable to think of Continuent and Tungsten as start-up efforts.
As you might guess from the references to Finland and MySQL, Continuent’s products are open source, or at least have open source versions. I’m still a little fuzzy as to which features are open sourced and which are not. For that matter, I’m still unclear as to Tungsten’s feature list overall …
March, 2011 edit: In its quaintness, this post is a reminder of just how fast Short Request Processing DBMS technology has been moving ahead. If I had to do it all over again, I’d suggest they use one of the high-performance MySQL options like dbShards, Schooner, or both together. I actually don’t know what they finally decided on in that area. (I do know that for analytic DBMS they chose Vertica.)
I have a client who wants to build a new application with peak update volume of several million transactions per hour. (Their base business is data mart outsourcing, but now they’re building update-heavy technology as well. ) They have a small budget. They’ve been a MySQL shop in the past, but would prefer to contract (not eliminate) their use of MySQL rather than expand it.
My client actually signed a deal for EnterpriseDB’s Postgres Plus Advanced Server and GridSQL, but unwound the transaction quickly. (They say EnterpriseDB was very gracious about the reversal.) There seem to have been two main reasons for the flip-flop. First, it seems that EnterpriseDB’s version of Postgres isn’t up to PostgreSQL’s 8.4 feature set yet, although EnterpriseDB’s timetable for catching up might have tolerable. But GridSQL apparently is further behind yet, with no timetable for up-to-date PostgreSQL compatibility. That was the dealbreaker.
The current base-case plan is to use generic open source PostgreSQL, with scale-out achieved via hand sharding, Hibernate, or … ??? Experience and thoughts along those lines would be much appreciated.
Another option for OLTP performance and scale-out is of course memory-centric options such as VoltDB or the Groovy SQL Switch. But this client’s database is terabyte-scale, so hardware costs could be an issue, as of course could be product maturity.
By the way, a large fraction of these updates will be actual changes, as opposed to new records, in case that matters. I expect that the schema being updated will be very simple — i.e., clearly simpler than in a classic order entry scenario.
|Categories: Cache, Clustering, Data mart outsourcing, EnterpriseDB and Postgres Plus, In-memory DBMS, Memory-centric data management, MySQL, OLTP, Open source, Parallelization, PostgreSQL, Software as a Service (SaaS), Vertica Systems||30 Comments|
I visited Greenplum in early April, and talked with them again last night. As I noted in a separate post, there are a couple of subjects I won’t write about today. But that still leaves me free to cover a number of other points about Greenplum, including: Read more
|Categories: Data warehousing, Database compression, EAI, EII, ETL, ELT, ETLT, Greenplum, MapReduce, Market share and customer counts, Parallelization, PostgreSQL, Pricing||11 Comments|