PostgreSQL
Analysis of open source database management system PostgreSQL and other products in the PostgreSQL ecosystem. Related subjects include:
Big scientific databases need to be stored somehow
A year ago, Mike Stonebraker observed that conventional DBMS don’t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods. Even so, some of the largest databases around are scientific ones, and they have to be managed somehow. For example:
- Microsoft just put out an overwrought press release. The substance seems to be that Pan-STARRS — a Jim Gray legacy also discussed in an August, 2008 Computerworld article — is adding 1.4 terabytes of image data per night, and one not so new database adds 15 terabytes per year of some kind of computer simulation output used to analyze protein folding. Both run on SQL Server, of course.
- Kognitio has an astronomical database too, at Cambridge University, adding 1/2 a terabyte of data per night.
- Oracle is used for a McGill University proteonomics database called CellMapBase. A figure of 50 terabytes of “mass storage” is included, which doesn’t include tape backup and so on.
- The Large Hadron Collider, once it actually starts functioning, is projected to generate 15 petabytes of data annually, which will be initially stored on tape and then distributed to various computing centers around the world.
- Netezza is proud of its ability to serve images and the like quickly, although off the top of my head I’m not thinking of a major customer it has in that area. (But then, if you just sell software, your academic discount can approach 100%; but if like Netezza you have an actual cost of goods sold, that’s not as appealing an option.)
Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility — e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.
| Categories: Aster Data, Data types, Greenplum, IBM and DB2, Kognitio, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, PostgreSQL, Scientific research | 1 Comment |
Has there been any progress on SAP over Postgres?
Peter Eisentraut discouragingly reported in January:
What I hear from my acquaintances at SAP, however, is this:
- SAP doesn’t need fancy database features, since the software doesn’t use them.
- Those who don’t want to buy Oracle can use MaxDB; it’s free.
PostgreSQL doesn’t support in-place upgrades, which makes it unsuitable for multiple terabyte installations typically used by SAP customers.
Has anything changed since then?
And as a trivia challenge, does anybody recognize my science fiction reference in the comment thread there?
Hint: The dialogue referenced did not occur on the planet Arrakis.
| Categories: PostgreSQL | 2 Comments |
Top DBMS on Linux
I was looking up George Crump’s blogs in connection with his recent post on SSDs, and I stumbled upon one that outlines at great length what features Linux backup systems should have. I won’t claim to have read it word for word, but what did catch my eye were a couple of comments on DBMS market share, which boiled down to:
- Oracle
- MySQL
- PostgreSQL
| Categories: IBM and DB2, Market share, MySQL, Oracle, PostgreSQL | Leave a Comment |
Mike Stonebraker’s counterarguments to MapReduce’s popularity
In response to recent posting I’ve done about MapReduce, Mike Stonebraker just got on the phone to give me his views. His core claim, more or less, is that anything you can do in MapReduce you could already do in a parallel database that complies with SQL-92 and/or has PostgreSQL underpinnnings. In particular, Mike says: Read more
| Categories: Data warehousing, MapReduce, Michael Stonebraker, PostgreSQL | 4 Comments |
Greenplum is in the big leagues
After a March, 2007 call, I didn’t talk again with Greenplum until earlier this month. That changed fast. I flew out to see Greenplum last week and spent over a day with president/co-founder Scott Yara, CTO/co-founder Luke Lonergan, marketing VP Paul Salazar, and product management/marketing director Ben Werther. Highlights – besides some really great sushi at Sakae in Burlingame – start with an eye-opening set of customer proof points, such as: Read more
| Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Greenplum, PostgreSQL | 15 Comments |
EnterpriseDB update
I had lunch today with CTO Bob Zurek of EnterpriseDB, who turns out to live in almost the same town I do (they technically separated in 1783, but share a high school today). DBMS-related highlights included:
- EnterpriseDB thinks PostgreSQL training and certification are a big deal for increasing PostgreSQL adoption.
- EnterpriseDB’s business focus right now (at least, one of them) is moving developers from interest to download to deployment and payment — i.e., the standard funnel for open source and open-source-inspired products.
- EnterpriseDB finds it important to be a good PostgreSQL community citizen. This makes a lot of sense, as EnterpriseDB doesn’t control the core PostgreSQL engine, even if it does employ some of the core PostgreSQL developers.
- But “open source” is not the same as “free”.
- I got the impression that the GridSQL technology EnterpriseDB acquired is being used to go after general read-mostly, horizontally-scaling applications (i.e., MySQL’s sweet spot). I did not get the impression, by way of contrast, that EnterpriseDB is out to play catch-up — e.g., with GreenPlum — in MPP data warehousing.
- Bob pointed out that something like “Vacuum” to clean up the database periodically is needed in a MVCC (MultiVersion Concurrency Control) engine. He thinks PostgreSQL’s autovacuum is good but not ideal.
- Bob draws this as yet another two-dimensional positioning graph, but in essence he thinks PostgreSQL and Postgres Plus are well-suited for a large space that’s above MySQL and below Oracle. I don’t think he really contradicted Kee Kwan’s opinion that there are good times to use PostgreSQL and good times to use MySQL.
- I was wrong when I previously said EnterpriseDB now offers MySQL portability. It just offers MySQL migration.
- The Elastra/EnterpriseDB cloud offering isn’t generally available yet.
- Stay tuned for developments in replication/high availability.
| Categories: EnterpriseDB and Postgres Plus, Mid-range, Open source, PostgreSQL | 1 Comment |
Microsoft is buying DATAllegro
I’ve long argued that:
- Oracle and Microsoft are doomed in the data warehouse market unless they acquire MPP/shared-nothing data warehouse DBMS and/or data warehouse appliances.
- DATAllegro is the ideal acquisition for either of them.
Microsoft has now validated my claim by agreeing to buy DATAllegro. As you probably know, we’ve been covering DATAllegro extensively, as per the links listed below.
Basic deal highlights include:
Pushback on the PostgreSQL vs. MySQL comparison
It should come as no surprise that not everybody agrees with EnterpriseDB’s views on the PostgreSQL/MySQL comparison. In particular, the High Availability MySQL blog offers a detailed rebuttal post, with more in the comment thread. According to MySQL fans, EnterpriseDB got its facts wrong on several matters regarding MySQL and InnoDB, especially in the areas of triggers and locking. And of course they disagree with EnterpriseDB’s general conclusion. ![]()
| Categories: MySQL, Open source, PostgreSQL | Leave a Comment |
PostgreSQL vs. MySQL, as per EnterpriseDB
EnterpriseDB put out a white paper arguing for the superiority of PostgreSQL over MySQL, even without EnterpriseDB’s own Postgres Plus extensions. Highlights of EnterpriseDB’s opinion include:
- EnterpriseDB asserts that MyISAM is the only MySQL storage engine with decent performance.
- EnterpriseDB then bashes MyISAM for all sorts of well-deserved reasons, especially ACID-noncompliance.
- EnterpriseDB asserts that row-level triggers, lacking in MySQL but present in PostgreSQL, are the most important kind of trigger.
- EnterpriseDB claims PostgreSQL is superior in procedural language support to MySQL.
- EnterpriseDB claims PostgreSQL is superior in authentication support to MySQL.
| Categories: EnterpriseDB and Postgres Plus, Mid-range, MySQL, Open source, PostgreSQL | 10 Comments |
Yahoo scales its web analytics database to petabyte range
Information Week has an article with details on what sounds like Yahoo’s core web analytics database. Highlights include:
- The Yahoo web analytics database is over 1 petabyte. They claim it will be in the 10s of petabytes by 2009.
- The Yahoo web analytics database is based on PostgreSQL. So much for MySQL fanboys’ claims of Yahoo validation for their beloved toy … uh, let me rephrase that. The highly-regarded MySQL, although doing a great job for some demanding and impressive applications at Yahoo, evidently wasn’t selected for this one in particular. OK. That’s much better now.
- But the Yahoo web analytics database doesn’t actually use PostgreSQL’s storage engine. Rather, Yahoo wrote something custom and columnar.
- Yahoo is processing 24 billion “events” per day. The article doesn’t clarify whether these are sent straight to the analytics store, or whether there’s an intermediate storage engine. Most likely the system fills blocks in RAM and then just appends them to the single persistent store. If commodity boxes occasionally crash and lose a few megs of data — well, in this application, that’s not a big deal at all.
- Yahoo thinks commercial column stores aren’t ready yet for more than 100 terabytes of data.
- Yahoo says it got great performance advantages from a custom system by optimizing for its specific application. I don’t know exactly what that would be, but I do know that database architectures for high-volume web analytics are still in pretty bad shape. In particular, there’s no good way yet to analyze the specific, variable-length paths users take through websites.
| Categories: Analytic technologies, Columnar database management, Data warehousing, MySQL, PostgreSQL, Specific users, Theory and architecture, Yahoo | 9 Comments |
Database blades are not what they used to be
In which we bring you another instantiation of Monash’s First Law of Commercial Semantics: Bad jargon drives out good.
When Enterprise DB announced a partnership with Truviso for a “blade,” I naturally assumed they were using the term in a more-or-less standard way, and hence believed that it was more than a “Barney” press release.* Silly me. Rather than referring to something closely akin to “datablade,” EnterpriseDB’s “blade” program turns out to just to be a catchall set of partnerships.
*A “Barney” announcement is one whose entire content boils down to “I love you; you love me.”
According to EnterpriseDB CTO Bob Zurek, the main features of the “blade” program include:
| Categories: Data types, Emulation, transparency, portability, EnterpriseDB and Postgres Plus, Open source, PostgreSQL | 3 Comments |
Truviso and EnterpriseDB blend event processing with ordinary database management
Truviso and EnterpriseDB announced today that there’s a Truviso “blade” for Postgres Plus. By email, EnterpriseDB Bob Zurek endorsed my tentative summary of what this means technically, namely:
There’s data being managed transactionally by EnterpriseDB.
Truviso’s DML has all along included ways to talk to a persistent Postgres data store.
If, in addition, one wants to do stream processing things on the same data, that’s now possible, using Truviso’s usual DML.
Supporting evidence for the DBMS disruption story
As previously announced, I did a webcast this afternoon, discussing database diversity. The title of the talk was taken directly from a post – What leading DBMS vendors don’t want you to realize — that argued mid-range DBMS are suitable for a broad variety of tasks. The overriding theme was a Clayton Christensen-style “disruption” narrative.
The sponsor was EnterpriseDB, which is fitting. While not the biggest DBMS industry disrupter in terms of revenue or visible impact (MySQL and Netezza say “Hi”), the Postgres family in general and EnterpriseDB in particular epitomize the disruption threat like nobody else, because of how broadly they substitute for market-leading database managers.
As I promised on the call, below is a post with links to further research backing up the points made. They’re numbered to match some of the presentation slides, which you can find at this link.
3. Much of the discussion of database diversity comes from a series of posts I coordinated with Mike Stonebraker.
4. At various times, starting on Slide 4, I made reference to datatype extensibility, a key feature of Oracle and DB2 – and a key advantage of Postgres over MySQL.
10. Capping off the database diversity discussion, Slide 10 mirrors this 11-point version of a data management software taxonomy.
13-14. I’ve posted many times about data warehousing DBMS and related technologies, including this overview of major analytic DBMS products, another recent overview of data warehouse specialty technologies, and an attempt to distinguish between data warehouse appliance myths and realities. Of particular interest for further research may be our sections on data warehouse appliances and columnar DBMS.
15. I do most of my posting about text search over on Text Technologies, specifically in the search category. Vendors I specifically mentioned as blending search with other kinds of data retrieval were Mark Logic and Attivio.
16. There’s a section here on native XML database management.
17. We also have a section on managing RDF and other graphical data models.
18. Ditto complex event/stream processing.
19. The only embeddable DBMS I’ve written much about recently is solidDB. And frankly, even in that case I’ve focused more on mid-tier caching uses, the now-canceled MySQL relationship, or general technology than I did specifically on embedded uses.
22-24. Back in February, 2007 I made what is probably still my clearest post explaining why I think market-leading DBMS vendors are in the process of getting disrupted
| Categories: EnterpriseDB and Postgres Plus, Mid-range, MySQL, Open source, Oracle, PostgreSQL | Leave a Comment |
EnterpriseDB unveils Postgres Plus
EnterpriseDB is making a series of moves and announcements. Highlights include:
- Renaming/repositioning the product as “Postgres Plus.” The free product is now Postgres Plus, while the version you pay EnterpriseDB for is now Postgres Plus Advanced Server.
- Repackaging the products, so that Postgres Plus Advanced Server is a strict superset of Postgres Plus.
- New features added to Postgres Plus Advanced Server.
- Features newly migrated from Advanced Server down to Postgres Plus.
- A strategic investment by IBM.
- Stressing Postgres in EnterpriseDB marketing, and dropping the tag-line defining themselves as “the Oracle-compatible database company.”
So far as I can tell, most of the technical differences between Advanced Server and regular Postgres Plus lie in three areas: Read more
| Categories: Cache, Emulation, transparency, portability, EnterpriseDB and Postgres Plus, Mid-range, MySQL, OLTP, Open source, PostgreSQL | 1 Comment |
PostgreSQL can be used in a lot of different ways
The relational DBMS industry is filled with startups. In some way or other, most of them are based on or make use of the open source project PostgreSQL. (Not all, of course; exceptions include DATAllegro and Infobright, which are based on Ingres and MySQL respectively.) But how they use PostgreSQL varies greatly. Read more
| Categories: EnterpriseDB and Postgres Plus, Greenplum, Open source, PostgreSQL, Vertica Systems | 9 Comments |
Database management system choices — mid-range-relational
This is the fourth of a five-part series on database management system choices. For the first post in the series, please click here.
The other threat to the high-end relational DBMS vendors aims squarely at the heart of their business. It’s the mid-range relational database management systems, which are doing an ever-larger fraction of what their high-end cousins can. That said, different products do different things well. So if you’re not blindly paying up for the security of an all-things-to-all-people high-end DBMS, there are a number of factors you might want to consider.
| Categories: Database diversity, EnterpriseDB and Postgres Plus, Mid-range, MySQL, OLTP, PostgreSQL, Theory and architecture | 2 Comments |
PostgreSQL speeds up OLTP
The Register reports on PostgreSQL 8.3, and emphasizes OLTP speedups and reductions in administrative burden:
Among the changes, Heap Only Tuples (HOT) that may cut the maintenance overhead of frequently updated tables by up to 75 per cent, spread checkpoints and background writer autotuning to reduce the impact of check points on response times, and an asynchronous commit option that also speeds the response times of certain transactions.
I wonder how EnterpriseDB compares on these features.
Edit: Slashdot has discussion and links. And here’s a PostgreSQL feature matrix.
| Categories: EnterpriseDB and Postgres Plus, Mid-range, OLTP, Open source, PostgreSQL | 1 Comment |
What hard-core transactional applications have actually been built in MySQL, PostgreSQL, EnterpriseDB, or FileMaker?
And here’s the biggie.
Question of the day #3
What complex, high-volume transactional applications have actually been built in mid-range DBMS such as MySQL, PostgreSQL, FileMaker, or EnterpriseDB?
I’ve been flamed for suggesting that MySQL or FileMaker aren’t fully equal to Oracle and DB2 in supporting hard-core transactional applications. (Which is ironic, because I’ve also been flamed for suggesting hard-core transactional support isn’t as big a deal for DBMS selection as some relational purists insist. But I digress …) So I’m putting the question out there — what impressive transactional applications do the stand-alone mid-range DBMS actually support? Read more
| Categories: EnterpriseDB and Postgres Plus, FileMaker, Mid-range, MySQL, OLTP, Open source, PostgreSQL | 20 Comments |
14 reasons not to use MySQL or other mid-range database management systems
I may argue for the use of open source and other mid-range database management systems, but a lot of industry sentiment remains on the other side. Vendors of high-end RDBMS naturally advocate enterprise-wide single-vendor adoption. Many CIOs and industry analysts, overwhelmed by product proliferation, think that’s a neat idea as well.
And in fairness, they’re not entirely wrong. Here are 14 reasons for using high-end relational database management systems, even on applications for which mid-range DBMS would suffice. Read more
| Categories: Microsoft and SQL*Server, Mid-range, MySQL, OLTP, Open source, Oracle, PostgreSQL | 19 Comments |
What leading DBMS vendors don’t want you to realize
For very high-end applications, the list of viable database management systems is short. Scalability can be a problem. (The rankings of most scalable alternatives differ in the OLTP and data warehouse realms.) Extreme levels of security can be had from only a few DBMS. (Oracle would have you believe there’s only one choice.) And if you truly need 99.99% uptime, there only are a few DBMS you even should consider.
But for most applications at any enterprise – and for all applications at most enterprises – super high-end DBMS aren’t required. There are relatively few applications that wouldn’t run perfectly well on PostgreSQL or EnterpriseDB today. Ingres and Progress OpenEdge aren’t far behind (they’re a little lacking in datatype support). Ditto Intersystems Cache’, although the nonrelational architecture will be off-putting to many. And to varying degrees, you can also do fine with MySQL, Pervasive PSQL, MaxDB, or a variety of other products – or for that matter with the cheap or free crippled versions of Oracle, SQL Server, DB2, and Informix.
What’s more, these mid-range database management systems can have significant advantages over their high-end brethren. Read more
The blogosphere writes about Sun buying MySQL
More from me soon, but first here is a survey of what other people are saying about Sun’s billion-dollar deal to acquire MySQL:
- Jeremy Cole, evidently a very experienced high-end MySQL user, itemizes some serious problems with MySQL — optimizer, memory management, replication, and so on. (Uh, Jeremy — what part of the product do you like?) He also echoes a theme I’ve seen elsewhere, and to some extent noticed myself; MySQL has had a lot of management issues as a company.
- Jeffrey McManus calls out Sun’s promise to continue to support non-Java programming languages in MySQL. Kaj Arnö of MySQL makes the point emphatically, reciting a list of operating systems and development environments/languages MySQL will continue to support.
- Matt Asay quite reasonably interprets Sun’s move as a bid for overall leadership and development of the open source software platform industry. I would add that Sun CEO Jonathon Schwartz came up through the software side of the business. I would further add that Sun has a dismal track record with closed-source software acquisitions, including Forte’, NetDynamics, and the enterprise side of Netscape.
- Matt also has selected quotes from the press conference, including Sun saying the coopetitionally obvious “Yeah, we’ll continue serious support for PostgreSQL and Oracle too.” Brian Aker also supports the PostgreSQL point.
- Zack Urlocker of MySQL implies that Jonathon Schwartz was very involved in the deal personally. That makes all kinds of sense.
- 451 Group has some interesting links, and don’t miss the short comment thread.
- The official MySQL and Sun company lines are summarized in this Zack Urlocker post on Infoworld (as well as some of the links above) and this post from Jonathon Schwartz of Sun.
| Categories: MySQL, Open source, PostgreSQL | 2 Comments |
The world according to Derek Rodner of EnterpriseDB
If you’re interested in the world of mid-range, OLTP, and/or open source database management systems, Derek Rodner’s blog is worth checking out. His 2007 Year in Review post deserves a look — even though it’s about as unbiased and spin-free as Bill O’Reilly’s TV show, in that combines multiple shots each at Oracle and MySQL with some plugs for EnterpriseDB. I’ve already praised his post a month ago listing large numbers of EnterpriseDB successes. Of course there are multiple heartfelt arguments on behalf of Postgres (too many to link to specifically). And he even has a great set of tips — which I hereby recommend to all my vendor clients — on how best to use Google AdWords.
| Categories: EnterpriseDB and Postgres Plus, Mid-range, OLTP, Open source, PostgreSQL | 1 Comment |
Elastra - somewhat more sensible Amazon-based DBMS option
Elastra is a startup offering MySQL and PostgreSQL SaaS instances in the Amazon S3/EC2 cloud. On their board is John Hummer, which I generally regard as a good thing, although it’s hardly a guarantee of success.* High Scalability raises some doubts about Elastra’s pricing, but I think that may be missing the point. Read more
| Categories: Amazon and its cloud, Cloud computing, Elastra, MySQL, OLTP, Open source, PostgreSQL, Software as a Service (SaaS) | 2 Comments |
The Netezza Developer Network
Netezza has officially announced the Netezza Developer Network. Associated with that is a set of technical capabilities, which basically boil down to programming user-defined functions or other capabilities straight onto the Netezza nodes (aka SPUs). And this is specifically onto the FPGAs, not the PowerPC processors. In C. Technically, I think what this boils down to is: Read more
| Categories: Data types, Data warehouse appliances, Data warehousing, GIS and geospatial, Native XML, Netezza, PostgreSQL, SAS Institute | 8 Comments |
StreamBase and Truviso
StreamBase is a decently-established startup, possibly the largest company in its area. Truviso, in the process of changing its name from Amalgamated Insight, has a dozen employees, one referenceable customer, and a product not yet in general availability. Both have ambitious plans for conquering the world, based on similar stories. And the stories make a considerable amount of sense.
Both companies’ core product is a memory-centric SQL engine designed to execute queries without ever writing data to disk. Of course, they both have persistence stories too — Truviso by being tightly integrated into open-source PostgreSQL, StreamBase more via “yeah, we can hand the data off to a conventional DBMS.” But the basic idea is to route data through a whole lot of different in-memory filters, to see what queries it satisfies, rather than executing many queries in sequence against disk-based data. Read more
