Pushback on the PostgreSQL vs. MySQL comparison
It should come as no surprise that not everybody agrees with EnterpriseDB’s views on the PostgreSQL/MySQL comparison. In particular, the High Availability MySQL blog offers a detailed rebuttal post, with more in the comment thread. According to MySQL fans, EnterpriseDB got its facts wrong on several matters regarding MySQL and InnoDB, especially in the areas of triggers and locking. And of course they disagree with EnterpriseDB’s general conclusion. ![]()
| Categories: MySQL, Open source, PostgreSQL | Leave a Comment |
How is MySQL’s join performance these days?
In a comment thread on a recent post comparing MySQL to Postgres, Jonathon Moore chimed in based on experience with both products. His characterization of some MySQL problems: Read more
| Categories: Infobright, MySQL, Open source | 5 Comments |
Google has thousands of internal data formats, mostly simple ones
In connection with the release of Protocol Buffers, Kenton Varda of Google wrote: Read more
Another Cognos scandal in Massachusetts
I already posted about the Boston Globe’s reporting on a deal to supply the whole Massachusetts state government with Cognos software that since has been investigated and rescinded.
The Globe now reports that a multimillion dollar deal the prior year with the Massachusetts Department of Education was equally dubious. Lowlights include: Read more
| Categories: Business intelligence, Cognos | Leave a Comment |
EnterpriseDB’s itemized claims of Oracle compatibility
Obviously, I’m poking around EnterpriseDB’s site this morning (in connection with their status as my client, actually). Anyhow, we all know that one of EnterpriseDB’s core claims is great Oracle-compatibility — but what exactly do they mean by that? I found a fairly clearly laid-out answer, as of last year, in this white paper and and — even more simply — in this blog post summarizing the white paper.
PostgreSQL vs. MySQL, as per EnterpriseDB
EnterpriseDB put out a white paper arguing for the superiority of PostgreSQL over MySQL, even without EnterpriseDB’s own Postgres Plus extensions. Highlights of EnterpriseDB’s opinion include:
- EnterpriseDB asserts that MyISAM is the only MySQL storage engine with decent performance.
- EnterpriseDB then bashes MyISAM for all sorts of well-deserved reasons, especially ACID-noncompliance.
- EnterpriseDB asserts that row-level triggers, lacking in MySQL but present in PostgreSQL, are the most important kind of trigger.
- EnterpriseDB claims PostgreSQL is superior in procedural language support to MySQL.
- EnterpriseDB claims PostgreSQL is superior in authentication support to MySQL.
| Categories: EnterpriseDB and Postgres Plus, Mid-range, MySQL, Open source, PostgreSQL | 10 Comments |
Declaration of Data Independence (humor)
The data warehouse appliance industry has a well-developed funny bone. Dataupia’s contribution is a Declaration of Data Independence, which begins:
When in the Course of an increasingly competitive global economy it becomes necessary for one data set to dissolve its connections to a constraining environment, the separate but inherently unequal station to which the Laws of Whose budget is larger prevails.
Related links:
- Cartoons from DATAllegro
- April Fool press release from Netezza
| Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Dataupia | Leave a Comment |
Three cartoons from DATAllegro



Related links:
- Humor from Netezza
- Another gerbil-based solution
| Categories: Analytic technologies, DATAllegro, Data warehousing, Humor | 1 Comment |
Event processing vs. data-driven processing
Marco Seiriö offers a distinction between event processing and data-driven processing. Specifically, he says that if an event has an ID, then it’s true event processing; if it doesn’t, and what you’re doing looks somewhat like event processing anyway, then you’re doing data-driven processing. Read more
| Categories: Complex event processing (CEP) | Leave a Comment |
The IRS data warehouse
According to a recent Eric Lai Computerworld story and a 2006 Sybase.com success story,
- The IRS has a data warehouse running on Sybase IQ, with 500 named users, called the CDW (Compliance Data Warehouse). (Computerworld)
- By some metric, it’s a 150 TB warehouse. (Computerworld)
- By some metric, they add 15-20 TB/year, with a 4 hour load time. (Computerworld)
- As of 2006, there were 20-25 TB of “input data”, with a “70% compression rate”. (Sybase)
I can’t entirely reconcile those numbers, but in any case the database sounds plenty big.
Computerworld also said:
the research division also uses Microsoft Corp.’s SQL Server to store all of the metadata for the data warehouse and the rest of the agency. Managing and cleaning all of that metadata — 10,000 labels for 150 databases — is a huge task in itself,
| Categories: Analytic technologies, Data warehousing, Specific users, Sybase | 2 Comments |
Jerry Held on cloud data warehousing and how business intelligence will be transformed by it
Vertica Chairman Jerry Held has a pair of blog posts on analytics and data warehousing in the cloud. The first lays out a number of potential benefits and consequences of cloud data warehousing, under the heading of “Transforming BI”: Read more
| Categories: Analytic technologies, Business intelligence, Cloud computing, Data mart outsourcing, Data warehousing, Software as a Service (SaaS), Vertica Systems | Leave a Comment |
Cognos/State of Massachusetts scandal
I assumed this had been reported widely outside of Massachusetts, but a web search suggests otherwise.
The story is this: Cognos sold 20,000 seats of software to Massachusetts for $13 million. There were technical violations of purchase procedures, and other aspects of the deal that didn’t pass the smell test. After IBM bought Cognos, the deal was rescinded, and is being rebid. Read more
| Categories: Analytic technologies, Business intelligence, Cognos, Pricing | 1 Comment |
Unreliable web MySQL application (Technorati/Wordpress)
Technorati yesterday exposed an application error, to wit (in what presumably should be a blog content region): Read more
| Categories: MySQL | 5 Comments |
Response to Rita Sallam of Oracle
In a comment thread on Seth Grimes’ blog, Rita Sallam of Oracle engaged in a passionate defense of her data warehousing software. I’d like to take it upon myself to respond to a few of here points here. Read more
| Categories: Data warehousing, Oracle | 4 Comments |
Oracle Optimized Warehouse Initiative
Oracle’s response to data warehouse appliances — and to IBM’s BCUs (Balanced Configuration Units) — so far is the Oracle Optimized Warehouse Initiative (OOW, not to be confused with Oracle Open World). A small amount of information about Oracle Optimized Warehouse can be found on Oracle’s website. Another small amount can be found in this recent long and breathless TDWI article, full of such brilliancies as attributing to the data warehouse appliance vendors the “claim that relational databases simply aren’t cut out for analytic workloads.” (Uh, what does he think they’re running — CODASYL DBMS?)
So far as I can tell, what Oracle Optimized Warehouse — much like IBM’s BCU — boils down to is the same old Oracle DBMS, but with recommended hardware configuration and tuning parameters. Thus, a lot of the hassle is taken out of ordering and installing an Oracle data warehouse, which is surely a good thing. But I doubt it does much to solve Oracle’s problems with price, price/performance, or the inevitable DBA hassles derived from a poorly-performing DBMS.
| Categories: Data warehouse appliances, Data warehousing, Oracle | 2 Comments |
Who is doing what in XML data management these days?
A comment thread to a post on a different subject has opened up a discussion of XML storage. Frankly, I haven’t kept up with my briefings on the subject, in part because XML support hasn’t proved to be very important yet to the big DBMS vendors, somewhat to my surprise. When last I looked, the situation wasn’t much different from what it was back in November, 2005. Unless I’ve missed something (and please tell me if I have!), here’s what’s going on: Read more
| Categories: IBM and DB2, Intersystems and Cache', Mark Logic, Microsoft and SQL*Server, Native XML, Oracle | 4 Comments |
Oracle’s hefty price increases
Jeff Jones of IBM wrote in to point out that Oracle is slathering on the price increases. I quote: Read more
| Categories: Dataupia, Emulation, transparency, portability, EnterpriseDB and Postgres Plus, Oracle | 5 Comments |
Derek Rodner blasts ANTs Software
Derek Rodner got snarky, and blasted Ants Software. Highlights include (emphasis mine):
I have never seen more thinly veiled attempts to make themselves bigger than they are. … In 2005, they did almost a half million dollars in revenue. That’s right, I said a half million, or $467,000 to be exact. In 2006, it got worse at $288,000 in revenue and last year they did $360,000. Yet, they continue to drone on about their “consortium” which, from the outside simply looks like a beta program. Its no consortium. … And, they continue to mention a major deal with IBM that COULD be worth millions over time. You can read about it in every SEC filing. But, it has never materialized. … They announced a major Oracle partnership, but Oracle never acknowledges their existence. I think they simply signed up for the partner program at oracle and paid the $1500. … Sybase is paying them $1.4 million to do whatever they want with the entire product line from ANTs. … This means that Sybase can do whatever they want with the product, including reselling it without paying another dime to ANTs.
| Categories: ANTs Software | 2 Comments |
Detailed analysis of Perst and other in-memory object-oriented DBMS
Dan Weinreb — inspired by but not linking to my recent short post on McObject’s object-oriented in-memory DBMS Perst — has posted a detailed discussion of Perst on his own blog. For context, he compares it briefly to analogous products, most especially Progress’s — which used to be ObjectStore, of which Dan was the chief architect.
This was based on documentation and general sleuthing (Dan figured out who McObject got Perst from), rather than hands-on experience, so performance figures and the like aren’t validated. Still, if you’re interested in such technology, it’s a fascinating post.
| Categories: In-memory DBMS, McObject, Memory-centric data management, Object | Leave a Comment |
Open source in-memory DBMS
I’ve gotten email about two different open source in-memory DBMS products/projects. I don’t know much about either, but in case you care, here are some pointers to more info.
First, the McObject guys — who also sell a relational in-memory product — have an object-oriented, apparently Java-centric product called Perst. They’ve sent over various press releases about same, the details of which didn’t make much of an impression on me. (Upon review, I see that one of the main improvements they cite in Perst 3.0 is that they added 38 pages of documentation.)
Second, I just got email about something called CSQL Cache. You can read more about CSQL Cache here, if you’re willing to navigate some fractured English. CSQL’s SourceForge page is here. My impression is that CSQL Cache is an in-memory DBMS focused on, you guessed it, caching. It definitely seems to talk SQL, but possibly its native data model is of some other kind (there are references both to “file-based” and “network”.)
| Categories: Cache, DBMS product categories, In-memory DBMS, McObject, Memory-centric data management, OLTP, Object, Open source | 5 Comments |
ANTs bails out of the DBMS market
ANTs Data Server — i.e., the ANTs DBMS — has been sold off to a company called 4Js. It is now to be called Genero DB. Actually, 4Js has been selling or working on a version of the product called Genero DB since 2006, specifically an Informix-compatible one.
I’m not totally clear on why an Informix-compatible DBMS is needed in a world that already has Informix SE, but maybe IBM is overcharging for maintenance even on the low-end version of the product.
Meanwhile, ANTs, which had originally tried to get enterprises to migrate away from Oracle, is now focused on middleware called the ANTs Compatibility Server to help them migrate to Oracle, specifically/initially from Sybase.
| Categories: ANTs Software, Emulation, transparency, portability, IBM and DB2, Oracle, Sybase | 2 Comments |
Yahoo scales its web analytics database to petabyte range
Information Week has an article with details on what sounds like Yahoo’s core web analytics database. Highlights include:
- The Yahoo web analytics database is over 1 petabyte. They claim it will be in the 10s of petabytes by 2009.
- The Yahoo web analytics database is based on PostgreSQL. So much for MySQL fanboys’ claims of Yahoo validation for their beloved toy … uh, let me rephrase that. The highly-regarded MySQL, although doing a great job for some demanding and impressive applications at Yahoo, evidently wasn’t selected for this one in particular. OK. That’s much better now.
- But the Yahoo web analytics database doesn’t actually use PostgreSQL’s storage engine. Rather, Yahoo wrote something custom and columnar.
- Yahoo is processing 24 billion “events” per day. The article doesn’t clarify whether these are sent straight to the analytics store, or whether there’s an intermediate storage engine. Most likely the system fills blocks in RAM and then just appends them to the single persistent store. If commodity boxes occasionally crash and lose a few megs of data — well, in this application, that’s not a big deal at all.
- Yahoo thinks commercial column stores aren’t ready yet for more than 100 terabytes of data.
- Yahoo says it got great performance advantages from a custom system by optimizing for its specific application. I don’t know exactly what that would be, but I do know that database architectures for high-volume web analytics are still in pretty bad shape. In particular, there’s no good way yet to analyze the specific, variable-length paths users take through websites.
| Categories: Analytic technologies, Columnar database management, Data warehousing, MySQL, PostgreSQL, Specific users, Theory and architecture, Yahoo | 5 Comments |
DATAllegro on compression
DATAllegro CEO Stuart Frost has been blogging quite a bit recently (and not before time!). A couple of his posts have touched on compression. In one he gave actual numbers for compression, namely:
DATAllegro compresses between 2:1 and 6:1 depending on the content of the rows, whereas column-oriented systems claim 4:1 to 10:1.
In another recent post, Stuart touched on architecture, saying:
Due to the way our compression code works, DATAllegro’s current products are optimized for performance under heavy concurrency. The end result is that we don’t use the full power of the platform when running one query at a time.
| Categories: Analytic technologies, DATAllegro, Data warehouse appliances, Data warehousing, Database compression | Leave a Comment |
Data warehouse appliance power user TEOCO
If you had to name super-high-end users of data warehouse technology, your list might start with a few retailers, credit data processors, and telcos, plus the US intelligence establishment. Well, it turns out that TEOCO runs outsourced data warehouses for several of the top US telcos, making it one of the top data warehouse technology users around.
A few weeks ago, I had a fascinating chat with John Devolites of TEOCO. Highlights included:
- TEOCO runs a >200 TB DATAllegro warehouse for a major US telco. (When we hear about a big DATAllegro telco site that’s been in production for a while, that’s surely the one they’re talking about.)
- TEOCO runs around 450 TB total of DATAllegro databases across its various customers. (When Stuart Frost blogs of >400 TB “systems,” that may be what he’s talking about.)
- TEOCO likes DATAllegro better than Netezza, although the margin is now small. This is mainly for financial reasons, specifically price-per-terabyte. When TEOCO spends its own money without customer direction as to appliance brand, it buys DATAllegro.
- TEOCO runs at least one 50 TB Netezza system — originally due to an acquisition of a Netezza user — with more coming. There also is more DATAllegro coming.
- TEOCO feels 15-30 concurrent users is the current practical limit for both DATAllegro and Netezza. That’s greater than it used to be.
- Netezza is a little faster than DATAllegro on a few esoteric queries, but the difference is not important to TEOCO’s business.
- Official price lists notwithstanding, TEOCO sees prices as being in the $10K/TB range. DATAllegro’s price advantage has shrunk greatly, as others have come down to more or less match. However, since John stated his price preference for DATAllegro as being in the present tense, I presume the price match isn’t perfect.
- Teradata was never a serious consideration, for price reasons.
- In the original POC a few years ago, the incumbent Oracle — even after extensive engineering — couldn’t get an important query down under 8 hours of running time. DATAllegro and Netezza both handled it in 2-3 minutes. Similarly, Oracle couldn’t get the load time for 100 million call detail records (CDRs) below 24 hours.
- Applications sound pretty standard for telecom: Lots of CDR processing — 550 million/day on the big DATAllegro system cited above. Pricing and fraud checking. Some data staging for legal reasons (giving the NSA what it subpoenas and no more).
| Categories: Analytic technologies, DATAllegro, Data mart outsourcing, Data warehouse appliances, Data warehousing, Netezza, Specific users, TEOCO | Leave a Comment |
Netezza on compression
Phil Francisco put up a nice post on Netezza’s company blog about a month ago, explaining the Netezza compression story. Highlights include:
- Like other row-based vendors, Netezza compresses data on a column-by-column basis, then stores the results in rows. This is obviously something of a limitation — no run-length encoding for them — but can surely accommodate several major compression techniques.
- The Netezza “Compress Engine” compresses data on a block-by-block basis. This is a disadvantage for row-based systems vs. columnar ones in the area of compression, because columnar systems have more values per block to play with, and that yields higher degrees of compression. And among row-based systems, typical block size is an indicator of compression success. Thus, DATAllegro probably does a little better at compression than Netezza, and Netezza does a lot better at compression than Teradata.
- Netezza calls its compression “compilation.” The blog post doesn’t make the reason clear. And the one reason I can recall confuses me. Netezza once said the compression extends at least somewhat to columns with calculated values. But that seems odd, as Netezza only has a very limited capability for materialized views.
- Netezza pays the processing cost of compression in the FPGA, not the microprocessor. And so Netezza spins the overhead of the Compress Engine as being zero or free. That’s actually not ridiculous, since Netezza seems to have still-unused real estate on the FPGA for new features like compression.
