Is MapReduce a good underpinning for next-gen scientific DBMS?
Back in November, Mike Stonebraker suggested that there’s a need for database management advances to serve “big science”. He said:
Obviously, the best solution to these … problems would be to put everything in a next-generation DBMS — one capable of keeping track of data, metadata, and lineage. Supporting the latter would require all operations on the data to be done inside the DBMS with user-defined functions — Postgres-style.
Categories: Data types, MapReduce, Scientific research | Leave a Comment |
A passionate defense of MapReduce
Mark Chu-Carroll has weighed in with a passionate defense of MapReduce. I only see one thing he got wrong, which was to overlook the great shared-nothing parallelism of today’s data warehouse appliances and specialty data warehouse DBMS. But that doesn’t detract from his overall point, which is that MapReduce is designed to help with parallel computing in general, not database querying in particular.
He also has the best version I know of an old observation, namely:
… [relational database] people have found the most beautiful, wonderful, perfect hammer in the whole world. It’s perfectly balanced – not too heavy, not too light, and swings just right to pound in a nail just right every time. The grip is custom-made, fitted to the shape of the owners hand, so that they can use it all day without getting any blisters. It’s also beautifully decorated – encrusted with gemstones and gold filigree – but only in places that won’t detract from how well it works as a hammer. It really is the greatest hammer ever. Relational database guys love their hammer. It’s just such a wonderful tool! And when they make something with it, it really comes out great. In fact, they like it so much that they think it’s the only tool they need. If you give them a screw, they’ll just pound it in like it’s a nail. And when you point out to them that dammit, it’s a screw, not a nail, they’ll say “I know that. But you can’t expect me to use a crappy little screwdriver when I have a magnificent hammer like this!”
Categories: Data models and architecture, Database diversity, MapReduce, Parallelization | Leave a Comment |
14 reasons not to use MySQL or other mid-range database management systems
I may argue for the use of open source and other mid-range database management systems, but a lot of industry sentiment remains on the other side. Vendors of high-end RDBMS naturally advocate enterprise-wide single-vendor adoption. Many CIOs and industry analysts, overwhelmed by product proliferation, think that’s a neat idea as well.
And in fairness, they’re not entirely wrong. Here are 14 reasons for using high-end relational database management systems, even on applications for which mid-range DBMS would suffice. Read more
Categories: Microsoft and SQL*Server, Mid-range, MySQL, OLTP, Open source, Oracle, PostgreSQL | 25 Comments |
Is Teradata bringing out a low-end data warehouse appliance?
Edit: This post is superseded by our analysis of the new Teradata 2500 data warehouse appliance.
One of Teradata’s competitors believes they got an accurate leak about a new low-end Teradata appliance. Teradata is neither confirming nor denying. I believe the leak.
I’m not going to give product or pricing details, which in any case could be subject to change before a final product release. But the general idea is:
- Commodity Dell servers.
- Some of the higher-end software stripped out.
- Limit on the number of nodes, leading to a database size limit somewhere in the tens of terabytes.
It will be interesting to see whether Teradata can come out with something that’s closely competitive in price, performance, and administrative ease to what the newer data warehouse appliance vendors offer, yet upgrades cleanly to full-sophistication Teradata systems for those who choose to pursue that path.
Categories: Data warehouse appliances, Data warehousing, Teradata | 1 Comment |
What leading DBMS vendors don’t want you to realize
For very high-end applications, the list of viable database management systems is short. Scalability can be a problem. (The rankings of most scalable alternatives differ in the OLTP and data warehouse realms.) Extreme levels of security can be had from only a few DBMS. (Oracle would have you believe there’s only one choice.) And if you truly need 99.99% uptime, there only are a few DBMS you even should consider.
But for most applications at any enterprise – and for all applications at most enterprises – super high-end DBMS aren’t required. There are relatively few applications that wouldn’t run perfectly well on PostgreSQL or EnterpriseDB today. Ingres and Progress OpenEdge aren’t far behind (they’re a little lacking in datatype support). Ditto Intersystems Cache’, although the nonrelational architecture will be off-putting to many. And to varying degrees, you can also do fine with MySQL, Pervasive PSQL, MaxDB, or a variety of other products – or for that matter with the cheap or free crippled versions of Oracle, SQL Server, DB2, and Informix.
What’s more, these mid-range database management systems can have significant advantages over their high-end brethren. Read more
Variants on SimpleDB
Ralf describes SimpleDB, a project for an open source/desktop equivalent, a .NET version, and so on. Who knew that there was so much need for a database manager that could easily lose your data forever (with simple programming errors) and that is a lead-pipe cinch to repeatedly misplace it for a while (the built-in latency issues)?
To wit: Read more
Categories: Amazon and its cloud | 1 Comment |
Will Brighthouse become the MySQL data warehouse of choice?
As I’ve previously noted:
- Infobright is about to make more noise about its MySQL-based data warehouse software, Brighthouse.
- Brighthouse has some very interesting technical features.
- A Sun/Infobright partnership would make a lot of sense.
Talking with Infobright today, I was again struck by how close their relationship with MySQL (the company is). Stay tuned.
Categories: Analytic technologies, Data warehousing, Infobright, MySQL | Leave a Comment |
Infobright is gearing up for a press push
There’s another TDWI conference coming up, so it’s time for data warehouse-related press rollouts. Infobright (one of my many clients in this area) will be doing one of them, and ran an early version by me. Customer announcements, vendor partnerships, and so on are still being finalized, but anyhow Infobright has 7 revenue-recognized customers and a bunch more that are sold and in the implementation cycle. There’s a Release 3 of Brighthouse coming up. As one would expect, Release 3’s major claims to fame are the general addition of features (including some which elicit a “You didn’t have that already?” reaction), plus huge performance improvements in some queries (i.e., the biggest bottlenecks in Brighthouse Release 2).
On that level, it’s all standard stuff, as is Infobright’s core pitch — ease, simplicity, low cost, etc., and the benefits of same. But drilling down, there are some rather unique technical claims. Read more
Categories: Analytic technologies, Data warehousing, Infobright | 1 Comment |
More Google reliability woes
Google’s reliability issues are ever worse. As I previously pointed out, this is evidence against the notion that MapReduce is a replacement for established DBMS.
Categories: Google | 2 Comments |
MapReduce for data mining? Maybe for variable-schema analytics.
Rich Skrenta is quite a successful entrepreneur, so it’s likely that he doesn’t really mean the more ridiculous parts of this rant on the MapReduce debate. E.g., he cheerfully disregards the fact that the data warehouse appliance vendors have ALREADY disrupted the market he’s focusing on. Index-light row-based and columnar systems are both super fast at data mining extracts.
But let’s go straight to the one interesting thing he said, Read more
Categories: Analytic technologies, MapReduce, Parallelization, SAS Institute | 2 Comments |