MapReduce links
For whatever reason, I seem to be making the peripheral posts about MapReduce tonight before getting to the meat of the issues. So be it. There’s a rich set of links out there about MapReduce, and here are some of the best of them:
- Aster Data introduced MapReduce integrated into its SQL data warehouse DBMS tonight. Aster’s site features an excellent white paper.
- Exactly the same is true of Greenplum.
- Google Labs offers the seminal MapReduce research paper. It also has a broken link to an associated slide presentation, which fortunately is available here.
- One can get a good sense of MapReduce by reading up on the open source implementation Hadoop.
- In particular, this list of Hadoop applications is the longest list of MapReduce applications I know of (ahead even of Google’s long internal list).
- Joel Spolsky explained the core MapReduce concept a couple of years ago.
Categories: MapReduce, Parallelization | 8 Comments |
MapReduce sound bites
Last Thursday, both Greenplum and Aster Data — the two most recent of my numerous data warehouse specialist customers — both told me of the same major innovation. Both were rushing to announce it first, before anybody else did. This led to considerable tap dancing, with the upshot being that both are releasing the information tonight or tomorrow morning.
What’s going on is that Aster Data and Greenplum have both integrated MapReduce into their respective MPP shared-nothing data warehouse DBMS. Read more
Categories: Analytic technologies, Aster Data, Greenplum, MapReduce, Parallelization | 11 Comments |
Greenplum’s single biggest customer
Greenplum offered a bit of clarification regarding the usage figures I posted last night. Everything on the list is in production, except that:
- One Greenplum customer is at 400 terabytes now, and upgrading to >1 petabyte “as we speak.”
- Greenplum’s other soon-to-be >1 petabyte customer isn’t in production yet. (Greenplum previously told me that customer was in the process of loading data right now.)
Categories: Data warehousing, Fox and MySpace, Greenplum, Petabyte-scale data management, Specific users | 3 Comments |
Greenplum is in the big leagues
After a March, 2007 call, I didn’t talk again with Greenplum until earlier this month. That changed fast. I flew out to see Greenplum last week and spent over a day with president/co-founder Scott Yara, CTO/co-founder Luke Lonergan, marketing VP Paul Salazar, and product management/marketing director Ben Werther. Highlights – besides some really great sushi at Sakae in Burlingame – start with an eye-opening set of customer proof points, such as: Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Greenplum, Petabyte-scale data management, PostgreSQL | 19 Comments |
My current customer list among the data warehouse specialists
One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers. (Another favorite page is the one for testimonials.) For a variety of reasons, I won’t undertake to be more precise about my current customer list than that. But I don’t think it would hurt anything to list the data warehouse DBMS/appliance specialists in the group. They are:
- Aster Data
- Calpont
- DATAllegro
- Greenplum
- Infobright
- Netezza
- ParAccel
- Teradata
- Vertica
All of those are Monash Advantage members.
If you care about all this, you may also be interested in the rest of my standards and disclosures.
Categories: About this blog, Aster Data, Calpont, Data warehousing, DATAllegro, Greenplum, Infobright, Netezza, ParAccel, Teradata, Vertica Systems | 3 Comments |
Kevin Closson doesn’t like MPP
Kevin Closson of Oracle offers a long criticism of the popularity of MPP. Key takeaways include:
- TPC-H benchmarks that show Oracle as somewhat superior to DB2 are highly significant.
- TPC-H benchmarks in which MPP vendors destroy Oracle are too unimportant to even mention.
- SMP did better than MPP the last time he was in a position to judge (which evidently was some time during the Clinton Administration), so it surely must still be superior for all purposes today.
Categories: Data warehousing, Oracle, Parallelization | 20 Comments |
The Explosion in DBMS Choice
If there’s one central theme to DBMS2, it’s that modern DBMS alternatives should in many cases be used instead of the traditional market leaders. So it was only a matter of time before somebody sponsored a white paper on that subject. The paper, sponsored by EnterpriseDB, is now posted along with my other recent white papers. Its conclusion — summarizing what kinds of database management system you should use in which circumstances — is reproduced below.
Many new applications are built on existing databases, adding new features to already-operating systems. But others are built in connection with truly new databases. And in the latter cases, it’s rare that a market-leading product is the best choice. Mid-range DBMS (for OLTP) or specialty data warehousing systems (for analytics) are usually just as capable, and much more cost-effective. Exceptions arise mainly in three kinds of cases:
- Small enterprises with very limited staff.
- Large enterprises that have negotiated heavily-discounted deals for a market-leading product.
- Super-high-end OLTP apps that need absolute top throughput (or security certifications, etc.)
Otherwise, the less costly products are typically the wiser choice. Read more
Categories: Database diversity | 7 Comments |
Three happy 100 terabyte-plus customers for DATAllegro
Over on my Network World blog, I asked the question “So who are DATAllegro’s actual current customers?” As regular readers know, that’s a fairly hard question to answer. TEOCO is widely known as DATAllegro’s flagship reference, but after that the list gets thin in a hurry.
As a by-the-by to other discussions, DATAllegro Stuart Frost undertook to respond in part himself. Specifically, he gave me two names of two other happy customers that are or imminently will be running DATAllegro against 100+ terabytes of user data. Read more
Categories: Data warehouse appliances, Data warehousing, DATAllegro, DBMS product categories | Leave a Comment |
Exasol technical briefing
It took 5 ½ months after my non-technical introduction, but I finally got a briefing from Exasol’s technical folks (specifically, the very helpful Mathias Golombek and Carsten Weidmann). Here are some highlights: Read more
Categories: Analytic technologies, Benchmarks and POCs, Columnar database management, Data warehousing, Exasol, In-memory DBMS, Memory-centric data management, Pricing | 1 Comment |
A NoteWorthy win for Intersystems Cache’
A small Microsoft SQL Server-based medical application vendor called NoteWorthy Medical Systems bought a small Intersystems Cache’-based medical application vendor called Mars Medical Systems. NoteWorthy then decided to rebuild its product line on Intersystems Cache’. A press release ensued.*
*In general, my criticisms of Intersystems’ stealth marketing are beginning to be relaxed. On the other hand, if you want to be technical, I still haven’t actually talked with the company for years …
I spoke briefly with Mark Conner, founder of Mars Medical and now EVP of NoteWorthy, about why he so loves Cache’. (I asked what he disliked about the product; his response was an emphatic “Nothing”.) It basically boils down to two reasons:
-
Mark thinks hierarchical data models are a great fit for medical applications. For example, the application’s UI (and local schema) look quite different depending on which particular complaints or diagnoses apply to particular patient visits.
-
Cache’ just runs and runs w/o DBA intervention. Mark cited a figure of two support engineers for Mars Medical, supporting over 1,000 medical (largely group) practices, almost none of which have DBAs.
The latter feature is crucial to small ISVs selling application software to even smaller users, and is a big part of why Progress and Intersystems have large share in that market. More generally, it’s the most important and common technical advantage that mid-range database management systems generally enjoy versus the market leaders. (The other big advantage, of course, is pricing.)