Analytic technologies
Discussion of technologies related to information query and analysis. Related subjects include:
- Business intelligence
- Data warehousing
- (in Text Technologies) Text mining
- (in The Monash Report) Data mining
- (in The Monash Report) General issues in analytic technology
Why MapReduce matters to SQL data warehousing
Greenplum and Aster Data have both just announced the integration of MapReduce into their SQL MPP data warehouse products. So why do I think this could be a big deal? The short answer is “Because MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up.” The long answer goes something like this.
The core ideas of MapReduce are: Read more
| Categories: Analytic technologies, Data warehousing, MapReduce, Parallelization | 24 Comments |
MapReduce sound bites
Last Thursday, both Greenplum and Aster Data — the two most recent of my numerous data warehouse specialist customers — both told me of the same major innovation. Both were rushing to announce it first, before anybody else did. This led to considerable tap dancing, with the upshot being that both are releasing the information tonight or tomorrow morning.
What’s going on is that Aster Data and Greenplum have both integrated MapReduce into their respective MPP shared-nothing data warehouse DBMS. Read more
| Categories: Analytic technologies, Aster Data, Greenplum, MapReduce, Parallelization | 11 Comments |
Greenplum’s single biggest customer
Greenplum offered a bit of clarification regarding the usage figures I posted last night. Everything on the list is in production, except that:
- One Greenplum customer is at 400 terabytes now, and upgrading to >1 petabyte “as we speak.”
- Greenplum’s other soon-to-be >1 petabyte customer isn’t in production yet. (Greenplum previously told me that customer was in the process of loading data right now.)
| Categories: Data warehousing, Fox and MySpace, Greenplum, Petabyte-scale data management, Specific users | 3 Comments |
Greenplum is in the big leagues
After a March, 2007 call, I didn’t talk again with Greenplum until earlier this month. That changed fast. I flew out to see Greenplum last week and spent over a day with president/co-founder Scott Yara, CTO/co-founder Luke Lonergan, marketing VP Paul Salazar, and product management/marketing director Ben Werther. Highlights – besides some really great sushi at Sakae in Burlingame – start with an eye-opening set of customer proof points, such as: Read more
| Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Greenplum, Petabyte-scale data management, PostgreSQL | 19 Comments |
My current customer list among the data warehouse specialists
One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers. (Another favorite page is the one for testimonials.) For a variety of reasons, I won’t undertake to be more precise about my current customer list than that. But I don’t think it would hurt anything to list the data warehouse DBMS/appliance specialists in the group. They are:
- Aster Data
- Calpont
- DATAllegro
- Greenplum
- Infobright
- Netezza
- ParAccel
- Teradata
- Vertica
All of those are Monash Advantage members.
If you care about all this, you may also be interested in the rest of my standards and disclosures.
| Categories: About this blog, Aster Data, Calpont, Data warehousing, DATAllegro, Greenplum, Infobright, Netezza, ParAccel, Teradata, Vertica Systems | 3 Comments |
Kevin Closson doesn’t like MPP
Kevin Closson of Oracle offers a long criticism of the popularity of MPP. Key takeaways include:
- TPC-H benchmarks that show Oracle as somewhat superior to DB2 are highly significant.
- TPC-H benchmarks in which MPP vendors destroy Oracle are too unimportant to even mention.
- SMP did better than MPP the last time he was in a position to judge (which evidently was some time during the Clinton Administration), so it surely must still be superior for all purposes today.
| Categories: Data warehousing, Oracle, Parallelization | 20 Comments |
Three happy 100 terabyte-plus customers for DATAllegro
Over on my Network World blog, I asked the question “So who are DATAllegro’s actual current customers?” As regular readers know, that’s a fairly hard question to answer. TEOCO is widely known as DATAllegro’s flagship reference, but after that the list gets thin in a hurry.
As a by-the-by to other discussions, DATAllegro Stuart Frost undertook to respond in part himself. Specifically, he gave me two names of two other happy customers that are or imminently will be running DATAllegro against 100+ terabytes of user data. Read more
| Categories: Data warehouse appliances, Data warehousing, DATAllegro, DBMS product categories | Leave a Comment |
Exasol technical briefing
It took 5 ½ months after my non-technical introduction, but I finally got a briefing from Exasol’s technical folks (specifically, the very helpful Mathias Golombek and Carsten Weidmann). Here are some highlights: Read more
| Categories: Analytic technologies, Benchmarks and POCs, Columnar database management, Data warehousing, Exasol, In-memory DBMS, Memory-centric data management, Pricing | 1 Comment |
Patent nonsense in the data warehouse DBMS market
There are two recent patent lawsuits in the data warehouse DBMS market. In one, Sybase is suing Vertica. In another, an individual named Cary Jardin (techie founder of XPrime, a sort of predecessor company to ParAccel) is suing DATAllegro. Naturally, there’s press coverage of the DATAllegro case, due in part to its surely non-coincidental timing right after the Microsoft acquisition was announced and in part to a vigorous PR campaign around it. And the Sybase case so excited a troll who calls himself Bill Walters that he posted identical references to it on about 12 different threads in this blog, as well as to a variety of Vertica-related articles in the online trade press. But I think it’s very unlikely that any of these cases turn out to much matter. Read more
| Categories: Columnar database management, Data warehousing, Database compression, DATAllegro, Sybase, Vertica Systems | 7 Comments |
Compare/constrast of Vertica, ParAccel, and Exasol
I talked with Exasol today – at 5:00 am! — and of course want to blog about it. For clarity, I’d like to start by comparing/contrasting the fundamental data structures at Vertica, ParAccel, and Exasol. And it feels like that should be a separate post. So here goes.
- Exasol, Vertica, and ParAccel all store data in columnar formats.
- Exasol, Vertica, and ParAccel all compress data heavily.
- Exasol and Vertica operate on in-memory data in compressed formats. ParAccel decompresses the data when it gets to RAM. Exasol, Vertica, and ParAccel all — perhaps to varying extents — operate on in-memory data in compressed formats.
- ParAccel and Exasol write data to what amounts to the in-memory part of their basic data structures; the data then gets persisted to disk. Vertica, however, has a separate in-memory data structure to accept data and write it to disk.
- Vertica is a disk-centric system that doesn’t rely on there being a lot of RAM.
- ParAccel can be described that way too; however, in some cases (including on the TPC-H benchmarks), ParAccel recommends loading all your data into RAM for maximum performance.
- Exasol is totally optimized for the assumption that queries will be run against data that had already been previously loaded into RAM.
Beyond the above, I plan to discuss in a separate post how Exasol does MPP shared-nothing software-only columnar data warehouse database management differently than Vertica and ParAccel do shared-nothing software-only columnar data warehouse database management. 🙂
