The TPC-H benchmark is a blight upon the industry
ParAccel has released a 30,000-gigabtye TPC-H benchmark, and no less a sage than Merv Adrian paid attention. Now, the TPCs may have had some use in the 1990s. Indeed, Merv was my analyst relations contact for a visit to my clients at Sybase around the time — 1996 or so — I was advising Sybase on how to market against its poor benchmark results. But TPCs are worthless today.
It’s not just that TPCs are highly tuned (ParAccel’s claim of “load-and-go” is laughable Edit: Looking at Appendix A of the full disclosure report, maybe it’s more justified than I thought.). It’s also not just that different analytic database management products perform very differently on different workloads, making the TPC-H not much of an indicator of anything real-life. The biggest problem is: Most TPC benchmarks are run on absurdly unrealistic hardware configurations.
For example, if you look at some details, the ParAccel 30-terabyte benchmark ran on 43 nodes, each with 64 gigabytes of RAM and 24 terabytes of disk. That’s 961,124.9 gigabytes of disk, officially, for a 32:1 disk/data ratio. By way of contrast, real-life analytic DBMS with good compression often have disk/data ratios of well under 1:1.
Meanwhile, the RAM:data ratio is around 1:11 It’s clear that ParAccel’s early TPC-H benchmarks ran entirely in RAM; indeed, ParAccel even admits that. And so I conjecture that ParAccel’s latest TPC-H benchmark ran (almost) entirely in RAM as well. Once again, this would illustrate that the TPC-H is irrelevant to judging an analytic DBMS’ real world performance.
More generally — I would not advise anybody to consider ParAccel’s product, for any use, except after a proof-of-concept in which ParAccel was not given the time and opportunity to perform extensive off-site tuning. I tend to feel that way about all analytic DBMS, but it’s a particular concern in the case of ParAccel.
Categories: Analytic technologies, Benchmarks and POCs, Buying processes, Columnar database management, Data warehousing, Database compression, ParAccel | 96 Comments |
H-Store is now VoltDB
I’ve always honored more of an NDA about the H-Store project and its commercialization than I really felt obligated to, given how freely information was being bandied about to others. I’m still doing so. 🙂
But I think I’ll at least say that the H-Store project is now named VoltDB. The VoltDB website names two individuals — Mike Stonebraker and Andy Palmer — both of whom are founders of Vertica. Job listings on the site are for field engineer and trainer, but not developer, so that suggests something about the project’s/product’s maturity level.
If you have an extreme OLTP need, you should talk to VoltDB. If you don’t have access to Mike or Andy directly, I can hook you up with a key VoltDB marketing/outreach guy. Price may not be as much of a barrier as you’d initially fear.
If anybody from VoltDB wants to be less cloak-and-daggery and say more in the comment thread, I’d be pleased.
And yes — an open-secret working name for H-Store/VoltDB was, for a while, “Horizontica.”
Categories: In-memory DBMS, Memory-centric data management, OLTP, Vertica Systems, VoltDB and H-Store | 15 Comments |
Apparent turmoil at EnterpriseDB
EnterpriseDB seems to be facing a string of management departures:
- Bob Zurek, EnterpriseDB’s well-regarded CTO, is gone. (He landed at Infobright, after a stint of independent consulting.)
- Multiple rumors have founder Andy Astor leaving EnterpriseDB, and stepping back to an advisory role. One version has Tuesday, June 16 as Andy’s last day. Update: As of Wednesday, June 17, Andy Astor is no longer listed as being on EnterpriseDB’s management team.
- Fred Holahan, who was briefly VP of Marketing, is not listed on EnterpriseDB’s management team web page. And EnterpriseDB announced a new VP of Marketing and Product Management on May 21.
- Other rumors point to turmoil at EnterpriseDB as well.
And by the way, EnterpriseDB, which used to call itself “the Oracle-compatible database company,” recently licensed out what used to be its core differentiating technology.
Now, this isn’t all bad news. EnterpriseDB’s Oracle-compatibility focus had to be changed anyway. And Fred Holahan was the proximate cause for me writing:
my recent dealings with EnterpriseDB underscore the importance of being VERY careful about counting your fingers after you shake hands with that company,
Still, these aren’t exactly indicators of a company executing on a smooth-running plan.
Categories: EnterpriseDB and Postgres Plus, Open source | 3 Comments |
Aster Data on parallelism
Aster Data’s core claim boils down to “We do parallelism better.” Aster has shied away from saying that for marketing purposes, for fear of the response “Yeah, right, everybody says that.” But when I talked with Mayank Bawa, Steve Wooledge, et al. yesterday, I focused discussions on just that point. Based on that chat and others before, here are some highlights (as I understand them) of what Aster claims, believes, or believes to be differentiated about its nCluster technology: Read more
Categories: Analytic technologies, Aster Data, Data warehousing, MapReduce, Parallelization, Theory and architecture | 3 Comments |
An example of what’s wrong with big vendors’ approaches to BI (SAP in this case)
I just found Chris Kanaracus’ article about SAP’s rollout last month of its “clear enterprises” strategy. The money quote comes from Sara Lee, the user SAP seems to have trotted out:
But Sara Lee has not yet decided to purchase the software, and there are substantial underlying tasks to perform as well, he added.
“This is giving us the horsepower [to analyze data] but we need to have harmonized and structured data underneath it.”
This is from the leading test user of the product?
Business intelligence and the associated data management processes need to be reimagined, and I’m increasingly coming to suspect that the big BI conglomerates aren’t up to the task.
Categories: Analytic technologies, Business intelligence, SAP AG, Specific users, Theory and architecture | Leave a Comment |
Google Fusion Tables
Google has announced an experimental cloud-based data management system called Fusion Tables. A press article and Slashdot thread ensued, based on some bizarre-sounding analyst quotes that I will not attempt to parse.
What Fusion Tables really seems to be is a spreadsheet without the formulae. That is, it’s a place to dump data in a grid of cells, comment on it, version it, and do elementary data manipulation. This could, I guess, be useful as an alternative to traditional RDBMS — assuming, of course, that you want to have a row-by-row debate about 100 megs of data.
Seriously, while Google Fusion Tables bears some vague resemblance to what I’m thinking about for the future of both business intelligence and data marts, it sounds as if it has a long way to go before it’s something most enterprises should spend time looking at.
Categories: Analytic technologies, Google, Theory and architecture | 1 Comment |
Two lessons from Dataupia’s troubles
I’ve been beating my head against the wall trying to convince startups of two well-established truisms:
- Experience consistently shows that the demand for transparency/emulation features isn’t as great as entrepreneurs hope.
- If a startup’s competitors sell directly to enterprises, an indirect sales strategy rarely succeeds.
Maybe one or the other will learn from Dataupia’s example.
Dataupia’s troubles are now confirmed
Todd Fin pointed me yesterday to an article by Wade Roush that confirmed in detail layoffs and other troubles at Dataupia. The article quotes Dataupia marketing VP Samantha Stone as saying Dataupia is down to 23 employees, and that some of the layoffs were in engineering. This is consistent with what I’d been hearing for a while, namely that other analytic DBMS vendors were seeing a flood of Dataupia resumes, especially technical ones.
The article goes on to discuss difficulties Dataupia has had in raising another round of financing. During Dataupia’s very long CEO search — which I kept hearing about from people who’d been approached for the job — it was obvious money wouldn’t come in until a CEO was found. But it seems that even with a new CEO, existing investors are reluctant to re-up without a new investor as well, and that new investment is slow in happening.
On the plus side, the article quotes Samantha as saying founder Foster Hinshaw is recovering well from his heart surgery.
Categories: Data warehouse appliances, Data warehousing, Dataupia, Emulation, transparency, portability | 3 Comments |
Netezza Q1 earning call transcript
I finally read the Netezza Q1 earnings call transcript, put out by Seeking Alpha. Highlights included:
- Netezza got 14 new-name accounts and 21 follow-on deals. Average sale in both groups was right around $1 million.
- The economy is tough, deals are slipping, and nobody knows for sure what will happen.
- Netezza’s main head-to-head competitors are Oracle and Teradata. Netezza claims good but not perfect win rates against each, but concedes that those vendors (especially Oracle) of course get other deals Netezza never sees.
- Netezza characterizes Teradata as offering its multiple product lines, trying to upsell many customers from cheaper to more expensive product lines, and being selectively aggressive about pricing. None of this is surprising to me.
- 80% of Netezza’s Q1 revenue, and perhaps even a higher fraction of new-name accounts, was in four vertical markets: “Digital media,” telecom, government, and financial services.
- Some time over the next few months, Netezza will give at least some more clarity about future products.
One tip for the Netezza folks, by the way, from this former stock analyst — you should never use the word “certainly” about a deal you haven’t closed yet. “Almost surely” could be OK, but “certainly” — well, it certainly was not the thing to say.
Aster Data sticks by its SQL/MapReduce guns
Aster Data continues to think that MapReduce, integrated with SQL, is an important technology. For example:
- Aster announced today that it’s providing .NET support for SQL/MapReduce. Perhaps not coincidentally, Aster’s biggest customer is MySpace, which is apparently a big Microsoft shop. (And MySpace parent Fox Interactive Media is a SQL/MapReduce fan, albeit running on Greenplum.)
- Aster generally puts more emphasis on MapReduce than SQL/MapReduce rival Greenplum. That’s a non-trivial comparison, because Greenplum is making progress in SQL/MapReduce itself.
- When talking with Aster folks, I can’t get them to shut up hear a lot about SQL/MapReduce.
I was a big fan of SQL/MapReduce when it was first announced last August. Notwithstanding persuasive examples favoring pure DBMS or pure MapReduce over DBMS/MapReduce integration, I continue to think the SQL/MapReduce idea has great potential. But I do wish more successful production examples would become visible …