Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
Notes on EMC’s Greenplum subsidiary
I spent considerable time last week with my clients at both Greenplum and EMC (if we ignore the fact that the deal has closed and they’re now the same company). I also had more of a hardcore engineering discussion than I’ve had with Greenplum for quite a while (I should have been pushier about that earlier). Takeaways included:
- This is starting off as a honeymoon deal. Everything Greenplum was planning to do is being continued. Additional resources are being poured into Greenplum to do more.
- Some Greenplum execs seem to envision staying long term, some seem to envision moving on to their next startups. The ones who envision moving on are, however, going to work hard first to make the merger a success.
- Greenplum has, for quite a while, had more of an advanced analytics/embedded predictive modeling story than I realized. Bad on them for not fleshing it out more in marketing and product packaging alike.
- Greenplum both denies the concurrency problems I previously noted and also has a very credible story as to how it will eliminate them. 🙂 Seriously, Greenplum tells of one customer that routinely runs 150 simultaneous queries – on what I think is not a terribly big system — and a number of POCs (Proofs of Concept) that simulated similar levels of concurrency.
Categories: Analytic technologies, Data warehousing, EMC, Greenplum | 1 Comment |
Teradata, Xkoto Gridscale (RIP), and active-active clustering
Having gotten a number of questions about Teradata’s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:
- Teradata is discontinuing Xkoto’s existing product Gridscale, which Scott characterized as being too OLTP-focused to be a good fit for Teradata. Teradata hopes and expects that existing Xkoto Gridscale customers won’t renew maintenance. (I’m not sure that they’ll even get the option to do so.)
- The point of Teradata’s technology + engineers acquisition of Xkoto is to enhance Teradata’s active-active or multi-active data warehousing capabilities, which it has had in some form for several years.
- In particular, Teradata wants to tie together different products in the Teradata product line. (Note: Those typically all run pretty much the same Teradata database management software, except insofar as they might be on different releases.)
- Scott rattled off all the plausible areas of enhancement, with multiple phrasings – performance, manageability, ease of use, tools, features, etc.
- Teradata plans to have one or two releases based on Xkoto technology in 2011.
Frankly, I’m disappointed at the struggles of clustering efforts such as Xkoto Gridscale or Continuent’s pre-Tungsten products, but if the DBMS vendors meet the same needs themselves, that’s OK too.
The logic behind active-active database implementations actually seems pretty compelling: Read more
Categories: Clustering, Continuent, Data warehousing, Solid-state memory, Teradata, Theory and architecture, Xkoto | 9 Comments |
Advice for some non-clients
Edit: Any further anonymous comments to this post will be deleted. Signed comments are permitted as always.
Most of what I get paid for is in some form or other consulting. (The same would be true for many other analysts.) And so I can be a bit stingy with my advice toward non-clients. But my non-clients are a distinguished and powerful group, including in their number Oracle, IBM, Microsoft, and most of the BI vendors. So here’s a bit of advice for them too.
Oracle. On the plus side, you guys have been making progress against your reputation for untruthfulness. Oh, I’ve dinged you for some past slip-ups, but on the whole they’ve been no worse than other vendors.’ But recently you pulled a doozy. The analyst reports section of your website fails to distinguish between unsponsored and sponsored work.* That is a horrible ethical stumble. Fix it fast. Then put processes in place to ensure nothing that dishonest happens again for a good long time.
*Merv Adrian’s “report” listed high on that page is actually a sponsored white paper. That Merv himself screwed up by not labeling it clearly as such in no way exonerates Oracle. Besides, I’m sure Merv won’t soon repeat the error — but for Oracle, this represents a whole pattern of behavior.
Oracle. And while I’m at it, outright dishonesty isn’t your only unnecessary credibility problem. You’re also playing too many games in analyst relations.
HP. Neoview will never succeed. Admit it to yourselves. Go buy something that can. Read more
Microstrategy technology notes
Earlier this week, Microstrategy made Mark LaRow available to talk about technology. The proximate reason was my recent mention of Microstrategy’s mobile BI emphasis, but we also touched on Microstrategy’s approach to in-memory business intelligence and some other subjects. We didn’t go into the depth of a similar conversation I had recently with Qlik Technologies, but I found it quite interesting even so.
Highlights of the in-memory BI discussion included:
- Microstrategy’s in-memory BI data structure is some kind of simple array, redundantly called a “vector array.” A more precise description was not available.
- While early versions of the capability have been around since 2002, Microstrategy’s in-memory BI capability only got serious with Microstrategy 9, which was released in Q1 of 2009. In particular, Microstrategy 9 was the first time in-memory BI had full security.
- Mark says a core reason for having their own in-memory BI is because Microstrategy has more smarts to predict which aggregates will or won’t be needed. Strictly speaking, that can’t be argued with. Vendors like Infobright would argue they come close enough to that ideal as to make little practical difference – but I’m also cheating by naming Infobright, which is particularly focused in that direction.
- Microstrategy in-memory BI compresses data by about 2X. Mark didn’t know which compression algorithm was used.
- The limitation on what’s in-memory is, of course, how much RAM you can fit on an SMP box. Microstrategy has seen up to ½ terabyte deployments.
- In-memory Microstrategy data structures are typically built during the batch window, for performance reasons. This is not, strictly speaking, mandatory, but I didn’t get a sense that Microstrategy was being used for much that resembled real-time business intelligence.
- Mark said Microstrategy has no interest in using solid-state memory to expand the reach of its in-memory BI. Frankly, if Microstrategy doesn’t change that stance, it’s in-memory BI capabilities are unlikely to stay significant for too many years.
Another key subject we discussed was Microstrategy’s view of dashboards. Read more
Categories: Business intelligence, Data warehousing, Memory-centric data management, MicroStrategy | Leave a Comment |
False-positive alerts, non-collaborative BI, inaccurate metrics, and what to do about them
I’ve been hinting at some points for quite a long time, without really spelling them out in written form. So let’s fix that. I believe:
- “Push” alerting technology could be much more granular and useful, but is being held back by the problem of false positives.
- Metrics passed down from on high didn’t work too well in Stalin’s USSR, and haven’t improved sufficiently since.
- A large, necessary piece of the solution to both problems is a great engine for setting and modifying metrics definitions.
I shall explain. Read more
Categories: Analytic technologies, Business intelligence, Data warehousing, MicroStrategy, Theory and architecture | 10 Comments |
Breakthrough: Exadata now has as many reference accounts as Aster Data!
According to Bob Evans of Information Week, there now are 15 disclosed Exadata reference accounts. Coincidentally, there are exactly 15 logos on Aster Data’s customer page. So on its own, that’s not a particularly impressive piece of information.
But other highlights of his column include:
- Some of those accounts are rather big-name. However, I’m not at all sure whether they’re actual production references.
- Andy Mendelsohn characterizes the sweet spot of Exadata’s market as “virtual private cloud.” That matches what Juan Loaiza told me six months ago.
- Oracle claims numerous competitive wins for Exadata. Let me hasten to note that one vendor’s “competitive win” is another vendor’s “our salesman read the deal as an unfavorable one and chose not to compete,” or even sometimes “Huh? We never heard about that deal.” That said, what I’m hearing is that Exadata is indeed a much stronger competitor than it used to be.
- Oracle claims a near $1 billion sales run rate for Exadata. No doubt, a large majority of those are hardware upgrades for existing Oracle database customers, often from non-Sun/Oracle hardware. Even so, some of those are surely deals that would have migrated away from Oracle in the pre-Exadata past.
Categories: Aster Data, Data warehousing, Exadata, Market share and customer counts, Oracle | 1 Comment |
How I’m planning to package user services
On the Monash Research business website right now, you could find multiple pages explaining and extolling our vendor consulting services. We even have posted standard contracts that:
- Are concise.
- Are priced in terms units of work, yet do not require me to meter services at precise hourly or daily rates.
- Have a minimum scope that allows me to feel comfortable I’m spending enough time with a client to do good work.
- Extend over time, mimicking the subscription model of analyst services.*
- Do not contain any concept of “work for hire,” transfer of intellectual property, or “we own your brain.”
- Don’t have any other features that are stunningly inappropriate for our business.
By way of contrast, the user services portion of our site is only a few lines long, and that’s beginning to hurt. Read more
Categories: About this blog, Analytic technologies, Business intelligence, Data warehousing | 6 Comments |
More on Greenplum and EMC
I talked with Ben Werther of Greenplum for about 40 minutes, which was my first post-merger Greenplum/EMC briefing. “Historical” highlights include:
- Ben says Greenplum wasn’t being shopped, by which he means Greenplum was out raising more capital and the fund-raising was going well. Note: Half or so of Greenplum’s deals were subscription-priced, so it had weaker cash flow than it would have if it were doing equally well selling perpetual licenses.
- However, joint engineering was also going well with, e.g., Greenplum CTO Luke Lonergan spending time at EMC facilities in Cork, Ireland. And one thing led to another …
- Greenplum has ~ 140 customers, vs. ~65 five quarters ago, 100+ at year-end, and an acquisition rate of 12-15/quarter last fall.
- A typical “small” paying customer for Greenplum starts with 10-20 TB of data.
- Greenplum Chorus isn’t generally available yet, with rollout energy being focused on Greenplum 4.0. Note: As important as it is for overall industry direction, Greenplum Chorus is a product which won’t be a terribly big deal in Release 1 anyway.
Highlights looking forward include: Read more
Categories: Data warehouse appliances, Data warehousing, EMC, Greenplum, Market share and customer counts | 7 Comments |
Will a data warehouse DBMS consolidation happen?
Naturally, people are wondering whether the Greenplum/EMC deal will trigger further consolidation in the analytic DBMS industry. Here is a lightly edited version of an IM chat I just had on the subject.
CurtMonash: I think consolidation is inevitable, and this deal is just a piece of it. That’s more like a “Yes” than a “No”, but I think “trigger” is overstated.
CurtMonash: Participants with good reasons for surviving include Oracle, Microsoft, IBM, Sybase, Teradata, Netezza, Greenplum, Vertica, Aster, and more. That’s too many to all remain as independent companies. (Edit: Infobright becomes a full member of that list if its Release 4 goes well.)
CurtMonash: Some will buy each other. HP needs to buy somebody at some point. Dell and Cisco are the ones who might feel a bit pushed to make acquisitions if their competitors’ stacks are too successful.
CurtMonash: I think successful vendors will feel embarrassed if they can’t beat the price DATAllegro got. 😉
CurtMonash: I also think ParAccel, Kickfire, and Calpont would be worth more acquired than independent.
CurtMonash: I don’t think the EMC/ParAccel deal was significant enough for ParAccel to have much to lose. 😉 (Edit: But everything is relative.)
CurtMonash: Kickfire laid off its salespeople. It needs to be bought soon.
Categories: Data warehousing | Leave a Comment |
Why analytic DBMS increasingly need to be storage-aware
In my quick reactions to the EMC/Greenplum announcement, I opined
I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner
promising to explain what I meant later on. So here goes. Read more