Microsoft and SQL*Server
Microsoft’s efforts in the database management, analytics, and data connectivity markets. Related subjects include:
- DATAllegro, which is being bought by Microsoft
- (in Text Technologies) Microsoft in the search, online media, and social software markets
- (in The Monash Report) Strategic issues for Microsoft, and Microsoft Office
- (in Software Memories) Historical notes on Microsoft
My current customer list among the analytic DBMS specialists
(This is an updated version of an August, 2008 post.)
One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers. (Another favorite page is the one for testimonials.) For a variety of reasons, I won’t undertake to be more precise about my current customer list than that. But I don’t think it would hurt anything to list the analytic/data warehouse DBMS/appliance specialists in the group. They are:
- Aster Data
- Greenplum
- Infobright
- Kickfire
- Kognitio
- Microsoft
- Netezza (my biggest client this year, probably, because of all the Enzee Universe appearances)
- Sybase
- Teradata
- Vertica
- Attivio, which may or may not be construed as being in the analytic DBMS business
- Clearpace, ditto
All of those are Monash Advantage members.
If you care about all this, you may also be interested in the rest of my standards and disclosures.
| Categories: About this blog, Aster Data, Data warehousing, Greenplum, Infobright, Kickfire, Microsoft and SQL*Server, Netezza, Sybase, Teradata, Vertica Systems | 2 Comments |
The future of data marts
Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC). Key observations around the concept — mixing mine and Greenplum’s together — include:
- Data marts aren’t just for performance (or price/performance). They also exist to give individual analysts or small teams control of their analytic destiny.
- Thus, it would be really cool if business users could have their own analytic “sandboxes” — virtual or physical analytic databases that they can manipulate without breaking anything else.
- In any case, business users want to analyze data when they want to analyze it. It is often unwise to ask business users to postpone analysis until after an enterprise data model can be extended to fully incorporate the new data they want to look at.
- Whether or not you agree with that, it’s an empirical fact that enterprises have many legacy data marts (or even, especially due to M&A, multiple legacy data warehouses). Similarly, it’s an empirical fact that many business users have the clout to order up new data marts as well.
- Consolidating data marts onto one common technological platform has important benefits.
In essence, Greenplum is pitching the story:
- Thesis: Enterprise Data Warehouses (EDWs)
- Antithesis: Data Warehouse Appliances
- Synthesis: Greenplum’s Enterprise Data Cloud vision
When put that starkly, it’s overstated, not least because
Specialized Analytic DBMS != Data Warehouse Appliance
But basically it makes sense, for two main reasons:
- Analysis is performed on all sorts of novel data, from sources far beyond an enterprise’s core transactions. This data neither has to fit nor particularly benefits from being tightly fitted into the core enterprise data model. Requiring it to do so is just an unnecessary and painful bureaucratic delay.
- On the other hand, consolidation can be a good idea even when systems don’t particularly interoperate. Data marts, which commonly do in part interoperate with central data stores, have all the more reason to be consolidated onto a central technology platform/stack.
Reinventing business intelligence
I’ve felt for quite a while that business intelligence tools are due for a revolution. But I’ve found the subject daunting to write about because — well, because it’s so multifaceted and big. So to break that logjam, here are some thoughts on the reinvention of business intelligence technology, with no pretense of being in any way comprehensive.
Natural language and classic science fiction
Actually, there’s a pretty well-known example of BI near-perfection — the Star Trek computers, usually voiced by the late Majel Barrett Roddenberry. They didn’t have a big role in the recent movie, which was so fast-paced nobody had time to analyze very much, but were a big part of the Star Trek universe overall. Star Trek’s computers integrated analytics, operations, and authentication, all with a great natural language/voice interface and visual displays. That example is at the heart of a 1998 article on natural language recognition I just re-posted.
As for reality: For decades, dating back at least to Artificial Intelligence Corporation’s Intellect, there have been offerings that provided “natural language” command, control, and query against otherwise fairly ordinary analytic tools. Such efforts have generally fizzled, for reasons outlined at the link above. Wolfram Alpha is the latest try; fortunately for its prospects, natural language is really only a small part of the Wolfram Alpha story.
A second theme has more recently emerged — using text indexing to get at data more flexibly than a relational schema would normally allow, either by searching on data values themselves (stressed by Attivio) or more by searching on the definitions of pre-built reports (the Google OneBox story). SAP’s Explorer is the latest such view, but I find Doug Henschen’s skepticism about SAP Explorer more persuasive than Cindi Howson’s cautiously favorable view. Partly that’s because I know SAP (and Business Objects); partly it’s because of difficulties such as those I already noted.
Flexibility and data exploration
It’s a truism that each generation of dashboard-like technology fails because it’s too inflexible. Users are shown the information that will provide them with the most insight. They appreciate it at first. But eventually it’s old hat, and when they want to do something new, the baked-in data model doesn’t support it.
The latest attempts to overcome this problem lie in two overlapping trends — cool data exploration/visualization tools, and in-memory analytics.
| Categories: Analytic technologies, Business intelligence, Google, Memory-centric data management, Microsoft and SQL*Server, SAP AG | 12 Comments |
Notes on CEP application development
While performance may not be all that great a source of CEP competitive differentiation, event processing vendors find plenty of other bases for technological competition, including application development, analytics, packaged applications, and data integration. In particular:
- Most independent CEP vendors have some kind of application story in the capital markets vertical, such as packaged applications, ISV partners with packaged applications, application frameworks, and so on.
- CEP vendors offer lots of connectors to specific financial industry price/quote/trade feeds, as well as the usual other kinds of database connectivity (SQL, XML, etc.)
- Aleri/Coral8 (separately and now together) like to call attention to their business intelligence/analytics offerings. Analytics is front-and-center on Truviso’s web site too, not that Truviso does much to call attention to itself, period. (Roman Bukary once said he’d outline Truviso’s new strategy to me in 6-8 weeks or so … it’s now 14 months and counting.)
So far as I can tell, the areas of applications and analytics are fairly uncontroversial. Different CEP vendors have implemented different kinds of things, no doubt focusing on those they thought they would find easiest to build and then sell. But these seem to be choices in business execution, not in core technical philosophy.
In CEP application development, however, real philosophical differences do seem to arise. There are at least three different CEP application development paradigms:
| Categories: Aleri and Coral8, Business intelligence, Complex event processing (CEP), Microsoft and SQL*Server, Progress, Apama, and DataDirect, StreamBase | 5 Comments |
Microsoft announced CEP this week too
Microsoft still hasn’t worked out all the kinks regarding when and how intensely to brief me. So most of what I know about their announcement earlier this week of a CEP/stream processing product* is what I garnered on a consulting call in March. That said, I sent Microsoft my notes from that call, they responded quickly and clearly to my question as to what remained under NDA, and for good measure they included a couple of clarifying comments that I’ll copy below.
*”in the SQL Server 2008 R2 timeframe,” about which Microsoft wrote “the first Community Technology Preview (CTP) of SQL Server 2008 R2 will be available for download in the second half of 2009 and the release is on track to ship in the first half of calendar year 2010. “
Perhaps it is more than coincidence that IBM rushed out its own announcement of an immature CEP technology — due to be more mature in a 2010 release — immediately after Microsoft revealed its plans. Anyhow, taken together, these announcements support my theory that the small independent CEP/stream processing vendors are more or less ceding broad parts of the potential stream processing market.
The main use cases Microsoft talks about for CEP are in the area of sensor data.
| Categories: Analytic technologies, Application areas, Complex event processing (CEP), Microsoft and SQL*Server | 6 Comments |
Database implications if IBM acquires Sun
Reported or rumored merger discussions between IBM and Sun are generating huge amounts of discussion today (some links below). Here are some quick thoughts around the subject of how the IBM/Sun deal — if it happens — might affect the database management system industry.
DATAllegro sales price: $275 million
According to a press release announcing a venture capitalist’s job change,
Microsoft purchased DATAllegro for $275 million
Technically, that needn’t shut down the rumor mill altogether, since given the way deals are structured and reported, it’s unlikely that Microsoft actually cut checks to DATAllegro stockholders in the aggregate amount of $275 million promptly after the close of the acquisition.
Still, it’s a data point of some weight.
Hat tip to Mark Myers.
Closing the book on the DATAllegro customer base
I’m prepared to call an end to the “Guess DATAllegro’s customers” game. Bottom line is that there are three in all, two of which are TEOCO and Dell, and the third of which is a semi-open secret. I wrote last week:
The number of DATAllegro production references is expected to double imminently, from one to two. Few will be surprised at the identity of the second reference. I imagine the number will then stay at two, as DATAllegro technology is no longer being sold, and the third known production user has never been reputed to be particularly pleased with it.
Dell did indeed disclose at TDWI that it was a large DATAllegro user, notwithstanding that Dell is a huge Teradata user as well. No doubt, Dell is gearing up to be a big user of Madison too.
Also at TDWI, I talked with some former DATAllegro employees who now work for rival vendors. None thinks DATAllegro has more than three customers. Neither do I.
| Categories: DATAllegro, Data warehouse appliances, Data warehousing, Market share, Microsoft and SQL*Server, Specific users | 8 Comments |
Data warehousing business trends
I’ve talked with a whole lot of vendors recently, some here at TDWI, as well as users, fellow analysts, and so on. Repeated themes include:
| Categories: Analytic technologies, Application areas, Data mart outsourcing, Data warehousing, Microsoft and SQL*Server, MySQL, Oracle, Teradata, eBay | Leave a Comment |
Microsoft SQL Server Fast Track
Stuart Frost of Microsoft (nee’ DATAllegro) checked in, with Microsoft’s TDWI-timed announcements. The news part was something called “SQL Server Fast Track“, which is the Microsoft SQL Server equivalent to Oracle’s “recommended configurations” or IBM’s “BCUs.” SQL Server Fast Track is further being portrayed as an incremental step toward Madison, Microsoft’s future high-end data warehousing offering.
| Categories: Data warehousing, Microsoft and SQL*Server, Pricing | 4 Comments |
Draft slides on how to select an analytic DBMS
I need to finalize an already-too-long slide deck on how to select an analytic DBMS by late Thursday night. Anybody see something I’m overlooking, or just plain got wrong?
Edit: The slides have now been finalized.
Gartner’s 2009 Magic Quadrant for Business Intelligence
A few days ago I tore into the Gartner Magic Quadrant for Data Warehouse DBMS. Well, the 2009 Gartner Magic Quadrant for Business Intelligence Platforms is out too. (Link here. Last year’s here. Hat tip for both to Doug Henschen.) Unlike the data warehouse MQ, Gartner’s BI MQ clusters its “Leaders” together tightly. But while less bold, the Business Intelligence Magic Quadrant’s claims are just as questionable as those in data warehousing.
Of course, some parts do make sense. E.g.: Read more
Gartner’s 2008 data warehouse database management system Magic Quadrant is out
Gartner’s annual Magic Quadrant for data warehouse DBMS is out. Thankfully, vendors don’t seem to be taking it as seriously as usual, so I didn’t immediately hear about. (I finally noticed it in a Greenplum pay-per-click ad.) Links to Gartner MQs tend to come and go, but as of now here are two working links to the 2008 Gartner Data Warehouse Database Management System MQ. My posts on the 2007 and 2006 MQs have also been updated with working links. Read more
Beyond query
I sometimes describe database management systems as “big SQL interpreters,” because that’s the core of what they do. But it’s not all they do, which is why I describe them as “electronic file clerks” too. File clerks don’t just store and fetch data; they also put a lot of work into neatening, culling, and generally managing the health of their information hoards.
Already 15 years ago, online backup was as big a competitive differentiator in the database wars as any particular SQL execution feature. Security became important in some market segments. Reliability and availability have been important from the getgo. And manageability has been crucial ever since Microsoft lapped Oracle in that regard, back when SQL Server had little else to recommend it except price.*
*Before Oracle10g, the SQL Server vs. Oracle manageability gap was big.
Now data warehousing is demanding the same kinds of infrastructure richness.*
| Categories: Data warehousing, Microsoft and SQL*Server, Oracle | 1 Comment |
Big scientific databases need to be stored somehow
A year ago, Mike Stonebraker observed that conventional DBMS don’t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods. Even so, some of the largest databases around are scientific ones, and they have to be managed somehow. For example:
- Microsoft just put out an overwrought press release. The substance seems to be that Pan-STARRS — a Jim Gray legacy also discussed in an August, 2008 Computerworld article — is adding 1.4 terabytes of image data per night, and one not so new database adds 15 terabytes per year of some kind of computer simulation output used to analyze protein folding. Both run on SQL Server, of course.
- Kognitio has an astronomical database too, at Cambridge University, adding 1/2 a terabyte of data per night.
- Oracle is used for a McGill University proteonomics database called CellMapBase. A figure of 50 terabytes of “mass storage” is included, which doesn’t include tape backup and so on.
- The Large Hadron Collider, once it actually starts functioning, is projected to generate 15 petabytes of data annually, which will be initially stored on tape and then distributed to various computing centers around the world.
- Netezza is proud of its ability to serve images and the like quickly, although off the top of my head I’m not thinking of a major customer it has in that area. (But then, if you just sell software, your academic discount can approach 100%; but if like Netezza you have an actual cost of goods sold, that’s not as appealing an option.)
Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility — e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.
| Categories: Aster Data, Data types, Greenplum, IBM and DB2, Kognitio, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, PostgreSQL, Scientific research | 1 Comment |
Multiple approaches to memory-centric analytics
Memory-centric analytic processing is in the spotlight.
- Microsoft’s big analytics announcement for the week (one of them, anyway), is “Gemini,” which evidently amounts to some kind of in-memory, cube-based analytics, but with columns rather than true cubes as the in-memory data structure.
- That sounds at lot like SAP’s BI Accelerator, which is a way to manifest SAP InfoCubes in-memory in a columnar architecture.
- QlikTech is going gangbusters with memory-centric business intelligence.
- IBM/Cognos’ Applix, which has a rather unique approach to memory-centric cubes, has never lived up to its potential. But now people are being reminded it exists.
- Exasol has made some sales with a highly memory-centric approach to data warehousing. Kognitio’s story is somewhat disk/RAM hybrid (disk is certainly involved, but the best parts of the technology deal with what happens once the data gets into RAM).
- Most of what the CEP (Complex Event Processing, aka event/stream processing) industry does is memory-centric analytics, both via tight integration with operational apps seems and for conventional BI.
| Categories: Analytic technologies, Memory-centric data management, Microsoft and SQL*Server | 3 Comments |
Advance sound bites on the Microsoft/DATAllegro announcement
Microsoft said they’d prebrief me on at least the DATAllegro part of tomorrow’s SQL Server announcements, but that didn’t turn out to happen (at least as of 9 pm Eastern time Sunday night). An embargoed press release did just arrive, but it’s so concise and high-level as to contain almost nothing of interest.
So I might as well post sound bites in advance. Here goes:
- With the DATAllegro acquisition, Microsoft leapfrogged Oracle. But with Exadata, Oracle leapfrogged Microsoft back. Exadata is actually shipping.
- There’s no assurance that the first DATAllegro/Microsoft release will inherit SQL Server’s level of concurrency. After all, DATAllegro/Ingres wasn’t as concurrent as plain Ingres.
- Porting DATAllegro from Ingres to SQL Server is likely to be straightforward. If they screw up it will be because they tried to do too much else at the same time, not because the basic port failed.
- Porting DATAllegro from Linux to Windows should also be OK. DATAllegro doesn’t stress the operating system in the areas where Windows remains weak.
- Earlier this year, DATAllegro had exactly one customer known to be in production, but I’ve spoken with that one. It’s TEOCO, which has a multi-hundred terabyte DATAllegro installation. TEOCO is a very price-oriented buyer.
- DATAllegro reports that two more customers are in production with large systems now. Neither of those is believed by industry sources to be especially in love with DATAllegro. Otherwise, nobody seems able and willing to identify other DATAllegro customers.
I’m going to be pretty busy Monday anyway. Linda is having a bit of oral surgery. And if I get back from that in time, I have calls set up with a couple of clients.
| Categories: DATAllegro, Data warehouse appliances, Data warehousing, Microsoft and SQL*Server | 2 Comments |
Microsoft/DATAllegro time frame announced
Edit: Actually, an email did eventually wend its way to me about a day later, which evidently had run into major congestion somewhere in the intertubes.
My resolve to eschew scathing sarcasm is being sorely tested tonight. The lastest trial is my discovery that nobody thought to so much as email me a press release, let alone brief me, on Microsoft’s announcement of a timetable for DATAllegro/SQL Server integration. Per Ina Fried — with a hat tip to anonymous commenter L.J. — Microsoft says:
The final version of that product is slated for the first half of 2010, though Microsoft said it will begin giving customers and partners access to early “community technology preview” releases within the next 12 months.
| Categories: DATAllegro, Data warehousing, Microsoft and SQL*Server | Leave a Comment |
A NoteWorthy win for Intersystems Cache’
A small Microsoft SQL Server-based medical application vendor called NoteWorthy Medical Systems bought a small Intersystems Cache’-based medical application vendor called Mars Medical Systems. NoteWorthy then decided to rebuild its product line on Intersystems Cache’. A press release ensued.*
*In general, my criticisms of Intersystems’ stealth marketing are beginning to be relaxed. On the other hand, if you want to be technical, I still haven’t actually talked with the company for years …
I spoke briefly with Mark Conner, founder of Mars Medical and now EVP of NoteWorthy, about why he so loves Cache’. (I asked what he disliked about the product; his response was an emphatic “Nothing”.) It basically boils down to two reasons:
-
Mark thinks hierarchical data models are a great fit for medical applications. For example, the application’s UI (and local schema) look quite different depending on which particular complaints or diagnoses apply to particular patient visits.
-
Cache’ just runs and runs w/o DBA intervention. Mark cited a figure of two support engineers for Mars Medical, supporting over 1,000 medical (largely group) practices, almost none of which have DBAs.
The latter feature is crucial to small ISVs selling application software to even smaller users, and is a big part of why Progress and Intersystems have large share in that market. More generally, it’s the most important and common technical advantage that mid-range database management systems generally enjoy versus the market leaders. (The other big advantage, of course, is pricing.)
| Categories: Intersystems and Cache', Microsoft and SQL*Server, Mid-range | 2 Comments |
Further thoughts on DATAllegro/Microsoft
My first, biggest thought about DATAllegro’s acquisition by Microsoft is “Why the ____ did it have to happen while I was trying to relax on my annual Cayman vacation???” Not coincidentally, I don’t plan to neatly cross-link all my posts and so on about DATAllegro/Microsoft until I get back to Acton this weekend.
One linking screwup is that I previously forgot to mention that — in addition to the numerous posts here — I also made several DATAllegro/Microsoft-related posts on my Network World blog A World of Bytes. They include: Read more
| Categories: Analytic technologies, DATAllegro, Data warehousing, Microsoft and SQL*Server | 7 Comments |
Other early coverage of Microsoft/DATAllegro
- Here’s the official press release on DATAllegro’s site, and Microsoft’s.
- Doug Henschen of Intelligent Enterprise has a good article. He got quotes from Microsoft claiming that SQL Server on its own would be able to handle 10s of terabytes of data in the next release, but DATAllegro was needed to get up to the 100s of terabytes. That said, the quotes don’t say whether that’s user data or total disk usage — the latter frankly seems more plausible.
- James Kobielus of Forrester has a long post on the Microsoft/DATAllegro deal, emphasizing product packaging issues and glossing over technological differentiators. (Edit: The post seems down as of Friday midday.)
- This is a few weeks old, but Kevin Closson is extremely skeptical of some of DATAllegro’s technical claims. (Not that it matters much if he’s right — more nodes = more throughput, no matter how much Oracle folks rant.)
- Eric Lai of Computerworld gets it right.
- Larry Dignan thinks the acquisition is part of an overall strong Microsoft enterprise push.
- William McKnight thinks Microsoft usually does a good job of integrating acquisitions.
- DATAllegro CEO Stuart Frost is happy.
- David Hunter thinks Microsoft will blithely continue with DATAllegro’s limited-hardware-support strategy. He’s almost certainly wrong.
- Philip Howard says almost nothing I agree with, although I can’t argue with the part
Conversely, it’s bad news for Ingres, bad news for Oracle, bad news for IBM, bad news for Teradata and bad news for HP, all for obvious reasons. As for the other appliance vendors: they will not be too happy either. In particular, we now have to consider who can survive on their own, who might be acquired, who might do the acquiring, and who is going to disappear.
| Categories: DATAllegro, Data warehousing, Microsoft and SQL*Server | 15 Comments |
DATAllegro could provide Microsoft with a true enterprise data warehouse sooner than you think
Jim Ericson of DM Review emailed the excellent questions:
Does DATAllegro give MSFT full-service high end data warehousing capability? If not, what is missing?
My quick answers are:
- No.
- Two things:
- Hard-core multi-user concurrency.
- Support for more esoteric analytic tools and functionality
Both are largely a matter of product maturity, and as a young company DATAllegro isn’t quite there yet.
That said, integration with Microsoft SQL Server is apt to be a big help in addressing both issues.
How will Oracle save its data warehouse business?
By acquiring DATAllegro, Microsoft has seriously leapfrogged Oracle in data warehouse technology. All doubts about maturity and versatility notwithstanding, DATAllegro has a 10X or better size advantage (actually, I think it’s more like 20-40X) versus Oracle in warehouses its technology can straightforwardly handle. Oracle cannot afford to let this move go unanswered.
It’s of course possible that Oracle has been successfully developing comparable data warehouse technology internally. But it’s unlikely. Oracle hasn’t done anything that radical, internally and successfully, for about 15 years, RAC (Real Application Clusters) excepted. (I.e., since the object/relational extensibility framework started in Release 7.) So in all likelihood, the answer will come via acquisition. I think there are four candidates that make the most sense: Teradata, Vertica, ParAccel, and Greenplum. Kognitio (controlled by former Oracle honcho Geoff Squire) might be in the mix as well. Netezza is probably a non-starter because of its hardware-centric strategy.
Here’s why I’m emphasizing Teradata, Vertica, ParAccel, and Greenplum:
| Categories: Analytic technologies, DATAllegro, Data warehouse appliances, Data warehousing, Greenplum, Microsoft and SQL*Server, Oracle, ParAccel, Teradata, Vertica Systems | 13 Comments |
Microsoft is buying DATAllegro
I’ve long argued that:
- Oracle and Microsoft are doomed in the data warehouse market unless they acquire MPP/shared-nothing data warehouse DBMS and/or data warehouse appliances.
- DATAllegro is the ideal acquisition for either of them.
Microsoft has now validated my claim by agreeing to buy DATAllegro. As you probably know, we’ve been covering DATAllegro extensively, as per the links listed below.
Basic deal highlights include:
Who is doing what in XML data management these days?
A comment thread to a post on a different subject has opened up a discussion of XML storage. Frankly, I haven’t kept up with my briefings on the subject, in part because XML support hasn’t proved to be very important yet to the big DBMS vendors, somewhat to my surprise. When last I looked, the situation wasn’t much different from what it was back in November, 2005. Unless I’ve missed something (and please tell me if I have!), here’s what’s going on: Read more
