Multiple approaches to memory-centric analytics
Memory-centric analytic processing is in the spotlight.
- Microsoft’s big analytics announcement for the week (one of them, anyway), is “Gemini,” which evidently amounts to some kind of in-memory, cube-based analytics, but with columns rather than true cubes as the in-memory data structure.
- That sounds at lot like SAP’s BI Accelerator, which is a way to manifest SAP InfoCubes in-memory in a columnar architecture.
- QlikTech is going gangbusters with memory-centric business intelligence.
- IBM/Cognos’ Applix, which has a rather unique approach to memory-centric cubes, has never lived up to its potential. But now people are being reminded it exists.
- Exasol has made some sales with a highly memory-centric approach to data warehousing. Kognitio’s story is somewhat disk/RAM hybrid (disk is certainly involved, but the best parts of the technology deal with what happens once the data gets into RAM).
- Most of what the CEP (Complex Event Processing, aka event/stream processing) industry does is memory-centric analytics, both via tight integration with operational apps seems and for conventional BI.
| Categories: Analytic technologies, Memory-centric data management, Microsoft and SQL*Server | 3 Comments |
Advance sound bites on the Microsoft/DATAllegro announcement
Microsoft said they’d prebrief me on at least the DATAllegro part of tomorrow’s SQL Server announcements, but that didn’t turn out to happen (at least as of 9 pm Eastern time Sunday night). An embargoed press release did just arrive, but it’s so concise and high-level as to contain almost nothing of interest.
So I might as well post sound bites in advance. Here goes:
- With the DATAllegro acquisition, Microsoft leapfrogged Oracle. But with Exadata, Oracle leapfrogged Microsoft back. Exadata is actually shipping.
- There’s no assurance that the first DATAllegro/Microsoft release will inherit SQL Server’s level of concurrency. After all, DATAllegro/Ingres wasn’t as concurrent as plain Ingres.
- Porting DATAllegro from Ingres to SQL Server is likely to be straightforward. If they screw up it will be because they tried to do too much else at the same time, not because the basic port failed.
- Porting DATAllegro from Linux to Windows should also be OK. DATAllegro doesn’t stress the operating system in the areas where Windows remains weak.
- Earlier this year, DATAllegro had exactly one customer known to be in production, but I’ve spoken with that one. It’s TEOCO, which has a multi-hundred terabyte DATAllegro installation. TEOCO is a very price-oriented buyer.
- DATAllegro reports that two more customers are in production with large systems now. Neither of those is believed by industry sources to be especially in love with DATAllegro. Otherwise, nobody seems able and willing to identify other DATAllegro customers.
I’m going to be pretty busy Monday anyway. Linda is having a bit of oral surgery. And if I get back from that in time, I have calls set up with a couple of clients.
| Categories: Data warehouse appliances, Data warehousing, DATAllegro, Microsoft and SQL*Server | 3 Comments |
Schema flexibility and XML data management
Conor O’Mahony, marketing manager for IBM’s DB2 pureXML, talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:
- Tax authorities change their rules and forms every year, but don’t want to do total rewrites of their electronic submission and processing software.
- The financial services industry keeps inventing new products, which don’t just have different terms and conditions, but may also have different kinds of terms and conditions.
- The same, to some extent, goes for the travel industry, which also keeps adding different kinds of offers and destinations.
- The energy industry keeps adding new kinds of highly complex equipment it has to manage.
Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange. Read more
| Categories: Data models and architecture, EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents | 3 Comments |
Vertical market XML standards
Tracking the alphabet soup of vertical market XML standards is hard. So as a starting point, I’m splitting a list I got from IBM into a standalone post.
Among the most important or successful IBM pureXML–supported standards, in terms of downloads and other evidence of customer interest, are: Read more
| Categories: Application areas, EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents | 2 Comments |
Overview of IBM DB2 pureXML
On August 29, I had a great call with IBM about DB2 pureXML (most of the IBM side of the talking was done by Conor O’Mahony and Qi Jin). I’m finally getting around to writing it up now. (The world of tabular data warehousing has kept me just a wee bit busy …)
As I write it, I see there are a considerable number of holes, but that’s the way it seems to go when researching XML storage. I’m also writing up a September call from which I finally figured out (I think) the essence of how MarkLogic Server works – but only after five months of trying. It turns out that MarkLogic works rather differently from DB2 pureXML. Not coincidentally, IBM and Mark Logic focus on rather different use cases for native XML storage.
What I understand so far about the basic DB2 pureXML architecture goes like this: Read more
| Categories: EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents | 7 Comments |
MarkLogic architecture deep dive
While I previously posted in great detail about how MarkLogic Server is an ACID-compliant XML-oriented DBMS with integrated text search that indexes everything in real time and executes range queries fairly quickly, I didn’t have a good feel for how all those apparently contradictory characteristics fit into a single product. But I finally had a call with Mark Logic Director of Engineering Ron Avnur, and think I have a better grasp of the MarkLogic architecture and story.
Ron described MarkLogic Server as a DBMS for trees. Read more
| Categories: MarkLogic, Structured documents, Text | 5 Comments |
History, focus, and technology of HP Neoview
On the basis of market impact to date, HP Neoview is just another data warehouse market participant – a dozen sales or so, a few systems in production, some evidence that it can handle 100 TB+ workloads, and so on. But HP’s BI Group CTO Greg Battas thinks Neoview is destined for greater things, because: Read more
| Categories: Data warehouse appliances, Data warehousing, HP and Neoview | 12 Comments |
HP Neoview in the market to date
I evidently got HP’s attention by a recent post in which I questioned its stance on the relative positioning of the Exadata-based HP Oracle data warehouse appliance and the HP Neoview data warehouse appliance. A conversation with Greg Battas and John Miller (respectively CTO and CMO of HP’s BI group) quickly ensued. Mainly we talked about Neoview product goals and architecture. But before I get to that in a separate post, here are some Neoview market-presence highlights, so far as I’ve been able to figure them out: Read more
| Categories: Data warehouse appliances, Data warehousing, HP and Neoview | 1 Comment |
Automatic redistribution of data warehouse data
In a recent Oracle Exadata FAQ, Kevin Closson writes:
Q. […] don’t some of the DW vendors split the data up in a shared nothing method. Thus when the data has to be repartitioned it gets expensive. Whereas here you just add another cell and ASM goes to work in the background. (depending upon the ASM power level you set.)
A. All the DW Appliance vendors implement shared-nothing so, yes, the data is chopped up into physical partitions. If you add hardware to increase performance of queries against your current dataset the data will have to be reloaded into the new partitioning scheme. As has always been the case with ASM, adding new disks-and therefore Exadata Storage Server cells-will cause the existing data to be redistributed automatically over all (including the new) drives. This ASM data redistribution is an online function.
Hmm. That sounds much like the story I’ve heard from various other data warehousing DBMS vendors as well.
Rather than try to speak for them, however, I’ll just post this and see whether they choose to add anything to the comment thread.
| Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle | 7 Comments |
Greenplum pricing
Edit: Actually, this post is completely incorrect. The $20K/terabyte is for software only. So far, my attempts to get Greenplum to estimate hardware costs have been unsuccessful.
Greenplum’s Scott Yara was recently quoted citing a $20K/terabyte figure for Greenplum pricing. That naturally raises the question:
Greenplum charges around $20K/terabyte of what?
| Categories: Data warehouse appliances, Data warehousing, Greenplum, Pricing | 4 Comments |
