IBM and DB2
Analysis of IBM and various of its product lines in database management, analytics, and data integration.
- Cognos
- solidDB
- (in The Monash Report) Operational and strategic issues for IBM
- (in Text Technologies) IBM in the text analytics market
- (in Software Memories) Historical notes on IBM
- (in Software Memories) Historical notes on Informix
Xkoto Gridscale highlights
I talked yesterday with cofounders Albert Lee and Ariff Kassam of Xkoto. Highlights included: Read more
| Categories: Clustering, IBM and DB2, Market share, Microsoft and SQL*Server, Parallelization, Pricing, Xkoto | 7 Comments |
Initial reactions to IBM acquiring SPSS
IBM is acquiring SPSS. My initial thoughts (questions by Eric Lai of Computerworld) include:
1) good buy for IBM? why or why not?
Yes. The integration of predictive analytics with other analytic or operational technologies is still ahead of us, so there was a lot of value to be gained from SPSS beyond what it had standalone. (That said, I haven’t actually looked at the numbers, so I have no comment on the price.)
By the way, SPSS coined the phrase “predictive analytics”, with the rest of the industry then coming around to use it. As with all successful marketing phrases, it’s somewhat misleading, in that it’s not wholly focused on prediction.
2) how does it position IBM vs. competitors?
IBM’s ownership immediately makes SPSS a stronger competitor to SAS. Any advantage to the rest of IBM depends on the integration roadmap and execution.
3) How does this particularly affect SAP and SAS and Oracle, IBM’s closest competitors by revenue according to IDC’s figures?
If one of Oracle or SAP had bought SPSS, it would have given them a competitive advantage against the other, in the integration of predictive analytics with packaged operational apps. That’s a missed opportunity for each.
One notable point is that SPSS is more SQL-oriented than SAS. Thus, SPSS has gotten performance benefits from Oracle’s in-database data mining technology that SAS apparently hasn’t.
IBM’s done a good job of keeping its acquired products working well with Oracle and other competitive DBMS in the past, and SPSS will surely be no exception.
Obviously, if IBM does a good job of Cognos/SPSS integration, that’s bad for competitors, starting with Oracle and SAP/Business Objects. So far business intelligence/predictive analytics integration has been pretty minor, because nobody’s figured out how to do it right, but some day that will change. Hmm — I feel another “Future of … ” post coming on.
4) Do you predict further M&A?
Always.
Related links
- Official word from SPSS and IBM
- Blog posts from Larry Dignan and James Taylor
- James Kobelius’s post, which includes the obvious point that Oracle — unlike SAP — has pretty decent data mining of its own
- Eric Lai’s actual article
| Categories: Analytic technologies, Cognos, IBM and DB2, Oracle, SAP AG, SAS Institute | 7 Comments |
Notes on CEP performance
I’ve been talking to CEP vendors on and off for a few years. So what I hear about performance is fairly patchwork. On the other hand, maybe 1-2+ year-old figures of per-core performance are still meaningful today. After all, Moore’s Law is being reflected more in core count than per-core performance, and it seems CEP vendors’ development efforts haven’t necessarily been concentrated on raw engine speed.
So anyway, what do you guys have to add to the following observations?
- Super-low-latency financial services industry tasks are often “embarrassingly parallel.” Thus, near-linear scale-out is common.
- That said, good parallelism seems fairly new in CEP engines (of course, CEP engines are fairly new themselves — for all I know, some have been parallel since inception).
- I’ve heard claims of up to 400,000 messages/second/core for simple queries or patterns.
- I’ve heard claims of 70,000 messages/core for not-so-simple queries or patterns, and probably higher than that depending on what the meaning of “simple” is.
- IBM just disclosed >15,000 messages/core on a pretty low-powered processor.
- I’ve heard that Coral8, Apama, and StreamBase rarely lost deals due to performance or throughput problems. I’ve heard that the same is not as true of Aleri.
- StreamBase proudly says it’s been fully multithreaded since academic research-project days. For Apama multithreading is evidently a more recent feature. But does it matter much?
| Categories: Aleri and Coral8, Complex event processing (CEP), IBM and DB2, Memory-centric data management, Progress, Apama, and DataDirect, StreamBase | 13 Comments |
Followup on IBM System S/InfoSphere Streams
After posting about IBM’s System S/InfoSphere Streams CEP offering, I sent three followup questions over to Jeff Jones. It seems simplest to just post the Q&A verbatim.
1. Just how many processors or cores does it take to get those 5 million messages/sec through? A little birdie says 4,000 cores. Read more
| Categories: Analytic technologies, Complex event processing (CEP), IBM and DB2, Investment research and trading | 7 Comments |
IBM System S Streams, aka InfoSphere Streams, aka stream processing, aka “please don’t call it CEP”
IBM has hastily announced System S Streams, a product that was supposed to be called InfoSphere Streams and introduced only in 2010. Apparently, the rush is because senior management wanted to talk about it later this week, and perhaps also because it was implicitly baked into some of IBM’s advertising already. Scrambling ensued. Even so, Jeff Jones and team got to me fast, and briefed me — fairly non-technically, unfortunately, but otherwise how I like it, namely on a harmless embargo and without any NDAs. That’s more than can be said for my clients at Microsoft, who also introduced CEP this week, but I digress …
*Indeed, as I draft this post-Celtics-game, the embargo is already expired.
Marketing aside, IBM System S/InfoSphere Streams is indeed a CEP/stream processing engine + language (with an Eclipse-based development environment). Apparently, IBM’s thinks InfoSphere Streams (if that’s what it winds up being renamed to) is or will be differentiated from other CEP packages in:
- Scale-out. (That’s the one that appears to be real today. In fact, there’s a prototype running on Blue Gene.)
- Support for complex datatypes such as XML, text, voice, video, etc.
- Security and general industrial-strengthness.
| Categories: Analytic technologies, Application areas, Complex event processing (CEP), IBM and DB2, Investment research and trading, Scientific research | 3 Comments |
Oracle’s hardware strategy
Larry Ellison stated clearly in an email interview with Reuters (links here and here) that Oracle intends to keep Sun’s hardware business and indeed intends to invest in the SPARC chip. Naturally, I have a few thoughts about this.
As Stephen O’Grady points out, Sun’s main strength lay in selling to the large enterprise market. Well, that’s Oracle’s overwhelming focus too. As I noted two years ago:
One Oracle response is to provide lots of add-on technologies for high-end customers, on the database and middle tiers alike. In app servers it’s done surprisingly well against BEA. It’s sold a lot of clustering. And it’s bought into and tried to popularize niche technologies like TimesTen and Tangosol’s.
This all makes perfect sense – it’s a great fit for Oracle’s best customers, and a way to get thousands of extra dollars per server from enterprises that may already have bought all-you-can-eat licenses to the Oracle DBMS. And being so sensible, it fits into the Clayton Christensen disruption story in two ways:
Oracle may be helpless against mid-tier competition, but it sure has the high-end core of its market locked up.
- As one type of technology is commoditized, value is created in other parts of the technology stack.
Oracle’s ongoing acquisition spree in system software, application software, and now hardware just supports that story. MySQL, embedded Java, and so on may be welcome to Oracle as yet more opportunities to tap additional markets — but Oracle’s emphasis is and surely will remain on the large enterprise market.
The next notable point may be found in Larry’s key quote: Read more
| Categories: Data warehouse appliances, Data warehousing, Exadata, HP and Neoview, IBM and DB2, Oracle | 8 Comments |
Some DB2 highlights
I chatted with IBM Thursday, about recent and imminent releases of DB2 (9.5 through 9.7). Highlights included:
- DB2 is getting Oracle emulation, which I posted about separately.
- IBM says that it had >50 new DB2 data warehouse customers last year. I neglected to ask how many of these had been general-purpose DB2 customers all along.
- By “data warehouse customer” I mean a user for InfoSphere Warehouse, which previously was called DB2’s DPF (Data Partitioning Feature). Apparently, this includes both logical and physical partitioning. E.g., DB2 isn’t shared-nothing without this feature.
- IBM is proud of DB2’s compression, which it claims commonly reaches 70-80%. It calls this “industry-leading” in comparison to Oracle, SQL Server, and other general-purpose relational DBMS.
- DB2 compression’s overall effect on performance stems from a trade-off between I/O (lessened) and CPU burden (increased). For OLTP workloads, this is about a wash. For data warehousing workloads, IBM says 20% performance improvement from compression is average.
- DB2 now has its version of one of my favorite Oracle security features, called Label Based Access Control. A label-control feature can make it much easier to secure data on a row-by-row, value-by-value basis. The obvious big user is national intelligence, followed by financial services. IBM says the health care industry also has interest in LBAC.
- Also in the security area, IBM reworked DB2’s audit feature for 9.5
- I think what I heard in our discussion of DB2 virtualization is:
- Increasingly, IBM is seeing production use of VMware, rather than just test/development.
- IBM believes it is a much closer partner to VMware than Oracle or Microsoft is, because it’s not pushing its own competing technology.
- Generally, virtualization is more important for OLTP workloads than data warehousing ones, because OLTP apps commonly only need part of the resources of a node while data warehousing often wants the whole node.
- AIX data warehousing is an exception. I think this is because AIX equates to big SMP boxes, and virtualization lets you spread out the data warehousing processing across more nodes, with the usual parallel I/O benefits.
- When IBM talks of new autonomic/self-tuning features in DB2, they’re used mainly for databases under 1 terabyte in size. Indeed, the self-tuning feature set doesn’t work with InfoSphere Warehouse.
- Even with the self-tuning feature it sounds as if you need at least a couple of DBA hours per instance per week, on average.
- DB2 on Linux/Unix/Windows has introduced some enhanced workload management features analogous to those long found in mainframe DB2. For example, resource allocation rules can be scheduled by time. (The point of workload management is to allocate resources such as CPU or I/O among the simultaneous queries or other tasks that contend for them.) Workload management rules can have thresholds for amounts of resources consumed, after which the priority for a task can go up (”Get it over with!”) or down (”Stop hogging my system!”).
| Categories: Application areas, Data warehousing, Database compression, IBM and DB2, Market share, OLTP, Parallelization | 2 Comments |
IBM’s Oracle emulation strategy reconsidered
I’ve now had a chance to talk with IBM about its recently-announced Oracle emulation strategy for DB2. (This is for DB2 9.7, which I gather has been quasi-announced in April, will be re-announced in May, and will be re-re-announced as being in general availability in June.)
Key points include:
- This really is more like Oracle emulation than it is transparency, a term I carelessly used before.
- IBM’s Oracle emulation effort is focused on two technological goals:
- Making it easy for an Oracle application to be ported to DB2.
- Making it easy for an Oracle developer to develop for DB2.
- The initial target market for DB2’s Oracle emulation is ISVs (Independent Software Vendors) much more than it is enterprises. IBM suggested there were a couple hundred early adopters, and those are primarily in the ISV area.
Because of Oracle’s market share, many ISVs focus on Oracle as the underlying database management system for their applications, whether or not they actually resell it along with their own software. IBM proposed three reasons why such ISVs might want to support DB2: Read more
| Categories: Data types, Emulation, transparency, portability, EnterpriseDB and Postgres Plus, GIS and geospatial, IBM and DB2, Market share, Native XML, Oracle, Pricing, Text | 9 Comments |
DBMS transparency layers never seem to sell well
A DBMS transparency layer, roughly speaking, is software that makes things that are written for one brand of database management system run unaltered on another.* These never seem to sell well. ANTs has failed in a couple of product strategies. EnterpriseDB’s Oracle compatibility only seems to have netted it a few sales, and only a small fraction of its total business. ParAccel’s and Dataupia’s transparency strategies have produced even less.
*The looseness in that definition highlights a key reason these technologies don’t sell well — it’s hard to be sure that what you’re buying will do a good job of running your particular apps.
This subject comes to mind for two reasons. One is that IBM seems to have licensed EnterpriseDB’s Oracle transparency layer for DB2. The other is that a natural upgrade path from MySQL to Oracle might be a MySQL transparency layer on top of an Oracle base.
| Categories: ANTs Software, Dataupia, Emulation, transparency, portability, EnterpriseDB and Postgres Plus, IBM and DB2, Market share, MySQL, Oracle, ParAccel | 9 Comments |
Database implications if IBM acquires Sun
Reported or rumored merger discussions between IBM and Sun are generating huge amounts of discussion today (some links below). Here are some quick thoughts around the subject of how the IBM/Sun deal — if it happens — might affect the database management system industry. Read more
