Analytic technologies
Discussion of technologies related to information query and analysis. Related subjects include:
- Business intelligence
- Data warehousing
- (in Text Technologies) Text mining
- (in The Monash Report) Data mining
- (in The Monash Report) General issues in analytic technology
IBM DB2 10
Shortly before Tuesday’s launch of DB2 10, IBM’s Conor O’Mahony checked in for a relatively non-technical briefing.* More precisely, this is about DB2 for “distributed” systems, aka LUW (Linux/Unix/Windows); some of the features have already been in the mainframe version of DB2 for a while. IBM is graciously permitting me to post the associated DB2 10 announcement slide deck.
*I hope any errors in interpretation are minor.
Major aspects of DB2 10 include new or improved capabilities in the areas of:
- Compression.
- Analytic query performance.
- Data ingest.
- Multi-temperature data management.
- Workload management.
- Graph management/relationship analytics.
- Time-travel, bitemporal features, and bitemporal time-travel.
Of course, there are various other enhancements too, including to security (fine-grained access control), Oracle compatibility, and DB2 pureScale. Everything except the pureScale part is also reflected in IBM InfoSphere Warehouse, which is a near-superset of DB2.*
*Also, the data ingest part isn’t in base DB2.
| Categories: Data warehousing, Database compression, IBM and DB2, RDF and graphs, Solid-state memory, Workload management | 3 Comments |
Notes on the ClearStory Data launch, including an inaccurate quote from me
ClearStory Data launched, with nice coverage in the New York Times, Computerworld, and elsewhere. But from my standpoint, there were some serious problems:
- (Bad.) I was planning to cover the launch as well, in a split exclusive, but that plan was changed, costing me considerable wasted work.
- (Worse.) I wasn’t told of the change as soon as it was known. Indeed, I wasn’t told at all; I was left to infer it from the fact that I was now being asked to talk with other reporters.
- (Horrific.) I was quoted in the ClearStory launch press release, but while the sentiments were reasonably in line with my own, the quote was incorrect.*
I’m utterly disgusted with this whole mess, although after talking with her a lot I’m fine with CEO Sharmila Mulligan’s part in it, which is to say with ClearStory’s part in general.
*I avoid the term “platform” as much as possible; indeed, I still don’t really know what the “new platforms” part was supposed to refer to. The Frankenquote wound up with some odd grammar as well.
Actually, in principle I’m a pretty close adviser to ClearStory (for starters, they’re one of my stealth-mode clients). That hasn’t really ramped up yet; in particular, I haven’t had a technical deep dive. So for now I’ll just say:
| Categories: Business intelligence, ClearStory Data, Data integration and middleware, Data mart outsourcing | Leave a Comment |
Juggling analytic databases
I’d like to survey a few related ideas:
- Enterprises should each have a variety of different analytic data stores.
- Vendors — especially but not only IBM and Teradata — are acknowledging and marketing around the point that enterprises should each have a number of different analytic data stores.
- In addition to having multiple analytic data management technology stacks, it is also desirable to have an agile way to spin out multiple virtual or physical relational data marts using a single RDBMS. Vendors are addressing that need.
- Some observers think that the real essence of analytic data management will be in data integration, not the actual data management.
Here goes. Read more
Hardware and components — lessons from Teradata
I love talking with Carson Schmidt, chief of Teradata’s hardware engineering (among other things), even if I don’t always understand the details of what he’s talking about. It had been way too long since our last chat, so I requested another one. We were joined by Keith Muller, who I presume is pictured here. Takeaways included:
- Teradata performance growth was slow in the early 2000s, but has accelerated since then; Intel gets a lot of the credit (and blame) for that.
- Carson hopes for a performance “discontinuity” with Intel Ivy Bridge.
- Teradata is not afraid to use niche special-purpose chips.
- Teradata’s views can be taken as well-informed endorsements of InfiniBand and SAS 2.0.
| Categories: Data warehouse appliances, Data warehousing, Database compression, Solid-state memory, Storage, Teradata | 9 Comments |
Translucent modeling, and the future of internet marketing
There’s a growing consensus that consumers require limits on the predictive modeling that is done about them. That’s a theme of the Obama Administration’s recent work on consumer data privacy; it’s central to other countries’ data retention regulations; and it’s specifically borne out by the recent Target-pursues-pregnant-women example. Whatever happens legally, I believe this also calls for a technical response, namely:
Consumers should be shown key factual and psychographic aspects of how they are modeled, and be given the chance to insist that marketers disregard any or all of those aspects.
I further believe that the resulting technology should be extended so that
information holders can collaborate by exchanging estimates for such key factors, rather than exchanging the underlying data itself.
To some extent this happens today, for example with attribution/de-anonymization or with credit scores; but I think it should be taken to another level of granularity.
My name for all this is translucent modeling, rather than “transparent”, the idea being that key points must be visible, but the finer details can be safely obscured.
Examples of dialog I think marketers should have with consumers include: Read more
| Categories: Liberty and privacy, Predictive modeling and advanced analytics, Web analytics | Leave a Comment |
The latest privacy example — pregnant potential Target shoppers
Charles Duhigg of the New York Times wrote a very interesting article, based on a forthcoming book of his, on two related subjects:
- The force of habit on our lives, and how we can/do deal with it. (That’s the fascinating part.)
- A specific case of predictive modeling. (That’s the part that’s getting all the attention. It’s interesting too.)
The predictive modeling part is that Target determined:
- People only change their shopping habits occasionally
- One of those occasions is when they get pregnant
- Hence, it would be a Really Good Idea to market aggressively to pregnant women
and then built a marketing strategy around early indicators of a woman’s pregnancy. Read more
| Categories: Liberty and privacy, Predictive modeling and advanced analytics, Specific users | Leave a Comment |
SAP HANA today
SAP HANA has gotten much attention, mainly for its potential. I finally got briefed on HANA a few weeks ago. While we didn’t have time for all that much detail, it still might be interesting to talk about where SAP HANA stands today.
The HANA section of SAP’s website is a confusing and sometimes inaccurate mess. But an IBM whitepaper on SAP HANA gives some helpful background.
SAP HANA is positioned as an “appliance”. So far as I can tell, that really means it’s a software product for which there are a variety of emphatically-recommended hardware configurations — Intel-only, from what right now are eight usual-suspect hardware partners. Anyhow, the core of SAP HANA is an in-memory DBMS. Particulars include:
- Mainly, HANA is an in-memory columnar DBMS, based on SAP’s confusingly-renamed BI Accelerator/BW Accelerator. Analytics and most OLTP (OnLine Transaction Processing) go against the columnar part of HANA.
- The HANA DBMS also has an in-memory row storage option, used to store metadata, small tables, and so on.
- SAP HANA talks both SQL and MDX.
- The HANA DBMS is shared-nothing across blades or rack servers. I imagine that within an individual blade it’s shared everything. The usual-suspect data distribution or partitioning strategies are available — hash, range, round-robin.
- SAP HANA has what sounds like a natural disk-based persistence strategy — logs, snapshots, and so on. SAP says that this is synchronous enough to give ACID compliance. For some hardware partners, those “disks” are actually Fusion I/O cards.
- HANA is fault-tolerant “across servers”.
- Text support is “coming soon”, which makes sense, given that BI Accelerator was based on the TREX search engine in the first place. Inxight is also in the HANA text mix.
- You can put data into SAP HANA in a variety of obvious ways:
- Writing it directly.
- Trigger-based replication (perhaps from the DBMS that runs your SAP apps).
- Log-based replication (based on Sybase Replication Server).
- SAP Business Objects’ ETL tool.
SAP says that the row-store part is based both on P*Time, an acquisition from Korea some time ago, and also on SAP’s own MaxDB. The IBM white paper mentions only the MaxDB aspect. (Edit: Actually, see the comment thread below.) Based on a variety of clues, I conjecture that this was an aspect of SAP HANA development that did not go entirely smoothly.
Other SAP HANA components include: Read more
Third-party analytics
This is one of a series of posts on business intelligence and related analytic technology subjects, keying off the 2011/2012 version of the Gartner Magic Quadrant for Business Intelligence Platforms. The four posts in the series cover:
- Overview comments about the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms, as well as a link to the actual document.
- Business intelligence industry trends — some of Gartner’s thoughts but mainly my own.
- Company-by-company comments based on the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms.
- (This post) Third-party analytics, pulling together and expanding on some points I made in the first three posts.
I’ve written a lot this weekend about various areas of business intelligence and related analytics. A recurring theme has been what we might call third-party analytics — i.e., anything other than buying analytic technology and deploying it in your own enterprise. Four main areas include:
- Business intelligence software OEMed to packaged operational application vendors.
- Business intelligence software OEMed to SaaS (Software as a Service) application vendors.
- Business intelligence software bundled into information-selling businesses.
- Stakeholder-facing analytics, which usually is just BI allowing customers (or suppliers, investors, citizens, etc.) to look into one of your databases.
| Categories: Business intelligence, Business Objects, Information Builders, Intersystems and Cache', Jaspersoft, Pentaho, Software as a Service (SaaS) | 1 Comment |
The 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms — company-by-company comments
This is one of a series of posts on business intelligence and related analytic technology subjects, keying off the 2011/2012 version of the Gartner Magic Quadrant for Business Intelligence Platforms. The four posts in the series cover:
- Overview comments about the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms, as well as a link to the actual document.
- Business intelligence industry trends — some of Gartner’s thoughts but mainly my own.
- (This post) Company-by-company comments based on the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms.
- Third-party analytics, pulling together and expanding on some points I made in the first three posts.
The heart of Gartner Group’s 2011/2012 Magic Quadrant for Business Intelligence Platforms was the company comments. I shall expound upon some, roughly in declining order of Gartner’s “Completeness of Vision” scores, dubious though those rankings may be. Read more
Business intelligence industry trends
This is one of a series of posts on business intelligence and related analytic technology subjects, keying off the 2011/2012 version of the Gartner Magic Quadrant for Business Intelligence Platforms. The four posts in the series cover:
- Overview comments about the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms, as well as a link to the actual document.
- (This post) Business intelligence industry trends — some of Gartner’s thoughts but mainly my own.
- Company-by-company comments based on the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms.
- Third-party analytics, pulling together and expanding on some points I made in the first three posts.
Besides company-specific comments, the 2011/2012 Gartner Magic Quadrant for Business Intelligence (BI) Platforms offered observations on overall BI trends in a “Market Overview” section. I have mixed feelings about Gartner’s list. In particular:
- Not inconsistently with my comments on departmental analytics, Gartner sees actual BI business users as favoring ease of getting the job done, while IT departments are more concerned about full feature sets, integration, corporate standards, and license costs.
- However, Gartner says as a separate point that all kinds of users want to relieve some of the complexity of BI, and really of analytics in general. I agree, but don’t think Gartner did a great job in outlining how this complexity reduction could really work.
- Gartner is bullish on mobile business intelligence, but doesn’t really contradict my more skeptical take. Even as it confesses that mobile BI use cases are somewhat thin (my word, not Gartner’s, and no pun intended), it sees mobile BI rapidly becoming mainstream technology.
- Gartner makes a distinction between “data discovery” tools and “enterprise BI” platforms. By “data discovery” I think Gartner means what I’d call the “pattern discovery” focus of investigative analytics. Anyhow, it seems that Gartner:
- Sees users as being confused about how the traditional pattern-monitoring kinds of BI fit with the newer emphasis on investigative analytics, and …
- … shares that confusion itself.
- Gartner observes that “Most BI platforms are deployed as systems of performance measurement, not for decision support.” It evidently sees this as a bad tendency, which is thankfully changing. Automated decisioning is part of the fix Gartner sees, along with collaboration. While I agree on both counts, Gartner oddly doesn’t also connect this to the general rise of investigative analytics.
- Gartner also had a catch-all trend of “new use cases”, listing some examples, but also sort of confessing it wasn’t doing a great job of articulating the point. I think that part of the difficulty is contortions as to what is or isn’t BI; Gartner seems to run into expositional difficulties whenever it touches on the core point that analytics isn’t all about performance-monitoring BI. Another problem is that Gartner doesn’t seem to have really thought through what does and doesn’t work in the area of analytic applications.
Here’s the forest that I suspect Gartner is missing for the trees:
- Even though all-in-one enterprise BI platforms are great at getting data to a multitude of endpoints …
- … and even though the number of endpoints for data are increasing (more users, more devices) …
- … all-in-one enterprise BI platforms fall short in helping the data be used once it arrives …
- … and all-in-one enterprise BI platform vendors will find it hard to catch up with other vendors’ data-use capabilities.
