Analysis of companies, products, and user strategies in the area of business intelligence. Related subjects include:
Teradata Aster 6 has been preannounced (beta in Q4, general release in Q1 2014). The general architectural idea is:
- There are multiple data stores, the first two of which are:
- The classic Aster relational data store.
- A file system that emulates HDFS (Hadoop Distributed File System).
- There are multiple processing “engines”, where an engine is what occupies and controls a processing thread. These start with:
- Generic analytic SQL, as Aster has had all along.
- SQL-MR, the MapReduce Aster has also had all along.
- SQL-Graph aka SQL-GR, a graph analytics system.
- The Aster parser and optimizer accept glorified SQL, and work across all the engines combined.
There’s much more, of course, but those are the essential pieces.
Just to be clear: Teradata Aster 6, aka the Teradata Aster Discovery Platform, includes HDFS compatibility, native MapReduce and ways of invoking Hadoop MapReduce on non-Aster nodes or clusters — but even so, you can’t run Hadoop MapReduce within Aster over Aster’s version of HDFS.
The most dramatic immediate additions are in the graph analytics area.* The new SQL-Graph is supported by something called BSP (Bulk Synchronous Parallel). I’ll start by observing (and some of this is confusing):
- BSP was thought of a long time ago, as a general-purpose computing model, but recently has come to the fore specifically for graph analytics. (Think Pregel and Giraph, along with Teradata Aster.)
- BSP has a kind of execution-graph metaphor, which is different from the graph data it helps analyze.
- BSP is described as being a combination hardware/software technology, but Teradata Aster and everybody else I know of implements it in software only.
- Aster long ago talked of adding a graph data store, but has given up that plan; rather, it wants you to do graph analytics on data stored in tables (or accessed through views) in the usual way.
Use cases suggested are a lot of marketing, plus anti-fraud.
*Pay no attention to Aster’s previous claims to do a good job on graph — and not only via nPath — in SQL-MR.
So far as I can infer from examples I’ve seen, the semantics of Teradata Aster SQL-Graph start:
- Ordinary SQL except in the FROM clause.
- Functions/operators that are the arguments for FROM; of course, they output tables. You can write these yourself, or use Teradata Aster’s prebuilt ones.
Within those functions, the core idea is: Read more
|Categories: Application areas, Aster Data, Business intelligence, Data models and architecture, Data warehousing, Hadoop, Parallelization, Predictive modeling and advanced analytics, RDF and graphs, Teradata||4 Comments|
In a general pontification on positioning, I wrote:
every product in a category is positioned along the same set of attributes,
and went on to suggest that summary attributes were more important than picky detailed ones. So how does that play out for investigative analytics?
First, summary attributes that matter for almost any kind of enterprise software include:
- Performance and scalability. I write about analytic performance and scalability a lot. Usually that’s in the context of analytic DBMS, but it also arises in analytic stacks such as Platfora, Metamarkets or even QlikView, and also in the challenges of making predictive modeling scale.
- Reliability, availability and security.* This is more crucial for short-request applications than analytic ones, but even your analytic systems shouldn’t leak data or crash.
- Goodness of fit with legacy systems. I hate that one, because enterprises often sacrifice way too much in favor of that benefit.
- Price. Duh.
*I picked up that phrase when — abbreviated as RAS — it was used to characterize the emphasis for Oracle 8. I like it better than a general and ambiguous concept of “enterprise-ready”.
The reason I’m writing this post, however, is to call out two summary attributes of special importance in investigative analytics — which regrettably which often conflict with each other — namely:
- Agility. People don’t want to submit requests for reports or statistical analyses; they want to get answers as soon as the questions come to mind.
- Completeness of feature set — for a particular use case, that is. There’s no such thing as an investigative analytics offering with a feature set that’s close to complete for all purposes; even SAS, IBM and other behemoths fall short.
Much of what I work on boils down to those two subjects. For example: Read more
|Categories: Aster Data, Business intelligence, Data warehousing, KXEN, Predictive modeling and advanced analytics, SAS Institute, Teradata||8 Comments|
I’ve suggested in the past, approximately, that the platform technology side of business intelligence is more significant than the user interface. That formulation, however, doesn’t exactly capture what I believe. To be more precise, let’s differentiate between a couple aspects of business intelligence UI.
It might seem that a lot of the action in business intelligence revolves around ever-better visualization. After all, Tableau is clearly identified as a visualization-centric technology; who’s hotter than Tableau? And numerous other vendors talk of “visualizations” too. But I don’t think that’s exactly right — rather, I see navigation as being a much bigger deal. And unlike most pure visualization, navigation usually depends strongly on underlying platform capabilities.
Examples of what I mean by innovative navigation — all of which have been developed or have gained prominence over the past decade or so — include:
- QlikView’s core behavior — all that associative navigation.
- QlikView’s collaboration, and every other BI collaboration capability I know of.
- ClearStory, although you won’t get to see what I mean until the launch next month.
- BI search or faceted-search UIs. (E.g. Endeca.)
- BI that is launched from operational applications.
As is the case for most important categories of technology, discussions of BI can get confused. I’ve remarked in the past that there are numerous kinds of BI, and that the very origin of the term “business intelligence” can’t even be pinned down to the nearest century. But the most fundamental confusion of all is that business intelligence technology really is two different things, which in simplest terms may be categorized as user interface (UI) and platform* technology. And so:
- The UI aspect is why BI tends to be sold to business departments; the platform aspect is why it also makes sense to sell BI to IT shops attempting to establish enterprise standards.
- The UI aspect is why it makes sense to sell and market BI much as one would applications; the platform aspect is why it makes sense to sell and market BI much as one would database technology.
- The UI aspect is why vendors want to integrate BI with transaction-processing applications; the platform aspect is, I suppose, why they have so much trouble making the integration work.
- The UI aspect is why BI is judged on … well, on snazzy UIs and demos. The platform aspect is a big reason why the snazziest UI doesn’t always win.
*I wanted to say “server” or “server-side” instead of “platform”, as I dislike the latter word. But it’s too inaccurate, for example in the case of the original Cognos PowerPlay, and also in various thin-client scenarios.
Key aspects of BI platform technology can include:
- Query and data management. That’s the area I most commonly write about, for example in the cases of Platfora, QlikView, or Metamarkets. It goes back to the 1990s — notably the Business Objects semantic layer and Cognos PowerPlay MOLAP (MultiDimensional OnLine Analytic Processing) engine — and indeed before that to the report writers and fourth-generation languages of the 1970s. This overlaps somewhat with …
- … data integration and metadata management. Business Objects, Qlik, and other BI vendors have bought data integration vendors. Arguably, there was a period when Information Builders’ main business was data connectivity and integration. And sometimes the main value proposition for a BI deal is “We need some way to get at all that data and bring it together.”
- Security and access control – authentication, authorization, and all the additional As.
- Scheduling and delivery. When 10s of 1000s of desktops are being served, these aren’t entirely trivial. Ditto when dealing with occasionally-connected mobile devices.
|Categories: Business intelligence, Business Objects, ClearStory Data, Cognos, Data warehousing, Endeca, Information Builders, Metamarkets and Druid, MOLAP, Platfora, Predictive modeling and advanced analytics, QlikTech and QlikView||11 Comments|
Some subjects just keep coming up. And so I keep saying things like:
Most generalizations about “Big Data” are false. “Big Data” is a horrific catch-all term, with many different meanings.
Most generalizations about Hadoop are false. Reasons include:
- Hadoop is a collection of disparate things, most particularly data storage and application execution systems.
- The transition from Hadoop 1 to Hadoop 2 will be drastic.
- For key aspects of Hadoop — especially file format and execution engine — there are or will be widely varied options.
Hadoop won’t soon replace relational data warehouses, if indeed it ever does. SQL-on-Hadoop is still very immature. And you can’t replace data warehouses unless you have the power of SQL.
Note: SQL isn’t the only way to provide “the power of SQL”, but alternative approaches are just as immature.
Most generalizations about NoSQL are false. Different NoSQL products are … different. It’s not even accurate to say that all NoSQL systems lack SQL interfaces. (For example, SQL-on-Hadoop often includes SQL-on-HBase.)
I made a remarkably rumpled video appearance yesterday with SiliconAngle honchos John Furrier and Dave Vellante. (Excuses include <3 hours sleep, and then a scrambling reaction to a schedule change.) Topics covered included, with approximate timechecks:
- 0:00 Introductory pabulum, and some technical difficulties
- 2:00 More introduction
- 3:00 Dynamic schemas and data model churn
- 6:00 Surveillance and privacy
- 13:00 Hadoop, especially the distro wars
- 22:00 BI innovation
- 23:30 More on dynamic schemas and data model churn
Edit: Some of my remarks were transcribed.
- I posted on dynamic schemas data model churn a few days ago.
- I capped off a series on privacy and surveillance a few days ago.
- I commented on various Hadoop distributions in June.
|Categories: Business intelligence, ClearStory Data, Data warehousing, Hadoop, MapR, MapReduce, Surveillance and privacy||Leave a Comment|
I lampoon the word “disruptive” for being badly overused. On the other hand, I often refer to the concept myself. Perhaps I should clarify.
- Market leaders serve high-end customers with complex, high-end products and services, often distributed through a costly sales channel.
- Upstarts serve a different market segment, often cheaply and/or simply, perhaps with a different business model (e.g. a different sales channel).
- Upstarts expand their offerings, and eventually attack the leaders in their core markets.
In response (this is the Innovator’s Solution part):
- Leaders expand their product lines, increasing the value of their offerings in their core markets.
- In particular, leaders expand into adjacent market segments, capturing margins and value even if their historical core businesses are commoditized.
- Leaders may also diversify into direct competition with the upstarts, but that generally works only if it’s via a separate division, perhaps acquired, that has permission to compete hard with the main business.
But not all cleverness is “disruption”.
- Routine product advancement by leaders — even when it’s admirably clever — is “sustaining” innovation, as opposed to the disruptive stuff.
- Innovative new technology from small companies is not, in itself, disruption either.
Here are some of the examples that make me think of the whole subject. Read more
|Categories: Business intelligence, Data warehousing, Hadoop, Microsoft and SQL*Server, MongoDB and 10gen, MySQL, Netezza, NewSQL, NoSQL, Oracle, Predictive modeling and advanced analytics, QlikTech and QlikView, Tableau Software||13 Comments|
I’ll start with three observations:
- Computer systems can’t be entirely tightly coupled — nothing would ever get developed or tested.
- Computer systems can’t be entirely loosely coupled — nothing would ever get optimized, in performance and functionality alike.
- In an ongoing trend, there is and will be dramatic refactoring as to which connections wind up being loose or tight.
As written, that’s probably pretty obvious. Even so, it’s easy to forget just how pervasive the refactoring is and is likely to be. Let’s survey some examples first, and then speculate about consequences. Read more
I talk with a lot of companies, and repeatedly hear some of the same application themes. This post is my attempt to collect some of those ideas in one place.
1. So far, the buzzword of the year is “real-time analytics”, generally with “operational” or “big data” included as well. I hear variants of that positioning from NewSQL vendors (e.g. MemSQL), NoSQL vendors (e.g. AeroSpike), BI stack vendors (e.g. Platfora), application-stack vendors (e.g. WibiData), log analysis vendors (led by Splunk), data management vendors (e.g. Cloudera), and of course the CEP industry.
Yeah, yeah, I know — not all the named companies are in exactly the right market category. But that’s hard to avoid.
Why this gold rush? On the demand side, there’s a real or imagined need for speed. On the supply side, I’d say:
- There are vast numbers of companies offering data-management-related technology. They need ways to differentiate.
- Doing analytics at short-request speeds is an obvious data-management-related challenge, and not yet comprehensively addressed.
2. More generally, most of the applications I hear about are analytic, or have a strong analytic aspect. The three biggest areas — and these overlap — are:
- Customer interaction
- Network and sensor monitoring
- Game and mobile application back-ends
Also arising fairly frequently are:
- Algorithmic trading
- Risk measurement
- Law enforcement/national security
- Stakeholder-facing analytics
I’m hearing less about quality, defect tracking, and equipment maintenance than I used to, but those application areas have anyway been ebbing and flowing for decades.
Well-resourced Silicon Valley start-ups typically announce their existence multiple times. Company formation, angel funding, Series A funding, Series B funding, company launch, product beta, and product general availability may not be 7 different “news events”, but they’re apt to be at least 3-4. Platfora, no exception to this rule, is hitting general availability today, and in connection with that I learned a bit more about what they are up to.
In simplest terms, Platfora offers exploratory business intelligence against Hadoop-based data. As per last weekend’s post about exploratory BI, a key requirement is speed; and so far as I can tell, any technological innovation Platfora offers relates to the need for speed. Specifically, I drilled into Platfora’s performance architecture on the query processing side (and associated data movement); Platfora also brags of rendering 100s of 1000s of “marks” quickly in HTML5 visualizations, but I haven’t a clue as to whether that’s much of an accomplishment in itself.
Platfora’s marketing suggests it obviates the need for a data warehouse at all; for most enterprises, of course, that is a great exaggeration. But another dubious aspect of Platfora marketing actually serves to understate the product’s merits — Platfora claims to have an “in-memory” product, when what’s really the case is that Platfora’s memory-centric technology uses both RAM and disk to manage larger data marts than could reasonably be fit into RAM alone. Expanding on what I wrote about Platfora when it de-stealthed: Read more
|Categories: Business intelligence, Columnar database management, Data warehousing, EAI, EII, ETL, ELT, ETLT, Hadoop, Market share and customer counts, Memory-centric data management, Platfora, Workload management||12 Comments|