Analytic technologies

Discussion of technologies related to information query and analysis. Related subjects include:

August 24, 2015

Multi-model database managers

I’d say:

Before supporting my claims directly, let me note that this is one of those posts that grew out of a Twitter conversation. The first round went:

Merv Adrian: 2 kinds of multimodel from DBMS vendors: multi-model DBMSs and multimodel portfolios. The latter create more complexity, not less.

Me: “Owned by the same vendor” does not imply “well integrated”. Indeed, not a single example is coming to mind.

Merv: We are clearly in violent agreement on that one.

Around the same time I suggested that Intersystems Cache’ was the last significant object-oriented DBMS, only to get the pushback that they were “multi-model” as well. That led to some reasonable-sounding justification — although the buzzwords of course aren’t from me — namely: Read more

August 3, 2015

Data messes

A lot of what I hear and talk about boils down to “data is a mess”. Below is a very partial list of examples.

To a first approximation, one would expect operational data to be rather clean. After all, it drives and/or records business transactions. So if something goes awry, the result can be lost money, disappointed customers, or worse, and those are outcomes to be strenuously avoided. Up to a point, that’s indeed true, at least at businesses large enough to be properly automated. (Unlike, for example — :) — mine.)

Even so, operational data has some canonical problems. First, it could be inaccurate; somebody can just misspell or otherwise botch an entry. Further, there are multiple ways data can be unreachable, typically because it’s:

Inconsistency can take multiple forms, including:  Read more

July 7, 2015

Zoomdata and the Vs

Let’s start with some terminology biases:

So when my clients at Zoomdata told me that they’re in the business of providing “the fastest visual analytics for big data”, I understood their choice, but rolled my eyes anyway. And then I immediately started to check how their strategy actually plays against the “big data” Vs.

It turns out that:

*The HDFS/S3 aspect seems to be a major part of Zoomdata’s current story.

Core aspects of Zoomdata’s technical strategy include:  Read more

June 14, 2015

“Chilling effects” revisited

In which I observe that Tim Cook and the EFF, while thankfully on the right track, haven’t gone nearly far enough.

Traditionally, the term “chilling effect” referred specifically to inhibitions on what in the US are regarded as First Amendment rights — the freedoms of speech, the press, and in some cases public assembly. Similarly, when the term “chilling effect” is used in a surveillance/privacy context, it usually refers to the fear that what you write or post online can later be held against you. This concern has been expressed by, among others, Tim Cook of Apple, Laura Poitras, and the Electronic Frontier Foundation, and several research studies have supported the point.

But that’s only part of the story. As I wrote in July, 2013,

… with the new data collection and analytic technologies, pretty much ANY action could have legal or financial consequences. And so, unless something is done, “big data” privacy-invading technologies can have a chilling effect on almost anything you want to do in life.

The reason, in simplest terms, is that your interests could be held against you. For example, models can estimate your future health, your propensity for risky hobbies, or your likelihood of changing your residence, career, or spouse. Any of these insights could be useful to employers or financial services firms, and not in a way that redounds to your benefit. And if you think enterprises (or governments) would never go that far, please consider an argument from the sequel to my first “chilling effects” post: Read more

June 10, 2015

Hadoop generalities

Occasionally I talk with an astute reporter — there are still a few left :) — and get led toward angles I hadn’t considered before, or at least hadn’t written up. A blog post may then ensue. This is one such post.

There is a group of questions going around that includes:

To a first approximation, my responses are:  Read more

June 8, 2015

Teradata will support Presto

At the highest level:

Now let’s make that all a little more precise.

Regarding Presto (and I got most of this from Teradata)::

Daniel Abadi said that Presto satisfies what he sees as some core architectural requirements for a modern parallel analytic RDBMS project:  Read more

May 26, 2015

IT-centric notes on the future of health care

It’s difficult to project the rate of IT change in health care, because:

Timing aside, it is clear that health care change will be drastic. The IT part of that starts with vastly comprehensive electronic health records, which will be accessible (in part or whole as the case may be) by patients, care givers, care payers and researchers alike. I expect elements of such records to include:

These vastly greater amounts of data cited above will allow for greatly changed analytics.
Read more

May 13, 2015

Notes on analytic technology, May 13, 2015

1. There are multiple ways in which analytics is inherently modular. For example:

Also, analytics is inherently iterative.

If I’m right that analytics is or at least should be modular and iterative, it’s easy to see why people hate multi-year data warehouse creation projects. Perhaps it’s also easy to see why I like the idea of schema-on-need.

2. In 2011, I wrote, in the context of agile predictive analytics, that

… the “business analyst” role should be expanded beyond BI and planning to include lightweight predictive analytics as well.

I gather that a similar point is at the heart of Gartner’s new term citizen data scientist. I am told that the term resonates with at least some enterprises.  Read more

April 16, 2015

Notes on indexes and index-like structures

Indexes are central to database management.

Perhaps it’s time for a round-up post on indexing. :)

1. First, let’s review some basics. Classically:

2. Further:  Read more

April 9, 2015

Which analytic technology problems are important to solve for whom?

I hear much discussion of shortfalls in analytic technology, especially from companies that want to fill in the gaps. But how much do these gaps actually matter? In many cases, that depends on what the analytic technology is being used for. So let’s think about some different kinds of analytic task, and where they each might most stress today’s available technology.

In separating out the task areas, I’ll focus first on the spectrum “To what extent is this supposed to produce novel insights?” and second on the dimension “To what extent is this supposed to be integrated into a production/operational system?” Issues of latency, algorithmic novelty, etc. can follow after those. In particular, let’s consider the tasks: Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.