Predictive modeling and advanced analytics

Discussion of technologies and vendors in the overlapping areas of predictive analytics, predictive modeling, data mining, machine learning, Monte Carlo analysis, and other “advanced” analytics.

June 14, 2015

“Chilling effects” revisited

In which I observe that Tim Cook and the EFF, while thankfully on the right track, haven’t gone nearly far enough.

Traditionally, the term “chilling effect” referred specifically to inhibitions on what in the US are regarded as First Amendment rights — the freedoms of speech, the press, and in some cases public assembly. Similarly, when the term “chilling effect” is used in a surveillance/privacy context, it usually refers to the fear that what you write or post online can later be held against you. This concern has been expressed by, among others, Tim Cook of Apple, Laura Poitras, and the Electronic Frontier Foundation, and several research studies have supported the point.

But that’s only part of the story. As I wrote in July, 2013,

… with the new data collection and analytic technologies, pretty much ANY action could have legal or financial consequences. And so, unless something is done, “big data” privacy-invading technologies can have a chilling effect on almost anything you want to do in life.

The reason, in simplest terms, is that your interests could be held against you. For example, models can estimate your future health, your propensity for risky hobbies, or your likelihood of changing your residence, career, or spouse. Any of these insights could be useful to employers or financial services firms, and not in a way that redounds to your benefit. And if you think enterprises (or governments) would never go that far, please consider an argument from the sequel to my first “chilling effects” post: Read more

May 26, 2015

IT-centric notes on the future of health care

It’s difficult to project the rate of IT change in health care, because:

Timing aside, it is clear that health care change will be drastic. The IT part of that starts with vastly comprehensive electronic health records, which will be accessible (in part or whole as the case may be) by patients, care givers, care payers and researchers alike. I expect elements of such records to include:

These vastly greater amounts of data cited above will allow for greatly changed analytics.
Read more

May 13, 2015

Notes on analytic technology, May 13, 2015

1. There are multiple ways in which analytics is inherently modular. For example:

Also, analytics is inherently iterative.

If I’m right that analytics is or at least should be modular and iterative, it’s easy to see why people hate multi-year data warehouse creation projects. Perhaps it’s also easy to see why I like the idea of schema-on-need.

2. In 2011, I wrote, in the context of agile predictive analytics, that

… the “business analyst” role should be expanded beyond BI and planning to include lightweight predictive analytics as well.

I gather that a similar point is at the heart of Gartner’s new term citizen data scientist. I am told that the term resonates with at least some enterprises.  Read more

April 9, 2015

Which analytic technology problems are important to solve for whom?

I hear much discussion of shortfalls in analytic technology, especially from companies that want to fill in the gaps. But how much do these gaps actually matter? In many cases, that depends on what the analytic technology is being used for. So let’s think about some different kinds of analytic task, and where they each might most stress today’s available technology.

In separating out the task areas, I’ll focus first on the spectrum “To what extent is this supposed to produce novel insights?” and second on the dimension “To what extent is this supposed to be integrated into a production/operational system?” Issues of latency, algorithmic novelty, etc. can follow after those. In particular, let’s consider the tasks: Read more

February 28, 2015

Databricks and Spark update

I chatted last night with Ion Stoica, CEO of my client Databricks, for an update both on his company and Spark. Databricks’ actual business is Databricks Cloud, about which I can say:

I do not expect all of the above to remain true as Databricks Cloud matures.

Ion also said that Databricks is over 50 people, and has moved its office from Berkeley to San Francisco. He also offered some Spark numbers, such as: Read more

February 1, 2015

Information technology for personal safety

There are numerous ways that technology, now or in the future, can significantly improve personal safety. Three of the biggest areas of application are or will be:

Implications will be dramatic for numerous industries and government activities, including but not limited to law enforcement, automotive manufacturing, infrastructure/construction, health care and insurance. Further, these technologies create a near-certainty that individuals’ movements and status will be electronically monitored in fine detail. Hence their development and eventual deployment constitutes a ticking clock toward a deadline for society deciding what to do about personal privacy.

Theoretically, humans aren’t the only potential kind of tyrants. Science fiction author Jack Williamson postulated a depressing nanny-technology in With Folded Hands, the idea for which was later borrowed by the humorous Star Trek episode I, Mudd.

Of these three areas, crime prevention is the furthest along; in particular, sidewalk cameras, license plate cameras and internet snooping are widely deployed around the world. So let’s consider the other two.

Vehicle accident prevention

Read more

January 19, 2015

Where the innovation is

I hoped to write a reasonable overview of current- to medium-term future IT innovation. Yeah, right. :) But if we abandon any hope that this post could be comprehensive, I can at least say:

1. Back in 2011, I ranted against the term Big Data, but expressed more fondness for the V words — Volume, Velocity, Variety and Variability. That said, when it comes to data management and movement, solutions to the V problems have generally been sketched out.

2. Even so, there’s much room for innovation around data movement and management. I’d start with:

3. As I suggested last year, data transformation is an important area for innovation.  Read more

December 31, 2014

Notes on machine-generated data, year-end 2014

Most IT innovation these days is focused on machine-generated data (sometimes just called “machine data”), rather than human-generated. So as I find myself in the mood for another survey post, I can’t think of any better idea for a unifying theme.

1. There are many kinds of machine-generated data. Important categories include:

That’s far from a complete list, but if you think about those categories you’ll probably capture most of the issues surrounding other kinds of machine-generated data as well.

2. Technology for better information and analysis is also technology for privacy intrusion. Public awareness of privacy issues is focused in a few areas, mainly: Read more

December 16, 2014

WibiData’s approach to predictive modeling and experimentation

A conversation I have too often with vendors goes something like:

That was the genesis of some tidbits I recently dropped about WibiData and predictive modeling, especially but not only in the area of experimentation. However, Wibi just reversed course and said it would be OK for me to tell more or less the full story, as long as I note that we’re talking about something that’s still in beta test, with all the limitations (to the product and my information alike) that beta implies.

As you may recall:

With that as background, WibiData’s approach to predictive modeling as of its next release will go something like this: Read more

December 12, 2014

Notes and links, December 12, 2014

1. A couple years ago I wrote skeptically about integrating predictive modeling and business intelligence. I’m less skeptical now.

For starters:

I’ve also heard a couple of ideas about how predictive modeling can support BI. One is via my client Omer Trajman, whose startup ScalingData is still semi-stealthy, but says they’re “working at the intersection of big data and IT operations”. The idea goes something like this:

Makes sense to me.

* The word “cluster” could have been used here in a couple of different ways, so I decided to avoid it altogether.

Finally, I’m hearing a variety of “smart ETL/data preparation” and “we recommend what columns you should join” stories. I don’t know how much machine learning there’s been in those to date, but it’s usually at least on the roadmap to make the systems (yet) smarter in the future. The end benefit is usually to facilitate BI.

2. Discussion of graph DBMS can get confusing. For example: Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.