Application areas

Posts focusing on the use of database and analytic technologies in specific application domains. Related subjects include:

October 10, 2013

Libraries in Teradata Aster

I recently wrote (emphasis added):

My clients at Teradata Aster probably see things differently, but I don’t think their library of pre-built analytic packages has been a big success. The same goes for other analytic platform vendors who have done similar (generally lesser) things. I believe that this is because such limited libraries don’t do enough of what users want.

The bolded part has been, shall we say, confirmed. As Randy Lea tells it, Teradata Aster sales qualification includes the determination that at least one SQL-MR operator — be relevant to the use case. (“Operator” seems to be the word now, rather than “function”.) Randy agreed that some users prefer hand-coding, but believes a large majority would like to push work to data analysts/business analysts who might have strong SQL skills, but be less adept at general mathematical programming.

This phrasing will all be less accurate after the release of Aster 6, which extends Aster’s capabilities beyond the trinity of SQL, the SQL-MR library, and Aster-supported hand-coding.

Randy also said:

And Randy seemed to agree when I put words in his mouth to the effect that the prebuilt operators save users months of development time.

Meanwhile, Teradata Aster has started a whole new library for relationship analytics.

September 20, 2013

Trends in predictive modeling

I talked with Teradata about a bunch of stuff yesterday, including this week’s announcements in in-database predictive modeling. The specific news was about partnerships with Fuzzy Logix and Revolution Analytics. But what I found more interesting was the surrounding discussion. In a nutshell:

This is the strongest statement of perceived demand for in-database modeling I’ve heard. (Compare Point #3 of my July predictive modeling post.) And fits with what I’ve been hearing about R.

Read more

September 3, 2013

The Hemisphere program

Another surveillance slide deck has emerged, as reported by the New York Times and other media outlets. This one is for the Hemisphere program, which apparently:

Other notes include:

I’ve never gotten a single consistent figure, but typical CDR size seems to be in the 100s of bytes range. So I conjecture that Project Hemisphere spawned one of the first petabyte-scale databases ever.

Hemisphere Project unknowns start:  Read more

August 24, 2013

Hortonworks business notes

Hortonworks did a business-oriented round of outreach, talking with at least Derrick Harris and me. Notes  from my call — for which Rob Bearden* didn’t bother showing up — include, in no particular order:

*Speaking of CEO Bearden, an interesting note from Derrick’s piece is that Bearden is quoted as saying “I started this company from day one …”, notwithstanding that the now-departed Eric Baldeschwieler was founding CEO.

In Hortonworks’ view, Hadoop adopters typically start with a specific use case around a new type of data, such as clickstream, sensor, server log, geolocation, or social.  Read more

August 17, 2013

Aerospike 3

My clients at Aerospike are coming out with their Version 3, and as several of my clients do, have encouraged me to front-run what otherwise would be the Monday embargo.

I encourage such behavior with arguments including:

Aerospike 2′s value proposition, let us recall, was:

… performance, consistent performance, and uninterrupted operations …

  • Aerospike’s consistent performance claims are along the lines of sub-millisecond latency, with 99.9% of responses being within 5 milliseconds, and even a node outage only borking performance for some 10s of milliseconds.
  • Uninterrupted operation is a core Aerospike design goal, and the company says that to date, no Aerospike production cluster has ever gone down.

The major support for such claims is Aerospike’s success in selling to the digital advertising market, which is probably second only to high-frequency trading in its low-latency demands. For example, Aerospike’s CMO Monica Pal sent along a link to what apparently is:

Read more

August 4, 2013

Data model churn

Perhaps we should remind ourselves of the many ways data models can be caused to churn. Here are some examples that are top-of-mind for me. They do overlap a lot — and the whole discussion overlaps with my post about schema complexity last January, and more generally with what I’ve written about dynamic schemas for the past several years..

Just to confuse things further — some of these examples show the importance of RDBMS, while others highlight the relational model’s limitations.

The old standbys

Product and service changes. Simple changes to your product line many not require any changes to the databases recording their production and sale. More complex product changes, however, probably will.

A big help in MCI’s rise in the 1980s was its new Friends and Family service offering. AT&T couldn’t respond quickly, because it couldn’t get the programming done, where by “programming” I mainly mean database integration and design. If all that was before your time, this link seems like a fairly contemporaneous case study.

Organizational changes. A common source of hassle, especially around databases that support business intelligence or planning/budgeting, is organizational change. Kalido’s whole business was based on accommodating that, last I checked, as were a lot of BI consultants’. Read more

July 20, 2013

The refactoring of everything

I’ll start with three observations:

As written, that’s probably pretty obvious. Even so, it’s easy to forget just how pervasive the refactoring is and is likely to be. Let’s survey some examples first, and then speculate about consequences. Read more

July 12, 2013

More notes on predictive modeling

My July 2 comments on predictive modeling were far from my best work. Let’s try again.

1. Predictive analytics has two very different aspects.

Developing models, aka “modeling”:

More precisely, some modeling algorithms are straightforward to parallelize and/or integrate into RDBMS, but many are not.

Using models, most commonly:

2. Some people think that all a modeler needs are a few basic algorithms. (That’s why, for example, analytic RDBMS vendors are proud of integrating a few specific modeling routines.) Other people think that’s ridiculous. Depending on use case, either group can be right.

3. If adoption of DBMS-integrated modeling is high, I haven’t noticed.

Read more

June 13, 2013

How is the surveillance data used?

Over the past week, discussion has exploded about US government surveillance. After summarizing, as best I could, what data the government appears to collect, now I ‘d like to consider what they actually do with it. More precisely, I’d like to focus on the data’s use(s) in combating US-soil terrorism. In a nutshell:

Consider the example of Tamerlan Tsarnaev:

In response to this 2011 request, the FBI checked U.S. government databases and other information to look for such things as derogatory telephone communications, possible use of online sites associated with the promotion of radical activity, associations with other persons of interest, travel history and plans, and education history.

While that response was unsuccessful in preventing a dramatic act of terrorism, at least they tried.

As for actual success stories — well, that’s a bit tough. In general, there are few known examples of terrorist plots being disrupted by law enforcement in the United States, except for fake plots engineered to draw terrorist-leaning individuals into committing actual crimes. One of those examples, that of Najibullah Zazi, was indeed based on an intercepted email — but the email address itself was uncovered through more ordinary anti-terrorism efforts.

As for machine learning/data mining/predictive modeling, I’ve never seen much of a hint of it being used in anti-terrorism efforts, whether in the news or in my own discussions inside the tech industry. And I think there’s a great reason for that — what would they use for a training set? Here’s what I mean.  Read more

June 10, 2013

Where things stand in US government surveillance

Edit: Please see the comment thread below for updates. Please also see a follow-on post about how the surveillance data is actually used.

US government surveillance has exploded into public consciousness since last Thursday. With one major exception, the news has just confirmed what was already thought or known. So where do we stand?

My views about domestic data collection start:

*Recall that these comments are US-specific. Data retention legislation has been proposed or passed in multiple countries to require recording of, among other things, all URL requests, with the stated goal of fighting either digital piracy or child pornography.

As for foreign data: Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.