Telecommunications

Posts about database and analytic technologies applied to the telecommunications industry, especially in call detail record (CDR) applications. Related subjects include:

October 31, 2012

Notes and comments — October 31, 2012

Time for another catch-all post. First and saddest — one of the earliest great commenters on this blog, and a beloved figure in the Boston-area database community, was Dan Weinreb, whom I had known since some Symbolics briefings in the early 1980s. He passed away recently, much much much too young. Looking back for a couple of examples — even if you’ve never heard of him before, I see that Dan ‘s 2009 comment on Tokutek is still interesting today, and so is a post on his own blog disagreeing with some of my choices in terminology.

Otherwise, in no particular order:

1. Chris Bird is learning MongoDB. As is common for Chris, his comments are both amusing and enlightening.

2. When I relayed Cloudera’s comments on Hadoop adoption, I left out a couple of categories. One Cloudera called “mobile”; when I probed, that was about HBase, with an example being messaging apps.

The other was “phone home” — i.e., the ingest of machine-generated data from a lot of different devices. This is something that’s obviously been coming for several years — but I’m increasingly getting the sense that it’s actually arrived.

Read more

October 18, 2012

Notes on Hadoop adoption and trends

With Strata/Hadoop World being next week, there is much Hadoop discussion. One theme of the season is BI over Hadoop. I have at least 5 clients claiming they’re uniquely positioned to support that (most of whom partner with a 6th client, Tableau); the first 2 whose offerings I’ve actually written about are Teradata Aster and Hadapt. More generally, I’m hearing “Using Hadoop is hard; we’re here to make it easier for you.”

If enterprises aren’t yet happily running business intelligence against Hadoop, what are they doing with it instead? I took the opportunity to ask Cloudera, whose answers didn’t contradict anything I’m hearing elsewhere. As Cloudera tells it (approximately — this part of the conversation* was rushed):   Read more

September 24, 2012

Notes on Hadoop adoption

I successfully resisted telephone consulting while on vacation, but I did do some by email. One was on the oft-recurring subject of Hadoop adoption. I think it’s OK to adapt some of that into a post.

Notes on past and current Hadoop adoption include:

Thoughts on how Hadoop adoption will look going forward include: Read more

July 5, 2012

Introduction to Neo Technology and Neo4j

I’ve been talking some with the Neo Technology/Neo4j guys, including Emil Eifrem (CEO/cofounder), Johan Svensson (CTO/cofounder), and Philip Rathle (Senior Director of Products). Basics include:

Numbers and historical facts include:

Read more

May 28, 2012

Quick-turnaround predictive modeling

Last November, I wrote two posts on agile predictive analytics. It’s time to return to the subject. I’m used to KXEN talking about the ability to do predictive modeling, very quickly, perhaps without professional statisticians; that the core of what KXEN does. But I was surprised when Revolution Analytics told me a similar story, based on a different approach, because ordinarily that’s not how R is used at all.

Ultimately, there seem to be three reasons why you’d want quick turnaround on your predictive modeling: Read more

May 21, 2012

Cool analytic stories

There are several reasons it’s hard to confirm great analytic user stories. First, there aren’t as many jaw-dropping use cases as one might think. For as I wrote about performance, new technology tends to make things better, but not radically so. After all, if its applications are …

… all that bloody important, then probably people have already been making do to get it done as best they can, even in an inferior way.

Further, some of the best stories are hard to confirm; even the famed beer/diapers story isn’t really true. Many application areas are hard to nail down due to confidentiality, especially but not only in such “adversarial” domains as anti-terrorism, anti-spam, or anti-fraud.

Even so, I have two questions in my inbox that boil down to “What are the coolest or most significant analytics stories out there?” So let’s round up some of what I know. Read more

May 13, 2012

Notes on the analysis of large graphs

This post is part of a series on managing and analyzing graph data. Posts to date include:

My series on graph data management and analytics got knocked off-stride by our website difficulties. Still, I want to return to one interesting set of issues — analyzing large graphs, specifically ones that don’t fit comfortably into RAM on a single server. By no means do I have the subject figured out. But here are a few notes on the matter.

How big can a graph be? That of course depends on:

*Even if your graph has 10 billion nodes, those can be tokenized in 34 bits, so the main concern is edges. Edges can include weights, timestamps, and so on, but how many specifics do you really need? At some point you can surely rely on a pointer to full detail stored elsewhere.

The biggest graph-size estimates I’ve gotten are from my clients at Yarcdata, a division of Cray. (“Yarc” is “Cray” spelled backwards.) To my surprise, they suggested that graphs about people could have 1000s of edges per node, whether in:

Yarcdata further suggested that bioinformatics use cases could have node counts higher yet, characterizing Bio2RDF as one of the “smaller” ones at 22 billion nodes. In these cases, the nodes/edge average seems lower than in people-analysis graphs, but we’re still talking about 100s of billions of edges.

Recalling that relationship analytics boils down to finding paths and subgraphs, the naive relational approach to such tasks would be: Read more

May 7, 2012

Relationship analytics application notes

This post is part of a series on managing and analyzing graph data. Posts to date include:

In my recent post on graph data models, I cited various application categories for relationship analytics. For most applications, it’s hard to get a lot of details. Reasons include:

Even so, it’s fairly safe to say:

Read more

May 4, 2012

Notes on graph data management

This post is part of a series on managing and analyzing graph data. Posts to date include:

Interest in graph data models keeps increasing. But it’s tough to discuss them with any generality, because “graph data model” encompasses so many different things. Indeed, just as all data structures can be mapped to relational ones, it is also the case that all data structures can be mapped to graphs.

Formally, a graph is a collection of (node, edge, node) triples. In the simplest case, the edge has no properties other than existence or maybe direction, and the triple can be reduced to a (node, node) pair, unordered or ordered as the case may be. It is common, however, for edges to encapsulate additional properties, the canonical examples of which are:

Many of the graph examples I can think of fit into four groups: Read more

May 1, 2012

Thinking about market segments

It is a reasonable (over)simplification to say that my business boils down to:

One complication that commonly creeps in is that different groups of users have different buying practices and technology needs. Usually, I nod to that point in passing, perhaps by listing different application areas for a company or product. But now let’s address it head on. Whether or not you care about the particulars, I hope the sheer length of this post reminds you that there are many different market segments out there.

Last June I wrote:

In almost any IT decision, there are a number of environmental constraints that need to be acknowledged. Organizations may have standard vendors, favored vendors, or simply vendors who give them particularly deep discounts. Legacy systems are in place, application and system alike, and may or may not be open to replacement. Enterprises may have on-premise or off-premise preferences; SaaS (Software as a Service) vendors probably have multitenancy concerns. Your organization can determine which aspects of your system you’d ideally like to see be tightly integrated with each other, and which you’d prefer to keep only loosely coupled. You may have biases for or against open-source software. You may be pro- or anti-appliance. Some applications have a substantial need for elastic scaling. And some kinds of issues cut across multiple areas, such as budget, timeframe, security, or trained personnel.

I’d further say that it matters whether the buyer:

Now let’s map those considerations (and others) to some specific market segments. Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.