Analytic technologies

Discussion of technologies related to information query and analysis. Related subjects include:

November 2, 2011

The cool aspects of Odiago WibiData

Christophe Bisciglia and Aaron Kimball have a new company.

WibiData is designed for management of, investigative analytics on, and operational analytics on consumer internet data, the main examples of which are web site traffic and personalization and their analogues for games and/or mobile devices. The core WibiData technology, built on HBase and Hadoop,* is a data management and analytic execution layer. That’s where the secret sauce resides. Also included are:

The whole thing is in beta, with about three (paying) beta customers.

*And Avro and so on.

The core ideas of WibiData include:

Read more

October 25, 2011

Where Datameer is positioned

I’ve chatted with Datameer a couple of times recently, mainly with CEO Stefan Groschupf, most recently after XLDB last Tuesday. Nothing I learned greatly contradicts what I wrote about Datameer 1 1/2 years ago.  In a nutshell, Datameer is designed to let you do simple stuff on large amounts of data, where “large amounts of data” typically means data in Hadoop, and “simple stuff” includes basic versions of a spreadsheet, of BI, and of EtL (Extract/Transform/Load, without much in the way of T).

Stefan reports that these capabilities are appealing to a significant fraction of enterprise or other commercial Hadoop users, especially the EtL and the BI. I don’t doubt him.

October 19, 2011

What those nested data structures are about

As I’ve noted before, the very big web companies have an issue with nested data structures. The subject came up in XLDB talks yesterday too, so my big goal for lunch was to finally understand what was being talked about. Sitting at a table full of eBay and LinkedIn folks turned out to be a good tactic.

The explanation was led by Oliver Ratzesberger, late of eBay* and progenitor of eBay’s Singularity project. In simplest terms, one event can spawn a lot of event attribute information, perhaps in the form of name-value pairs, which it then makes sense to store together in some way. The example Oliver dwelled on was that, on any given web page, there can be 100+ pieces of information to record, including:

*Edit: Oliver subsequently moved on to Sears and then Teradata.

There are several reasons why one might wish to store this information in ways that grieve relational purists. First, reconstructing all this information via joins would be brutally expensive. What’s more, reconstructing all this information via joins could be impractical. Some comes from third party ad servers, which might not reproduce the same ads upon demand. Other is in the form of rankings, which can’t always be reliably reproduced from one query to the next. (That’s just one of several reasons text search and relational DBMS are an awkward fit.)

Also, there’s a strong dynamic schema flavor to these databases. The list of attributes for one web click might be very different in kind from the list for the next page. Forcing that kind of variability into a fixed relational schema, while theoretically possible, doesn’t necessarily make a lot of sense.

October 14, 2011

Commercial software for academic use

As Jacek Becla explained:

Even so, I think that academic researchers, in the natural and social sciences alike, commonly overlook the wealth of commercial software that could help them in their efforts.

I further think that the commercial software industry could do a better job of exposing its work to academics, where by “expose” I mean:

Reasons to do so include:

Read more

October 10, 2011

Text data management, Part 3: Analytic and progressively enhanced

This is Part 3 of a three post series. The posts cover:

  1. Confusion about text data management.
  2. Choices for text data management (general and short-request).
  3. Choices for text data management (analytic).

I’ve gone on for two long posts about text data management already, but even so I’ve glossed over a major point:

Using text data commonly involves a long series of data enhancement steps.

Even before you do what we’d normally think of as “analysis”, text markup can include steps such as:

Those processes can add up to dozens of steps. And maybe, six months down the road, you’ll think of more steps yet.

Read more

October 10, 2011

Text data management, Part 1: Confusion

This is Part 1 of a three post series. The posts cover:

  1. Confusion about text data management.
  2. Choices for text data management (general and short-request).
  3. Choices for text data management (analytic).

There’s much confusion about the management of text data, among technology users, vendors, and investors alike. Reasons seems to include:

Above all: The use cases for text data vary greatly, just as the use cases for simply-structured databases do.

There are probably fewer people now than there were six years ago who need to be told that text and relational database management are very different things. Other misconceptions, however, appear to be on the rise. Specific points that are commonly overlooked include: Read more

October 3, 2011

Teradata Unity and the idea of active-active data warehouse replication

Teradata is having its annual conference, Teradata Partners, at the same time as Oracle OpenWorld this week. That made it an easy decision for Teradata to preannounce its big news, Teradata Columnar and the rest of Teradata 14. But of course it held some stuff back, notably Teradata Unity, which is the name chosen for replication technology based on Teradata’s Xkoto acquisition.

The core mission of Teradata Unity is asynchronous, near-real-time replication across Teradata systems. The point of “asynchronous” is performance. The point of “near-real-time” is that it Teradata Unity can be used for high availability and disaster recovery, and further can be used to allow real work on HA and DR database copies. Teradata Unity works request-at-a-time, which limits performance somewhat;* Unity has a lock manager that makes sure updates are applied in the same order on all copies, in cases where locks are needed at all.

Read more

September 26, 2011

Highlights of a busy news week

I put up 14 posts over the past week, so perhaps you haven’t had a chance yet to read them all. 🙂 Highlights included:

Most of the posts, however, were reactions to news events. In particular:

September 25, 2011

Ingres deemphasized, company now named Actian

Ingres, the company, is:

It turns out that Actian was the name of an ancient athletic competition commemorating Augustus’ defeat of Anthony at Actium, a battle that was more recently memorialized in the movie Cleopatra. Frankly, I think Cleopatra Software might have been a more interesting company name, although that could mean execs would have to arrive at sales calls rolled up in a carpet.

Read more

September 25, 2011

Workload management and RAM

Closing out my recent round of Teradata-related posts, here’s a little anomaly:

Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.