April 14, 2011

Attensity update

I talked with Michelle de Haaff and Ian Hersey of Attensity back in February. We covered a lot of ground, so let’s start with a very high-level view.

The four most interesting technical points were probably:

Some more specific notes include: 

Attensity and relational DBMS

Notes on Attensity’s choice of DBMS to OEM include:

It seems there are two parts to the Attensity schema. The raw output of “exhaustive extraction” sounds as if it has rather narrow rows. But Attensity then builds something more star-schema-like to feed into BI tools. Perhaps the latter is the reason for preferring columnar DBMS. There don’t seem to be a lot of auxiliary tables; the only ones Ian cited were:

Previous Attensity database targets (partner, not OEM) included Teradata, SQL Server, Oracle, and MySQL. Hibernate layers were in the mix somewhere too. SQL Server actually had the best performance. I don’t think that’s counting a more recent Sybase IQ partnership, which only racked up a couple of sales.

Attensity, Hadoop, and other non-relational technologies

But that’s OEM. Attensity runs its own data centers, with approximately 60 Hadoop/HBase nodes and 30 nodes of Apache Solr (open source text search).* One reason for moving out of Amazon EC2 was that Solr cried out for solid-state drives; another was just cost.

*But those are just rough figures, from Ian’s memory.

Attensity uses HBase to store full-text documents. However, it doesn’t seem that this is a classic low-latency update HBase use case; Attensity reports doing 3 loads a day, 50 gigabytes of documents total. Apparently that works out to 1 billion documents/month; I gather Attensity just keeps them for 6 months. HBase has been nicely stable for Attensity.

Attensity uses Solr to build distributed search indexes. Solr has not been nicely stable.

What Attensity does in Hadoop seems to be rather simple NLP (Natural Language Processing), plus things one might do in a relational DBMS instead. Examples include:

There surely also is some basic preprocessing, ingesting text (and document metadata) in various forms and normalizing it into a more standard format. Some real-time ingesting is done outside of Hadoop, in more of a queuing system, the most obvious example of that being the Twitter firehose. Ian suggested that in the future this system will get more uses, in the form of a UIMA-like pipeline.

I further get the impression that Attensity uses Hadoop to do on a SaaS (Software As A Service) basis what its customers do in Vertica. The old idea that Attensity provides hosted services for about half its customers still seems to apply, at least on the new-customer front. However, I’m not sure exactly which product lines Attensity was referring to when they said that.

Comments

7 Responses to “Attensity update”

  1. Greg Holmberg on April 22nd, 2011 7:27 pm

    Is it just me, or does this posting end mid-sentence?

    However, I’m not sure exactly which product
    lines Attensity was referring to when

    That’s the last I see. Same in the email I recieved.

  2. Curt Monash on April 22nd, 2011 8:49 pm

    Youch. You do make an excellent point. Let me see what I can do about that …

  3. Curt Monash on April 22nd, 2011 8:56 pm

    OK. Added in the three missing words at the end. Thanks!

  4. Blaine Gaither on May 4th, 2011 6:58 pm

    Please consider workload maangement as a category

  5. Curt Monash on May 4th, 2011 8:12 pm
  6. Investigative analytics and derived data: Enzee Universe 2011 talk : DBMS 2 : DataBase Management System Services on April 24th, 2012 12:12 am

    […] involves lots of analytic tasks performed on lots of kinds of data. Specific examples cited include text analytics and graph/relationship […]

  7. Where things stand in US government surveillance | DBMS 2 : DataBase Management System Services on June 10th, 2013 12:12 pm

    […] for anybody to purchase, or can be scraped by anybody who invests in the equipment and bandwidth. Attensity’s service is just one […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.