Attensity update
I talked with Michelle de Haaff and Ian Hersey of Attensity back in February. We covered a lot of ground, so let’s start with a very high-level view.
- Two years ago, Attensity merged with two other companies in somewhat related businesses, thus expanding 4X or so in size.
- Due to the merger, Attensity now has two core lines of business:
- Text analytics.
- Driving actions, such as call center or social media response, based on text analytics.
- The combined Attensity is part American, part German.
- Attensity’s German part compels it to do some public financial reporting. Attensity will do $50-60 million in 2011 revenue.
- Attensity crunches text in 17 languages. English is preeminent. #2 is — you guessed it! — German.
- A big part of Attensity’s business (or at least of its value proposition) is analyzing the text in social media. Attensity boasts coverage of 75 million social media sources, such as blogs, forums, or review sites.
The four most interesting technical points were probably:
- Attensity has changed how it does exhaustive extraction. I’m having some trouble writing that part up, so for now I’ll just refer you to Attensity’s own description of the new way of doing things.
- Attensity has development work underway meant to address some of the problems in text analytics/other analytics integration. I don’t feel I got enough detail to want to talk about that yet.
- Attensity runs its own data centers, with approximately 60 Hadoop/HBase nodes and 30 nodes of Apache Solr (open source text search). More on that below.
- Attensity now OEMs Vertica. More on that below too.
Some more specific notes include: Read more
| Categories: Analytic technologies, Cloud computing, Hadoop, HBase, Predictive modeling and advanced analytics, Software as a Service (SaaS), Sybase, Vertica Systems | 6 Comments |
DataStax introduces a Cassandra-based Hadoop distribution called Brisk
Cassandra company DataStax is introducing a Hadoop distribution called Brisk, for use cases that combine short-request and analytic processing. Brisk in essence replaces HDFS (Hadoop Distributed File System) with a Cassandra-based file system called CassandraFS. The whole thing is due to be released (Apache open source) within the next 45 days.
The core claims for Cassandra/Brisk/CassandraFS are:
- CassandraFS has the same interface as HDFS. So, in particular, you should be able to use most Hadoop add-ons with Brisk.
- CassandraFS has comparable performance to HDFS on sequential scans. That’s without predicate pushdown to Cassandra, which is Coming Soon but won’t be in the first Brisk release.
- Brisk/CassandraFS is much easier to administer than HDFS. In particular, there are no NameNodes, JobTracker single points of failure, or any other form of head node. Brisk/CassandraFS is strictly peer-to-peer.
- Cassandra is far superior to HBase for short-request use cases, specifically with 5-6X the random-access performance.
There’s a pretty good white paper around all this, which also recites general Cassandra claims – [edit] and here at last is the link.
| Categories: Cassandra, DataStax, Hadoop, HBase, MapReduce, Open source | 3 Comments |
eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more
I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla’s home base of SLAC, which used to stand for “Stanford Linear Accelerator Center”). Todd Walter of Teradata also sat in on the latter part of the conversation. Things I learned included: Read more
| Categories: Data warehousing, Derived data, eBay, Greenplum, Hadoop, HBase, Log analysis, Petabyte-scale data management, Teradata | 28 Comments |
More on NoSQL and HVSP (or OLRP)
Since posting last Wednesday morning that I’m looking into NoSQL and HVSP, I’ve had a lot of conversations, including with (among others):
- Dwight Merriman of 10gen (MongoDB)
- Damien Katz of Couchio (CouchDB)
- Matt Pfeil of Riptano (Cassandra)
- Todd Lipcon of Cloudera (HBase committer)
- Tony Falco of Basho (Riak)
- John Busch of Schooner
- Ori Herrnstadt of Akiban
| Categories: Akiban, Basho and Riak, Cache, Cassandra, Cloudera, Clustrix, CouchDB, DataStax, Facebook, Hadoop, HBase, memcached, MySQL, NoSQL, Object, OLTP, Open source, Parallelization, Schooner Information Technology, Theory and architecture, Tokutek | 3 Comments |
