September 8, 2011

Aster Data business trends

Last month, I reviewed with the Aster Data folks which markets they were targeting and selling into, subsequent to acquisition by their new orange overlords. The answers aren’t what they used to be. Aster no longer focuses much on what it used to call frontline (i.e., low-latency, operational) applications; those are of course a key strength for Teradata. Rather, Aster focuses on investigative analytics — they’ve long endorsed my use of the term — and on the batch run/scoring kinds of applications that inform operational systems.

Categories: Analytic technologies, Application areas, Aster Data, Data warehousing, DataStax, RDF and graphs, Surveillance and privacy, Teradata, Web analytics

1 Comment

September 7, 2011

Vertica projections — an overview

Partially at my suggestion, Vertica has blogged a three–part series explaining the “projections” that are central to a Vertica database. This is important, because in Vertica projections play the roles that in many analytic DBMS might be filled by base tables, indexes, AND materialized views. Highlights include:

A Vertica projection can contain:
- All the columns in a table.
- Some of the columns in a table.
- A prejoin among tables.
Vertica projections are updated and maintained just as base tables are. (I.e., there’s no kind of batch lag.)
You can import the same logical schema you use elsewhere. Vertica puts no constraints on your logical schema. Note: Vertica has been claiming good support for all logical schemas since Vertica 4.0 came out in early 2010.
Vertica (the product) will automatically generate a physical schema for you — i.e. a set of projections — that Vertica (the company) thinks will do a great job for you. Note: That also dates back to Vertica 4.0.
Vertica claims that queries are very fast even when you haven’t created projections explicitly for them. Note: While the extent to which this is true may be a matter of dispute, competitors clearly overreach when they make assertions like “every major Vertica query needs a projection prebuilt for it.”
On the other hand, it is advisable to build projections (automatically or manually) that optimize performance of certain parts of your query load.

The blog posts contain a lot more than that, of course, both rah-rah and technical detail, including reminders of other Vertica advantages (compression, no logging, etc.). If you’re interested in analytic DBMS, they’re worth a look.

Categories: Columnar database management, Data warehousing, Vertica Systems

5 Comments

September 6, 2011

Derived data, progressive enhancement, and schema evolution

The emphasis I’m putting on derived data is leading to a variety of questions, especially about how to tease apart several related concepts:

Derived data.
Many-step processes to produce derived data.
Schema evolution.
Temporary data constructs.

So let’s dive in. Read more

Categories: Data models and architecture, Data warehousing, Derived data, MarkLogic, Text

Data management at Zynga and LinkedIn

Mike Driscoll and his Metamarkets colleagues organized a bit of a bash Thursday night. Among the many folks I chatted with were Ken Rudin of Zynga, Sam Shah of LinkedIn, and D. J. Patil, late of LinkedIn. I now know more about analytic data management at Zynga and LinkedIn, plus some bonus stuff on LinkedIn’s People You May Know application. 🙂

It’s blindingly obvious that Zynga is one of Vertica’s petabyte-scale customers, given that Zynga sends 5 TB/day of data into Vertica, and keeps that data for about a year. (Zynga may retain even more data going forward; in particular, Zynga regrets ever having thrown out the first month of data for any game it’s tried to launch.) This is game actions, for the most part, rather than log files; true logs generally go into Splunk.

I don’t know whether the missing data is completely thrown away, or just stashed on inaccessible tapes somewhere.

I found two aspects of the Zynga story particularly interesting. First, those 5 TB/day are going straight into Vertica (from, I presume, memcached/Membase/Couchbase), as Zynga decided that sending the data to some kind of log first was more trouble than it’s worth. Second, there’s Zynga’s approach to analytic database design. Highlights of that include: Read more