Comments on: Introduction to Aster Data and nCluster

By: Confluence: Client: Telefonica I+D

Confluence: Client: Telefonica I+D — Fri, 05 Feb 2010 15:23:00 +0000

personalization server old code and architecture review…

(form SVN) (see part list at…

By: Web analytics — clickstream and network event data | DBMS2 -- DataBase Management System Services

Mon, 22 Sep 2008 10:11:09 +0000

[…] Data’s largest disclosed database, by almost two orders of magnitude, is at […]

By: Roger

Roger — Wed, 03 Sep 2008 21:53:26 +0000

What their pricing model? Per terabyte? And how much it costs?

By: Steve Wooledge

Steve Wooledge — Wed, 03 Sep 2008 18:21:42 +0000

Hi Curt,

Thanks for the post. Just a couple points for clarification:

[1] At MySpace, every piece of data has 2 copies on distinct nodes. More specifically, at MySpace, as well as our other customers, they use RAID 0 on the Aster Worker nodes and RAID 10 on the Aster Queen nodes. [more on our 3-tiered architecture here: (http://www.asterdata.com/product/architecture.html)] Our recommendation is to always use RAID 0 on the workers, because it gives you better performance when a disk fails: with RAID10, if a disk fails, the node stays available, but the performance of that node drops by 50% (and, thus, the performance of the cluster). Because we have full replication and transparent failover, in Aster nCluster if a disk fails, the entire node goes down, but nCluster’s performance only drops by 1/n th (where n is the number of nodes).

[2] re: “parallel query” – The local GROUP BYs is an example; our query optimization algorithms cover the relational algebra and not just 1 case.