Comments on: Notes on SciDB and scientific data management

By: Curt Monash

Curt Monash — Tue, 04 Oct 2016 02:12:47 +0000

Hi Juan,

As per http://www.dbms2.com/2016/08/28/are-analytic-rdbms-and-data-warehouse-appliances-obsolete/, I’m not sure that I have a good answer for you.

By: Juan

Juan — Wed, 28 Sep 2016 12:01:53 +0000

I need a database with advanced statical functions or a statistics program working transparently on a very large database. (ideally a distributed).
What software do you suggest?
Spark (with some proper underlying file system) could be the solution in the future but it only lets you do basic things. You can’t fit mixed effect models or bayesian models. Scidb has the same problem, you can only use functions implemented on it, and there are little. You can also design your own algorithms but it’s gonna be quite difficult.

R or similar programs let you import data from a database but you can’t perform large operations properly, you can only get summaries or do it by chunks.

By: Michael McIntire

Michael McIntire — Sat, 31 Jul 2010 17:27:14 +0000

What is driving the move to hadoop and other non-relational platforms is the cost and culture of RDBMS implementations.

The culture problem is related to data management systems forcing data to be transformed into a private and internal form, and all the process that fronts it. Dimensional Modeling is an example. Let’s stop physicalizing dimensional design because that’s what RDBMS products support.

On the cost front, generating data declines at roughly the inverse of moore’s law, not counting non-native per transaction data growth (I’m collecting more and more data about every event).

On the analytics side of this problem – there are many more scans of the full dataset to get a single metric, so this function cost grows non-linearly in relation to the data size.

So – Data costs are declining at the same rate of hardware. Data Analytics costs are RISING per unit of data. Put quite simply, at the upper end of the data size spectrum – data owners cannot afford to buy data management software.

By: Curt Monash

Curt Monash — Thu, 03 Jun 2010 10:11:11 +0000

Michael,

SciDB is for analytics; Cassandra is for OLTP, hold the “T”, which I called HVSP in http://www.dbms2.com/2010/03/13/the-naming-of-the-foo/.

Hadoop is a closer competitor, as are RDBMS, MapReduce-enabled or otherwise.

By: Michael

Michael — Wed, 26 May 2010 20:17:09 +0000

Why has interest from “web analytics users” receded recently? Could this be due to the increased interest in Hadoop/Cassandra and similar products?