Comments on: History, focus, and technology of HP Neoview

By: Notes on HBase | DBMS 2 : DataBase Management System Services

Notes on HBase | DBMS 2 : DataBase Management System Services — Tue, 10 Mar 2015 18:24:59 +0000

[…] Another such project is Trafodion — supposedly the Welsh word for “transaction” — open sourced by HP. This seems to be based on NonStop SQL and Neoview code, which counter-intuitively have always been joined at the hip. […]

By: Database Virtualization = Location Transparency. Old Wine in a New Bottle? « Share Virtual Machines

Thu, 05 Feb 2009 06:59:08 +0000

[…] 2006. Oracle’s acquisition of TangoSol, Microsoft’s Project Velocity are following HP NeoView’s usage of distributed caches for solving large BI queries. Strictly speaking these are not […]

By: Goetz Graefe

Goetz Graefe — Tue, 27 Jan 2009 23:44:16 +0000

For what it’s worth, the Cascades project was never associated with the University of Wisconsin – Madison. The only possible connection is that I got my degree there. I wrote the query optimizer code 1993-94 while on the faculty of Portland State University (in Oregon) and consulting for Tandem. In addition to the Tandem project (and now HP Neoview), the code also formed the foundation for query optimization in Microsoft SQL Server 7.0 and onwards.

By: Tom Williams

Tom Williams — Mon, 15 Dec 2008 04:42:30 +0000

It is a bit difficult to go through it detail (and I did like your joke).

In short, most joins involve sorts and merges, which are expensive and can get very expensive when large data sets are involved. There are only two join plans that provide linear scalability, the hash join and the hash merge join. The hash merge join requires a hash based file sytem (different from hash distribution). The hash join employs a similiar technique but in memory. The problem is that memory runs out quickly and is often used for other operations like buffering. Teradata is the only RDBMS that provides the hash merge join.

So if I have to sort and merge large data sets, I really want the hash merge join available to the optimzer.

By: Curt Monash

Curt Monash — Sun, 14 Dec 2008 13:51:05 +0000

Tom,

I have a design that will ensure SUB-linear scalability, up to over a petabyte. On one terabyte of data, I’ll throttle performance by a factor of 10. On four terabytes, I’ll throttle it only by a factor of 8 … OK, I’m kidding. But to compare constant_1 times n vs. constant_2 times nlogn, it’s interesting to know what constant_1 and constant_2 are.

More generally, I’m confused by what you’re saying. You seem to be assigning a single scalability function to all join plans on a particular product, no matter what strategy the particular query’s execution plan uses. Taken literally, that’s totally absurd, and I’m not guessing successfully at your actual and surely more sensible meaning.

By: Tom Williams

Tom Williams — Sat, 13 Dec 2008 23:00:21 +0000

Which implementation besides Teradata provides linear scalability regardless of the size of the tables and the concurrent user level? From what I understand, Oracle, IMB, Neoview hash join plan is linear but is dependent on availability of sufficient meory. After that, their join plans are nlogn.

Linear scalability is very rare in computing and I’d be interested in knowing if anyone besides Teradata provides it in their RDBMS.

By: Curt Monash

Curt Monash — Fri, 12 Dec 2008 05:37:47 +0000

Tom,

Most of the row-based competitors can, as one implementation option, do a hash partition, forgo indexes, and expect the queries to be satisfied by table scans.

So I’m not clear as to exactly what architectural point you are making that puts Teradata ahead of the newer guys, or for that matter that makes it impossible to use Oracle in the way that you described.

If all you’re saying is that b-trees aren’t the way to do decision support, and that the architectures of specialty products reflects this fact better than Oracle’s does, I agree completely. But it looked as if you were going to an extreme that I don’t see the foundation for.

CAM

By: Tom Williams

Tom Williams — Thu, 11 Dec 2008 13:43:40 +0000

You have to also consider the join algorithms when evaluating a decision support RDBMS. Teradata is the only vendor who can guarantee linear scalability and it is because of the hashed based file system which was built to solve decision support problems. Oracle, IBM and Neoview are all deployed on a b-tree file system which was designed for OLTP. This forces them to use nlogn join algorithms when the queries involve very large tables or the concurrency level is high.

By: Curt Monash

Curt Monash — Sat, 04 Oct 2008 02:12:15 +0000

Joe,

Erin McCabe recently joined HP’s BI unit. Expect better PR from them in the future! 🙂

Best,

CAM

By: Curt Monash

Curt Monash — Thu, 02 Oct 2008 18:35:21 +0000

Thanks, Glenn — good points all!

CAM