October 2, 2008

History, focus, and technology of HP Neoview

On the basis of market impact to date, HP Neoview is just another data warehouse market participant – a dozen sales or so, a few systems in production, some evidence that it can handle 100 TB+ workloads, and so on. But HP’s BI Group CTO Greg Battas thinks Neoview is destined for greater things, because:

  1. HP is Really Serious about making Neoview into a great, high-end system, no matter how long it takes. Reasons are much as you’d expect, including that data warehousing is a large fraction of all computing expenditure, and HP CEO Mark Hurd came over from Teradata.
  2. There’s been a lot of investment and a long technical pedigree so far.
  3. Greg thinks Neoview’s technology is really cool.

Greg says that he actually started the Neoview project as –- if I may be so bold as to paraphrase — a skunkworks, big-honking-data-mart, data warehouse appliance effort. However, his bosses redirected him toward a super-high-end emphasis, and the Neoview technical mandate is now to focus on “Three Cs” — Concurrency, Complexity, and Capacity – with Capacity being the easiest of the three. Performance, by way of contrast, is a relatively low priority, occupying perhaps 10-15% of the Neoview R&D budget. Workload management (25% of all R&D) is a bigger deal, as is the ability to execute a broad variety of queries, including complex ones.

But there also was some “technology in the bank” to draw on. Everybody knows that Neoview in some way grew out of Tandem’s NonstopSQL, which was one of the great relational DBMS of the 1980s. What is less obvious is exactly how that great OLTP DBMS – which was most famously used for running automatic teller machine (ATM) networks — turned into a data warehousing product. The story turns out to be that in the 1990s, Microsoft threw $40-50 million at the Nonstop SQL group, to port Tandem’s system software onto Windows. Greg confusingly reports both that this effort occurred in the “late” 1990s and that it was shelved after the Tandem/Compaq merger, which occurred in 1997. Anyhow …

… at some time in the semi-distant past, Tandem wrote 1 million lines of new code. This included two major new components or redesigns to the DBMS that are highly relevant to data warehousing, namely:

And then the code just sat on the shelf, until HP bought Compaq, Mark Hurd came over to run HP, and HP decided to get into the data warehouse appliance business.

So what does this all amount to for the HP Neoview technology architecture? Well, highlights include:

* A cynic might wonder exactly what vast real-life Neoview production experience Greg was referring to. But as I’ve said before — if you have a serious problem with disk failures affecting performance, you might want to reconsider either the quality of disks you’re using, or your system management practices … .

Comments

12 Responses to “History, focus, and technology of HP Neoview”

  1. Joe Harris on October 2nd, 2008 9:21 am

    Great post (as usual). This is by far the most insightful piece on NeoView to date. Which, frankly, does not speak well of HP’s PR.

    Here’s my question though: What’s so “high end” about NeoView? If it’s as clever and fast and whizz bang as they say then why not offer it in bite size pieces as an appliance?

    I don’t see this going anywhere unless they make it appliance and put it out in the public eye for scrutiny.

    Pretend you’re a big telco for a minute… Teradata offers a reputation stability with speed, Netezza offers simplicity and speed, you’ve already got a ton of Oracle so you’ll give them a look.

    Where is the NeoView hook? Being as good as Teradata isn’t enough. Even being twice as fast isn’t enough because Netezza is 5x when it counts.

    Maybe they should give it away with the Itanium hardware it needs and ask Intel to foot the bill as a marketing effort.

    Just a thought.

  2. Glenn Paulley on October 2nd, 2008 1:34 pm

    Some comments on a few of these technology points:

    “Expressions – I assume this means projects and selects – are done via a kind of byte code, on the CPU. Greg suggested Teradata uses a similar approach.” Actually this pertains to the computation of any expression value in the engine, including aggregate functions, arithmetic functions, string functions, and so on. The idea behind using a byte-code machine is that the machine can, in principle, be “compiled” (optimized) at query build time to eliminate code that is unnecessary for this computation in this particular context. Other systems, including Sybase SQL Anywhere, use this approach.

    “Neoview’s Cascades-based optimizer seems to be smart enough to, for example, do aggregations before joins when it makes sense. (Much the same is true of Aster Data’s optimizer.)” – Pushing/pulling aggregation above/below a join was studied by a fellow graduate student, Paul Yan, at the University of Waterloo in the mid-1990s as part of his PhD thesis (under the direction of Paul Larson, now at Microsoft Research). As far as I know DB2 was the first product to incorporate these optimizations.

    “By the way, the basic idea behind Cascades – or at least Neoview’s version of it — is that it uses more heuristics than conventional cost-based optimizers do. That it, Neoview starts out with a candidate plan – perhaps derived in the usual way – and then considers variants on it.” – It is difficult to know how much Neoview’s implementation differs from other transformation-based optimizer implementations (such as Microsoft SQL Server) based on the Cascades framework (originally developed by Goetz Graefe, now at HP Labs). Every optimizer uses heuristics to reduce the size of the search space; whether or not one uses “more” heuristics over another is difficult to assess, because those assumptions are rarely documented, if even made public.

  3. Curt Monash on October 2nd, 2008 2:35 pm

    Thanks, Glenn — good points all!

    CAM

  4. Curt Monash on October 3rd, 2008 10:12 pm

    Joe,

    Erin McCabe recently joined HP’s BI unit. Expect better PR from them in the future! 🙂

    Best,

    CAM

  5. Tom Williams on December 11th, 2008 9:43 am

    You have to also consider the join algorithms when evaluating a decision support RDBMS. Teradata is the only vendor who can guarantee linear scalability and it is because of the hashed based file system which was built to solve decision support problems. Oracle, IBM and Neoview are all deployed on a b-tree file system which was designed for OLTP. This forces them to use nlogn join algorithms when the queries involve very large tables or the concurrency level is high.

  6. Curt Monash on December 12th, 2008 1:37 am

    Tom,

    Most of the row-based competitors can, as one implementation option, do a hash partition, forgo indexes, and expect the queries to be satisfied by table scans.

    So I’m not clear as to exactly what architectural point you are making that puts Teradata ahead of the newer guys, or for that matter that makes it impossible to use Oracle in the way that you described.

    If all you’re saying is that b-trees aren’t the way to do decision support, and that the architectures of specialty products reflects this fact better than Oracle’s does, I agree completely. But it looked as if you were going to an extreme that I don’t see the foundation for.

    CAM

  7. Tom Williams on December 13th, 2008 7:00 pm

    Which implementation besides Teradata provides linear scalability regardless of the size of the tables and the concurrent user level? From what I understand, Oracle, IMB, Neoview hash join plan is linear but is dependent on availability of sufficient meory. After that, their join plans are nlogn.

    Linear scalability is very rare in computing and I’d be interested in knowing if anyone besides Teradata provides it in their RDBMS.

  8. Curt Monash on December 14th, 2008 9:51 am

    Tom,

    I have a design that will ensure SUB-linear scalability, up to over a petabyte. On one terabyte of data, I’ll throttle performance by a factor of 10. On four terabytes, I’ll throttle it only by a factor of 8 … OK, I’m kidding. But to compare constant_1 times n vs. constant_2 times nlogn, it’s interesting to know what constant_1 and constant_2 are.

    More generally, I’m confused by what you’re saying. You seem to be assigning a single scalability function to all join plans on a particular product, no matter what strategy the particular query’s execution plan uses. Taken literally, that’s totally absurd, and I’m not guessing successfully at your actual and surely more sensible meaning.

  9. Tom Williams on December 15th, 2008 12:42 am

    It is a bit difficult to go through it detail (and I did like your joke).

    In short, most joins involve sorts and merges, which are expensive and can get very expensive when large data sets are involved. There are only two join plans that provide linear scalability, the hash join and the hash merge join. The hash merge join requires a hash based file sytem (different from hash distribution). The hash join employs a similiar technique but in memory. The problem is that memory runs out quickly and is often used for other operations like buffering. Teradata is the only RDBMS that provides the hash merge join.

    So if I have to sort and merge large data sets, I really want the hash merge join available to the optimzer.

  10. Goetz Graefe on January 27th, 2009 7:44 pm

    For what it’s worth, the Cascades project was never associated with the University of Wisconsin – Madison. The only possible connection is that I got my degree there. I wrote the query optimizer code 1993-94 while on the faculty of Portland State University (in Oregon) and consulting for Tandem. In addition to the Tandem project (and now HP Neoview), the code also formed the foundation for query optimization in Microsoft SQL Server 7.0 and onwards.

  11. Database Virtualization = Location Transparency. Old Wine in a New Bottle? « Share Virtual Machines on February 5th, 2009 2:59 am

    […] 2006. Oracle’s acquisition of TangoSol, Microsoft’s Project Velocity are following HP NeoView’s usage of distributed caches for solving large BI queries. Strictly speaking these are not […]

  12. Notes on HBase | DBMS 2 : DataBase Management System Services on March 10th, 2015 2:24 pm

    […] Another such project is Trafodion — supposedly the Welsh word for “transaction” — open sourced by HP. This seems to be based on NonStop SQL and Neoview code, which counter-intuitively have always been joined at the hip. […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.