October 5, 2006

Introduction to Kognitio WX-2

Kognitio called me for a briefing this morning on their WX-2 product. Technical highlights included:

Much like the other “new” MPP data warehouse vendors, Kognitio claims to never have knowingly been outbenchmarked, whether on performance or on TCO factors such as ease of installation.

Kognitio is essentially the company formerly known as White Cross, which recently merged with – well, with a company called Kognitio. They’re UK-based, and have sold about 10 copies of their database software, about 9 of those to UK customers. But they’re coming seriously to the US Real Soon Now. They also are releasing in a couple weeks a new version (Version 6) of their product, with improvements in query streaming, workload management, and so on. They report having about 70 employees.

White Cross originally had an integrated BI and DBMS product, running on customer hardware. The main investors were famed venture capitalists Warburg Pincus (the guys who also funded BEA and Veritas predecessor OpenVision), and Geoff Squire, a former Very Senior Executive at Oracle and then OpenVision, and the guy who brought them the deal. Warburg Pincus and other VCs have been pretty much wiped out in a cramdown round (oops — I did due diligence for Warburg), and there now are two main investors, one of them still Geoff.

They’ve now ported to standard/commodity hardware, and seem to be deemphasizing some or all of their analytic technology. Along the way Kognitio ran a service bureau with 1000 nodes or so, confirming scalability, and are confident that they can do proofs-of-concept in the multiple-hundred-node range from service bureaus now. They also have a customer win with 290 nodes, although the rest seem to be more in the 30-125 node range.

A key part of the Kognitio message is “utterly standard hardware.” They run on standard blades from HP and IBM or other vendors (I get the impression more HP to date), in standard enclosures, using the standard NICs inside the enclosures (1 gigabit Ethernet) and 10 gigabit Ethernet between enclosures. To date they haven’t even purchased or assembled systems for their customers, but it seems clear that demand will drive them to do so, as the MPP market is pretty appliance-centric, at least in a weak sense of “appliance.” Two particular advantages they cite to this approach are:

  1. Essentially instant availability on new CPUs, whether from AMD or Intel, since blades to use such chips are available with days of announcement. That said, I note that DATallegro did announce a new version the day Woodcrest came out, and that IBM’s DB2 BCUs use standard hardware.
  2. Very low hardware maintenance prices (1-2% of purchase cost, apparently).

Edit: For more on the data warehouse appliance market overall, please see this December, 2007 post on data warehouse appliance fact and fiction.

Comments

11 Responses to “Introduction to Kognitio WX-2”

  1. Stuart Frost on October 21st, 2006 2:38 pm

    Curt,

    I’m puzzled by your comment that data is just randomly spread across the nodes in Kognitio. When a join is carried out, surely this would mean that every row in the smaller table would need to be moved to EVERY node. That might be OK with very small dimension tables, but it clearly wouldn’t scale. Let’s assume a 20TB fact table and 1TB dimension table (not at all uncommon for our customers). If a join needs most of the columns of the smaller table, we’d need to move 1TB to each node to do the join. OK, so we can do it row by row to avoid the need for a huge amount of disk space, but it would still be incredibly slow across GigE – or any current network for that matter and there’d also be a lot of time spent waiting for new rows to arrive.

    I just don’t understand the logic behind this architecture, which goes against every other parallel database design that I’m aware of. Am I missing something?

    Stuart
    DATAllegro

  2. Curt Monash on October 21st, 2006 2:56 pm

    Stuart,

    How is that different from the situation for DATallegro in cases where the join key is neither range partitioned nor hashed on?

    Also, while I don’t think Kognitio moves their compressed bitmaps around from node to node, I’ll ping them to see if they want to jump into this discussion.

    Thanks,

    CAM

  3. Curt Monash on October 21st, 2006 2:59 pm

    Stuart,

    I’d also note that this design is analogous to those used in text search engines, including (I presume) every major web search engine, and also FAST’s.

    I’ve been meaning to post on exactly that point over on Text Technologies. Maybe I’ll do so now and point to this thread.

  4. Stuart Frost on October 21st, 2006 10:44 pm

    Curt,

    True, we have to move data around if a join is on a non-partitioned key, but that’s very unusual for all but fairly small tables in our architecture. Also, we can move data over Infiniband MUCH faster than GigE with almost zero impact on processor load. Kognitio’s comment about moving to memory being faster is meaningless, since GigE will be the (slower than disk) bottleneck.

    Moving data around for every join seems pointless and would appear to be an unnecessary overhead that impacts both scaling and concurrency.

    I don’t know much about text search, but the nature of that problem would seem to be very different to relational-based databases, where set logic allows you to process joins very efficiently. Why not take advantage of the inherent structure of the data where you can?

    Given all this, I just don’t understand why the Kognitio designers chose to distribute data randomly.

    Stuart
    DATAllegro

  5. Roger Gaskell, Kognitio on October 25th, 2006 9:16 am

    I do not wish to get into a tit for tat argument with Stuart about the merits of each others technology on this forum. I personally believe that DatAllegro have a much more attractive proposition than many in this space and I am sure we will be crossing swords on competitive benchmarks in the very near future.

    However, I will answer this specific point.

    There are a couple of bits of information that Stuart is missing. WX2 does indeed randomly distribute the data across all available nodes when the data is loaded. No decisions have to made in advance on how the data should be loaded to satisfy a particular query profile, which simplifies the whole process. It also true that if nothing else was done then WX2′s optimiser would automatically re-distribute the data, if the data was not suitably distributed, on a query by query basis. However this is not the normal mode of operation. WX2 has simple SQL extensions that allow the data to be re-distributed as a separate step. The optimiser will then use the new distribution for any subsequent queries without needing to perform a re-distribution. Re-distributions can be created and dropped as required.

    To answer Stuart’s point about GbE and the performance of a re-distribution. A 20TB WX2 system is capable of re-distributing a 1TB table in around 20 seconds. This is assuming that every WX2 nodes needs to re-distribute all that tables rows and columns. Typically this is not the case because WX2 only re-distributes the rows and columns it needs for a particular query. WX2 nodes support multiple GbE links and we have lots of nodes. Incidentally we also work quite happily with Infiniband as well as GbE.

  6. Curt Monash on October 25th, 2006 9:53 am

    Roger,

    Are these redistributions additional copies? fif not, what do you mean by “drop”?

    Thanks,

    CAM

  7. Stuart Frost on October 30th, 2006 8:43 pm

    Roger,

    I agree with you on avoiding a tit for tat argument, so I’ll just focus on asking for clarification.

    Does the redistribution have to be carried out every time the underlying table is changed, or can it be done incrementally?

    Since most DW schemas (especially star schemas) have very predictable access paths, does this mean most customers redistribute their fact and large dimension tables and leave the duplicates in place until the next load?

    Since these redistributed tables are effectively materialized views, how do you keep them in RAM, as implied by your earlier emails? Surely they are too big?

    Thanks for the comment on our value proposition. I agree that it’s much more attractive than any we’ve come across so far. I’m looking forward to seeing how it stacks up against yours :)

    Stuart

  8. DBMS2 — DataBase Management System Services»Blog Archive » Who’s who in columnar relational database management systems on January 22nd, 2007 7:28 am

    [...] The best known columnar RDBMS is surely Sybase’s IQ Accelerator, evolved from a product acquired in the mid-1990s. Problem – it doesn’t have a shared-nothing architecture of the sort needed to exploit grid/blade technology. Whoops. The other recognized player is SAND, but I don’t know a lot about them. Based on their website, it would seem that grids and compression play a big part in their story. Less established but pretty interesting is Kognitio, who are just beginning to make marketing noise outside the UK. SAP’s BI Accelerator is also a compressed columnar system, but operates entirely in-memory and hence is limited in possible database size. Mike Stonebraker’s startup Vertica is of course the new kid on the block, and there are other columnar startups as well whose names currently escape me. [...]

  9. DBMS2 — DataBase Management System Services»Blog Archive » Word of the day: “Compression” on March 16th, 2007 5:30 am

    [...] IBM sent over a bunch of success stories recently, with DB2’s new aggressive compression prominently mentioned. Mike Stonebraker made a big point of Vertica’s compression when last we talked; other column-oriented data warehouse/mart software vendors (e.g. Kognitio, SAP, Sybase) get strong compression benefits as well. Other data warehouse/mart specialists are doing a lot with compression too, although some of that is governed by please-don’t-say-anything-good-about-us NDA agreements. [...]

  10. Infology.Ru » Blog Archive » Быстрый обзор технологий хранилищ данных on August 20th, 2008 12:37 pm

    [...] Большое количество специалистов в области хранилищ данных предлагают архитектуры, основанные на традиционных строковых реляционных СУБД, но оптимизируют их для аналитической нагрузки. Среди них Teradata, Netezza, DATAllegro, Greenplum, Dataupia, и SAS. Все они, за исключением SAS целиком или в основном являются производителями комплексов для хранилищ данных архитектуры MPP/shared-nothing data warehouse. ПРАВКА: Смотрите комментарии по теме Kognitio. [...]

  11. Infology.Ru » Blog Archive » Стратегии аппаратного обеспечения комплексов для хранилищ данных on August 21st, 2008 2:15 pm

    [...] Kognitio когда-либо станет поставщиком комплексов, они будут [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.