Comments on: Introduction to Kognitio WX-2

By: Infology.Ru » Blog Archive » Стратегии аппаратного обеспечения комплексов для хранилищ данных

Thu, 21 Aug 2008 18:15:00 +0000

[…] Kognitio когда-либо станет поставщиком комплексов, они будут […]

By: Infology.Ru » Blog Archive » Быстрый обзор технологий хранилищ данных

Infology.Ru » Blog Archive » Быстрый обзор технологий хранилищ данных — Wed, 20 Aug 2008 16:37:38 +0000

[…] Большое количество специалистов в области хранилищ данных предлагают архитектуры, основанные на традиционных строковых реляционных СУБД, но оптимизируют их для аналитической нагрузки. Среди них Teradata, Netezza, DATAllegro, Greenplum, Dataupia, и SAS. Все они, за исключением SAS целиком или в основном являются производителями комплексов для хранилищ данных архитектуры MPP/shared-nothing data warehouse. ПРАВКА: Смотрите комментарии по теме Kognitio. […]

By: DBMS2 — DataBase Management System Services»Blog Archive » Word of the day: “Compression”

Fri, 16 Mar 2007 09:30:57 +0000

[…] IBM sent over a bunch of success stories recently, with DB2’s new aggressive compression prominently mentioned. Mike Stonebraker made a big point of Vertica’s compression when last we talked; other column-oriented data warehouse/mart software vendors (e.g. Kognitio, SAP, Sybase) get strong compression benefits as well. Other data warehouse/mart specialists are doing a lot with compression too, although some of that is governed by please-don’t-say-anything-good-about-us NDA agreements. […]

By: DBMS2 — DataBase Management System Services»Blog Archive » Who’s who in columnar relational database management systems

Mon, 22 Jan 2007 11:28:32 +0000

[…] The best known columnar RDBMS is surely Sybase’s IQ Accelerator, evolved from a product acquired in the mid-1990s. Problem – it doesn’t have a shared-nothing architecture of the sort needed to exploit grid/blade technology. Whoops. The other recognized player is SAND, but I don’t know a lot about them. Based on their website, it would seem that grids and compression play a big part in their story. Less established but pretty interesting is Kognitio, who are just beginning to make marketing noise outside the UK. SAP’s BI Accelerator is also a compressed columnar system, but operates entirely in-memory and hence is limited in possible database size. Mike Stonebraker’s startup Vertica is of course the new kid on the block, and there are other columnar startups as well whose names currently escape me. […]

By: Stuart Frost

Stuart Frost — Tue, 31 Oct 2006 00:43:32 +0000

Roger,

I agree with you on avoiding a tit for tat argument, so I’ll just focus on asking for clarification.

Does the redistribution have to be carried out every time the underlying table is changed, or can it be done incrementally?

Since most DW schemas (especially star schemas) have very predictable access paths, does this mean most customers redistribute their fact and large dimension tables and leave the duplicates in place until the next load?

Since these redistributed tables are effectively materialized views, how do you keep them in RAM, as implied by your earlier emails? Surely they are too big?

Thanks for the comment on our value proposition. I agree that it’s much more attractive than any we’ve come across so far. I’m looking forward to seeing how it stacks up against yours 🙂

Stuart

By: Curt Monash

Curt Monash — Wed, 25 Oct 2006 13:53:45 +0000

Roger,

Are these redistributions additional copies? fif not, what do you mean by “drop”?

Thanks,

CAM

By: Roger Gaskell, Kognitio

Roger Gaskell, Kognitio — Wed, 25 Oct 2006 13:16:44 +0000

I do not wish to get into a tit for tat argument with Stuart about the merits of each others technology on this forum. I personally believe that DatAllegro have a much more attractive proposition than many in this space and I am sure we will be crossing swords on competitive benchmarks in the very near future.

However, I will answer this specific point.

There are a couple of bits of information that Stuart is missing. WX2 does indeed randomly distribute the data across all available nodes when the data is loaded. No decisions have to made in advance on how the data should be loaded to satisfy a particular query profile, which simplifies the whole process. It also true that if nothing else was done then WX2’s optimiser would automatically re-distribute the data, if the data was not suitably distributed, on a query by query basis. However this is not the normal mode of operation. WX2 has simple SQL extensions that allow the data to be re-distributed as a separate step. The optimiser will then use the new distribution for any subsequent queries without needing to perform a re-distribution. Re-distributions can be created and dropped as required.

To answer Stuart’s point about GbE and the performance of a re-distribution. A 20TB WX2 system is capable of re-distributing a 1TB table in around 20 seconds. This is assuming that every WX2 nodes needs to re-distribute all that tables rows and columns. Typically this is not the case because WX2 only re-distributes the rows and columns it needs for a particular query. WX2 nodes support multiple GbE links and we have lots of nodes. Incidentally we also work quite happily with Infiniband as well as GbE.

By: Stuart Frost

Stuart Frost — Sun, 22 Oct 2006 02:44:48 +0000

Curt,

True, we have to move data around if a join is on a non-partitioned key, but that’s very unusual for all but fairly small tables in our architecture. Also, we can move data over Infiniband MUCH faster than GigE with almost zero impact on processor load. Kognitio’s comment about moving to memory being faster is meaningless, since GigE will be the (slower than disk) bottleneck.

Moving data around for every join seems pointless and would appear to be an unnecessary overhead that impacts both scaling and concurrency.

I don’t know much about text search, but the nature of that problem would seem to be very different to relational-based databases, where set logic allows you to process joins very efficiently. Why not take advantage of the inherent structure of the data where you can?

Given all this, I just don’t understand why the Kognitio designers chose to distribute data randomly.

Stuart
DATAllegro

By: Curt Monash

Curt Monash — Sat, 21 Oct 2006 18:59:27 +0000

Stuart,

I’d also note that this design is analogous to those used in text search engines, including (I presume) every major web search engine, and also FAST’s.

I’ve been meaning to post on exactly that point over on Text Technologies. Maybe I’ll do so now and point to this thread.

By: Curt Monash

Curt Monash — Sat, 21 Oct 2006 18:56:42 +0000

Stuart,

How is that different from the situation for DATallegro in cases where the join key is neither range partitioned nor hashed on?

Also, while I don’t think Kognitio moves their compressed bitmaps around from node to node, I’ll ping them to see if they want to jump into this discussion.

Thanks,

CAM