March 28, 2008

Disruption versus chasm crossing in the database market

The 451 Group just released a report on open source DBMS adoption. In a blog post announcing same, Matthew Aslett wrote (emphasis mine):

you only have to look at the comparative revenues of the open source and proprietary vendors to see that there is a vast chasm to be crossed.

“Chasm” memes were introduced by Geoffrey Moore, founder of the Chasm Group and author of Crossing the Chasm. His defining example was Oracle, and the database market in general. The core insight was that platform markets get to tipping points, after which the leaders have tremendous advantages that make them tend to remain leaders for a good long time.

The sequel to “chasm” theory is Clayton Christensen’s “disruption” rubric, popularized in The Innovator’s Dilemma. I’ve argued previously that the DBMS market is being disrupted, in both the ways that Christensen records: Read more

Categories: Data warehouse appliances, Open source

1 Comment

March 28, 2008

XML versus sparse columns in variable schemas

Simon Sabin makes an interesting point: If you can have 30,000 columns in a table without sparsity management blowing up, you can handle entities with lots of different kinds of attributes. (And in SQL Server you can now do just that.) The example he uses is products — different products can have different sets of possible colors, different kinds of sizes, and so on. An example I’ve used in the past is marketing information — different prospects can reveal different kinds of information, which may have been gathered via non-comparable marketing programs.

I’ve suggested this kind of variability as a reason to actually go XML — you’re constantly adding not just new information, but new kinds of information, so your fixed schema is never up to date. But I haven’t detected many actual application designers who agree with me …

Categories: MySQL, Structured documents, Theory and architecture

3 Comments

March 27, 2008

The illuminate guys have a CTO blog

If you want to know more about illuminate’s data warehouse offerings, CTO Joe Foley has a blog. A good starting point might be the post on value-based storage. Two key points seem to be:

The VBS also provides some data access features that can not be duplicated in any other structure. A search can be executed starting with a data value in the pool. By going from the value pool back to the index, it is possible to quickly locate every use of the value wherever is may be used in the logical record structures.

which makes sense, and

This structure also enables our incremental query capability. As the result of a query, the database returns a set of instance identifiers rather than a set of records. This is because there are no records, only pointers and values. With the response being a set of pointers, it is a simple matter to perform the next query step and then get the union or difference between the two sets of pointers for the result of the second query step. This process can be continued indefinitely with the result set shrinking or growing as the new results are merged with the old.

which still sounds like gobbledygook to me. Read more

Categories: Analytic technologies, Business intelligence, Data warehousing, illuminate Solutions

iLuminate’s correlation/associative approach to data warehousing

illuminate Solutions (small “i”) is an interesting little company, still rough around the edges. (E.g., the Press Release Archive page at i-lluminate.com says, in its entirety, “We are in the process of loading our historical press releases. Please check back the second week in March!” And I only got that much when I corrected an obvious typo in the URL in the menu bar.) According to CTO Joe Foley, illuminate has 37 or so employees, and 40+ customers, ¾ of whom are in their home country of Spain and ½ the rest of whom are in Latin America. Now they’re entering the US.

illuminate’s basic idea is one I’ve heard before, but mainly from companies with more of a search orientation*, such as Attivio: Take a collection of tables, create a big inverted index on all the values in all columns at once, and do queries on that. This, illuminate claims, obviates all sorts of database design problems and similar hassles you otherwise might have. illuminate’s buzzword for all this is “CDBMS”, where the “C” stands for correlation. The actual CDBMS product is called iLuminate; related business intelligence tools are called iCorrelate and iAnalyze. What iLuminate actually indexes is a token that holds four pieces of information: Instance identifier, table identifier, column identifier, and value. Read more

Categories: Analytic technologies, Business intelligence, Data warehousing, illuminate Solutions

2 Comments

March 26, 2008

Pervasive is also pursuing simplicity and SaaS integration

I blogged recently about Cast Iron Systems, a simplicity-oriented data integration appliance vendor that is increasingly focusing on the SaaS market. Well, Pervasive Software is doing something similar.

Via Data Integrator, Pervasive is a leader in the low-cost integration market, with revenue split about 50/25/25 between direct sales, ISVs, and SaaS. Pervasive fondly believes that its products cost half as much as Cast Iron’s, and wind up taking no more installation effort when you factor in Pervasive’s broader capabilities in areas such as workflow. However, there’s some doubt as to whether this is apples-to-apples. Cast Iron does include hardware, after all, and as Pervasive itself points out, Cast Iron will bundle some professional services into a sale if you ask nicely.

Two things are new. Read more

Categories: Cloud computing, EAI, EII, ETL, ELT, ETLT, Pervasive Software, Software as a Service (SaaS)

5 Comments

March 25, 2008

Elastra launched today

At Elastra’s request, I didn’t write further about them back when I was interested in doing so. But you can go find out about them yourself. Basically, their secret sauce is that they write deployment instructions in a few hundred lines of two proprietary markup languages. They have ambitions beyond DBMS, and beyond the Amazon cloud.

According to their slides, they have 13 paying customers.

Categories: Cloud computing, Elastra

1 Comment

March 25, 2008

The eBay analytics guys have a blog now

Oliver Ratzesberger and his crew have started a blog, focusing on xldb analytics. Naturally, one of the early posts gives a quick overview of their system stats. Highlights include:

Incoming data volumes exceed 40TB per day, with more than 10^11 new items/lines/records being added per day. Our analytical processing infrastructure exceeds 6PB of physical storage with over 2.9PB(1.4+1.5) in our largest cluster.

We leverage compression technologies wherever possible and are achieving compression ratios as high as 99% on our highest volume data feeds.

On any given day our massive parallel systems process more than 27PB of data, not factoring in various levels of caches that serve similar activities or processes and reduce the amount of physical IOs significantly.

We execute millions of requests on a daily basis, spanning from near realtime highly localized access to enormous jobs that span 100s of TB in a single or series of models.

Categories: eBay, Specific users

GridSQL: What EnterpriseDB is and is not doing in Postgres-based MPP data warehousing

While talking with EnterpriseDB about today’s Postgres Plus announcements, I took the chance to clear up a point of confusion. Somebody told Seth Grimes that EnterpriseDB is out to compete with Greenplum, but that person was wrong. EnterpriseDB fondly hopes to manage multi-terabyte data warehouses, just as Oracle and Microsoft do with their respective general-purpose DBMS. However, EnterpriseDB is not going after the 10s-100s of terabytes sized DBMS that are the province of specialists such as Greenplum, Teradata, Netezza, or columnar DBMS vendors.

Even so, in GridSQL EnterpriseDB does seem to be open-sourcing MPP shared-nothing basics. There’s a lightweight optimizer that does a little (but only a little) more to minimize data movement beyond just optimizing queries on each node. And GridSQL knows how to replicate small tables across each node, a key aspect of many MPP designs. (Partition your facts; replicate your dimensions.)

Categories: Analytic technologies, Data warehousing, EnterpriseDB and Postgres Plus, Greenplum, Open source, Parallelization

1 Comment

March 25, 2008

EnterpriseDB unveils Postgres Plus

EnterpriseDB is making a series of moves and announcements. Highlights include:

Renaming/repositioning the product as “Postgres Plus.” The free product is now Postgres Plus, while the version you pay EnterpriseDB for is now Postgres Plus Advanced Server.
Repackaging the products, so that Postgres Plus Advanced Server is a strict superset of Postgres Plus.
New features added to Postgres Plus Advanced Server.
Features newly migrated from Advanced Server down to Postgres Plus.
A strategic investment by IBM.
Stressing Postgres in EnterpriseDB marketing, and dropping the tag-line defining themselves as “the Oracle-compatible database company.”

So far as I can tell, most of the technical differences between Advanced Server and regular Postgres Plus lie in three areas: Read more

Categories: Cache, Emulation, transparency, portability, EnterpriseDB and Postgres Plus, Mid-range, MySQL, OLTP, Open source, PostgreSQL

1 Comment

March 21, 2008

Cast Iron Systems focuses on SaaS data integration

When I wrote about data integration vendor Cast Iron Systems a year ago, its core message was “simplicity, simplicity, simplicity.” Supporting points included:

An appliance delivery format.
Lots of heuristics for automatic mapping and quick set-up. E.g., Cast Iron claims that 70% of a typical SAP-Salesforce.com connection can be done straight out of the box.
The absence of data cleaning/transformation features that might complicate things.

Cast Iron still believes in all that.

Even so, its messaging has changed a bit. Cast Iron now bills itself, in the first sentence of its press release boilerplate, as “the fastest growing SaaS integration appliance vendor.” And when I talked with marketing chief Simon Peel today, the only use cases we discussed were connections between SaaS and on-premises apps. Read more

Categories: Cast Iron Systems, Cloud computing, EAI, EII, ETL, ELT, ETLT, Informatica, Software as a Service (SaaS)

2 Comments

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Disruption versus chasm crossing in the database market

XML versus sparse columns in variable schemas

The illuminate guys have a CTO blog

iLuminate’s correlation/associative approach to data warehousing

Pervasive is also pursuing simplicity and SaaS integration

Elastra launched today

The eBay analytics guys have a blog now

GridSQL: What EnterpriseDB is and is not doing in Postgres-based MPP data warehousing

EnterpriseDB unveils Postgres Plus

Cast Iron Systems focuses on SaaS data integration

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin