March 28th, 2008 Curt Monash
The 451 Group just released a report on open source DBMS adoption. In a blog post announcing same, Matthew Aslett wrote (emphasis mine):
you only have to look at the comparative revenues of the open source and proprietary vendors to see that there is a vast chasm to be crossed.
“Chasm” memes were introduced by Geoffrey Moore, founder of the Chasm Group and author of Crossing the Chasm. His defining example was Oracle, and the database market in general. The core insight was that platform markets get to tipping points, after which the leaders have tremendous advantages that make them tend to remain leaders for a good long time.
The sequel to “chasm” theory is Clayton Christensen’s “disruption” rubric, popularized in The Innovator’s Dilemma. I’ve argued previously that the DBMS market is being disrupted, in both the ways that Christensen records: Read the rest of this entry »
Posted in Data warehouse appliances, Open source RDBMS, Relational database management systems | 1 Comment »
March 28th, 2008 Curt Monash
Simon Sabin makes an interesting point: If you can have 30,000 columns in a table without sparsity management blowing up, you can handle entities with lots of different kinds of attributes. (And in SQL Server you can now do just that.) The example he uses is products — different products can have different sets of possible colors, different kinds of sizes, and so on. An example I’ve used in the past is marketing information — different prospects can reveal different kinds of information, which may have been gathered via non-comparable marketing programs.
I’ve suggested this kind of variability as a reason to actually go XML — you’re constantly adding not just new information, but new kinds of information, so your fixed schema is never up to date. But I haven’t detected many actual application designers who agree with me …
Please subscribe to our feed!
Posted in Database theory and practice, MySQL, Native XML | 2 Comments »
March 27th, 2008 Curt Monash
If you want to know more about illuminate’s data warehouse offerings, CTO Joe Foley has a blog. A good starting point might be the post on value-based storage. Two key points seem to be:
The VBS also provides some data access features that can not be duplicated in any other structure. A search can be executed starting with a data value in the pool. By going from the value pool back to the index, it is possible to quickly locate every use of the value wherever is may be used in the logical record structures.
which makes sense, and
This structure also enables our incremental query capability. As the result of a query, the database returns a set of instance identifiers rather than a set of records. This is because there are no records, only pointers and values. With the response being a set of pointers, it is a simple matter to perform the next query step and then get the union or difference between the two sets of pointers for the result of the second query step. This process can be continued indefinitely with the result set shrinking or growing as the new results are merged with the old.
which still sounds like gobbledygook to me. Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Data warehousing, illuminate Solutions and iLuminate | No Comments »
March 26th, 2008 Curt Monash
illuminate Solutions (small “i”) is an interesting little company, still rough around the edges. (E.g., the Press Release Archive page at i-lluminate.com says, in its entirety, “We are in the process of loading our historical press releases. Please check back the second week in March!” And I only got that much when I corrected an obvious typo in the URL in the menu bar.) According to CTO Joe Foley, illuminate has 37 or so employees, and 40+ customers, ¾ of whom are in their home country of Spain and ½ the rest of whom are in Latin America. Now they’re entering the US.
illuminate’s basic idea is one I’ve heard before, but mainly from companies with more of a search orientation*, such as Attivio: Take a collection of tables, create a big inverted index on all the values in all columns at once, and do queries on that. This, illuminate claims, obviates all sorts of database design problems and similar hassles you otherwise might have. illuminate’s buzzword for all this is “CDBMS”, where the “C” stands for correlation. The actual CDBMS product is called iLuminate; related business intelligence tools are called iCorrelate and iAnalyze. What iLuminate actually indexes is a token that holds four pieces of information: Instance identifier, table identifier, column identifier, and value.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Data warehousing, illuminate Solutions and iLuminate | No Comments »
March 26th, 2008 Curt Monash
I blogged recently about Cast Iron Systems, a simplicity-oriented data integration appliance vendor that is increasingly focusing on the SaaS market. Well, Pervasive Software is doing something similar.
Via Data Integrator, Pervasive is a leader in the low-cost integration market, with revenue split about 50/25/25 between direct sales, ISVs, and SaaS. Pervasive fondly believes that its products cost half as much as Cast Iron’s, and wind up taking no more installation effort when you factor in Pervasive’s broader capabilities in areas such as workflow. However, there’s some doubt as to whether this is apples-to-apples. Cast Iron does include hardware, after all, and as Pervasive itself points out, Cast Iron will bundle some professional services into a sale if you ask nicely.
Two things are new. Read the rest of this entry »
Posted in Cloud computing, EII, ETL, and/or EAI, Pervasive Software, SaaS | 4 Comments »
March 25th, 2008 Curt Monash
At Elastra’s request, I didn’t write further about them back when I was interested in doing so. But you can go find out about them yourself. Basically, their secret sauce is that they write deployment instructions in a few hundred lines of two proprietary markup languages. They have ambitions beyond DBMS, and beyond the Amazon cloud.
According to their slides, they have 13 paying customers.
Posted in Cloud computing, Elastra | No Comments »
March 25th, 2008 Curt Monash
Oliver Ratzesberger and his crew have started a blog, focusing on xldb analytics. Naturally, one of the early posts gives a quick overview of their system stats. Highlights include:
Incoming data volumes exceed 40TB per day, with more than 10^11 new items/lines/records being added per day. Our analytical processing infrastructure exceeds 6PB of physical storage with over 2.9PB(1.4+1.5) in our largest cluster.
We leverage compression technologies wherever possible and are achieving compression ratios as high as 99% on our highest volume data feeds.
On any given day our massive parallel systems process more than 27PB of data, not factoring in various levels of caches that serve similar activities or processes and reduce the amount of physical IOs significantly.
We execute millions of requests on a daily basis, spanning from near realtime highly localized access to enormous jobs that span 100s of TB in a single or series of models.
Posted in Specific users, eBay | No Comments »
March 25th, 2008 Curt Monash
While talking with EnterpriseDB about today’s Postgres Plus announcements, I took the chance to clear up a point of confusion. Somebody told Seth Grimes that EnterpriseDB is out to compete with Greenplum, but that person was wrong. EnterpriseDB fondly hopes to manage multi-terabyte data warehouses, just as Oracle and Microsoft do with their respective general-purpose DBMS. However, EnterpriseDB is not going after the 10s-100s of terabytes sized DBMS that are the province of specialists such as Greenplum, Teradata, Netezza, or columnar DBMS vendors.
Even so, in GridSQL EnterpriseDB does seem to be open-sourcing MPP shared-nothing basics. There’s a lightweight optimizer that does a little (but only a little) more to minimize data movement beyond just optimizing queries on each node. And GridSQL knows how to replicate small tables across each node, a key aspect of many MPP designs. (Partition your facts; replicate your dimensions.)
Please subscribe to our feed!
Technorati Tags: GridSQL
Posted in Analytics and analytic technologies, Data warehousing, EnterpriseDB and Postgres Plus, Greenplum, Open source RDBMS, Relational database management systems | No Comments »
March 25th, 2008 Curt Monash
EnterpriseDB is making a series of moves and announcements. Highlights include:
- Renaming/repositioning the product as “Postgres Plus.” The free product is now Postgres Plus, while the version you pay EnterpriseDB for is now Postgres Plus Advanced Server.
- Repackaging the products, so that Postgres Plus Advanced Server is a strict superset of Postgres Plus.
-
New features added to Postgres Plus Advanced Server.
-
Features newly migrated from Advanced Server down to Postgres Plus.
- A strategic investment by IBM.
- Stressing Postgres in EnterpriseDB marketing, and dropping the tag-line defining themselves as “the Oracle-compatible database company.”
So far as I can tell, most of the technical differences between Advanced Server and regular Postgres Plus lie in three areas: Read the rest of this entry »
Posted in Cache, EnterpriseDB and Postgres Plus, Mid-range DBMS, MySQL, OLTP database management, Open source RDBMS, Portability, transparency, and plug-compatibility, PostgreSQL, Relational database management systems | 1 Comment »
March 21st, 2008 Curt Monash
When I wrote about data integration vendor Cast Iron Systems a year ago, its core message was “simplicity, simplicity, simplicity.” Supporting points included:
- An appliance delivery format.
- Lots of heuristics for automatic mapping and quick set-up. E.g., Cast Iron claims that 70% of a typical SAP-Salesforce.com connection can be done straight out of the box.
-
The absence of data cleaning/transformation features that might complicate things.
Cast Iron still believes in all that.
Even so, its messaging has changed a bit. Cast Iron now bills itself, in the first sentence of its press release boilerplate, as “the fastest growing SaaS integration appliance vendor.” And when I talked with marketing chief Simon Peel today, the only use cases we discussed were connections between SaaS and on-premises apps. Read the rest of this entry »
Posted in Cast Iron Systems, Cloud computing, EII, ETL, and/or EAI, Informatica, SaaS | 1 Comment »
March 19th, 2008 Curt Monash
I talked with both Coral8 and Truviso this afternoon. They both have their financial services efforts, of course. Coral8 also continues to get business doing data reduction for sensor networks — mainly RFID and utilities, I think. Coral8 is working on some really cool and confidential other stuff as well.
But my biggest takeaway from this pair of calls was that Coral8 and Truviso are penetrating general BI. Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Complex event/stream processing (CEP), Coral8, Memory-centric data management, Truviso | No Comments »
March 19th, 2008 Curt Monash
It seems that the CEP folks are still concerned about what to call themselves. There really are only three choices:
- Complex event processing
- Event processing
- Event stream processing
“Stream processing” might once have been on the list, but it has too many other meanings, and “streaming” adds more meanings yet.
“Complex” has the virtue of inertia; CEP is the closest thing the category has to an agreed-upon name. But few people want to buy technology that describes itself as being “complex.” And in any case it’s not clear how complex many of those events are. “Event stream processing” isn’t terribly well established, and to some extent it runs afoul of the same ambiguities as “stream processing.” What’s worse, those names lead to four-word product category names. Who really wants to market or hear about “complex event processing engines” or “event stream processing platforms”?
So let’s just call the category “event processing” and have done with it, OK? Products can, if they want, be “event processing somethings.” Names like that wouldn’t be any more of a mouthful than “data warehouse appliance,” and the latter category is doing pretty well for itself.
Please subscribe to our feed!
Posted in Complex event/stream processing (CEP) | No Comments »
March 14th, 2008 Curt Monash
An interesting part of my conversation with Dataupia’s CTO John O’Brien came when we talked about data warehousing in general. On the one hand, he endorsed the view that using Oracle probably isn’t a good idea for data warehouses larger than 10 terabytes, with SQL Server’s limit being well below that. On the other hand, he said he’d helped build 50-60 terabyte warehouses in Oracle years ago.
The point is that to build warehouses that big in Oracle or other traditional DBMS, you have to pull out a large bag of tricks. Read the rest of this entry »
Posted in Analytics and analytic technologies, Data warehouse appliances, Data warehousing, Microsoft and SQL*Server, Oracle, Relational database management systems | 16 Comments »
March 14th, 2008 Curt Monash
I had a catch-up phone meeting with Dataupia, since I hadn’t spoke with the company since the middle of last year. Like several other companies in the data warehouse specialist market, Dataupia can be annoyingly secretive. On the plus side – and this is very refreshing — Dataupia doesn’t seem to expect credit for accomplishments beyond those they’re willing to provide actual evidence for.
What I’ve gleaned about Dataupia’s customer activity to date amounts to: Read the rest of this entry »
Posted in Analytics and analytic technologies, Data warehouse appliances, Data warehousing, Dataupia, Portability, transparency, and plug-compatibility | No Comments »
March 14th, 2008 Curt Monash
I wrote a few weeks ago about the H-Store project, which rejects a variety of assumptions underlying traditional OLTP database design. One of these is long transactions over open database connections. The idea is that the most demanding OLTP applications run on the Web, where abandonment is common, and hence the only sensible option is to break things up into simple chunks. Read the rest of this entry »
Posted in Application areas, OLTP database management | No Comments »
March 13th, 2008 Curt Monash
Twitter commonly has the problem of duplicate tweets. That is, if you post a message, it shows up twice. After a little while, the dupe disappears, but if you delete the dupe manually, the original is gone too.
I presume what’s going on is that tweets are cached, the tweets are eventually batched to disk, and they don’t always get deleted from cache until some time after they’re persisted. If you happen to check the page of your recent tweets inbetween — boom, you get two hits. But what I don’t understand is why the two versions have different timestamps.
Presumably, this could be explained at a MySQL User Conference session next month, one of whose topics will be Intelligent caching strategies using a hybrid MemCache / MySQL approach. I’m so glad they don’t use stupid strategies to do this … Read the rest of this entry »
Posted in Cache, MySQL, OLTP database management, Specific users | 3 Comments »
March 11th, 2008 Curt Monash
Last year, I thought that solidDB could at least potentially be an outstanding MySQL engine. But as per news posted on SourceForge last week, that’s not going to happen. At least, it’s not going to happen via any development efforts from IBM.
Please sign up for our feed!
Posted in IBM and DB2, Mid-range DBMS, MySQL, Open source RDBMS, Relational database management systems, solidDB | 4 Comments »
March 7th, 2008 Curt Monash
Cartoon
Song (previously posted)
Poem
Another cartoon — not particularly funny on its own — that appears to come between two of the above
Posted in Humor | No Comments »
March 6th, 2008 Curt Monash
The relational DBMS industry is filled with startups. In some way or other, most of them are based on or make use of the open source project PostgreSQL. (Not all, of course; exceptions include DATAllegro and Infobright, which are based on Ingres and MySQL respectively.) But how they use PostgreSQL varies greatly. Read the rest of this entry »
Posted in EnterpriseDB and Postgres Plus, Greenplum, Open source RDBMS, PostgreSQL, Relational database management systems, Vertica Systems | 9 Comments »
March 6th, 2008 Curt Monash
As usual, Microsoft forgot to brief me, but Mary Jo Foley reports on Microsoft SQL Server Data Services. A look at the official site clarifies that this database-in-a-cloud offering uses “Microsoft SQL Server as a data storage node.” However, there seems to be a software layer on top of SQL Server providing scale-out and appropriate management.
In addition to the more-than-SQL-Server layer, there seems to be a less-than-SQL-Server aspect as well. In a particular, Microsoft SQL Server Data Services boasts “Support for simple types: string, numeric, datetime, boolean.” XML is the “primary wire format,” and hints dropped about the schema philosophy sound XMLish too.
Interestingly, Foley reports that Microsoft plans to offer an on-premises version of Microsoft SQL Server Data Services as well.
Please subscribe to our feed!
Posted in Cloud computing, Microsoft and SQL*Server, Native XML | No Comments »
March 6th, 2008 Curt Monash
I previously wrote that EnterpriseDB-on-Elastra has very little enterprise traction, drawing most of its interest instead from online businesses or ISVs. Having used that as a starting point in a recent chat with EnterpriseDB marketing chief Derek Rodner, I can now add that overall:
- EnterpriseDB reports good traction with ISVs. In particular, those that resell Oracle would like a cheaper alternative. Sometimes, they can port their code with no rewriting at all.
- Online businesses of various kinds also are a significant fraction of the customer base.
- EnterpriseDB has some true large-enterprise customers — Derek rattled off some household names — but this isn’t yet the heart of its business.
- EnterpriseDB has an increasing business teleselling to SMBs.
Please subscribe to our feed!
Posted in EnterpriseDB and Postgres Plus, Mid-range DBMS, Portability, transparency, and plug-compatibility, Relational database management systems | No Comments »
March 4th, 2008 Curt Monash
Intelligent Enterprise has an article on Sybase IQ and columnar systems that leaves me shaking my head. E.g., it ends by saying Netezza has a columnar architecture (uh, no). It also quotes an IBM exec as saying only 10-20% of what matters in a data warehouse DBMS is performance (already an odd claim), and then has him saying columnar only provides a 10% performance gain (let’s be generous and hope that’s a misquote).
Also from the article — and this part seems more credible — is:
“Sybase IQ revenues were up 70% last year,” said Richard Pledereder, VP of engineering. … Sybase now claims 1,200 Sybase IQ customers. It runs large data warehouses powered by big, multiprocessor servers. Priced at $45,000 per CPU, those IQ customers now account for a significant share of Sybase’s revenues, although the company won’t break down revenues by market segment.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Columnar architectures, Data warehousing, Relational database management systems, Specific users, Sybase | 1 Comment »