Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
Kickfire update
A Kickfire competitor tipped me off that he got 3 Kickfire salesmen’s resumes in 24 hours. I ran this by Kickfire CEO Bruce Armstrong, who confirmed that Kickfire has had a layoff, but gave me no further details.
Bruce also told me that Kickfire is now up to 10 paying customers, and that there are repeat deals.
Categories: Data warehouse appliances, Data warehousing, Kickfire, Market share and customer counts | 3 Comments |
Ingres VectorWise technical highlights
After working through problems w/ travel, cell phones, and so on, Peter Boncz of VectorWise finally caught up with me for a regrettably brief call. Peter gave me the strong impression that what I’d written in the past about VectorWise had been and remained accurate, so I focused on filling in the gaps. Highlights included: Read more
Categories: Actian and Ingres, Analytic technologies, Benchmarks and POCs, Columnar database management, Data warehousing, Database compression, Open source, VectorWise | 2 Comments |
Algebraix
I talked Friday with Chris Piedemonte and Gary Sherman, respectively the Cofounder/CTO and Chief Mathematician of Algebraix, who hooked up together for this project back in 2003 or 2004. (Algebraix is the company formerly known as XSPRADA.) Algebraix makes an analytic DBMS, somewhat based on the ideas of extended set theory, that runs on SMP (Symmetric MultiProcessing) boxes. Like all analytic DBMS vendors, Algebraix has on some occasions run some queries orders of magnitude faster than they ran on the systems users were looking to replace.
Algebraix’s secret sauce is that the DBMS keeps reorganizing and recopying the data on disk, to optimize performance in response to expected query patterns (automatically inferred from queries it’s seen so far). This sounds a lot like the Infobright story, with some of the more obvious differences being: Read more
Categories: Algebraix, Data warehousing, Database compression, Infobright, Theory and architecture | 3 Comments |
Various quick notes
As you might imagine, there are a lot of blog posts I’d like to write I never seem to get around to, or things I’d like to comment on that I don’t want to bother ever writing a full post about. In some cases I just tweet a comment or link and leave it at that.
And it’s not going to get any better. Next week = the oft-postponed elder care trip. Then I’m back for a short week. Then I’m off on my quarterly visit to the SF area. Soon thereafter I’ve have a lot to do in connection with Enzee Universe. And at that point another month will have gone by.
Anyhow: Read more
Categories: Analytic technologies, Business intelligence, Data warehousing, Exadata, GIS and geospatial, Google, IBM and DB2, Netezza, Oracle, Parallelization, SAP AG, SAS Institute | 3 Comments |
More on Sybase IQ, including Version 15.2
Back in March, Sybase was kind enough to give me permission to post a slide deck about Sybase IQ. Well, I’m finally getting around to doing so. Highlights include but are not limited to:
- Slide 2 has some market success figures and so on. (>3100 copies at >1800 users, >200 sales last year)
- Slides 6-11 give more detail on Sybase’s indexing and data access methods than I put into my recent technical basics of Sybase IQ post.
- Slide 16 reminds us that in-database data mining is quite competitive with what SAS has actually delivered with its DBMS partners, even if it doesn’t have the nice architectural approach of Aster or Netezza. (I.e., Sybase IQ’s more-than-SQL advanced analytics story relies on C++ UDFs — User Defined Functions — running in-process with the DBMS.) In particular, there’s a data mining/predictive analytics library — modeling and scoring both — licensed from a small third party.
- A number of the other later slides also have quite a bit of technical crunch. (More on some of those points below too.)
Sybase IQ may have a bit of a funky architecture (e.g., no MPP), but the age of the product and the substantial revenue it generates have allowed Sybase to put in a bunch of product features that newer vendors haven’t gotten around to yet.
More recently, Sybase volunteered permission for me to preannounce Sybase IQ Version 15.2 by a few days (it’s scheduled to come out this week). Read more
Notes on SciDB and scientific data management
I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That’s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about issues in scientific data management. Here’s some of what has transpired since then.
The main new activity I know of has been in the open source SciDB project. Read more
Categories: Analytic technologies, Data warehousing, eBay, GIS and geospatial, Microsoft and SQL*Server, SciDB, Scientific research, Web analytics | 5 Comments |
Technical basics of Sybase IQ
The Sybase IQ folks had been rather slow about briefing me, at least with respect to crunch. They finally fixed that in February. Since then, I’ve been slow about posting based on those briefings. But what with Sybase being acquired by SAP, Sybase having an analyst meeting this week, and other reasons – well, this seems like a good time to post about Sybase IQ. 🙂
For starters, Sybase IQ is not just a bitmapped system, but it’s also not all that closely akin to C-Store or Vertica. In particular,
- Sybase IQ stores data in columns – like, for example, Vertica.
- Sybase IQ relies on indexes to retrieve data – unlike, for example, Vertica, in which the column pretty much is the index.
- However, columns themselves can be used as indexes in the usual Vertica-like way.
- Most of Sybase IQ’s indexes are bitmaps, or a lot like bitmaps, ala’ the original IQ product.
- Some of Sybase IQ’s indexes are not at all like bitmaps, but more like B-trees.
- In general, Sybase recommends that you put multiple indexes on each column because — what the heck – each one of them is pretty small. (In particular, the bitmap-like indexes are highly compressible.) Together, indexes tend to take up <10% of Sybase IQ storage space.
Categories: Columnar database management, Data warehousing, Database compression, Sybase, Theory and architecture | 3 Comments |
SAP believes in database proliferation
For as long as we’ve had the concept of database management, there’s been a debate as to whether it is realistic for large enterprises to have a single Grand Unified Enterprise Storehouse Of All Information, or whether database proliferation actually makes sense. This argument has been particularly intense in the area of data warehouse/data marts. I’m generally on the side of data mart proliferation.
4 1/2 years ago, I noted that SAP believed strongly in database proliferation: Read more
Categories: Data warehousing, SAP AG, Theory and architecture | 3 Comments |
Quick reactions to SAP acquiring Sybase
SAP is acquiring Sybase. On the conference call SAP said Sybase would be run as a separate division of SAP (no surprise). Most of the focus was on Sybase’s mobile technology, which is forecast at >$400 million in 2010 revenues (which would be 30%ish of the total). My quick reactions include: Read more
8 not very technical problems with analytic technology
In a couple of talks, including last Thursday’s, I’ve rattled off a list of eight serious problems with analytic technology, all of them human or organizational much more than purely technical. At best, these problems stand in the way of analytic success, and at least one is a lot worse than that.
The bulleted list in my notes is:
-
Individual-human
-
Expense of expertise
-
Limited numeracy
-
-
Organizational
-
Limited budgets
-
Legacy systems
-
General inertia
-
-
Political
-
Obsolete systems
-
Clueless lawmakers
-
Obsolete legal framework
-
I shall explain. Read more