Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
Revisiting disk vibration as a data warehouse performance problem
Last April, I wrote about the problems disk vibration can cause for data warehouse performance. Possible performance hits exceeded 10X, wild as that sounds.
Now Slashdot and ZDnet have weighed in, although for the most part they only are suggesting 50-100% performance hits. Read more
Categories: Data warehousing, Storage | 3 Comments |
Notes and cautions about new analytic technology
As previously noted, I headlined Aster’s Big Data Summit in Washington, DC last Thursday. More than others, that talk did reuse material I’d presented before. I promised the audience that when I got back I’d put up a blog post linking to supporting material for the talk.
Part of the time, I talked about things I’ve written about before. For example: Read more
Categories: Aster Data, Business intelligence, Data warehousing, Predictive modeling and advanced analytics, Presentations | 3 Comments |
Vertica update
Last month, Vertica’s CEO Ralph Breslauer quit,* and Vertica made it sound like there would be a new CEO late in April. And indeed, as of April 29, there was. He’s a guy I’ve never heard of before named Chris Lynch, apparently quite the sales machine builder. The most substance I’ve found is a pair of Mass High Tech articles — the latter exceedingly typo-ridden — to the general effect that:
- Vertica plans to build a massive, world-conquering sales force.
- If Vertica dips back into negative cash flow to do that and has to raise more venture capital, so be it.
- “Triple-digit” revenue growth is expected for this year.
I’ll be speaking in Washington, DC on May 6
My clients at Aster Data are putting on a sequence of conferences called “Big Data Summit(s)”, and wanted me to keynote one. I agreed to the one in Washington, DC, on May 6, on the condition that I would be allowed to start with the same liberty and privacy themes I started my New England Database Summit keynote with. Since I already knew Aster to be one of the multiple companies in this industry that is responsibly concerned about the liberty and privacy threats we’re all helping cause, I expected them to agree to that condition immediately, and indeed they did.
On a rough-draft basis, my talk concept is:
Implications of New Analytic Technology in four areas:
- Liberty & privacy
- Data acquisition & retention
- Data exploration
- Operationalized analytics
I haven’t done any work yet on the talk besides coming up with that snippet, and probably won’t until the week before I give it. Suggestions are welcome.
If anybody actually has a link to a clear discussion of legislative and regulatory data retention requirements, that would be cool. I know they’ve exploded, but I don’t have the details.
Categories: Analytic technologies, Archiving and information preservation, Aster Data, Data warehousing, Presentations, Surveillance and privacy | 1 Comment |
Greenplum et alia’s BigDataNews.com site
Greenplum recently started a website BigDataNews.com, and quickly signed up Aster Data as a co-sponsor. (Edit: As per a comment below, the decision to sign up additional sponsors was made by the site’s independent publisher.) It’s actually being run by Brett Sheppard, a former Gartner/DataQuest analyst who now gets involved in this kind of thing. (Brett and I may be working on another project soon, with Greenplum funding.)
The heart of the site is feeds* from a variety of high-profile blogs (DBMS2, Daniel Abadi’s, Joe Hellerstein’s, James Kobelius’, et al.), plus some additional posts written by Brett (primarily) or Greenplum folks. Highlights of Brett’s posts include:
- What I am told was an unauthorized revelation that Greenplum Chorus is built on CouchDB and Erlang.
- An impassioned defense of the integrity of Gartner’s analysis.
*At least in my case, that’s just a post title or snippet, plus a link back to the main post. The same goes for mapreduce.org, actually.
Categories: Analytic technologies, Data warehousing, Greenplum, NoSQL | 2 Comments |
Story of an analytic DBMS evaluation
One of our readers was kind enough to walk me through his analytic DBMS evaluation process. The story is:
- The X Company (XCo) has a <1 TB database.
- 100s of XCo’s customers log in at once to run reports. 50-200 concurrent queries is a good target number.
- XCo had been “suffering” with Oracle and wanted to upgrade.
- XCo didn’t have a lot of money to spend. Netezza pulled out of the sales cycle early due to budget (and this was recently enough that Netezza Skimmer could have been bid).
- Greenplum didn’t offer any references that approached the desired number of concurrent users.
- Ultimately the evaluation came down to Vertica and ParAccel.
- Vertica won.
Notes on the Vertica vs. ParAccel selection include: Read more
Categories: Analytic technologies, Benchmarks and POCs, Buying processes, Data warehousing, Greenplum, Netezza, Oracle, ParAccel, Vertica Systems | 7 Comments |
Greenplum Chorus and Greenplum 4.0
Greenplum is making two product announcements this morning. Greenplum 4.0 is a revision of the core Greenplum database technology. In addition, Greenplum is announcing Greenplum Chorus, which is the first product release instantiating last year’s EDC (Enterprise Data Cloud) vision statement and marketing campaign.
Greenplum 4.0 highlights and related observations include: Read more
Is the enterprise data warehouse a myth?
An enterprise data warehouse should:
- Manage data to high standards of accuracy, consistency, cleanliness, clarity, and security.
- Manage all the data in your organization.
Pick ONE. Read more
Categories: Data models and architecture, Data warehousing, Database diversity, Teradata, Theory and architecture | 8 Comments |
Examples of machine-generated data
Not long ago I pointed out that much future Big Data growth will be in the area of machine-generated data, examples of which include: Read more
Categories: Analytic technologies, Data warehousing, Games and virtual worlds, Investment research and trading, Log analysis, Oracle, Telecommunications, Web analytics | 28 Comments |
Notes on the evolution of OLTP database management systems
The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part). OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache’, solidDB’s exit, etc.) generally accruing to products that originated in the 20th Century.
Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place. These include:
- Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.
- By number, most of an enterprise’s OLTP/general-purpose databases are low-volume and low-value.
- Most interesting new OLTP/general-purpose data management products are either MySQL-based or NoSQL.
- It’s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.
- The era of silicon-centric relational DBMS is coming.
- The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.
- Users’ instance on “free” could be a major problem for OLTP DBMS innovation.
I shall explain. Read more