May 8th, 2008 Curt Monash
Another TDWI conference approaches. Not coincidentally, I had another Vertica briefing. Primary subjects included some embargoed stuff, plus (at my instigation) outsourced data marts. But I also had the opportunity to follow up on a couple of points from February’s briefing, namely:
Vertica has about 35 paying customers. That doesn’t sound like a lot more than they had a quarter ago, but first quarters can be slow.
Vertica’s list price is $150K/terabyte of user data. That sounds very high versus the competition. On the other hand, if you do the math versus what they told me a few months ago — average initial selling price $250K or less, multi-terabyte sites — it’s obvious that discounting is rampant, so I wouldn’t actually assume that Vertica is a high-priced alternative.
Vertica does stress several reasons for thinking their TCO is competitive. First, with all that compression and performance, they think their hardware costs are very modest. Second, with the self-tuning, they think their DBA costs are modest too. Finally, they charge only for deployed data; the software that stores copies of data for development and test is free.
Posted in Analytics and analytic technologies, Columnar architectures, Data warehousing, Database compression, Vertica Systems | 4 Comments »
May 8th, 2008 Curt Monash
In which we bring you another instantiation of Monash’s First Law of Commercial Semantics: Bad jargon drives out good.
When Enterprise DB announced a partnership with Truviso for a “blade,” I naturally assumed they were using the term in a more-or-less standard way, and hence believed that it was more than a “Barney” press release.* Silly me. Rather than referring to something closely akin to “datablade,” EnterpriseDB’s “blade” program turns out to just to be a catchall set of partnerships.
*A “Barney” announcement is one whose entire content boils down to “I love you; you love me.”
According to EnterpriseDB CTO Bob Zurek, the main features of the “blade” program include:
Read the rest of this entry »
Posted in Data types, EnterpriseDB and Postgres Plus, Open source RDBMS, Portability, transparency, and plug-compatibility, PostgreSQL, Relational database management systems, Specialized data management in general | 3 Comments »
April 29th, 2008 Curt Monash
Truviso and EnterpriseDB announced today that there’s a Truviso “blade” for Postgres Plus. By email, EnterpriseDB Bob Zurek endorsed my tentative summary of what this means technically, namely:
-
There’s data being managed transactionally by EnterpriseDB.
-
Truviso’s DML has all along included ways to talk to a persistent Postgres data store.
-
If, in addition, one wants to do stream processing things on the same data, that’s now possible, using Truviso’s usual DML.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Complex event/stream processing (CEP), Data types, EnterpriseDB and Postgres Plus, Games and virtual worlds, Memory-centric data management, Open source RDBMS, PostgreSQL, Specialized data management in general, Truviso | 1 Comment »
April 21st, 2008 Curt Monash
After months of leaks, Teradata has unveiled its new lines of data warehouse appliances, raising the total number either from 1 to 3 (my view) or 0 to 2 (what you believe if you think Teradata wasn’t previously an appliance vendor). Most significant is the new Teradata 2500 series, meant to compete directly with the smaller data warehouse specialists. Highlights include:
-
An oddly precise estimated capacity of “6.12 terabytes”/node (user data). This estimate is based on 30% compression, which is low by industry standards, and surely explains part of the price umbrella the Teradata 2500 is offering other vendors.
-
$125K/TB of user data. Obviously, list pricing and actual pricing aren’t the same thing, and many vendors don’t even bother to disclose official price lists. But the Teradata 2500 seems more expensive than most smaller-vendor alternatives.
-
Scalability up to 24 nodes (>140 TB).
-
Full Teradata application-facing functionality. Some of Teradata’s rivals are still working on getting all of their certifications with tier-1 and tier-2 business intelligence tools. Teradata has a rich application ecosystem.
-
What will be controversial performance, until customer-benchmark trends clearly emerge.
The Teradata 2500 is coming out of the chute with two customers – a new-customer retailer buying a single cabinet (i.e., 6.12 TB), and an existing customer for whom fewer details seem available. So far as I can tell, the sales force has had the product since late January, although the first leaks I got incorrectly suggested the system would only scale to a limited number of nodes.
Other products in the announcement included:
-
The Teradata 5550, a routine annual upgrade to the Teradata 5500.
-
The Teradata 550. This is a low-end, single-server SMP box introduced 9 or so months ago, originally meant for application development and testing. But some customers have been using it for deployment, and Teradata is now officially acknowledging that. It only scales to 2-3 TB of user data.
The Teradata 2500’s performance should be below the Teradata 5550’s for three reasons:
The same considerations apply to a comparison between the Teradata 2500 and the older Teradata 5000, but in that case they’re offset by a year of Moore’s Law benefit.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Data warehouse appliances, Data warehousing, Database compression, Relational database management systems, Teradata | 1 Comment »
April 18th, 2008 Curt Monash
I chatted with Raj Cherabuddi and others on the Kickfire (formerly C2) team for over an hour on Monday, and now have a better sense of their story. There are some very basic questions I still don’t have answers to; I’ll fill those in when I can.
Highlights of what I have and haven’t figured out so far include:
-
Kickfire’s technology has two main parts: A SQL co-processor chip and a MySQL storage engine.
-
Kickfire makes a Type 0 appliance. If I understood correctly, it contains the chip, a couple of standard CPU cores, and 64 gigs of RAM. Or else it contains just the chip, and is meant to be hooked up to a 2U box with 64 gigs of RAM. I’m confused.
-
The Kickfire box can handle up to 3 terabytes of user data. The disk required for that is 4-5 terabytes without redundancy, 2X with. Based on that formulation and other clues, I’m guessing Kickfire — unlike other appliance vendors — doesn’t build in storage itself.
-
I don’t know whether the Kickfire chip is true custom silicon or an FPGA emulation.
-
The essential idea of the chip is dataflow programming for SQL, with pipelining between operations. This eliminates the overhead of registers and context switching. I don’t know what the trade-offs are, if any.
-
Kickfire’s database software is columnar, operating on compressed data even in RAM. In that, Kickfire’s story is most similar to Vertica’s, although I’m guessing Exasol may do something similar as well. Like Vertica, Kickfire uses multiple compression methods (they’re reluctant to give detail, but agreed it would be fair to say they use both something like dictionary/token and something like delta compression).
-
Kickfire’s software is ACID-compliant. You can do incremental loads or trickle feeds. Bulk load speed is 100 Gb/hour. Kickfire’s solution for the traditional problem of updating column stores is called “snapshots.” Without giving details, they position that as similar to the Vertica solution.
-
Like other MySQL storage engines, Kickfire inherits whatever data connectivity, stored procedure capabilities, user-defined functions ability, etc. that MySQL has.
-
Kickfire has no paying customers, but does have a slide showing many logos of “prospects and beta customers.”
-
Kickfire has no MPP capabilities at this time, but says adding those is “on the roadmap” and will be “easy.”
-
Kickfire submitted a 100 Gb TPC-H result, in which it beat the previous leaders — Exasol, ParAccel, and Microsoft – on price-performance, and lagged only Exasol and ParAccel on absolute performance. Kickfire is extremely proud of this. Indeed, I don’t recall another vendor ascribing that much weight to them in the entire history of TPCs.* Kickfire seems unfazed by the fact that its result is for a system listed with a ship date 6 months in the future (I’m guessing that’s the latest the TPC will allow), while the other results are for systems available today.
*Somebody – perhaps adman extraordinaire Rick Bennett? — may want to check my memory on this, but I think Oracle’s famed “Gentlemen, start your snails” ad in the early 1990s was about PC World tests, not TPCs. Oracle also had an ad about WW1-style planes nosediving, but I don’t think those referenced TPCs either.
Posted in Analytics and analytic technologies, Columnar architectures, Data warehouse appliances, Data warehousing, Database compression, Database theory and practice, Kickfire, Open source RDBMS, Relational database management systems | 3 Comments »
April 13th, 2008 Curt Monash
I just put up a long post about a small development-stage company, ScaleDB. The punchline is that ScaleDB has a data access method — an extension of Patricia tries — that gives referential integrity and updatable views for free.
People who think current “relational” DBMS aren’t relational enough often suggest that’s the kind of foundation DBMS should have. And unlike Required Technologies’ TransRelational (TM) shtick, ScaleDB’s really is an OLTP-oriented approach.
Please subscribe to our feed!
Posted in Database theory and practice, MySQL, Relational database management systems, TransRelational | No Comments »
April 10th, 2008 Curt Monash
On a recent webcast, I presented an 11-node data management software taxonomy, updating a post commenting on Mike Stonebraker’s. It goes:
1. High-end OLTP/general-purpose DBMS
2. Mid-range OLTP/general-purpose DBMS
3. Row-based analytic RDBMS
4. Column- or array-based analytic RDBMS
5. Text search engines
6. XML and OO DBMS (but these may merge with search)
7. RDF and other graphical DBMS (but these may merge with relational)
8. Event/stream processing engines (aka CEP)
9. Embedded DBMS for devices
10. Sub-DBMS file managers (e.g. MapReduce/Hadoop)
11. Science DBMS
Obviously, this is a work in progress. In particular, while there’s clearly more than one kind of analytic DBMS, partitioning them into categories is not easy.
Please subscribe to our feed!
Posted in Database diversity | 3 Comments »
April 2nd, 2008 Curt Monash
Once or twice a year, EnterpriseDB sponsors a webcast for me. The last two were super well-attended. And most people stayed to the end, which is generally an encouraging sign!
The emphasis this time is on alternatives to the market-leading DBMS. I’ll highlight the advantages of both data warehousing specialists and general-purpose mid-range DBMS (naturally focusing on the latter, given who the sponsor is). The provocative title is taken from a January, 2008 post — What leading DBMS vendors don’t want you to realize. If you read every word of this blog, there probably won’t be much new for you.
But I’d love to have you listen in and perhaps ask a question anyway!
You can register on EnterpriseDB’s webcast page, which also has an archived webcast I did for them in October, 2007.
Posted in Database diversity, EnterpriseDB and Postgres Plus, Mid-range DBMS | No Comments »
March 28th, 2008 Curt Monash
Simon Sabin makes an interesting point: If you can have 30,000 columns in a table without sparsity management blowing up, you can handle entities with lots of different kinds of attributes. (And in SQL Server you can now do just that.) The example he uses is products — different products can have different sets of possible colors, different kinds of sizes, and so on. An example I’ve used in the past is marketing information — different prospects can reveal different kinds of information, which may have been gathered via non-comparable marketing programs.
I’ve suggested this kind of variability as a reason to actually go XML — you’re constantly adding not just new information, but new kinds of information, so your fixed schema is never up to date. But I haven’t detected many actual application designers who agree with me …
Please subscribe to our feed!
Posted in Database theory and practice, MySQL, Native XML | 2 Comments »
March 4th, 2008 Curt Monash
Intelligent Enterprise has an article on Sybase IQ and columnar systems that leaves me shaking my head. E.g., it ends by saying Netezza has a columnar architecture (uh, no). It also quotes an IBM exec as saying only 10-20% of what matters in a data warehouse DBMS is performance (already an odd claim), and then has him saying columnar only provides a 10% performance gain (let’s be generous and hope that’s a misquote).
Also from the article — and this part seems more credible — is:
“Sybase IQ revenues were up 70% last year,” said Richard Pledereder, VP of engineering. … Sybase now claims 1,200 Sybase IQ customers. It runs large data warehouses powered by big, multiprocessor servers. Priced at $45,000 per CPU, those IQ customers now account for a significant share of Sybase’s revenues, although the company won’t break down revenues by market segment.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Columnar architectures, Data warehousing, Relational database management systems, Specific users, Sybase | 1 Comment »
February 20th, 2008 Curt Monash
Billy Newport of IBM sees a lot of similarities between his app-server-based product ObjectGrid and H-Store. In both cases, constrained tree schemas are assumed, and OLTP performance goodness ensues. A couple of points I noted on a quick skim through his blog:
- He calls out RAM consumption as a challenge for this kind of architecture.
- He points out that it’s a big advantage to have data called and used in the same address space.
Being based in RAM is obviously a huge part of the H-Store scheme. But so is having transaction execution be close to the database.
IBM now has both ObjectGrid and a memory-centric DBMS (solidDB) that they’ve been using as a front end for DBMS. Integration of the two could be pretty interesting.
Please sign up for our feed!
Posted in Cache, Database theory and practice, H-Store, IBM and DB2, Memory-centric data management, OLTP database management, Relational database management systems, solidDB | No Comments »
February 19th, 2008 Curt Monash
I wrote yesterday about the H-Store project, the latest from the team of researchers who also brought us C-Store and its commercialization Vertica. H-Store is designed to drastically improve efficiency in OLTP database processing, in two ways. First, it puts everything in RAM. Second, it tries to gain an additional order of magnitude on in-memory performance versus today’s DBMS designs by, for example, taking a very different approach to ensuring ACID compliance.
Today I had the chance to talk with two more of the H-Store researchers, Sam Madden and Daniel Abadi.
Read the rest of this entry »
Posted in Database diversity, H-Store, Memory-centric data management, OLTP database management | 1 Comment »
February 18th, 2008 Curt Monash
I recently caught up with ParAccel’s CTO Barry Zane and Marketing VP Kim Stanick for a long technical discussion, which they have graciously continued by email. It would be impolitic in the extreme to comment on what led up to that. Let’s just note that many things I’ve previously written about ParAccel are now inoperative, and go straight to the highlights.
Read the rest of this entry »
Posted in Columnar architectures, Data warehousing, Microsoft and SQL*Server, ParAccel, Portability, transparency, and plug-compatibility | 4 Comments »
February 18th, 2008 Curt Monash
Last week, Dan Weinreb tipped me off to something very cool: Mike Stonebraker and a group of MIT/Brown/Yale colleagues are calling for a complete rewrite of OLTP DBMS. And they have a plan for how to do it, called H-Store, as per a paper and an associated slide presentation.
Read the rest of this entry »
Posted in Database diversity, Database theory and practice, H-Store, Memory-centric data management, Michael Stonebraker, OLTP database management | 28 Comments »
February 16th, 2008 Curt Monash
In a response to my recent five-part series on DBMS diversity, Mike Stonebraker has proposed his own taxonomy of data management technologies over on Vertica’s Database Column blog.
- OLTP DBMSs focused on fast, reliable transaction processing
- Analytic/Data Warehouse DBMSs focused on efficient load and ad-hoc query performance
-
Science DBMSs — after all MatLab does not scale to disk-sized arrays
- RDF stores focused on efficiently storing semi-structured data in this format
-
XML stores focused on semi-structured data in this format
- Search engines — the big players all use proprietary engines in this area
- Stream Processing Engines focused on real-time StreamSQL
- “Lean and Mean,” less-than-a-database engines focused on doing a small number of things very well (embedded databases are probably in this category)
- MapReduce and Hadoop — after all Google has enough “throw weight” to define a category
He goes on to say that each will be architected differently, except that — as he already convinced me back in July — RDF will be well-managed by specialty data warehouse DBMS. Read the rest of this entry »
Posted in Data types, Database diversity, Database theory and practice, Michael Stonebraker, Mid-range DBMS, OLTP database management, RDF and graphs, Relational database management systems | No Comments »
February 15th, 2008 Curt Monash
This is the fifth of a five-part series on database management system choices. For the first post in the series, please click here.
Relational database management systems have three essential elements:
- Rows and columns. Theoretically, rows and columns may be inessential to the relational model. But in reality, they are built into the design of every real-world relational product. If you don’t have rows and columns, you’re not using the product to do what it was well-designed for.
- Predicate logic. Theoretically, everything can be fitted into a predicate Procrustean bed. But if you’re looking for relevancy rankings on a text search, binary logic is a highly convoluted way to get them.
- Fixed schemas. Database theorists commonly assume that databases have fixed schemas. If this means that 90%+ of all information is null or missing, they have elegant ways of dealing with that. Even so, as computing gets ever more concerned with individuals — each with his/her/its unique “profile(s)” — fixed schemas get ever harder to maintain.
If any of these three elements is missing or inappropriate, then a traditional relational database management system may not be the best choice.
Read the rest of this entry »
Posted in Data types, Database diversity, Database theory and practice | 1 Comment »
February 15th, 2008 Curt Monash
This is the fourth of a five-part series on database management system choices. For the first post in the series, please click here.
The other threat to the high-end relational DBMS vendors aims squarely at the heart of their business. It’s the mid-range relational database management systems, which are doing an ever-larger fraction of what their high-end cousins can. That said, different products do different things well. So if you’re not blindly paying up for the security of an all-things-to-all-people high-end DBMS, there are a number of factors you might want to consider.
Read the rest of this entry »
Posted in Database diversity, Database theory and practice, Mid-range DBMS, OLTP database management, Relational database management systems | 2 Comments »
February 15th, 2008 Curt Monash
This is the third of a five-part series on database management system choices. For the first post in the series, please click here.
High-end OLTP relational database management system vendors try to offer one-stop shopping for almost all data management needs. But as I noted in my prior post, their product category is facing two major competitive threats. One comes from specialty data warehouse database management system products. I’ve covered those extensively in this blog, with key takeaways including:
- Specialty data warehouse products offer huge cost advantages versus less targeted DBMS. This applies to purchase/maintenance and administrative costs alike. And it’s true even when the general-purposed DBMS boast data warehousing features such as star indexes, bitmap indexes, or sophisticated optimizers.
- The larger the database, the bigger the difference. It’s almost inconceivable to use Oracle for a 100+ terabyte data warehouse. But if you only have 5 terabytes, Oracle is a perfectly viable – albeit annoying and costly – alternative.
- Most specialty data warehouse products have a shared-nothing architecture. Smaller parts are cheaper per unit of capacity. Hence shared nothing/grid architectures are inherently cheaper, at least in theory. In data warehousing, that theoretical possibility has long been made practical.
- Specialty data warehouse products with row-based architectures are commonly sold in appliance formats. In particular, this is true of Teradata, Netezza, DATAllegro, and Greenplum. One reason is that they’re optimized to stream data off of disk fairly sequentially, as opposed to relying on random seeks.
- Specialty data warehouse products with columnar architectures are commonly available in software-only formats. Even so, Vertica and ParAccel also boast appliance deals, with HP and Sun respectively.
- There is tremendous technical diversity and differentiation in the specialty data warehouse system market.
Let me expand on that last point. Different features may or may not be important to you, depending on whether your precise application needs include:
Read the rest of this entry »
Posted in Analytics and analytic technologies, Data warehouse appliances, Data warehousing, Database diversity, Database theory and practice, Relational database management systems | 17 Comments »
February 15th, 2008 Curt Monash
This is the second of a five-part series on database management system choices. For the first post in the series, please click here.
For the most part, relational database management systems divide into four major classes:
- High-end OLTP (OnLine Transaction Processing) relational DBMS. Oracle is the flagship for this category, followed by DB2.
- Specialty data warehouse DBMS. Teradata is the leader here, followed by Netezza, DATAllegro, ParAccel, Vertica, Infobright, Greenplum, Kognitio, Sybase IQ, and a host of others.
- Mid-range relational database management systems. Most of the contenders here fall into one or more of three categories: Open-source-based relational DBMS (MySQL, PostgreSQL, EnterpriseDB); reseller-focused relational DBMS (Progress OpenEdge, Pervasive PSQL); or crippled “editions” of high-end systems. Microsoft SQL Server was once a clear mid-range system, but now is better classified as high-end OLTP.
- Embedded relational database management systems. The leader of this category is Sybase’s SQL Anywhere. Also significant are memory-centric products Oracle TimesTen and solidDB.
Read the rest of this entry »
Posted in Database diversity, Database theory and practice, OLTP database management, Relational database management systems | 8 Comments »
February 15th, 2008 Curt Monash
This is the first in a 5-part series of posts on data management product choices. By pre-arrangement, Mike Stonebraker is responding on The Database Column, starting with his own taxonomy of DBMS types.
In the 1990s, most database management experts believed that a single general-purpose DBMS could meet substantially all needs. If you just kept adding in enough datatypes and data access methods (e.g., specialized indexes), your DBMS could eventually do a good job of meeting almost any requirement. And so, from the late 1990s into the beginning of this decade, it seemed that technology was supporting business trends, and the DBMS industry was inexorably consolidating. There was an oligopoly of high-end vendors, who sold increasingly similar super-sophisticated database management systems. Nothing else in database management seemed to matter.
Well, we were wrong. The big thing we overlooked is that database optimizations go down to the level of actual storage.
Read the rest of this entry »
Posted in Database diversity, Database theory and practice | 9 Comments »
February 8th, 2008 Curt Monash
Please do not rely on the parts of the post below that are about ParAccel. See our February 18 post about ParAccel instead.
I’ve already posted about a chat I had with Mike Stonebraker regarding Vertica yesterday. I naturally raised the subject of load speed, unaware that Mike’s colleague Stan Zlodnik had posted at length about load speed the day before. Given that post, it seems timely to go into a bit more detail, and in particular to address three questions:
- Can columnar DBMS do operational BI?
- Can columnar DBMS do ELT (Extract-Load-Transform, as opposed to ETL)?
- Are columnar DBMS’ load speeds a problem other than in issues #1 and #2?
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Columnar architectures, Data warehousing, Database theory and practice, EII, ETL, and/or EAI, Michael Stonebraker, ParAccel, Sybase, Vertica Systems | No Comments »
February 1st, 2008 Curt Monash
I’ve run into a research/alpha/whatever project called CouchDB a couple of times now. It’s yet another “Who needs relational databases? Who needs schemas?” kind of idea. Rather, CouchDB is for taking random documents and banging them into databases, then calculating views on the fly as needed. It’s REST-friendly. Lucene and a web server are built in.
Damien Katz seems to be the driving force behind CouchDB, and his discussion of document-oriented development seems to be a good starting point. Read the rest of this entry »
Posted in CouchDB, Database diversity, Database theory and practice, Native XML | 3 Comments »
January 24th, 2008 Curt Monash
Mark Chu-Carroll has weighed in with a passionate defense of MapReduce. I only see one thing he got wrong, which was to overlook the great shared-nothing parallelism of today’s data warehouse appliances and specialty data warehouse DBMS. But that doesn’t detract from his overall point, which is that MapReduce is designed to help with parallel computing in general, not database querying in particular.
He also has the best version I know of an old observation, namely:
… [relational database] people have found the most beautiful, wonderful, perfect hammer in the whole world. It’s perfectly balanced - not too heavy, not too light, and swings just right to pound in a nail just right every time. The grip is custom-made, fitted to the shape of the owners hand, so that they can use it all day without getting any blisters. It’s also beautifully decorated - encrusted with gemstones and gold filigree - but only in places that won’t detract from how well it works as a hammer. It really is the greatest hammer ever. Relational database guys love their hammer. It’s just such a wonderful tool! And when they make something with it, it really comes out great. In fact, they like it so much that they think it’s the only tool they need. If you give them a screw, they’ll just pound it in like it’s a nail. And when you point out to them that dammit, it’s a screw, not a nail, they’ll say “I know that. But you can’t expect me to use a crappy little screwdriver when I have a magnificent hammer like this!”
Please sign up for our feed!
Posted in Database diversity, Google, BigTable, and MapReduce | No Comments »
January 18th, 2008 Curt Monash
Anybody who cites — with approval — both Fabian Pascal and Joe Celko can’t be all bad. “Why Programmers Don’t Like Relational Databases” is a bit of polemic, but on the whole it’s a good reminder of why relational-bashing often is overdone.
Personally, I think the applications for which traditional schema-heavy relational/SQL programming make sense are less interesting that those for which it doesn’t — but the world is indeed chock full of less interesting tasks.
Please sign up for our feed!
Posted in Database theory and practice, Relational database management systems | No Comments »