Theory and architecture
Analysis of design choices in databases and database management systems. Related subjects include:
- Any subcategory
- Database diversity
- Explicit support for specific data types
- (in Text Technologies) Text search
Database management system architecture implications of an eventual move to solid-state memory
I’ve pointed out in the past that solid-state/Flash memory could be a good alternative to hard disks in PCs and enterprise systems alike. Well, when that happy day arrives, what will be some of the implications for database management software architecture?
- Compression will be even more important. Cost per terabyte of storage will spike up for that storage that is moved from disk to solid-state.
- The sequential-rather-than-random reading strategy of data warehouse appliance makers may become less relevant. The one way to get rid of the disk-speed bottleneck is to get rid of disks.
- DBMS will need to write data as rarely as possible. Solid-state memory tends to wear out if you keep writing over it. Assuming this problem gets better over time (if it doesn’t, this whole discussion is moot) but isn’t totally solved, architectures which have fewer writes are on the whole better.
Categories: Data warehouse appliances, Data warehousing, Database compression, Netezza, Solid-state memory, Theory and architecture | Leave a Comment |
The Netezza Developer Network
Netezza has officially announced the Netezza Developer Network. Associated with that is a set of technical capabilities, which basically boil down to programming user-defined functions or other capabilities straight onto the Netezza nodes (aka SPUs). And this is specifically onto the FPGAs, not the PowerPC processors. In C. Technically, I think what this boils down to is: Read more
Pervasive Summit PSQL v10
Pervasive Software has a long history – 25 years, in fact, as they’re emphasizing in some current marketing. Ownership and company name have changed a few times, as the company went from being an independent startup to being owned by Novell to being independent again. The original product, and still the cash cow, was a linked-list DBMS called Btrieve, eventually renamed Pervasive PSQL as it gained more and more relational functionality.
Pervasive Summit PSQL v10 has just been rolled out, and I wrote a nice little white paper to commemorate the event, describing some of the main advances over v9, primarily for the benefit of current Pervasive PSQL developers. In one major advance, Pervasive made the SQL functionality much stronger. In particular, you now can have a regular SQL data dictionary, so that the database can be used for other purposes – BI, additional apps, whatever. Apparently, that wasn’t possible before, although it had been possible in yet earlier releases. Pervasive also added view-based security permissions, which is obviously a Very Good Thing.
There also are some big performance boosts. Read more
Some pushback from DATAllegro against the columnar argument
I was chatting with Stuart Frost this evening (DATAllegro’s CEO). As usual, I grilled him about customer counts; as usual, he was evasive, but expressed general ebullience about the pace of business; also as usual, he was charming and helpful on other subjects.
In particular, we talked about the Vertica story, and he offered some interesting pushback. Part was blindingly obvious — Vertica’s not in the marketplace yet, when they are the product won’t be mature, and so on. Part was the also obvious “we can do most of that ourselves” line of argument, some of which I’ve summarized in a comment here. But he made two other interesting points as well. Read more
Categories: Columnar database management, Data warehouse appliances, Data warehousing, DATAllegro, Theory and architecture, Vertica Systems | 1 Comment |
The core of the Vertica story still seems to be compression
Back in March, I suggested that compression was a central and compelling aspect of Vertica’s story. Well, in their new blog, the Vertica guys now strongly reinforce that impression.
I recommend those two Database Column posts (by Sam Madden) highly. I’ve rarely seen such a clear, detailed presentation of a company’s technical argument. My own thoughts on the subject boil down to:
- In principle, all the technology (and hence all the technological advantages) they’re talking about could be turned into features of one of the indexing options of a row-oriented RDBMS. But in practice, there’s no indication that this will happen any time soon.
- Release 1 of the Vertica product will surely have many rough edges.
- Some startups are surprisingly ignorant of the issue involved in building a successful, industrial-strength DBMS. But a company that has both Mike Stonebraker and Jerry Held seriously involved has a big advantage. They may make other kinds of errors, but they won’t make many ignorant ones.
Categories: Columnar database management, Data warehousing, Database compression, Michael Stonebraker, Theory and architecture, Vertica Systems | 5 Comments |
Three bold assertions by Mike Stonebraker
In the first “meat” — i.e., other than housekeeping — post on the new Database Column blog, Mike Stonebraker makes three core claims:
1. Different DBMS should be used for different purposes. I am in violent agreement with that point, which is indeed a major theme of this blog.
2. Vertica’s software is 50X faster than anything non-columnar and 10X faster than anything columnar. Now, some of these stats surely come from the syndrome of comparing the future release of your product, as tuned by world’s greatest experts on it who also hope to get rich on their stock options in your company, vs. some well-established production release of your competitors’ products, tuned to an unknown level of excellence,* with the whole thing running test queries that you, in your impartial wisdom, deem representative of user needs. Or something like that … Read more
Categories: Benchmarks and POCs, Columnar database management, Data warehousing, Database diversity, Michael Stonebraker, OLTP, Theory and architecture, TransRelational | 3 Comments |
The Vertica guys have their own blog now
I’ve written a considerable amount about Vertica and/or the opinions of Mike Stonebraker. Now the Vertica guys have their own blog, which they pledge will not just be a rehash of Vertica marketing pitches — notwithstanding the Vertica-related wordplay in the blog’s name.*
*Those guys are good at wordplay.
Categories: Columnar database management, Humor, Vertica Systems | 1 Comment |
Big stuff coming from DATAllegro
In the literal sense, that is. While the details on what I wrote about this a few weeks ago* are still embargoed, I’m at liberty to drop a few more hints.
*Please also see DATAllegro CEO Stuart Frost’s two comments added today to that thread.
DATAllegro systems these days basically consist of Dell servers talking to EMC disk arrays, with Cisco Infiniband to provide fast inter-server communication without significant CPU load. Well, if you decrease the number of Dell servers per EMC box, and increase the number of disks per EMC box, you can slash your per-terabyte price (possibly at the cost of lowering performance).
Read more
Categories: Data warehouse appliances, Data warehousing, Database compression, DATAllegro | Leave a Comment |
DATAllegro heads for the high end
DATAllegro Stuart Frost called in for a prebriefing/feedback/consulting session. (I love advising my DBMS vendor clients on how to beat each other’s brains in. This was even more fun in the 1990s, when combat was generally more aggressive. Those were also the days when somebody would change jobs to an arch-rival and immediately explain how everything they’d told me before was utterly false …)
While I had Stuart on the phone, I did manage to extract some stuff I’m at liberty to use immediately. Here are the highlights: Read more
Categories: Data warehouse appliances, Data warehousing, Database compression, DATAllegro, Greenplum, Netezza, Teradata | 4 Comments |
Fast RDF in specialty relational databases
When Mike Stonebraker and I discussed RDF yesterday, he quickly turned to suggesting fast ways of implementing it over an RDBMS. Then, quite characteristically, he sent over a paper that allegedly covered them, but actually was about closely related schemes instead. 🙂 Edit: The paper has a new, stable URL. Hat tip to Daniel Abadi.
All minor confusion aside, here’s the story. At its core, an RDF database is one huge three-column table storing subject-property-object triples. In the naive implementation, you then have to join this table to itself repeatedly. Materialized views are a good start, but they only take you so far. Read more