ANALYTIC is the antonym of TRANSACTIONAL
In 1993, Ted Codd introduced the term OLAP (OnLine Analytic Processing) to describe data management that wasn’t optimized for OLTP (OnLine Transaction Processing). Later in the 1990s, Henry Morris of IDC introduced the term analytic applications to describe apps that weren’t transactional. Since then, no better word than “analytic” has emerged to cover the broad class of IT apps and technologies that aren’t focused on transactional processing.
In the latest incarnation, analytic appliances are coming to the fore. Read more
| Categories: Analytic technologies, Data warehouse appliances, Netezza, Vertica Systems | Leave a Comment |
Netezza is finally opening the kimono
I’ve bashed Netezza repeatedly for secrecy and obscurity about its technology and technical plans. Well, they’re getting a lot better. The latest post in a Netezza company blog, by marketing exec Phil Francisco, lays out their story clearly and concisely. And it’s backed up by a white paper that does more of the same. In particular, Page 11 of that white paper spells out possible future directions for enhancement, such as better compression, encryption, join filtering, and Netezza Developer Network stuff. Read more
The Netezza strategy for data shipping
I talked with Netezza today, and finally understand better why they don’t have node-to-node data shipping problems with only 1-gigabit (gigE) interconnects:
- Netezza boxes have lots of relatively small nodes, so all else being equal, each individual node has less communicating to do than, say, a DATAllegro node does.
- It’s not just just 1-gigabit. There’s a hierarchical communications architecture, and at one level in the hierarchy switches are talking to each other through 32 parallel 1-gigabit channels at a time.
| Categories: Data warehouse appliances, Netezza | Leave a Comment |
Just what does Oracle-compatibility mean?
Quite a bit of DBMS plug-compatibility is being claimed these days. Lewis Cunningham’s post on a few new EnterpriseDB features illustrates just how picky compatibility features can get. One can run Oracle code but not get around to handling comments properly? Sheesh.
| Categories: Emulation, transparency, portability, EnterpriseDB and Postgres Plus, Oracle | Leave a Comment |
A nice EnterpriseDB replacement of MySQL
I’m going to praise EnterpriseDB’s marketing communications twice in two blog posts, because I really liked some of the crunch they put into a press release announcing a MySQL replacement at FortiusOne. To wit (emphasis mine):
The PostGIS geospatial extensions to PostgreSQL played a key role in FortiusOne’s selection of EnterpriseDB Advanced Server, a PostgreSQL-based solution, and dramatically improved performance. FortiusOne needed to run complex spatial queries against large datasets quickly and efficiently, and found the MySQL spatial extensions to be far less complete and comprehensive than PostGIS. EnterpriseDB Advanced Server processes some of GeoCommons’ database-intensive rendering requests in one-thirtieth of the time required by MySQL. During peak loads, GeoCommons processes more than one hundred thousand complex requests per hour, requiring true enterprise-class performance and scalability.
Another major factor in FortiusOne’s replacement of MySQL with EnterpriseDB Advanced Server was the company’s need for advanced partitioning, custom triggers, and functional indexing. EnterpriseDB’s advanced partitioning capabilities instantly enabled linear performance, even with tables having billions of rows.
| Categories: Data types, EnterpriseDB and Postgres Plus, GIS and geospatial, MySQL | 10 Comments |
EnterpriseDB grows rapidly and fires its field sales force
Ashlee Vance discovered that EnterpriseDB had shot its field sales force, and opined that EnterpriseDB might generally be in trouble. EnterpriseDB CEO Andy Astor and marketing exec Derek Rodner responded quickly in their respective blogs. Andy and I also talked on the phone.
As best as I can tell, here’s what’s actually going on: Read more
| Categories: EnterpriseDB and Postgres Plus, Open source | 2 Comments |
Data warehouse appliances – fact and fiction
Borrowing the “Fact or fiction?” meme from the sports world:
- Data warehouse appliances have to have specialized hardware. Fiction. Indeed, most contenders except Teradata and Netezza — for example, DATAllegro, Vertica, ParAccel, Greenplum, and Infobright — offer Type 2 appliances. (Dataupia is another exception.)
- Specialized hardware is a dead-end for data warehouse appliances. Fiction. If it were easy for Teradata to replace its specialized switch technology, it would have done so a decade ago. And Netezza’s strategy has a lot of appeal.
- Data warehouse appliances are nothing new, and failed long ago. Fiction, but only because of Teradata. 1980s appliance pioneer Britton-Lee didn’t do so well (it was actually bought by Teradata). IBM and ICL (Britain’s national-champion hardware company) had content-addressable data store technology that went nowhere.
- Since data warehouse appliances failed long ago, they’ll fail now too. Fiction. Shared-nothing MPP is a fundamental advantage of appliances. So are various index-light strategies. Data warehouse appliances are here to stay.
- Data warehouse appliances only make sense if your main database management system can’t handle the job. Fiction. There are dozens of data warehouse appliances managing under 5 terabytes of user data, if not under 1 terabyte. True, some of them are legacy installations, dating back to when Oracle couldn’t handle that much data well itself. But new ones are still going in. Even if Oracle or Microsoft SQL Server can do the job, a data warehouse appliance is often a far superior — cheaper, easier to deploy and keep running, and/or better performing — alternative.
- Data warehouse appliances are just for data marts. For your full enterprise data warehouse, use a conventional DBMS. Part fact, part fiction. It depends on the appliance, and on the complexity of your needs. Teradata systems can do pretty much everything. Netezza and DATAllegro, two of the oldest data warehouse appliance startups, have worked hard on their concurrency issues and now can support fairly large user or reporting loads. They also can handle reasonable volumes of transactional or trickle-feed updates, and probably can support full EDW requirements for decent-sized organizations. Even so, there are some warehouse use cases for which they’re ill-suited. Newer appliance vendors are more limited yet.
- Analytic appliances are just renamed data warehouse appliances. Fact, even if misleading. Netezza is using the term “analytic appliance” to highlight additional things one can do on its boxes beyond answering queries. But those are still operations on a data mart or data warehouse.
- Teradata is the leading data warehouse appliance vendor. More fact than fiction. Some observers say that Teradata systems aren’t data warehouse appliances. But I think they are. Competitors may be superior to Teradata in one or the other characteristic trait of appliances – e.g., speed of installation – but it’s hard to define “appliances” in an objective way that excludes Teradata.
If you liked this post, you might also like one on text mining fact and fiction.
Amazon Dynamo — when primary key access is enough
Amazon has a very decentralized technical operation. But even the individual pieces have interestingly huge scale. Thus, various different things they’re doing are of interest.
They recently presented a research paper on a high-performance transactional system called Dynamo. (Hat tip to Dare Obasanjo.) A key point is the following:
There are many services on Amazon’s platform that only need primary-key access to a data store. For many services, such as those that provide best seller lists, shopping carts, customer preferences, session management, sales rank, and product catalog, the common pattern of using a relational database would lead to inefficiencies and limit scale and availability. Dynamo provides a simple primary-key only interface to meet the requirements of these applications.
Now, I don’t think too many organizations past Amazon are going to decide that they can’t afford the overhead of an RDBMS for such OLTP-like applications. But I do think it will become increasingly common to find other reasons to eschew traditional OLTP relational architectures. Maybe you’ll want the schema flexibility of XML. Or perhaps you’ll be happy with a fixed relational schema, but will want to optimize for analytic performance.
