Database SaaS gains a little visibility
Way back in the 1970s, a huge fraction of analytic database management was done via timesharing, specifically in connection with the RAMIS and FOCUS business-intelligence-precursor fourth-generation languages. (Both were written by Gerry Cohen, who built his company Information Builders around the latter one.) The market for remoting-computing business intelligence has never wholly gone away since. Indeed, it’s being revived now, via everything from the analytics part of Salesforce.com to the service category I call data mart outsourcing.
Less successful to date are efforts in the area of pure database software-as-a-service. It seems that if somebody is going for SaaS anyway, they usually want a more complete, integrated offering. The most noteworthy exceptions I can think of to this general rule are Kognitio and Vertica, and they only have a handful of database SaaS customers each. To wit: Read more
Gartner’s 2008 data warehouse database management system Magic Quadrant is out
February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.
Gartner’s annual Magic Quadrant for data warehouse DBMS is out. Thankfully, vendors don’t seem to be taking it as seriously as usual, so I didn’t immediately hear about it. (I finally noticed it in a Greenplum pay-per-click ad.) Links to Gartner MQs tend to come and go, but as of now here are two working links to the 2008 Gartner Data Warehouse Database Management System MQ. My posts on the 2007 and 2006 MQs have also been updated with working links. Read more
Pervasive DataRush
I’ve made a few references to Pervasive DataRush in the past — like this one — but I’ve never gotten around to seriously writing it up. I’ll now try to make partial amends. The key points about Pervasive Datarush are:
- DataRush grew out of Pervasive Software’s ETL business, as the underpinnings for a new data transformation tool they were building.
- DataRush is a Java framework for doing parallel programming automagically.
- Unlike most modern parallelization technologies, DataRush is focused on single SMP (Symmetric MultiProcessing) boxes rather than loosely-coupled grids.
- DataRush is based on dataflow programming.
- Pervasive says that DataRush is really fast.
More details may be found at the rather rich Pervasive DataRush website, or in the following excerpt from an email by Pervasive’s Steve Hochschild: Read more
Categories: Parallelization, Pervasive Software | 1 Comment |
Expressor pre-announces a data loading benchmark leapfrog
Expressor Software plans to blow the Vertica/Syncsort “benchmark” out of the water, to wit
What I know already is that our numbers will between 7 and 8 min to load one TB of data and will set another world record for the tpc-h benchmark.
The whole blog post has a delightful air of skepticism, e.g.:
Sometimes the mention of a join and lookup are documented but why? If the files are load ready what is there to join or lookup?
… If the files are load ready and the bulk load interface is used, what exactly is done with the DI product?
My guess… nothing.
… But what I can’t figure out is what is so complex about this test in the first place?
Categories: Benchmarks and POCs, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Expressor | Leave a Comment |
More from Vertica on data warehouse load speeds
Last month, when Vertica releases its “benchmark” of data warehouse load speeds, I didn’t realize it had previously released some actual customer-experience load rates as well. In a July, 2008 white paper that seems thankfully free of any registration requirements, Vertica cited four examples:
- (Comcast) Trickle loads 48MB/minute – SNMP data generated by devices in the Comcast cable network is trickle loaded on a 24×7 basis at rates as high as 135,000 rows/second. The system runs on 5 HP ProLiant DL 380 servers.
- (Verizon) Bulk loads to memory 300MB/minute – 50MB to 300MB of call detail records (1K record size—150 columns per row) are loaded every 10 minutes. Vertica runs on 6 HP ProLiant DL380 servers.
- (Level 3 Communications) Bulk loads to disk 5GB/minute – The loading and enrichment (i.e., summary table creation) of 1.5TB of call detail records formerly took 5 days in a row-oriented data warehouse database. Vertica required 5 hours to load the same data.
- (“Global investment firm”) Trickle loads 2.6GB/minute – Historic financial trade and quote (TaQ) data was bulk loaded into the database at a rate of 125GB/hour. New TaQ data was trickled into the database at rates as high as 90,000 rows per second (480b per row).
Categories: Vertica Systems | Leave a Comment |
ParAccel’s market momentum
After my recent blog post, ParAccel is once again angry that I haven’t given it proper credit for it accomplishments. So let me try to redress the failing.
- ParAccel has disclosed the names of two customers, LatiNode and Merkle (presumably as an add-on to Merkle’s Netezza environment). And ParAccel has named two others under NDA. Four disclosed or semi-disclosed customers is actually more than DATAllegro has/had, although I presume DATAllegro’s three known customers are larger, especially in terms of database size.
- ParAccel sports a long list of partners, and has put out quite a few press releases in connection with these partnerships. While I’ve never succeeded in finding a company that took its ParAccel partnership especially seriously, I’ve only asked three or four of them, which is a small fraction of the total number of partners ParAccel has announced, so in no way can I rule out that somebody, somewhere, is actively helping ParAccel try to sell its products.
- ParAccel repeatedly says it has beaten Vertica in numerous proofs-of-concept (POCs), considerably more than the two cases in which it claims to have actually won a deal against Vertica competition.
- ParAccel has elicited favorable commentary from such astute observers as Seth Grimes and Doug Henschen.
- ParAccel has been noted for running TPC-H benchmarks in memory much more quickly than other vendors run them on disk.
Uh, that’s about all I can think of. What else am I forgetting? Surely that can’t be ParAccel’s entire litany of market success!
Categories: Data warehousing, Market share and customer counts, ParAccel | 6 Comments |
ParAccel actually uses relatively little PostgreSQL code
I often find it hard to write about ParAccel’s technology, for a variety of reasons:
- With occasional exceptions, ParAccel is reluctant to share detailed information.
- With occasional exceptions, ParAccel is reluctant to say anything for attribution.
- In ParAccel’s version of an “agile” development approach, product details keep changing, as do plans and schedules. (The gibe that ParAccel’s product plans are whatever their current sales prospect wants them to be — while of course highly exaggerated — isn’t wholly unfounded.)
- ParAccel has sold very few copies of its products, so it’s hard to get information from third parties.
ParAccel is quick, however, to send email if I post anything about them they think is incorrect.
All that said, I did get careless when I neglected to doublecheck something I already knew. Read more
Categories: Data warehousing, Netezza, ParAccel, PostgreSQL | 3 Comments |
Ordinary OLTP DBMS vs. memory-centric processing
A correspondent from China wrote in to ask about products that matched the following application scenario: Read more
Categories: In-memory DBMS, McObject, Memory-centric data management, OLTP, Oracle TimesTen, solidDB | 7 Comments |
More grist for the column vs. row mill
Daniel Abadi and Sam Madden are at it again, following up on their blog posts of six months arguing for the general superiority of column stores over row stores (for analytic query processing). The gist is to recite a number of bases for superiority, beyond the two standard ones of less I/O and better compression, and seems to be based largely on Section 5 of a SIGMOD paper they wrote with Neil Hachem.
A big part of their argument is that if you carry the processing of columnar and/or compressed data all the way through in memory, you get lots of advantages, especially because everything’s smaller and hence fits better into Level 2 cache. There also is some kind of join algorithm enhancement, which seems to be based on noticing when the result wound up falling into a range according to some dimension, and perhaps using dictionary encoding in a way that will help induce such an outcome.
The main enemy here is row-store vendors who say, in effect, “Oh, it’s easy to shoehorn almost all the benefits of a column-store into a row-based system.” They also take a swipe — for being insufficiently purely columnar — at unnamed columnar Vertica competitors, described in terms that seemingly apply directly to ParAccel.
Categories: Columnar database management, Data warehousing, Database compression, ParAccel, Vertica Systems | 2 Comments |
Database archiving and information preservation
Two similar companies reached out to me recently – SAND Technology and Clearpace. Their current market focus is somewhat different: Clearpace talks mainly of archiving, and sells first and foremost into the compliance market, while SAND has the most traction providing “near-line” storage for SAP databases.* But both stories boil down to pretty much the same thing: Cheap, trustworthy data storage with good-enough query capabilities. E.g., I think both companies would agree the following is a not-too-misleading first-approximation characterization of their respective products:
- Fully functional relational DBMS.
- Claims of fast query performance, but that’s not how they’re sold.
- Huge compression.
- Careful attention to time-stamping and auditability.