1. It boggles my mind that some database technology companies still don’t view compression as a major issue. Compression directly affects storage and bandwidth usage alike — for all kinds of storage (potentially including RAM) and for all kinds of bandwidth (network, I/O, and potentially on-server).
Trading off less-than-maximal compression so as to minimize CPU impact can make sense. Having no compression at all, however, is an admission of defeat.
2. People tend to misjudge Hadoop’s development pace in either of two directions. An overly expansive view is to note that some people working on Hadoop are trying to make it be all things for all people, and to somehow imagine those goals will soon be achieved. An overly narrow view is to note an important missing feature in Hadoop, and think there’s a big business to be made out of offering it alone.
At this point, I’d guess that Cloudera and Hortonworks have 500ish employees combined, many of whom are engineers. That allows for a low double-digit number of 5+ person engineering teams, along with a number of smaller projects. The most urgently needed features are indeed being built. On the other hand, a complete monument to computing will not soon emerge.
3. Schooner’s acquisition by SanDisk has led to the discontinuation of Schooner’s SQL DBMS SchoonerSQL. Schooner’s flash-optimized key-value store Membrain continues. I don’t have details, but the Membrain web page suggests both data store and cache use cases.
4. There’s considerable personnel movement at Boston-area database technology companies right now. Please ping me directly if you care.
5. I talked recently with Ashish Thusoo of Qubole. Qubole’s initial offering is a Hive-in-the-cloud, started by the guys who invented Hive. Qubole’s coolest new technical feature vs. generic Hive seems to be a disk-based columnar cache that lives with the servers, to help “smooth over the jitters” between Amazon EC2 and S3. Qubole company basics include:
- Founded last year.
- 15 early adopters, generally from mid-sized internet companies. Some of the adopters are already paying.
- 12 employees.
6. In my recent When I am a VC Overlord post, I wrote:
4. I will not fund any software whose primary feature is that it is implemented in the “cloud” or via “SaaS”. A me-too product on a different platform is still a me-too product.
5. I will not fund any pitch that emphasizes the word “elastic”. Elastic is an important feature of underwear and pajamas, but even in those domains it does not provide differentiation.
Cloud/SaaS deployments give you a chance at providing superior ease of use/installation/administration, without compromising functionality — but they don’t automatically guarantee it. It’s hard work to make your customers’ lives easier.*
*This is the second consecutive post in which I’ve used a similar line. I’ll try to stop now. What’s really scary is that I was inspired by the old Frank Perdue ad “It takes a tough man to make a tender chicken.”
7. Ofir Manor of EMC is skeptical about Oracle’s claims for Hybrid Columnar Compression. But he didn’t really dig up that much dirt, except that he seems to think 10X compression is more of a ceiling than the floor that Oracle marketing suggests it is. The money quote is:
Oracle used to provide 3x compression, now it provides 10x compression, so no wonder the best references customers are seeing about 3.4x savings…
That 3X is from Oracle’s Basic Compression, which seems to be a block-level dictionary scheme.
Code generation is most beneficial for queries that execute simple expressions and the interpretation overhead is most pronounced. For example, a query that is doing a regular expression match over each row is not going to benefit from code generation much because the interpretation overhead is low compared to the regex processing time.
Code generation may end up like compression — an architectural feature that DBMS just obviously should have.