Discussion of DataStax — formerly known as Riptano — a company founded to commercialize Cassandra.
I put up 14 posts over the past week, so perhaps you haven’t had a chance yet to read them all. Highlights included:
- My most important post of the week was a general guide to IT vendor strategy. That one has already spawned discussion at many companies, from the tiny to the multi-billion-dollar.
- The best comment thread of the week was probably on my post about scale-out relational OLTP choices, in which people discussed the merits of various particular alternatives.
- I recommended that people strongly consider attending XLDB 5 in Menlo Park on October 18-19.
Most of the posts, however, were reactions to news events. In particular:
- Teradata announced that Teradata 14 will be hybrid-columnar, more in Vertica’s way than in Greenplum’s or Aster Data’s. (Pay no attention to the Wall Street Journal’s apparent belief that no other analytic DBMS is hybrid-columnar at all.)
- Aster announced the unsurprising news that there will be a Teradata Aster appliance. Also, Aster talked about greater analytic flexibility in the forthcoming Aster 5.0.
- With Oracle OpenWorld coming up, Oracle decided to get some of its announcing out of the way early. In particular, it announced the Oracle Database Appliance, which is small-business-friendly hardware for running the Oracle DBMS. However, the Oracle Database Appliance doesn’t seem to do much about the complexity of running the Oracle DBMS software.
- In a catch-all Hadoop post, I noted that:
- Oracle has now clearly said it has a Hadoop appliance coming, no doubt next week at OpenWorld.
- I still can’t see why Hadoop appliances would succeed, but a lot of smart folks seem to disagree with me.
- Greenplum announced what looks like a nice but unimportant little product upgrade.
- It’s a really good thing that previously reported plans to revamp Hadoop are underway.
- DataStax announced that it really is a Cassandra company after all. Pay no attention to previous marketing that seemed to put DataStax in the same Hadoop-alternative category as, say, MapR.
- Ingres has changed its name to Actian. The announcement seems like a confession that Ingres and VectorWise are going nowhere.
|Categories: Actian and Ingres, Aster Data, Data warehousing, DataStax, Greenplum, Hadoop, Teradata, VectorWise||Leave a Comment|
The DataStax and Cassandra stories are somewhat confusing. Unfortunately, DataStax chose to clarify them in what has turned out to be a crazy news week. I’m going to use this post just to report on the status of the DataStax product line, without going into any analysis beyond that.
Last month, I reviewed with the Aster Data folks which markets they were targeting and selling into, subsequent to acquisition by their new orange overlords. The answers aren’t what they used to be. Aster no longer focuses much on what it used to call frontline (i.e., low-latency, operational) applications; those are of course a key strength for Teradata. Rather, Aster focuses on investigative analytics — they’ve long endorsed my use of the term — and on the batch run/scoring kinds of applications that inform operational systems.
|Categories: Analytic technologies, Application areas, Aster Data, Data warehousing, DataStax, RDF and graphs, Surveillance and privacy, Teradata, Web analytics||1 Comment|
There’s been a flurry of announcements recently in the Hadoop world. Much of it has been concentrated on Hadoop data storage and management. This is understandable, since HDFS (Hadoop Distributed File System) is quite a young (i.e. immature) system, with much strengthening and Bottleneck Whack-A-Mole remaining in its future.
Known HDFS and Hadoop data storage and management issues include but are not limited to:
- Hadoop is run by a master node, and specifically a namenode, that’s a single point of failure.
- HDFS compression could be better.
- HDFS likes to store three copies of everything, whereas many DBMS and file systems are satisfied with two.
- Hive (the canonical way to do SQL joins and so on in Hadoop) is slow.
Different entities have different ideas about how such deficiencies should be addressed. Read more
|Categories: Aster Data, Cassandra, Cloudera, Data warehouse appliances, DataStax, EMC, Greenplum, Hadapt, Hadoop, IBM and DB2, MapReduce, MongoDB and 10gen, Netezza, Parallelization||22 Comments|
Cassandra company DataStax is introducing a Hadoop distribution called Brisk, for use cases that combine short-request and analytic processing. Brisk in essence replaces HDFS (Hadoop Distributed File System) with a Cassandra-based file system called CassandraFS. The whole thing is due to be released (Apache open source) within the next 45 days.
The core claims for Cassandra/Brisk/CassandraFS are:
- CassandraFS has the same interface as HDFS. So, in particular, you should be able to use most Hadoop add-ons with Brisk.
- CassandraFS has comparable performance to HDFS on sequential scans. That’s without predicate pushdown to Cassandra, which is Coming Soon but won’t be in the first Brisk release.
- Brisk/CassandraFS is much easier to administer than HDFS. In particular, there are no NameNodes, JobTracker single points of failure, or any other form of head node. Brisk/CassandraFS is strictly peer-to-peer.
- Cassandra is far superior to HBase for short-request use cases, specifically with 5-6X the random-access performance.
There’s a pretty good white paper around all this, which also recites general Cassandra claims –  and here at last is the link.
Riptano, the Cassandra company, has changed its name to DataStax. DataStax has opened headquarters in Burlingame and hired some database-experienced folks – notably Ben Werther from Greenplum and Michael Weir from ParAccel, with Zenobia Godschalk (who worked with Aster Data) somewhere in the outside PR mix. Other than that, what’s new at DataStax is pretty much what could have been expected based on what DataStax folks said last spring.
Most notably, DataStax is introducing a software offering, whose full name is DataStax OpsCenter for Apache Cassandra. DataStax OpsCenter for Apache Cassandra seems to be, in essence, a monitoring tool for Cassandra clusters, with a bit of capacity planning bundled in. (If there are any outright operations parts to DataStax OpsCenter, they got overlooked in our conversation.)* Read more
|Categories: Cassandra, DataStax, Market share and customer counts, NoSQL, Specific users, Telecommunications||1 Comment|
Since posting last Wednesday morning that I’m looking into NoSQL and HVSP, I’ve had a lot of conversations, including with (among others):
- Dwight Merriman of 10gen (MongoDB)
- Damien Katz of Couchio (CouchDB)
- Matt Pfeil of Riptano (Cassandra)
- Todd Lipcon of Cloudera (HBase committer)
- Tony Falco of Basho (Riak)
- John Busch of Schooner
- Ori Herrnstadt of Akiban
Tonight’s Cassandra technology post got plenty long enough on its own, so I’m separating out business and adoption issues here. For starters, known Cassandra users include:
- Facebook, which has said it has 150 or so Cassandra nodes (but see below)
- Twitter, which has said it has 45 or so Cassandra nodes
- Rackspace, which used to be Jonathan Ellis’ employer, and now is backing Cassandra company Riptano
- Digg, which along with Twitter and Rackspace was one of the three major users helping advance the Cassandra project
- OpenX, Simple Geo, Digital Reasoning, who Jonathan cited as production users in March
- Cloudkick, as noted and linked in my other post
- Two customers Riptano named at launch (but I’ve forgotten who they were*)
Fetlife, Meebo, and others seem to at least have a healthy interest in Cassandra, based on their level of involvement in a forthcoming Cassandra Summit. That said, the @Fetlife tweetstream features numerous yelps of pain, and I don’t mean the recreational kind. Read more
|Categories: Cassandra, DataStax, Facebook, Market share and customer counts, NoSQL, Open source, Parallelization, Pricing, Specific users||5 Comments|
Back in March, I talked with Jonathan Ellis of Rackspace, who runs the Apache Cassandra project. I started drafting a blog post then, but never put it up. Then Jonathan cofounded Riptano, a company to commercialize Cassandra, and so I talked with him again in May. Well, I’m finally finding time to clear my Cassandra/Riptano backlog. I’ll cover the more technical parts below, and the more business- or usage-oriented ones in a companion Cassandra/Riptano post.
Jonathan’s core claims for Cassandra include:
- Cassandra is shared-nothing.
- Cassandra has good approaches to replication and partitioning, right out of the box.
- In particular, Cassandra is good for use cases that distribute a database around the world and want to access it at “local” latencies. (Indeed, Jonathan asserts that non-local replication is a significant non-big-data Cassandra use case.)
- Cassandra’s scale-out is application-transparent, unlike sharded MySQL’s.
- Cassandra is fast at both appends and range queries, which would be hard to accomplish in a pure key-value store.
In general, Jonathan positions Cassandra as being best-suited to handle a small number of operations at high volume, throughput, and speed. The rest of what you do, as far as he’s concerned, may well belong in a more traditional SQL DBMS. Read more
|Categories: Amazon and its cloud, Cassandra, DataStax, Facebook, Google, Log analysis, NoSQL, Open source, Parallelization||4 Comments|