NoSQL
Discussion of the NoSQL movement, whatever that is.
More on NoSQL and HVSP (or OLRP)
Since posting last Wednesday morning that I’m looking into NoSQL and HVSP, I’ve had a lot of conversations, including with (among others):
- Dwight Merriman of 10gen (MongoDB)
- Damien Katz of Couchio (CouchDB)
- Matt Pfeil of Riptano (Cassandra)
- Todd Lipcon of Cloudera (HBase committer)
- Tony Falco of Basho (Riak)
- John Busch of Schooner
- Ori Herrnstadt of Akiban
| Categories: Akiban, Basho and Riak, Cache, Cassandra, Cloudera, Clustrix, CouchDB, Facebook, HBase, Hadoop, MySQL, NoSQL, OLTP, Object, Open source, Parallelization, Riptano, Schooner, Theory and architecture, Tokutek, memcached | Leave a Comment |
The Workday architecture — a new kind of OLTP software stack
One of my coolest company visits in some time was to SaaS (Software as a Service) vendor Workday, Inc., earlier this month. Reasons included:
- Workday has forward-thinking ideas about SaaS enterprise applications and the integration of business intelligence into same.
- Workday has highly innovative ideas in how it manages data.
- Companies founded by Dave Duffield tend to feature smart, likeable people who talk to one pleasantly and forthrightly. Workday is no exception; CTO Stan Swete and the other Workday folks present were a delight to talk with.
- I’d invited Merv Adrian to come along with me. He asked great questions, and I could gather myself a bit despite how sleep-deprived I was for the first part of that trip.
Workday kindly allowed me to post this Workday slide deck. Otherwise, I’ve split out a quick Workday, Inc. company overview into a separate post.
The biggie for me was the data and object management part. Specifically: Read more
I’m collecting data points on NoSQL and HVSP adoption
I was asked to do a magazine article on NoSQL, where by “NoSQL” is meant “whatever they talk about at NoSQL conferences.” By now the number of publications planning to run the article is up to 2, the deadline is next week and, crucially, it has been agreed that I may talk about HVSP in general, NoSQL and SQL alike.
It also is understood that, realistically, I can’t be expected to know and mention the very latest news for all the many products in the categories. Even so, I think this would be fine time to check just where NoSQL and HVSP adoption stand. Here is most of what I know, or links to same; it would be great if you guys would contribute additional data in the comment thread.
In the NoSQL area: Read more
Finally confirmed: Membase has a reasonable product roadmap
On my recent trip to California, neither I nor my clients at Northscale covered ourselves in meeting-arranging glory. Still, from the rushed 30 minute meeting we did wind up having, I finally came away feeling good about Membase’s product direction.
To review, Membase is a reasonably elastic persistent data store, sporting the memcached API, making memcached/Membase an attractive alternative to memcached/sharded MySQL. As of now, Membase is a pure key-value store.
Northscale defends pure key-value stores by arguing, in effect: Read more
| Categories: NoSQL, Northscale, Parallelization, memcached | 3 Comments |
Links and observations
I’m back from a trip to the SF Bay area, with a lot of writing ahead of me. I’ll dive in with some quick comments here, then write at greater length about some of these points when I can. From my trip: Read more
Advice for some non-clients
Edit: Any further anonymous comments to this post will be deleted. Signed comments are permitted as always.
Most of what I get paid for is in some form or other consulting. (The same would be true for many other analysts.) And so I can be a bit stingy with my advice toward non-clients. But my non-clients are a distinguished and powerful group, including in their number Oracle, IBM, Microsoft, and most of the BI vendors. So here’s a bit of advice for them too.
Oracle. On the plus side, you guys have been making progress against your reputation for untruthfulness. Oh, I’ve dinged you for some past slip-ups, but on the whole they’ve been no worse than other vendors.’ But recently you pulled a doozy. The analyst reports section of your website fails to distinguish between unsponsored and sponsored work.* That is a horrible ethical stumble. Fix it fast. Then put processes in place to ensure nothing that dishonest happens again for a good long time.
*Merv Adrian’s “report” listed high on that page is actually a sponsored white paper. That Merv himself screwed up by not labeling it clearly as such in no way exonerates Oracle. Besides, I’m sure Merv won’t soon repeat the error — but for Oracle, this represents a whole pattern of behavior.
Oracle. And while I’m at it, outright dishonesty isn’t your only unnecessary credibility problem. You’re also playing too many games in analyst relations.
HP. Neoview will never succeed. Admit it to yourselves. Go buy something that can. Read more
Riptano, and Cassandra adoption
Tonight’s Cassandra technology post got plenty long enough on its own, so I’m separating out business and adoption issues here. For starters, known Cassandra users include:
- Facebook, which has said it has 150 or so Cassandra nodes (but see below)
- Twitter, which has said it has 45 or so Cassandra nodes
- Rackspace, which used to be Jonathan Ellis’ employer, and now is backing Cassandra company Riptano
- Digg, which along with Twitter and Rackspace was one of the three major users helping advance the Cassandra project
- OpenX, Simple Geo, Digital Reasoning, who Jonathan cited as production users in March
- Cloudkick, as noted and linked in my other post
- Two customers Riptano named at launch (but I’ve forgotten who they were*)
Fetlife, Meebo, and others seem to at least have a healthy interest in Cassandra, based on their level of involvement in a forthcoming Cassandra Summit. That said, the @Fetlife tweetstream features numerous yelps of pain, and I don’t mean the recreational kind. Read more
| Categories: Cassandra, Facebook, Market share, NoSQL, Open source, Parallelization, Pricing, Riptano, Specific users | 3 Comments |
Cassandra technical overview
Back in March, I talked with Jonathan Ellis of Rackspace, who runs the Apache Cassandra project. I started drafting a blog post then, but never put it up. Then Jonathan cofounded Riptano, a company to commercialize Cassandra, and so I talked with him again in May. Well, I’m finally finding time to clear my Cassandra/Riptano backlog. I’ll cover the more technical parts below, and the more business- or usage-oriented ones in a companion Cassandra/Riptano post.
Jonathan’s core claims for Cassandra include:
- Cassandra is shared-nothing.
- Cassandra has good approaches to replication and partitioning, right out of the box.
- In particular, Cassandra is good for use cases that distribute a database around the world and want to access it at “local” latencies. (Indeed, Jonathan asserts that non-local replication is a significant non-big-data Cassandra use case.)
- Cassandra’s scale-out is application-transparent, unlike sharded MySQL’s.
- Cassandra is fast at both appends and range queries, which would be hard to accomplish in a pure key-value store.
In general, Jonathan positions Cassandra as being best-suited to handle a small number of operations at high volume, throughput, and speed. The rest of what you do, as far as he’s concerned, may well belong in a more traditional SQL DBMS. Read more
| Categories: Amazon and its cloud, Cassandra, Facebook, Google, Log analysis, NoSQL, Open source, Parallelization, Riptano | 4 Comments |
VoltDB finally launches
VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB — or just “Volt” — has discovered the virtues of embargoes that end 12:01 am. Let’s go straight to the technical highlights:
- VoltDB is based on the H-Store technology, which I wrote about in February, 2009. Most of what I said about H-Store then applies to VoltDB today.
- VoltDB is a no-apologies ACID relational DBMS, which runs entirely in RAM.
- VoltDB has rather limited SQL. (One example: VoltDB can’t do SUMs in SQL.) However, VoltDB guy Tim Callaghan (Mark Callaghan’s lesser-known but nonetheless smart brother) asserts that if you code up the missing functionality, it’s almost as fast as if it were present in the DBMS to begin with, because there’s no added I/O from the handoff between the DBMS and the procedural code. (The data’s in RAM one way or the other.)
- VoltDB’s Big Conceptual Performance Story is that it does away with most locks, latches, logs, etc., and also most context switching.
- In particular, you’re supposed to partition your data and architect your application so that most transactions execute on a single core. When you can do that, you get VoltDB’s performance benefits. To the extent you can’t, you’re in two-phase-commit performance land. (More precisely, you’re doing 2PC for multi-core writes, which is surely a major reason that multi-core reads are a lot faster in VoltDB than multi-core writes.)
- VoltDB has a little less than one DBMS thread per core. When the data partitioning works as it should, you execute a complete transaction in that single thread. Poof. No context switching.
- A transaction in VoltDB is a Java stored procedure. (The early idea of Ruby on Rails in lieu of the Java/SQL combo didn’t hold up performance-wise.)
- Solid-state memory is not a viable alternative to RAM for VoltDB. Too slow.
- Instead, VoltDB lets you snapshot data to disk at tunable intervals. “Continuous” is one of the options, wherein a new snapshot starts being made as soon as the last one completes.
- In addition, VoltDB will also spool a kind of transaction log to the target of your choice. (Obvious choice: An analytic DBMS such as Vertica, but there’s no such connectivity partnership actually in place at this time.)
The Clustrix story
After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included: Read more
