From time to time I advise a software vendor on how, whether, or to what extent it should offer its technology in open source. In summary, I believe:
- The formal differences between “open source” and “closed source” strategies are of secondary importance.
- The attitudinal and emotional differences between “open source” and “closed source” approaches can be large.
- A pure closed source strategy can make sense.
- A closed source strategy with important open source aspects can make sense.
- A pure open source strategy will only rarely win.
An “open source software” business model and strategy might include:
- Software given away for free.
- Demand generation to encourage people to use the free version of the software.
- Subscription pricing for additional proprietary software and support.
- Direct sales, and further marketing, to encourage users of the free stuff to upgrade to a paid version.
A “closed source software” business model and strategy might include:
- Demand generation.
- Free-download versions of the software.
- Subscription pricing for software (increasingly common) and support (always).
- Direct sales, and associated marketing.
Those look pretty similar to me.
Of course, there can still be differences between open and closed source. In particular: Read more
I haven’t done a notes/link/comments post for a while. Time for a little catch-up.
1. MySQL now has a memcached integration story. I haven’t checked the details. The MySQL team is pretty hard to talk with, due to the heavy-handedness of Oracle’s analyst relations.
2. The Large Hadron Collider offers some serious numbers, including:
- 1 petabyte/second.
- 6 x 109 collisions/second.
- Only 1 in 1013 collision records kept (which I guess knocks things down to a 100 byte/second average, from the standpoint of persistent storage).
- Real-time filtering by a cluster of several thousand machines, over a 25 nanosecond period.
3. One application area we don’t talk about much for analytic technologies is education. However: Read more
|Categories: Cache, memcached, Memory-centric data management, MySQL, Open source, Petabyte-scale data management, RDF and graphs, Scientific research||Leave a Comment|
I talked with MemSQL shortly before today’s launch. MemSQL technology basics are:
- In-memory relational DBMS.
- Being released single-box only. Transparent sharding is under development for release in the fall. Basic replication is under development too.
- Subset of SQL-92.
- MySQL wire-compatible (SQL coverage issues excepted).
MemSQL’s performance claims include:
- Read performance 10% or so worse than memcached.
- Write performance 20% or so better than memcached.
- 1.2 million inserts/second on a 64-core, 1/2 TB of RAM machine.
- Similarly, 1/2 billion records loaded in under 20 minutes.
MemSQL company basics include: Read more
|Categories: Database compression, In-memory DBMS, Investment research and trading, Market share and customer counts, memcached, MemSQL, OLTP, Pricing, Web analytics||3 Comments|
I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:
- The desire for human real-time interactive response naturally leads to keeping data in RAM.
- Many databases will be ever cheaper to put into RAM over time, thanks to Moore’s Law. (Most) traditional databases will eventually wind up in RAM.
- However, there will be exceptions, mainly on the machine-generated side. Where data creation and RAM data storage are getting cheaper at similar rates … well, the overall cost of RAM storage may not significantly decline.
Getting more specific than that is hard, however, because:
- The possibilities for in-memory data storage are as numerous and varied as those for disk.
- The individual technologies and products for in-memory storage are much less mature than those for disk.
- Solid-state options such as flash just confuse things further.
Consider, for example, some of the in-memory data management ideas kicking around. Read more
According to the MySQL Cluster home page, today’s MySQL Cluster release has — give or take terminology details — added transparent sharding (Edit: Actually, please see the first comment below) and a memcached interface. My quick comments on all this to a reporter a couple of days ago were:
- Persistent memcached is a useful thing. Couchbase’s sales illustrate that point: http://www.dbms2.com/2012/02/01/couchbase-update/
- MySQL has always given good performance when used just as a key-value store, e.g. http://www.dbms2.com/2010/08/22/workday-technology-stack/ . So it’s reasonable to hope the memcached interface will have good performance out of the box.
- MySQL’s clustering capabilities have long been weak, providing a window of opportunity for companies and products such as Schooner Information and dbShards. The gold standard for clustering is:
- Efficient transparent sharding: http://www.dbms2.com/2011/02/24/transparent-sharding/
- Synchronous replication at much better than two-phase-commit speeds. http://www.dbms2.com/2011/10/23/schooner-pivots-further/
I don’t really know enough about MySQL Cluster right now to comment in more detail.
My Couchbase business update with Bob Wiederhold was very interesting, but it didn’t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven’t had their designs locked down yet anyway. But here’s at least a partial explanation of what’s up.
memcached is a way to cache data in RAM across a cluster of servers and have it all look logically like a single memory pool, extremely popular among large internet companies. The Membase product — which is what Couchbase has been selling this year — adds persistence to memcached, an obvious improvement on requiring application developers to write both to memcached and to non-transparently-sharded MySQL. The main technical points in adding persistence seem to have been:
- A persistent backing store (duh), namely SQLite.
- A change to the hashing algorithm, to avoid losing data when the cluster configuration is changed.
Couchbase is essentially Membase improved by integrating CouchDB into it, with the main changes being:
- Changing the backing store to CouchDB (duh). This will be in the first Couchbase release.
- Adding cross data center replication on CouchDB’s consistency model. This will not, I believe, be in the first Couchbase release.
- Offering CouchDB’s programming and query interfaces as an option. So far as I can tell, this will be implemented straightforwardly in the first Couchbase release, with elegance planned for later down the road.
Let’s drill down a bit into Membase/Couchbase clustering and consistency. Read more
|Categories: Cache, Clustering, Couchbase, memcached, Memory-centric data management, MySQL, Parallelization, Solid-state memory||6 Comments|
I decided I needed some Couchbase drilldown, on business and technology alike, so I had solid chats with both CEO Bob Wiederhold and Chief Architect Dustin Sallings. Pretty much everything I wrote at the time Membase and CouchOne merged to form Couchbase (the company) still holds up. But I have more detail now.
Context for any comments on customer traction includes:
- Membase went into limited production release in October, and full release in January. Similar things are true of CouchDB.
- Hence, most sales of Couchbase’s products have been made over the past 6 months.
- Couchbase (the merged product) is at this point only in a pre-production developer’s release.
- Couchbase has both a direct sales force and a classic open-source “funnel”-based online selling model. Naturally, Couchbase’s understanding of what its customers are doing is more solid with respect to the direct sales base.
- Most of Couchbase’s revenue to date seems to have come from a limited number of big-ticket “lighthouse” accounts (as opposed to, say, the larger number of smaller deals that come in through the online funnel).
- Most Membase purchases are for new applications, as opposed to memcached migrations. However, customers are the kinds of companies that probably also are using memcached elsewhere.
- Most other Membase purchases are replacements for the Membase/MySQL combination. Bob says those are easy sales with short sales cycles.
- Pure memcached support is a small but non-zero business for Couchbase, and a fine source of upsell opportunities.
- In the pipeline but not so much yet in the customer base are SaaS vendors and the like who use and may want to replace traditional DBMS such as Oracle. Other than among those, Couchbase doesn’t compete much yet with Oracle et al.
- Pure CouchDB isn’t all that much of a business, at least relative to community size, as CouchDB is a single-server product commonly used by people who are content not to pay for support.
Membase sales are concentrated in five kinds of internet-centric companies, which in declining order are: Read more
As a follow-up to the latest Stonebraker kerfuffle, Derrick Harris asked me a bunch of smart followup questions. My responses and afterthoughts include:
- Facebook et al. are in effect Software as a Service (SaaS) vendors, not enterprise technology users. In particular:
- They have the technical chops to rewrite their code as needed.
- Unlike packaged software vendors, they’re not answerable to anybody for keeping legacy code alive after a rewrite. That makes migration a lot easier.
- If they want to write different parts of their system on different technical underpinnings, nobody can stop them. For example …
- … Facebook innovated Cassandra, and is now heavily committed to HBase.
- It makes little sense to talk of Facebook’s use of “MySQL.” Better to talk of Facebook’s use of “MySQL + memcached + non-transparent sharding.” That said:
- It’s hard to see why somebody today would use MySQL + memcached + non-transparent sharding for a new project. At least one of Couchbase or transparently-sharded MySQL is very likely a superior alternative. Other alternatives might be better yet.
- As noted above in the example of Facebook, the many major web businesses that are using MySQL + memcached + non-transparent sharding for existing projects can be presumed able to migrate away from that stack as the need arises.
Continuing with that discussion of DBMS alternatives:
- If you just want to write to the memcached API anyway, why not go with Couchbase?
- If you want to go relational, why not go with MySQL? There are many alternatives for scaling or accelerating MySQL — dbShards, Schooner, Akiban, Tokutek, ScaleBase, ScaleDB, Clustrix, and Xeround come to mind quickly, so there’s a great chance that one or more will fit your use case. (And if you don’t get the choice of MySQL flavor right the first time, porting to another one shouldn’t be all THAT awful.)
- If you really, really want to go in-memory, and don’t mind writing Java stored procedures, and don’t need to do the kinds of joins it isn’t good at, but do need to do the kinds of joins it is, VoltDB could indeed be a good alternative.
And while we’re at it — going schema-free often makes a whole lot of sense. I need to write much more about the point, but for now let’s just say that I look favorably on the Big Four schema-free/NoSQL options of MongoDB, Couchbase, HBase, and Cassandra.
This post has a sequel.
Last week, Mike Stonebraker insulted MySQL and Facebook’s use of it, by implication advocating VoltDB instead. Kerfuffle ensued. To the extent Mike was saying that non-transparently sharded MySQL isn’t an ideal way to do things, he’s surely right. That still leaves a lot of options for massive short-request databases, however, including transparently sharded RDBMS, scale-out in-memory DBMS (whether or not VoltDB*), and various NoSQL options. If nothing else, Couchbase would seem superior to memcached/non-transparent MySQL if you were starting a project today.
*The big problem with VoltDB, last I checked, was its reliance on Java stored procedures to get work done.
Pleasantries continued in The Register, which got an amazing-sounding quote from Mike. If The Reg is to be believed — something I wouldn’t necessarily take for granted — Mike claimed that he (i.e. VoltDB) knows how to solve the distributed join performance problem. Read more
|Categories: Cache, Clustering, Couchbase, Games and virtual worlds, In-memory DBMS, memcached, Michael Stonebraker, MySQL, Parallelization, Theory and architecture, VoltDB and H-Store||20 Comments|
In January, 2010, I posited that it might be helpful to view data as being divided into three categories:
- Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays.
- Human/Nontabular data — i.e., all other data generated by humans.
- Machine-Generated data.
I won’t now stand by every nuance in that post, which may differ slightly from those in my more recent posts about machine-generated data and poly-structured databases. But one general idea is hard to dispute:
Traditional database data — records of human transactional activity, referred to as “Human/Tabular data above” — will not grow as fast as Moore’s Law makes computer chips cheaper.
And that point has a straightforward corollary, namely:
It will become ever more affordable to put traditional database data entirely into RAM. Read more
|Categories: Analytic technologies, Cache, In-memory DBMS, memcached, Memory-centric data management, OLTP, Oracle, Oracle TimesTen, SAP AG, solidDB, Storage, Theory and architecture, VoltDB and H-Store||24 Comments|