Cache

Analysis of technologies that accelerate database management via caching. Related subjects include:

April 7, 2012

Many kinds of memory-centric data management

I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:

The desire for human real-time interactive response naturally leads to keeping data in RAM.
Many databases will be ever cheaper to put into RAM over time, thanks to Moore’s Law. (Most) traditional databases will eventually wind up in RAM.
However, there will be exceptions, mainly on the machine-generated side. Where data creation and RAM data storage are getting cheaper at similar rates … well, the overall cost of RAM storage may not significantly decline.

Getting more specific than that is hard, however, because:

The possibilities for in-memory data storage are as numerous and varied as those for disk.
The individual technologies and products for in-memory storage are much less mature than those for disk.
Solid-state options such as flash just confuse things further.

Consider, for example, some of the in-memory data management ideas kicking around. Read more

Categories: Business intelligence, Cache, Cognos, Columnar database management, Couchbase, Data models and architecture, Data warehousing, Database diversity, Exasol, IBM and DB2, In-memory DBMS, Kognitio, memcached, MongoDB, MySQL, NoSQL, Oracle, Oracle TimesTen, ParAccel, QlikTech and QlikView, SAP AG, solidDB, Streaming and complex event processing (CEP), VoltDB and H-Store, Workday

15 Comments

August 13, 2011

Couchbase technical update

My Couchbase business update with Bob Wiederhold was very interesting, but it didn’t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven’t had their designs locked down yet anyway. But here’s at least a partial explanation of what’s up.

memcached is a way to cache data in RAM across a cluster of servers and have it all look logically like a single memory pool, extremely popular among large internet companies. The Membase product — which is what Couchbase has been selling this year — adds persistence to memcached, an obvious improvement on requiring application developers to write both to memcached and to non-transparently-sharded MySQL. The main technical points in adding persistence seem to have been:

A persistent backing store (duh), namely SQLite.
A change to the hashing algorithm, to avoid losing data when the cluster configuration is changed.

Couchbase is essentially Membase improved by integrating CouchDB into it, with the main changes being:

Changing the backing store to CouchDB (duh). This will be in the first Couchbase release.
Adding cross data center replication on CouchDB’s consistency model. This will not, I believe, be in the first Couchbase release.
Offering CouchDB’s programming and query interfaces as an option. So far as I can tell, this will be implemented straightforwardly in the first Couchbase release, with elegance planned for later down the road.

Let’s drill down a bit into Membase/Couchbase clustering and consistency. Read more

Categories: Cache, Clustering, Couchbase, memcached, Memory-centric data management, MySQL, Parallelization, Solid-state memory

8 Comments

July 15, 2011

Soundbites: the Facebook/MySQL/NoSQL/VoltDB/Stonebraker flap, continued

As a follow-up to the latest Stonebraker kerfuffle, Derrick Harris asked me a bunch of smart followup questions. My responses and afterthoughts include:

Facebook et al. are in effect Software as a Service (SaaS) vendors, not enterprise technology users. In particular:
- They have the technical chops to rewrite their code as needed.
- Unlike packaged software vendors, they’re not answerable to anybody for keeping legacy code alive after a rewrite. That makes migration a lot easier.
- If they want to write different parts of their system on different technical underpinnings, nobody can stop them. For example …
- … Facebook innovated Cassandra, and is now heavily committed to HBase.
It makes little sense to talk of Facebook’s use of “MySQL.” Better to talk of Facebook’s use of “MySQL + memcached + non-transparent sharding.” That said:
- It’s hard to see why somebody today would use MySQL + memcached + non-transparent sharding for a new project. At least one of Couchbase or transparently-sharded MySQL is very likely a superior alternative. Other alternatives might be better yet.
- As noted above in the example of Facebook, the many major web businesses that are using MySQL + memcached + non-transparent sharding for existing projects can be presumed able to migrate away from that stack as the need arises.

Continuing with that discussion of DBMS alternatives:

If you just want to write to the memcached API anyway, why not go with Couchbase?
If you want to go relational, why not go with MySQL? There are many alternatives for scaling or accelerating MySQL — dbShards, Schooner, Akiban, Tokutek, ScaleBase, ScaleDB, Clustrix, and Xeround come to mind quickly, so there’s a great chance that one or more will fit your use case. (And if you don’t get the choice of MySQL flavor right the first time, porting to another one shouldn’t be all THAT awful.)
If you really, really want to go in-memory, and don’t mind writing Java stored procedures, and don’t need to do the kinds of joins it isn’t good at, but do need to do the kinds of joins it is, VoltDB could indeed be a good alternative.

And while we’re at it — going schema-free often makes a whole lot of sense. I need to write much more about the point, but for now let’s just say that I look favorably on the Big Four schema-free/NoSQL options of MongoDB, Couchbase, HBase, and Cassandra.

Categories: Akiban, Cache, Cassandra, Clustrix, Couchbase, Data models and architecture, Database diversity, dbShards and CodeFutures, Facebook, HBase, In-memory DBMS, memcached, Michael Stonebraker, MongoDB, NoSQL, Open source, ScaleBase, ScaleDB, Schooner Information Technology, Software as a Service (SaaS), Tokutek and TokuDB, VoltDB and H-Store

19 Comments

July 14, 2011

An odd claim attributed to Mike Stonebraker

This post has a sequel.

Last week, Mike Stonebraker insulted MySQL and Facebook’s use of it, by implication advocating VoltDB instead. Kerfuffle ensued. To the extent Mike was saying that non-transparently sharded MySQL isn’t an ideal way to do things, he’s surely right. That still leaves a lot of options for massive short-request databases, however, including transparently sharded RDBMS, scale-out in-memory DBMS (whether or not VoltDB*), and various NoSQL options. If nothing else, Couchbase would seem superior to memcached/non-transparent MySQL if you were starting a project today.

*The big problem with VoltDB, last I checked, was its reliance on Java stored procedures to get work done.

Pleasantries continued in The Register, which got an amazing-sounding quote from Mike. If The Reg is to be believed — something I wouldn’t necessarily take for granted — Mike claimed that he (i.e. VoltDB) knows how to solve the distributed join performance problem. Read more

Categories: Cache, Clustering, Couchbase, Games and virtual worlds, In-memory DBMS, memcached, Michael Stonebraker, MySQL, Parallelization, Theory and architecture, VoltDB and H-Store

20 Comments

May 23, 2011

Traditional databases will eventually wind up in RAM

In January, 2010, I posited that it might be helpful to view data as being divided into three categories:

Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays.
Human/Nontabular data — i.e., all other data generated by humans.
Machine-Generated data.

I won’t now stand by every nuance in that post, which may differ slightly from those in my more recent posts about machine-generated data and poly-structured databases. But one general idea is hard to dispute:

Traditional database data — records of human transactional activity, referred to as “Human/Tabular data above” — will not grow as fast as Moore’s Law makes computer chips cheaper.

And that point has a straightforward corollary, namely:

It will become ever more affordable to put traditional database data entirely into RAM. Read more

Categories: Analytic technologies, Cache, In-memory DBMS, memcached, Memory-centric data management, OLTP, Oracle, Oracle TimesTen, SAP AG, solidDB, Storage, Theory and architecture, VoltDB and H-Store

28 Comments

May 21, 2011

Object-oriented database management systems (OODBMS)

There seems to be a fair amount of confusion about object-oriented database management systems (OODBMS). Let’s start with a working definition:

An object-oriented database management system (OODBMS, but sometimes just called “object database”) is a DBMS that stores data in a logical model that is closely aligned with an application program’s object model. Of course, an OODBMS will have a physical data model optimized for the kinds of logical data model it expects.

If you’re guessing from that definition that there can be difficulties drawing boundaries between the application, the application programming language, the data manipulation language, and/or the DBMS — you’re right. Those difficulties have been a big factor in relegating OODBMS to being a relatively niche technology to date.

Examples of what I would call OODBMS include: Read more

Categories: Cache, In-memory DBMS, Intersystems and Cache', Memory-centric data management, Objectivity and Infinite Graph, OLTP, Software as a Service (SaaS), Starcounter

21 Comments

May 6, 2011

DB2 OLTP scale-out: pureScale

Tim Vincent of IBM talked me through DB2 pureScale Monday. IBM DB2 pureScale is a kind of shared-disk scale-out parallel OTLP DBMS, with some interesting twists. IBM’s scalability claims for pureScale, on a 90% read/10% write workload, include:

95% scalability up to 64 machines
90% scalability up to 88 machines
89% scalability up to 112 machines
84% scalability up to 128 machines

More precisely, those are counts of cluster “members,” but the recommended configuration is one member per operating system instance — i.e. one member per machine — for reasons of availability. In an 80% read/20% write workload, scalability is less — perhaps 90% scalability over 16 members.

Several elements are of IBM’s DB2 pureScale architecture are pretty straightforward:

There are multiple pureScale members (machines), each with its own instance of DB2.
There’s an RDMA (Remote Direct Memory Access) interconnect, perhaps InfiniBand. (The point of InfiniBand and other RDMA is that moving data doesn’t require interrupts, and hence doesn’t cost many CPU cycles.)
The DB2 pureScale members share access to the database on a disk array.
Each DB2 pureScale member has its own log, also on the disk array.

Something called GPFS (Global Parallel File System), which comes bundled with DB2, sits underneath all this. It’s all based on the mainframe technology IBM Parallel Sysplex.

The weirdest part (to me) of DB2 pureScale is something called the Global Cluster Facility, which runs on its own set of boxes. (Edit: Actually, see Tim Vincent’s comment below.) Read more

Categories: Cache, Clustering, IBM and DB2, OLTP, Oracle

15 Comments

May 3, 2011

Oracle and Exadata: Business and technical notes

Last Friday I stopped by Oracle for my first conversation since January, 2010, in this case for a chat with Andy Mendelsohn, Mark Townsend, Tim Shetler, and George Lumpkin, covering Exadata and the Oracle DBMS. Key points included: Read more

Categories: Analytic technologies, Cache, Clustering, Data warehouse appliances, Data warehousing, Emulation, transparency, portability, Exadata, MapReduce, Market share and customer counts, OLTP, Oracle, Parallelization, Predictive modeling and advanced analytics, Solid-state memory

9 Comments

February 8, 2011

Membase and CouchOne merged to form Couchbase

Membase, the company whose product is Membase and whose former company name is Northscale, has merged with CouchOne, the company whose product is CouchDB and whose former name is Couch.io. The result (product and company) will be called Couchbase. CouchDB inventor Damien Katz will join the Membase (now Couchbase) management team as CTO. Couchbase can reasonably be regarded as a document-oriented NoSQL DBMS, a product category I not coincidentally posted about yesterday.

In essence, Couchbase will be CouchDB with scale-out. Alternatively, Couchbase will be Membase with a richer programming interface. The Couchbase sweet spot is likely to be: Read more

Categories: Application areas, Cache, Couchbase, CouchDB, Market share and customer counts, memcached, NoSQL, Open source, Parallelization, Solid-state memory

2 Comments

October 18, 2010

More notes on Membase and memcached

As a companion to my post about Membase last week, the company has graciously allowed me to post a rather detailed Membase slide deck. (It even has pricing.) Also, I left one point out.

Membase announced a Cloudera partnership. I couldn’t detect anything technically exciting about that, but it serves to highlight what I do find to be an interesting usage trend. A couple of big Web players (AOL and ShareThis) are using Hadoop to crunch data and derive customer profile data, then feed that back into Membase. Why Membase? Because it can serve up the profile in a millisecond, as part of a bigger 40-millisecond-latency request.

And why Hadoop, rather than Aster Data nCluster, which ShareThis also uses? Umm, I didn’t ask.

When I mentioned this to Colin Mahony, he said Vertica had similar stories. However, I don’t recall whether they were about Membase or just memcached, and he hasn’t had a chance to get back to me with clarification. (Edit: As per Colin’s comment below, it’s both.)

Categories: Aster Data, Cache, Cloudera, Couchbase, Hadoop, memcached, Memory-centric data management, NoSQL, Pricing, Specific users, Vertica Systems, Web analytics

7 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in