Cache

Analysis of technologies that accelerate database management via caching. Related subjects include:

October 15, 2015

Couchbase 4.0 and related subjects

I last wrote about Couchbase in November, 2012, around the time of Couchbase 2.0. One of the many new features I mentioned then was secondary indexing. Ravi Mayuram just checked in to tell me about Couchbase 4.0. One of the important new features he mentioned was what I think he said was Couchbase’s “first version” of secondary indexing. Obviously, I’m confused.

Now that you’re duly warned, let me remind you of aspects of Couchbase timeline.

2 corporate name changes ago, Couchbase was organized to commercialize memcached. memcached, of course, was internet companies’ default way to scale out short-request processing before the rise of NoSQL, typically backed by manually sharded MySQL.
Couchbase’s original value proposition, under the name Membase, was to provide persistence and of course support for memcached. This later grew into a caching-oriented pitch even to customers who weren’t already memcached users.
A merger with the makers of CouchDB ensued, with the intention of replacing Membase’s SQLite back end with CouchDB at the same time as JSON support was introduced. This went badly.
By now, however, Couchbase sells for more than distributed cache use cases. Ravi rattled off a variety of big-name customer examples for system-of-record kinds of use cases, especially in session logging (duh) and also in travel reservations.
Couchbase 4.0 has been in beta for a few months.

Technical notes on Couchbase 4.0 — and related riffs 🙂 — start: Read more

Categories: Cache, Clustering, Couchbase, Data models and architecture, Databricks, Spark and BDAS, Exadata, Hadoop, MarkLogic, MongoDB, MySQL, NoSQL, Open source, Schema on need, Structured documents, Web analytics

1 Comment

September 8, 2013

Layering of database technology & DBMS with multiple DMLs

Two subjects in one post, because they were too hard to separate from each other

Any sufficiently complex software is developed in modules and subsystems. DBMS are no exception; the core trinity of parser, optimizer/planner, and execution engine merely starts the discussion. But increasingly, database technology is layered in a more fundamental way as well, to the extent that different parts of what would seem to be an integrated DBMS can sometimes be developed by separate vendors.

Major examples of this trend — where by “major” I mean “spanning a lot of different vendors or projects” — include:

The object/relational, aka universal, extensibility features developed in the 1990s for Oracle, DB2, Informix, Illustra, and Postgres. The most successful extensions probably have been:
- Geospatial indexing via ESRI.
- Full-text indexing, notwithstanding questionable features and performance.
MySQL storage engines.
MPP (Massively Parallel Processing) analytic RDBMS relying on single-node PostgreSQL, Ingres, and/or Microsoft SQL Server — e.g. Greenplum (especially early on), Aster (ditto), DATAllegro, DATAllegro’s offspring Microsoft PDW (Parallel Data Warehouse), or Hadapt.
Splits in which a DBMS has serious processing both in a “database” layer and in a predicate-pushdown “storage” layer — most famously Oracle Exadata, but also MarkLogic, InfiniDB, and others.
SQL-on-HDFS — Hive, Impala, Stinger, Shark and so on (including Hadapt).

Other examples on my mind include:

Data manipulation APIs being added to key-value stores such as Couchbase and Aerospike.
TokuMX, the Tokutek/MongoDB hybrid I just blogged about.
NuoDB’s willing reliance on third-party key-value stores (or HDFS in the role of one).
FoundationDB’s strategy, and specifically its acquisition of Akiban.

And there are several others I hope to blog about soon, e.g. current-day PostgreSQL.

In an overlapping trend, DBMS increasingly have multiple data manipulation APIs. Examples include: Read more

Categories: Aerospike, Akiban, Aster Data, Cache, Calpont, Cloudera, Data models and architecture, Database diversity, Databricks, Spark and BDAS, DATAllegro, Derived data, Greenplum, Hadapt, Hadoop, JPMorgan Chase, NoSQL, NuoDB, Parallelization, Solid-state memory, SQL/Hadoop integration, Structured documents, Text

7 Comments

January 12, 2013

Introduction to NuoDB

NuoDB has an interesting NewSQL story. NuoDB’s core design goals seem to be:

SQL.
Transactions.
Very flexible topology, including:
- Local replicas.
- Remote replicas.
- Easy deployment and management.

Categories: Cache, Cloud computing, Clustering, Database compression, NewSQL, NuoDB

5 Comments

January 7, 2013

Introduction to GenieDB

GenieDB is one of the newer and smaller NewSQL companies. GenieDB’s story is focused on wide-area replication and uptime, coupled to claims about ease and the associated low TCO (Total Cost of Ownership).

GenieDB is in my same family of clients as Cirro.

The GenieDB product is more interesting if we conflate the existing GenieDB Version 1 and a soon-forthcoming (mid-year or so) Version 2. On that basis:

GenieDB has three tiers.
GenieDB’s top tier is the usual MySQL front-end.
GenieDB’s bottom tier is either Berkeley DB or a conventional MySQL storage engine.
GenieDB’s bottom tier stores your entire database at every node.
If you replicate locally, GenieDB’s middle tier operates a distributed cache.
If you replicate wide-area, GenieDB’s middle tier allows active-active/multi-master replication.

The heart of the GenieDB story is probably wide-area replication. Specifics there include: Read more

Categories: Cache, Cloud computing, Clustering, GenieDB, Market share and customer counts, MySQL, NewSQL

4 Comments

November 19, 2012

Couchbase 2.0

My clients at Couchbase checked in.

After multiple delays, Couchbase 2.0 is well into beta, with general availability being delayed by the holiday season as much as anything else.
Couchbase (the company) now has >350 subscription customers, almost all for Couchbase (the product) — which is to say for what was known as Membase, which is basically a persistent version of Memcached.
There also are many users of open source Couchbase, most famously LinkedIn.
Orbitz is a much-mentioned flagship paying Couchbase customer.
Couchbase customers mainly seem to be replacing a caching layer, Memcached or otherwise.
Couchbase headcount is just under 100.

The big changes in Couchbase 2.0 versus the previous (1.8.x) version are:

JSON storage, including secondary indexes.
Multi-data-center replication.
A back-end change from SQLite to a heavily forked version of CouchDB, called Couchstore.

Couchbase 2.0 is upwards-compatible with prior versions of Couchbase (and hence with Memcached), but not with CouchDB.

Technology notes on Couchbase 2.0 include: Read more

Categories: Basho and Riak, Cache, Cassandra, Clustering, Couchbase, MapReduce, Market share and customer counts, MongoDB, NoSQL, Open source, Structured documents

5 Comments

September 7, 2012

Integrated internet system design

What are the central challenges in internet system design? We probably all have similar lists, comprising issues such as scale, scale-out, throughput, availability, security, programming ease, UI, or general cost-effectiveness. Screw those up, and you don’t have an internet business.

Much new technology addresses those challenges, with considerable success. But the success is usually one silo at a time — a short-request application here, an analytic database there. When it comes to integration, unsolved problems abound.

The top integration and integration-like challenges for me, from a practical standpoint, are:

Integrating silos — a decades-old problem still with us in a big way.
Dynamic schemas with joins.
Low-latency business intelligence.
Human real-time personalization.

Other concerns that get mentioned include:

Geographical distribution due to privacy laws, which for some users is a major requirement for compliance.
Logical data warehouse, a term that doesn’t actually mean anything real.
In-memory data grids, which some day may no longer always be hand-coupled to the application and data stacks they accelerate.

Let’s skip those latter issues for now, focusing instead on the first four.

Categories: About this blog, Business intelligence, Cache, Clustering, Data integration and middleware, Data warehousing, Database diversity, EAI, EII, ETL, ELT, ETLT, Exadata, NoSQL, OLTP, Oracle, Predictive modeling and advanced analytics, SAP AG, Surveillance and privacy

6 Comments

August 20, 2012

In-memory, (hybrid) memory-centric DBMS — three analytic glossary draft entries

These are three closely-related draft entries for the DBMS2 analytic glossary. Please comment with any ideas you have for their improvement!

1. We coined the term memory-centric data management to comprise several kinds of technology that manage data in RAM (Random Access Memory), including:

In-memory DBMS (DataBase Management Systems).
Hybrid memory-centric DBMS.
Other kinds of in-memory data stores, such as:
- Caching layers.
- In-memory data stores that are tightly tied to specific analytic tools, for example the in-memory data management part of QlikView.
Complex event/stream processing.

Related link

Many examples of memory-centric data management (April, 2012)

2. An in-memory DBMS is a DBMS designed under the assumption that substantially all database operations will be performed in RAM (Random Access Memory). Thus, in-memory DBMS form a subcategory of memory-centric data management systems.

Ways in which in-memory DBMS are commonly different from those that query and update persistent storage include: Read more

Categories: Analytic glossary, Cache, In-memory DBMS, Memory-centric data management, Streaming and complex event processing (CEP)

7 Comments

August 6, 2012

Notes, links and comments August 6, 2012

I haven’t done a notes/link/comments post for a while. Time for a little catch-up.

1. MySQL now has a memcached integration story. I haven’t checked the details. The MySQL team is pretty hard to talk with, due to the heavy-handedness of Oracle’s analyst relations.

2. The Large Hadron Collider offers some serious numbers, including:

1 petabyte/second.
6 x 10⁹ collisions/second.
Only 1 in 10¹³collision records kept (which I guess knocks things down to a 100 byte/second average, from the standpoint of persistent storage).
Real-time filtering by a cluster of several thousand machines, over a 25 nanosecond period.

3. One application area we don’t talk about much for analytic technologies is education. However: Read more

Categories: Cache, memcached, Memory-centric data management, MySQL, Open source, Petabyte-scale data management, RDF and graphs, Scientific research

Memory-centric data management when locality matters

Ron Pressler of Parallel Universe/SpaceBase pinged me about a data grid product he was open sourcing, called Galaxy. The idea is that a distributed RAM grid will allocate data, not randomly or via consistent hashing, but rather via a locality-sensitive approach. Notes include:

The original technology was developed to track moving objects on behalf of the Israeli Air Force.
The commercial product is focused on MMO (Massively MultiPlayer Online) games (or virtual worlds).
The underpinnings are being open sourced.
Ron suggests that, among other use cases, Galaxy might work well for graphs.
Ron argues that one benefit is that when lots of things cluster together — e.g. characters in a game — there’s a natural way to split them elastically (shrink the radius for proximity).
The design philosophy seems to be to adapt as many ideas as possible from the way CPUs manage (multiple levels of) RAM cache.

The whole thing is discussed in considerable detail in a blog post and a especially in a Hacker News comment thread. There’s also an error-riddled TechCrunch article. Read more

Categories: Cache, Clustering, Games and virtual worlds, GIS and geospatial, Open source, RDF and graphs, Scientific research, Streaming and complex event processing (CEP)

2 Comments

June 14, 2012

Workday update

In August 2010, I wrote about Workday’s interesting technical architecture, highlights of which included:

Lots of small Java objects in memory.
A very simple MySQL backing store (append-only, <10 tables).
Some modernistic approaches to application navigation.
A faceted approach to BI.

I caught up with Workday recently, and things have naturally evolved. Most of what we talked about (by my choice) dealt with data management, business intelligence, and the overlap between the two.

It is now reasonable to say that Workday’s servers fall into at least seven tiers, although we talked mainly about five that work together as a kind of giant app/database server amalgamation. The three that do noteworthy data management can be described as:

In-memory objects and transactions. This is similar to what Workday had before.
Persistent MySQL. Part of this is similar to what Workday had before. In addition, Workday is now storing certain data in tables in the ordinary relational way.
In-memory caching and indexing. This has three aspects:
- Indexes for the ordinary relational tables, organized in interesting ways.
- Indexes for Workday’s search-box navigation (as per my original Workday technical post, you can search across objects, task-names, etc.).
- Compressed copies of the Java objects, used to instantiate other servers as needed. The most obvious uses of this are:
  - Recovery for the object/transaction tier.
  - Launch for the elastic compute tier. (Described below.)

Two other Workday server tiers may be described as: Read more

Categories: Application servers, Business intelligence, Cache, Cloud computing, Data models and architecture, Database compression, Database diversity, EAI, EII, ETL, ELT, ETLT, Memory-centric data management, MySQL, Parallelization, Software as a Service (SaaS), Workday

5 Comments

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Cache

Couchbase 4.0 and related subjects

Layering of database technology & DBMS with multiple DMLs

Introduction to NuoDB

Introduction to GenieDB

Couchbase 2.0

Integrated internet system design

In-memory, (hybrid) memory-centric DBMS — three analytic glossary draft entries

Notes, links and comments August 6, 2012

Memory-centric data management when locality matters

Workday update

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin