Data/database virtualization seems to be a hot subject right now, and vendors of a broad variety of different technologies are all claiming to be in the space. A terminological mess has ensued, as Monash’s First and Third Laws of Commercial Semantics are borne out in spades.
If something is like “virtualization”, then it should resemble hypervisors such as VMware. To me:
- The core feature of a hypervisor is that it allows many somethings to run and coexist where ordinarily only one something would come into play. Here the “many somethings” are virtual machines and what’s going on inside them, and the “one something” is the ordinary operating system/hardware computing stack.
- A core feature of original VMware was that the “many somethings” could be quite different — for example, the operating environments of numerous different hardware systems you wanted to decommission, or of new systems that you didn’t want to buy quite yet.
- Important features of hypervisors include:
- The ability to have multiple virtual machines run side by side at once, safely.
- Flexible and powerful workload management if the virtual machines do contend for resources.
- Easy management.
- The negative feature of having sufficiently low overhead.
Anything that claims to be “like virtualization” should be viewed in that light. I.e., it isn’t real virtualization unless it has the ex uno plures* feature.
*”Out of one, many”. It turns out that e unum pluribus just means the same as e pluribus unum, namely “Out of many, one”; word order isn’t as important in Latin as in English.
Most commonly, “data/database virtualization” is used to denote some kind of transparent data federation.
- Forrester Research, in a recent Forrester Wave, conflates that with “Information as a Service”.
- Informatica’s data virtualization marketing page gives one vendor’s view as to which capabilities could be involved.
- Logical data warehouse would seem to be a related concept.
I think “virtualization” is a bad name for this, because there isn’t much ex uno plures going on. But at least it’s a name that’s in widespread use.
More solid is the sense of “database virtualization” used by Delphix. Their core idea is to take all your different database copies for product, test, development, archiving and so on, and to the extent possible turn them into one real database, plus a bunch of diffs. Cost savings are obvious if that works. The ex uno plures feature is present.
Recently, I’ve noticed that transparent sharding is being referred to as database virtualization, especially by ParElastic. Transparent sharding is a great feature, but I don’t think calling it “database virtualization” makes much sense.
I noted back in October that the essence of multitenancy is a special-case version of ex uno plures. If somebody offered that and wanted to call it “virtualization”, I might not argue too much.
Weirdest of all is ScaleDB’s use of the term. ScaleDB seems to be claiming that:
- Any interesting database topology should be called “database virtualization”.
- The highest and best form of database virtualization is a clustered, shared-everything DBMS approach such as Oracle RAC.
Neither logic nor language support ScaleDB’s side.