salesforce.com, force.com, and database.com use exactly the same database infrastructure and architecture. That’s the good news. The bad news is that salesforce.com is somewhat obscure about technical details, for reasons such as:
- A long-ago marketing decision to not give infrastructure details, so as to convey a “Don’t worry; we’ll take care of everything” message.
- Even so, a long-ago and perhaps now-regretted marketing decision to disclose and even exaggerate salesforce.com’s reliance on Oracle, as part of an early-days attempt to prove salesforce was using enterprise-class technology.
- A desire to hide the recipe for salesforce.com’s secret sauce.
- Force of habit — I’m not sure salesforce even knows how to tell its technical story with any clarity.
Actually, salesforce.com has moved some kinds of data out of Oracle that previously used to be stored there. Besides Oracle, salesforce uses at least a file system and a RAM-based data store about which I have no details. Even so, much of salesforce.com’s data is stored in Oracle — a single instance of Oracle, which it believes may be the largest instance of Oracle in the world.
Salesforce did spell out some of its database story in a 2008 force.com white paper, which is good stuff, but potentially misleading in one important way. The paper tells of a level of abstraction, whereby what the application sees as logical “columns” are stored in a very different schema than one might assume. However, it doesn’t spell out a second level of abstraction, whereby that logical schema also isn’t how the database is actually laid out.
Another flaw in the paper is that it spins “We had to do this, to support multitenancy, so we did.” issues as “Because we’re multitenant, we can do this, while single-tenant systems can’t.” One example is the query optimization step around “user visibility” in Figure 11. Welcome to marketing.
At the first level of abstraction, data seems to be kept mainly in a single wide table, with hundreds of columns. What’s more, many of those are “flex columns”; a flex column can hold data of many different kinds and even datatypes. Notwithstanding the second level of abstraction, I imagine the idea of stuffing different kinds of thing into the same column has something to do with the fact that Oracle’s physical limit on columns falls far short of the number of logical columns salesforce wants to use.
If we imagine that the different kinds of data in a flex column were each in their own column instead, the whole thing might sound like BigTable/Cassandra/HBase-style column-group NoSQL. Thus, much as Workday uses MySQL to simulate a key-value store, salesforce.com can be said to use Oracle to simulate a different kind of NoSQL. In both cases, what’s going on seems to be a kind of object/relational mapping, but with the relational aspect strongly deemphasized. Or, if you take a more relational view, we could say that salesforce.com’s tables are a lot wider than any one user organization’s, because each user sees only its own custom columns (plus the standard ones common to all users).
The second layer of abstraction has a lot to do with multitenancy. If you want to stick data for many different user organizations into the same huge table, then you have to label it in some way to show who is permitted to see or update each part. Logically, this leads to a join, between one table carrying data plus a simple key showing which users/roles are entitled to see it, and a second table showing who actually is that kind of user/has that kind of role. But that join makes a lot of sense to store in a denormalized way, all the more because data is partitioned across the computer cluster in line with which user organization it actually belongs to.
Multitenant security isn’t the only reason for this denormalization, but it appears to be the biggest one.
The whole thing is doing 550 million or so transactions per day. salesforce.com thinks that fact should be regarded as evidence that it works.