September 15, 2011

The database architecture of salesforce.com, force.com, and database.com

salesforce.com, force.com, and database.com use exactly the same database infrastructure and architecture. That’s the good news. The bad news is that salesforce.com is somewhat obscure about technical details, for reasons such as:

Actually, salesforce.com has moved some kinds of data out of Oracle that previously used to be stored there. Besides Oracle, salesforce uses at least a file system and a RAM-based data store about which I have no details. Even so, much of salesforce.com’s data is stored in Oracle — a single instance of Oracle, which it believes may be the largest instance of Oracle in the world.

Salesforce did spell out some of its database story in a 2008 force.com white paper, which is good stuff, but potentially misleading in one important way. The paper tells of a level of abstraction, whereby what the application sees as logical “columns” are stored in a very different schema than one might assume. However, it doesn’t spell out a second level of abstraction, whereby that logical schema also isn’t how the database is actually laid out.

Another flaw in the paper is that it spins “We had to do this, to support multitenancy, so we did.” issues as “Because we’re multitenant, we can do this, while single-tenant systems can’t.” One example is the query optimization step around “user visibility” in Figure 11. Welcome to marketing.

At the first level of abstraction, data seems to be kept mainly in a single wide table, with hundreds of columns. What’s more, many of those are “flex columns”; a flex column can hold data of many different kinds and even datatypes. Notwithstanding the second level of abstraction, I imagine the idea of stuffing different kinds of thing into the same column has something to do with the fact that Oracle’s physical limit on columns falls far short of the number of logical columns salesforce wants to use.

If we imagine that the different kinds of data in a flex column were each in their own column instead, the whole thing might sound like BigTable/Cassandra/HBase-style column-group NoSQL. Thus, much as Workday uses MySQL to simulate a key-value store, salesforce.com can be said to use Oracle to simulate a different kind of NoSQL. In both cases, what’s going on seems to be a kind of object/relational mapping, but with the relational aspect strongly deemphasized. Or, if you take a more relational view, we could say that salesforce.com’s tables are a lot wider than any one user organization’s, because each user sees only its own custom columns (plus the standard ones common to all users).

The second layer of abstraction has a lot to do with multitenancy. If you want to stick data for many different user organizations into the same huge table, then you have to label it in some way to show who is permitted to see or update each part. Logically, this leads to a join, between one table carrying data plus a simple key showing which users/roles are entitled to see it, and a second table showing who actually is that kind of user/has that kind of role. But that join makes a lot of sense to store in a denormalized way, all the more because data is partitioned across the computer cluster in line with which user organization it actually belongs to.

Multitenant security isn’t the only reason for this denormalization, but it appears to be the biggest one.

The whole thing is doing 550 million or so transactions per day. salesforce.com thinks that fact should be regarded as evidence that it works. :)

Comments

18 Responses to “The database architecture of salesforce.com, force.com, and database.com”

  1. salesforce.com, force.com, database.com, data.com, heroku.com — notes and context | DBMS 2 : DataBase Management System Services on September 15th, 2011 11:11 am

    […] database architecture of salesforce.com, force.com, and database.com Categories: Pricing, Software as a Service (SaaS), salesforce.com  Subscribe to our […]

  2. Dan Koren on September 16th, 2011 3:05 am

    Curt,

    550M transactions per day (24h?) amounts to
    about 6366 transactions per second. Far less
    impressive than it sounds.

    Dan Koren

  3. Gary on September 16th, 2011 6:07 am

    ” which it believes may be the largest instance of Oracle in the world.” Is that a measure of the number of CPU/cores, or in data volume or some other measure ?

  4. State of Data #66 « Dr Data's Blog on September 16th, 2011 8:58 am

    […] – Database architecture of Salesforce.com, force.com and database.com – Oracle databases doing 550M […]

  5. Curt Monash on September 16th, 2011 1:29 pm

    Gary,

    I don’t know.

    I’d GUESS CPUs/cores or something — SaaS applications lend themselves to parallelization really well, because you’re doing the same thing at once on many small subsets of the data.

  6. Curt Monash on September 16th, 2011 1:36 pm

    Dan,

    Good point of arithmetic. I would guess those are heftier transactions than one sees on most other real-world systems that average 1000s of transactions/second. But yes, this is a really big, serious transactional database, not something that is off-the-charts humongous.

    One thing to note is that salesforce is not pitching “Buy from us and set up a database just like the one big one we’ve proven the technology on already.” Rather, salesforce is pitching “Buy from us and participate in the big database that’s already been up and running without serious incident for years.” The latter is indeed a somewhat stronger pitch.

  7. Gera Shegalov on September 16th, 2011 2:01 pm

    I would like to point out that it’s not a single Oracle instance but rather a single Oracle Real Application Cluster (Oracle RAC) on several high-end Sun boxes that backs all these force’s. At Oracle an instance refers to a single server. An Oracle database (collection of physical files) is managed either by a single instance or by RAC consisting of multiple instances.

    I attended SOCC’10 last year where Saleforce’s CTO Rob Woollen gave a keynote on the force.com architecture [http://research.microsoft.com/en-us/um/redmond/events/socc2010/woollen.htm]. Judging by my own recollection and notes a fellow attendee took http://sna-projects.com/blog/2010/06/socc-2010-updates/ Salesforce used an 8-node Oracle RAC as of 2010 and Rob was talking about gradual extension of this installation over the years. It might be more nodes by now.

  8. Curt Monash on September 16th, 2011 2:57 pm

    Dan,

    We should also note that it’s easy to confuse our intuitions about transactions/second and transactions/minute. 380,000 transactions/minute is quite a lot, especially as an average rather than a peak.

  9. Curt Monash on September 16th, 2011 5:00 pm

    Thanks for the data, Gera!

  10. Hubi on September 18th, 2011 1:49 am

    That kind of throughput over a long period of time is at the top end of what Microsoft also considers large database workloads

    210B txn p.a.
    396K txn per minute

    http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2008/Mediterranean-Shipping-Company/Mediterranean-Shipping-Company-Managing-22-Terabytes-of-data-with-SQL-Server-2008/4000003470

  11. Meng Mao on September 18th, 2011 10:32 pm

    If some of the columns are what amount to custom fields for certain customers, does that mean signing a new major customer might require a (presumably extensive) migration of their table?

  12. La petite Revue de Presse du Décisionnel | www.LeGrandBI.com on September 19th, 2011 3:03 am

    […] salesforce.com, force.com, and database.com use exactly the same database infrastructure and architecture. That’s the good news. The bad news is that salesforce.com is somewhat obscure about technical details… Lire l’article […]

  13. Curt Monash on September 19th, 2011 10:32 am

    @Meng,

    Nope. You just add new custom columns (at least logically). Hence my analogy to BigTable/Cassandra/HBase.

  14. RussellH on March 21st, 2012 1:33 pm
  15. Is salesforce.com going to stick with Oracle? | DBMS 2 : DataBase Management System Services on June 26th, 2012 10:54 am

    […] going to stick with Oracle?” So let me refer to and expand upon my previous post about salesforce.com’s database architecture by […]

  16. Choosing and Using a CRM | the engine room on July 18th, 2012 12:55 pm

    […] Unknown […]

  17. Cloud databases 101: Who builds ‘em and what they do — Cloud Computing News on July 20th, 2012 1:53 pm

    […] standalone database service, Database.com, isn’t exactly NoSQL, but it isn’t exactly a relational database, either. What it is for sure is the same multitenant database architecture that has been underneath […]

  18. Salesforce vs. Microsoft CRM - An Evolutionary Tale | CRM Switch on June 25th, 2013 3:35 pm

    […] the most scalable database available at the time, which was Oracle. However, salesforce.com had to architect the database very in a specific way that would support multi-tenant scalability and security as well efficient […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.