That multi-tenancy discussion revisited
Keeping in mind Monash’s Third Law of Commercial Semantics,
No market categorization is ever precise
I’ll try to clarify my response to Oracle’s claims about Oracle12c being a “multi-tenant” DBMS.
I wrote a couple days ago:
Oracle is confusing people with its comments on multi-tenancy. I suspect:
- What Oracle is talking about when it says “multi-tenancy” is more like consolidation than true multi-tenancy.
- Probably there are a couple of true multi-tenancy features as well.
Now I’m even having doubts about the second part.
In simplest terms:
- Multi-tenancy is about making a single thing appear to be many different ones — typically one for each customer. Here the “things” can be databases and/or instances of the (same) application that talks to them.
- Database consolidation is about letting many different databases be hosted or managed more as one.
But from everything I’ve heard:
- Oracle12c’s announced new features improve database consolidation, not multi-tenancy.
More detail may be found at the links above.
Categories: Oracle, Software as a Service (SaaS) | 4 Comments |
Notes on the Oracle OpenWorld Sunday keynote
I’m not at Oracle OpenWorld, but as usual that won’t keep me from commenting. My bottom line on the first night’s announcements is:
- At many large enterprises, Oracle has a lock on much of their IT efforts. (But not necessarily in the internet or investigative analytics areas.) Tonight’s announcements serve to strengthen that.
- Tonight’s announcements do little to help Oracle in other market segments.
In particular:
1. At the highest level, my view of Oracle’s strategy is the same as it’s been for several years:
Clayton Christensen’s The Innovator’s Solution teaches us that Oracle should focus on selling a thick stack of technology to its highest-end customers, and that’s exactly what Oracle does focus on.
2. Tonight’s news is closely in line with what Oracle’s Juan Loaiza told me three years ago, especially:
- Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
- Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.
3. Oracle is confusing people with its comments on multi-tenancy. I suspect:
- What Oracle is talking about when it says “multi-tenancy” is more like consolidation than true multi-tenancy.
- Probably there are a couple of true multi-tenancy features as well.
4. SaaS (Software as a Service) vendors don’t want to use Oracle, because they don’t want to pay for it.* This limits the potential impact of Oracle’s true multi-tenancy features. Even so: Read more
Hoping for true columnar storage in Oracle12c
I was asked to clarify one of my July comments on Oracle12c,
I wonder whether Oracle will finally introduce a true columnar storage option, a year behind Teradata. That would be the obvious enhancement on the data warehousing side, if they can pull it off. If they can’t, it’s a damning commentary on the core Oracle codebase.
by somebody smart who however seemed to have half-forgotten my post comparing (hybrid) columnar compression to (hybrid) columnar storage.
In simplest terms:
- Columnar storage and columnar compression are two different things. The main connections are:
- Columnar storage can make columnar compression more effective.
- In different ways, both technologies reduce I/O.
- EMC Greenplum, Teradata Aster, and Teradata Classic are all originally row-based systems that have gone hybrid columnar.
- Vertica is an originally column-based system that has gone hybrid columnar.
Categories: Aster Data, Columnar database management, Data warehousing, Database compression, Greenplum, Oracle, Teradata, Vertica Systems | 4 Comments |
Oops
Please disregard any intentions I expressed of traveling in October, in particular a trip to visit 20 or so California clients. I’m under doctor’s orders not to fly for several weeks, and also don’t feel like driving (or walking) any significant distances. Any meetings I have in the very near future will either be telephonic, or else within a few minute’s drive of my home office in Acton, MA.
The story behind this is:
- Istanbul sidewalks have a lot of knee/shin-height metal poles to separate streets and driveways from sidewalks.
- I stumbled over one of the shin-height ones.
- I have some pretty dramatic bruising, and associated pain.
- Bruises + plane flights = risk of blood clot.
Fortunately, that’s all it is — no fracture, and the sprain per se is mild. But about 4 doctors and nurses have told me this is really unusual bruising. Nobody has offered a precise opinion as to how soon it will clear up, but I gather the good case is 2-4 weeks and the bad case is twice that.
I should have plenty of opportunity to blog.
Categories: About this blog | 1 Comment |
When should analytics be in-memory?
I was asked today for rules or guidance regarding “analytical problems, situations, or techniques better suited for in-database versus in-memory processing”. There are actually two kinds of distinction to be drawn:
- Some workloads, in principle, should run on data to which there’s very fast and unfettered access — so fast and unfettered that you’d love the whole data set to be in RAM. For others, there is little such need.
- Some products, in practice, are coupled to specific in-memory data stores or to specific DBMS, even though other similar products don’t make the same storage assumptions.
Let’s focus on the first part of that — what work, in principle, should be done in memory? Read more
Categories: Business intelligence, Data warehousing, Memory-centric data management, Parallelization, Predictive modeling and advanced analytics | 2 Comments |
Notes on Hadoop adoption
I successfully resisted telephone consulting while on vacation, but I did do some by email. One was on the oft-recurring subject of Hadoop adoption. I think it’s OK to adapt some of that into a post.
Notes on past and current Hadoop adoption include:
- Enterprise Hadoop adoption is for experimental uses or departmental production (as opposed to serious enterprise-level production). Indeed, it’s rather tough to disambiguate those two. If an enterprise uses Hadoop to search for new insights and gets a few, is that an experiment that went well, or is it production?
- One of the core internet-business use cases for Hadoop is a many-step ETL, ELT, and data refinement pipeline, with Hadoop executing some or many of the steps. But I don’t think that’s in production at many enterprises yet, except in the usual forward-leaning sectors of financial services and (we’re all guessing) national intelligence.
- In terms of industry adoption:
- Financial services on the investment/trading side are all over Hadoop, just as they’re all over any technology. Ditto national intelligence, one thinks.
- Consumer financial services, especially credit card, are giving Hadoop a try too, for marketing and/or anti-fraud.
- I’m sure there’s some telecom usage, but I’m hearing of less than I thought I would. Perhaps this is because telcos have spent so long optimizing their data into short, structured records.
- Whatever consumer financial services firms do, retailers do too, albeit with smaller budgets.
Thoughts on how Hadoop adoption will look going forward include: Read more
Categories: Cloud computing, Data warehouse appliances, Data warehousing, EAI, EII, ETL, ELT, ETLT, Hadoop, Investment research and trading, Telecommunications | 3 Comments |
Database challenges in multi-tenancy support
I predicted 2 months ago that Oracle 12c would have some kind of improved support for multi-tenancy; Larry Ellison confirmed on this week’s earnings call that it will. So maybe it’s time to think about what such support could or should mean. I’m actually still on vacation, so I’d like to keep this short, but here are a few notes.
- The goal of multi-tenancy is:
- SaaS (Software as a Service) users should get all the flexibility, performance, security, and control they would expect if their SaaS vendor hosted a software instance and database just for them.
- SaaS vendors shouldn’t have to do any more than host a single instance of the application software and a single database.
- In its purest form, that goal is a nice dream.
- Separation-of-access and related security issues are the most obvious requirement of multi-tenancy. However, the simplest ways to meet the requirement stress your SELECT statements. I alluded to that in a post about salesforce.com.
- In a clustered, multi-tenant SaaS database, you want each tenant’s individual database to be properly clustered. Perhaps you want it all on one server. Perhaps you want it striped across the cluster. In any case, your DBMS’ clustering has to be flexible and granular enough to make that possible.
- Caching should also be as good for each tenant as if that tenant had a standalone database.
- Individual tenants need to be able to administer their databases, at least in certain ways, as if they were standalone, or else the SaaS vendor needs to be able to do it for them.
- This is implicit in what I said above about users/roles/permissions, clustering, and caching.
- It really gets interesting when we take into account application customization and the resulting schema changes.
- And before we get too excited about any of this, please note that there are many SaaS vendors in the world doing just fine without explicit DBMS multi-tenancy features.
Categories: Clustering, Oracle, Software as a Service (SaaS) | 8 Comments |
Integrated internet system design
What are the central challenges in internet system design? We probably all have similar lists, comprising issues such as scale, scale-out, throughput, availability, security, programming ease, UI, or general cost-effectiveness. Screw those up, and you don’t have an internet business.
Much new technology addresses those challenges, with considerable success. But the success is usually one silo at a time — a short-request application here, an analytic database there. When it comes to integration, unsolved problems abound.
The top integration and integration-like challenges for me, from a practical standpoint, are:
- Integrating silos — a decades-old problem still with us in a big way.
- Dynamic schemas with joins.
- Low-latency business intelligence.
- Human real-time personalization.
Other concerns that get mentioned include:
- Geographical distribution due to privacy laws, which for some users is a major requirement for compliance.
- Logical data warehouse, a term that doesn’t actually mean anything real.
- In-memory data grids, which some day may no longer always be hand-coupled to the application and data stacks they accelerate.
Let’s skip those latter issues for now, focusing instead on the first four.
Uninterrupted DBMS operation — an almost-achievable goal
I’m hearing more and more stories about uninterrupted DBMS operation. There are no iron-clad assurances of zero downtime; if nothing else, you could crash your whole system yourself via some kind of application bug. Even so, it’s a worthy ideal, and near-zero downtime is a practical goal.
Uninterrupted database operations can have a lot of different aspects. The two most basic are probably:
- High availability/fail-over. If a system goes down, another one in the same data center is operational almost immediately.
- Disaster recovery. Same story, but not in the same data center, and hence not quite as immediate.
These work with single-server or scale-out systems alike. However, scale-out and the replication commonly associated with it raise additional issues in continuous database operation:
- Eventual consistency. Scale-out and replication create multiple potential new points of failure, server and network alike. Eventual consistency ensures that a single such failure doesn’t take any part of the database down.
- The use of replicas to avoid planned downtime. If you do rolling maintenance, then you can keep a set of servers with the full database up at all times.
Finally, if you really care about uninterrupted operation, you might also want to examine:
- Administrative tools and utilities. The better your tools, the better your chances of keeping your system up. That applies to anything from administrative dashboards to parallel backup functionality.
- Fencing of in-database analytic processes. If you’re going to do in-database analytics, fenced/out-of-process ones are a lot safer than the alternative.
- Online schema changes. If you change a schema in a relational DBMS, that doesn’t necessarily entail taking the database offline.
Let’s discuss some of those points below.
Aerospike, the former Citrusleaf
My new clients at Aerospike have a range of minor news to announce:
- A company and product name change (they used to be Citrusleaf).
- Some new people and funding.
- In association with an acqui-hire — of AlchemyDB guy Russ Sullivan — some unspecified future technical plans.
- A community edition (Aerospike, nee’ Citrusleaf, is closed-source).
Mainly, however, they want to call your attention to the fact that they’ve been selling a fast, reliable key-value store, with a number of production references, and want to suggest that other organizations should perhaps buy it as well.
Generally, the Aerospike product story is as I described in two posts last year. At the highest level:
- Aerospike has a key-value data model.
- Secondary indexes and so on are still futures.
- Aerospike is clustered, of course.
- Two hardware/storage choices are encouraged:
- Spinning disk, but you keep all your data in RAM.
- Solid-state disk.
AeroSpike’s three core marketing claims are performance, consistent performance, and uninterrupted operations.
- Aerospike’s performance claims are supported by a variety of blazing internal benchmarks.
- Aerospike’s consistent performance claims are along the lines of sub-millisecond latency, with 99.9% of responses being within 5 milliseconds, and even a node outage only borking performance for some 10s of milliseconds.
- Uninterrupted operation is a core AeroSpike design goal, and the company says that to date, no AeroSpike production cluster has ever gone down.
Aerospike technical details start with the expected: Read more