An enterprise data warehouse should:
- Manage data to high standards of accuracy, consistency, cleanliness, clarity, and security.
- Manage all the data in your organization.
There’s little to dislike in the enterprise data warehouse dream, as represented (for example) in this 2004 Teradata Magazine article. But in a world where ever more data comes in from ever more sources – and is needed ever faster – it simply isn’t realistic to expect that all an enterprise’s data will be vetted, organized, and managed to the highest of standards.
This is a core premise of Greenplum’s Enterprise Data Cloud (EDC)/Chorus marketing initiative, and in that respect Greenplum is correct.
If the EDW is a great idea that can never be 100% implemented, what should you do? At conventional enterprises, the answer is pretty obvious: Manage some of your data to enterprise data warehouse standards, but not all of it. Specifically, your highest-value data should be in something that looks like a classic enterprise data warehouse, and your lower-value data shouldn’t.
Of course, if you’re a data mart outsourcer or other analytic service provider, whose data is about your customers’ businesses rather than your own, and whose business is managing your customers’ data, this may not apply to you. But otherwise it’s a position with many supporting arguments, including:
- Financial reporting, compliance, and other legitimate concerns introduce rigidity into data models. This increases the cost and reduces the speed of getting data into enterprise data warehouses.
- Data governance procedures imposed for any other business purpose have the same effect. What’s deemed necessary for enterprise data warehouses can be fatal to timely analytics.
- The highest-value data typically comes from transactional systems, such as order entry or sales contact management. So it starts out with a degree of governance that, say, web log files may never enjoy.
- In some enterprises, it is affordable or even cost-effective to manage your highest-value data in your favorite big-brand DBMS, but necessary to manage most of your data in something with lower TCO (Total Cost of Ownership). Big-brand OLTP DBMS are often better (or at least less bad) at managing enterprise data warehouses than they are at running data mart workloads.
- At certain enterprise and database sizes, it may indeed make sense to run what amounts to an enterprise data warehouse out of the same database instance that does OLTP, while putting larger data sets into more cost-effective data marts. A trend to “operational BI” may actually make that option more appealing going forward than it has been in the past.
- And finally, there’s the empirical fact that not one really large enterprise on the whole planet has a true, perfectly comprehensive enterprise data warehouse. At least, I’ve never heard of one.
- Even Teradata doesn’t push an EDW-only strategy any more
- I agreed when Greenplum first started pushing the EDC idea that something like it would be the future of data marts