Perhaps we should remind ourselves of the many ways data models can be caused to churn. Here are some examples that are top-of-mind for me. They do overlap a lot — and the whole discussion overlaps with my post about schema complexity last January, and more generally with what I’ve written about dynamic schemas for the past several years..
Just to confuse things further — some of these examples show the importance of RDBMS, while others highlight the relational model’s limitations.
The old standbys
Product and service changes. Simple changes to your product line many not require any changes to the databases recording their production and sale. More complex product changes, however, probably will.
A big help in MCI’s rise in the 1980s was its new Friends and Family service offering. AT&T couldn’t respond quickly, because it couldn’t get the programming done, where by “programming” I mainly mean database integration and design. If all that was before your time, this link seems like a fairly contemporaneous case study.
Organizational changes. A common source of hassle, especially around databases that support business intelligence or planning/budgeting, is organizational change. Kalido’s whole business was based on accommodating that, last I checked, as were a lot of BI consultants’.
That ability was also the most noteworthy feature of PeopleSoft’s application development technology, back in 1990s, at least the way I remember Rick Berquist explaining PeopleTools to me.
Mergers & acquisitions. Obviously, accommodating a business combination has a huge effect on data management, especially if you follow the usual path of starting with separate legacy systems and combining them where possible over time. And it plays merry hell with the trend-tracking parts of your accounting and BI systems.
Application replacement. Replace your third-party apps, for whatever reason, and you almost surely get a new database structure too. The same, of course, goes when you deploy entirely new apps. And when things get either more integrated (e.g. by replacing silos with an application suite) or less so (e.g. by introducing selective SaaS apps), special fun ensues.
Refactoring and MDM. There are numerous ways it can make sense to refactor your custom apps, including your custom/in-house ones. One important reason of many is to increase your adoption of master data management.
The new stuff
Marketing campaign data. Marketers are full of creative ideas, many of which involve generating responses or other data about targets. As I’ve noted before, this data can come in a variety of structures.
Further confusing matters:
- For legitimate business-agility reasons, marketers don’t have a lot of patience about getting those databases up and running.
- But siloing everything isn’t good; there are many reasons to combine data from different sources.
Social data. In particular, marketers like “social data”, whether through direct interaction with consumers, or from scraping online discussions. That includes a lot of text data, and representing text data in ways that work well with analytic tools is a never-ending battle.
Third-party data. Enterprises are making ever more use of data supplied by third parties. That data typically shows up whenever the customer chooses to pay for it, in whatever form the data vendor chooses to supply.
Internet log data. Website logs are a mess, and the same goes for many mobile-app equivalents. Part of the reason is nested data structures. But even leaving those aside, it’s a best practice to extract and directly store different fields at different points in time.
Machine-generated data. Besides the points I’ve already noted about log data, there are other issues with machine-generated data. In particular:
- There’s a lot of it. Moore’s Law teaches us that some sensors will always be able to throw off more data than it will be affordable to store. Hence, selections will have to be made as to what constitutes a signal or even worth storing. As choices change, data structures are apt to change as well.
- There are many different kinds of it. If you’re taking data from all the machines in a factory, or all the major parts of an automobile, then new sources or aspects of data will be introduced frequently, as engineers find new ways to use ever more affordable chips. And when the mobile devices are powerful multipurpose computers, such as smartphones, any one model can keep changing its mind as to what data it sends.
Derived data. As per some of the cases cited above — whatever its reason for existing, derived data leads to schema change.
Bottom line: Any data-related business processes you have — for example data governance — should assume that your data models will be in perpetual, rapid flux.
- In my recent post on the refactoring of everything, I focused on larger/coarser-grained systems. But the data model churn discussed in this post is yet another example.
- In July, 2011, I took a somewhat calmer view of remote machine-generated data than I would now.